intel xeon phi: Topics by Science.gov

Sample records for intel xeon phi

Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors

NASA Astrophysics Data System (ADS)

Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.

2016-05-01

This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

NASA Astrophysics Data System (ADS)

Yi, Hongsuk

2014-03-01

Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Evaluating the transport layer of the ALFA framework for the Intel® Xeon Phi™ Coprocessor

NASA Astrophysics Data System (ADS)

Santogidis, Aram; Hirstius, Andreas; Lalis, Spyros

2015-12-01

The ALFA framework supports the software development of major High Energy Physics experiments. As part of our research effort to optimize the transport layer of ALFA, we focus on profiling its data transfer performance for inter-node communication on the Intel Xeon Phi Coprocessor. In this article we present the collected performance measurements with the related analysis of the results. The optimization opportunities that are discovered, help us to formulate the future plans of enabling high performance data transfer for ALFA on the Intel Xeon Phi architecture.
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.

PubMed

Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal

2015-06-01

Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
Exact diagonalization of quantum lattice models on coprocessors

NASA Astrophysics Data System (ADS)

Siro, T.; Harju, A.

2016-10-01

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme for Intel Many Integrated Core (MIC) architecture

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.

2015-05-01

Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.
Initial results on computational performance of Intel Many Integrated Core (MIC) architecture: implementation of the Weather and Research Forecasting (WRF) Purdue-Lin microphysics scheme

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2014-10-01

Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

NASA Astrophysics Data System (ADS)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes

PubMed Central

2017-01-01

To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation. PMID:28582389
Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes.

PubMed

Einkemmer, Lukas

2017-01-01

To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation.
Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

NASA Astrophysics Data System (ADS)

Dey, T.; Rodrigue, P.

2015-07-01

We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.
Evaluation of the Xeon phi processor as a technology for the acceleration of real-time control in high-order adaptive optics systems

NASA Astrophysics Data System (ADS)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah; Vick, Andy; Schnetler, Hermine

2014-08-01

We present wavefront reconstruction acceleration of high-order AO systems using an Intel Xeon Phi processor. The Xeon Phi is a coprocessor providing many integrated cores and designed for accelerating compute intensive, numerical codes. Unlike other accelerator technologies, it allows virtually unchanged C/C++ to be recompiled to run on the Xeon Phi, giving the potential of making development, upgrade and maintenance faster and less complex. We benchmark the Xeon Phi in the context of AO real-time control by running a matrix vector multiply (MVM) algorithm. We investigate variability in execution time and demonstrate a substantial speed-up in loop frequency. We examine the integration of a Xeon Phi into an existing RTC system and show that performance improvements can be achieved with limited development effort.
Revisiting Intel Xeon Phi optimization of Thompson cloud microphysics scheme in Weather Research and Forecasting (WRF) model

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen

2015-10-01

The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.
Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel® Xeon Phi™ Coprocessor.

PubMed

Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas

2015-01-01

Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
GW Calculations of Materials on the Intel Xeon-Phi Architecture

NASA Astrophysics Data System (ADS)

Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek; Biller, Ariel; Chelikowsky, James R.; Louie, Steven G.

Intel Xeon-Phi processors are expected to power a large number of High-Performance Computing (HPC) systems around the United States and the world in the near future. We evaluate the ability of GW and pre-requisite Density Functional Theory (DFT) calculations for materials on utilizing the Xeon-Phi architecture. We describe the optimization process and performance improvements achieved. We find that the GW method, like other higher level Many-Body methods beyond standard local/semilocal approximations to Kohn-Sham DFT, is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-waves, band-pairs and frequencies. Support provided by the SCIDAC program, Department of Energy, Office of Science, Advanced Scientic Computing Research and Basic Energy Sciences. Grant Numbers DE-SC0008877 (Austin) and DE-AC02-05CH11231 (LBNL).
Implementation of High-Order Multireference Coupled-Cluster Methods on Intel Many Integrated Core Architecture.

PubMed

Aprà, E; Kowalski, K

2016-03-08

In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
ELT-scale Adaptive Optics real-time control with thes Intel Xeon Phi Many Integrated Core Architecture

NASA Astrophysics Data System (ADS)

Jenkins, David R.; Basden, Alastair; Myers, Richard M.

2018-05-01

We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.
Performance tuning Weather Research and Forecasting (WRF) Goddard longwave radiative transfer scheme on Intel Xeon Phi

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2015-10-01

Next-generation mesoscale numerical weather prediction system, the Weather Research and Forecasting (WRF) model, is a designed for dual use for forecasting and research. WRF offers multiple physics options that can be combined in any way. One of the physics options is radiance computation. The major source for energy for the earth's climate is solar radiation. Thus, it is imperative to accurately model horizontal and vertical distribution of the heating. Goddard solar radiative transfer model includes the absorption duo to water vapor,ozone, ozygen, carbon dioxide, clouds and aerosols. The model computes the interactions among the absorption and scattering by clouds, aerosols, molecules and surface. Finally, fluxes are integrated over the entire longwave spectrum.In this paper, we present our results of optimizing the Goddard longwave radiative transfer scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The optimizations improved the performance of the original Goddard longwave radiative transfer scheme on Xeon Phi 7120P by a factor of 2.2x. Furthermore, the same optimizations improved the performance of the Goddard longwave radiative transfer scheme on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 2.1x compared to the original Goddard longwave radiative transfer scheme code.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

PubMed

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

NASA Astrophysics Data System (ADS)

Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

2016-04-01

We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.

Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.

PubMed

Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H

2015-10-01

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.
Intel Xeon Phi accelerated Weather Research and Forecasting (WRF) Goddard microphysics scheme

NASA Astrophysics Data System (ADS)

Mielikainen, J.; Huang, B.; Huang, A. H.-L.

2014-12-01

The Weather Research and Forecasting (WRF) model is a numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs. The WRF development is a done in collaboration around the globe. Furthermore, the WRF is used by academic atmospheric scientists, weather forecasters at the operational centers and so on. The WRF contains several physics components. The most time consuming one is the microphysics. One microphysics scheme is the Goddard cloud microphysics scheme. It is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the Goddard scheme code. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU does. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is one familiar to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discussed in this paper. The results show that the optimizations improved performance of Goddard microphysics scheme on Xeon Phi 7120P by a factor of 4.7×. In addition, the optimizations reduced the Goddard microphysics scheme's share of the total WRF processing time from 20.0 to 7.5%. Furthermore, the same optimizations improved performance on Intel Xeon E5-2670 by a factor of 2.8× compared to the original code.
Does the Intel Xeon Phi processor fit HEP workloads?

NASA Astrophysics Data System (ADS)

Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

2014-06-01

This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

NASA Astrophysics Data System (ADS)

Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

2018-02-01

We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.
Investigating the Use of the Intel Xeon Phi for Event Reconstruction

NASA Astrophysics Data System (ADS)

Sherman, Keegan; Gilfoyle, Gerard

2014-09-01

The physics goal of Jefferson Lab is to understand how quarks and gluons form nuclei and it is being upgraded to a higher, 12-GeV beam energy. The new CLAS12 detector in Hall B will collect 5-10 terabytes of data per day and will require considerable computing resources. We are investigating tools, such as the Intel Xeon Phi, to speed up the event reconstruction. The Kalman Filter is one of the methods being studied. It is a linear algebra algorithm that estimates the state of a system by combining existing data and predictions of those measurements. The tools required to apply this technique (i.e. matrix multiplication, matrix inversion) are being written using C++ intrinsics for Intel's Xeon Phi Coprocessor, which uses the Many Integrated Cores (MIC) architecture. The Intel MIC is a new high-performance chip that connects to a host machine through the PCIe bus and is built to run highly vectorized and parallelized code making it a well-suited device for applications such as the Kalman Filter. Our tests of the MIC optimized algorithms needed for the filter show significant increases in speed. For example, matrix multiplication of 5x5 matrices on the MIC was able to run up to 69 times faster than the host core. The physics goal of Jefferson Lab is to understand how quarks and gluons form nuclei and it is being upgraded to a higher, 12-GeV beam energy. The new CLAS12 detector in Hall B will collect 5-10 terabytes of data per day and will require considerable computing resources. We are investigating tools, such as the Intel Xeon Phi, to speed up the event reconstruction. The Kalman Filter is one of the methods being studied. It is a linear algebra algorithm that estimates the state of a system by combining existing data and predictions of those measurements. The tools required to apply this technique (i.e. matrix multiplication, matrix inversion) are being written using C++ intrinsics for Intel's Xeon Phi Coprocessor, which uses the Many Integrated Cores (MIC) architecture. The Intel MIC is a new high-performance chip that connects to a host machine through the PCIe bus and is built to run highly vectorized and parallelized code making it a well-suited device for applications such as the Kalman Filter. Our tests of the MIC optimized algorithms needed for the filter show significant increases in speed. For example, matrix multiplication of 5x5 matrices on the MIC was able to run up to 69 times faster than the host core. Work supported by the University of Richmond and the US Department of Energy.
Accelerating gravitational microlensing simulations using the Xeon Phi coprocessor

NASA Astrophysics Data System (ADS)

Chen, B.; Kantowski, R.; Dai, X.; Baron, E.; Van der Mark, P.

2017-04-01

Recently Graphics Processing Units (GPUs) have been used to speed up very CPU-intensive gravitational microlensing simulations. In this work, we use the Xeon Phi coprocessor to accelerate such simulations and compare its performance on a microlensing code with that of NVIDIA's GPUs. For the selected set of parameters evaluated in our experiment, we find that the speedup by Intel's Knights Corner coprocessor is comparable to that by NVIDIA's Fermi family of GPUs with compute capability 2.0, but less significant than GPUs with higher compute capabilities such as the Kepler. However, the very recently released second generation Xeon Phi, Knights Landing, is about 5.8 times faster than the Knights Corner, and about 2.9 times faster than the Kepler GPU used in our simulations. We conclude that the Xeon Phi is a very promising alternative to GPUs for modern high performance microlensing simulations.
An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

DOE PAGES

Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; ...

2017-10-04

The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overallmore » memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.« less
Application of Intel Many Integrated Core (MIC) accelerators to the Pleim-Xiu land surface scheme

NASA Astrophysics Data System (ADS)

Huang, Melin; Huang, Bormin; Huang, Allen H.

2015-10-01

The land-surface model (LSM) is one physics process in the weather research and forecast (WRF) model. The LSM includes atmospheric information from the surface layer scheme, radiative forcing from the radiation scheme, and precipitation forcing from the microphysics and convective schemes, together with internal information on the land's state variables and land-surface properties. The LSM is to provide heat and moisture fluxes over land points and sea-ice points. The Pleim-Xiu (PX) scheme is one LSM. The PX LSM features three pathways for moisture fluxes: evapotranspiration, soil evaporation, and evaporation from wet canopies. To accelerate the computation process of this scheme, we employ Intel Xeon Phi Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.3x and 11.7x as compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670.
Acceleration of Monte Carlo simulation of photon migration in complex heterogeneous media using Intel many-integrated core architecture.

PubMed

Gorshkov, Anton V; Kirillin, Mikhail Yu

2015-08-01

Over two decades, the Monte Carlo technique has become a gold standard in simulation of light propagation in turbid media, including biotissues. Technological solutions provide further advances of this technique. The Intel Xeon Phi coprocessor is a new type of accelerator for highly parallel general purpose computing, which allows execution of a wide range of applications without substantial code modification. We present a technical approach of porting our previously developed Monte Carlo (MC) code for simulation of light transport in tissues to the Intel Xeon Phi coprocessor. We show that employing the accelerator allows reducing computational time of MC simulation and obtaining simulation speed-up comparable to GPU. We demonstrate the performance of the developed code for simulation of light transport in the human head and determination of the measurement volume in near-infrared spectroscopy brain sensing.
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-08-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™

PubMed Central

Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.

2016-01-01

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591
Optimizing the Betts-Miller-Janjic cumulus parameterization with Intel Many Integrated Core (MIC) architecture

NASA Astrophysics Data System (ADS)

Huang, Melin; Huang, Bormin; Huang, Allen H.-L.

2015-10-01

The schemes of cumulus parameterization are responsible for the sub-grid-scale effects of convective and/or shallow clouds, and intended to represent vertical fluxes due to unresolved updrafts and downdrafts and compensating motion outside the clouds. Some schemes additionally provide cloud and precipitation field tendencies in the convective column, and momentum tendencies due to convective transport of momentum. The schemes all provide the convective component of surface rainfall. Betts-Miller-Janjic (BMJ) is one scheme to fulfill such purposes in the weather research and forecast (WRF) model. National Centers for Environmental Prediction (NCEP) has tried to optimize the BMJ scheme for operational application. As there are no interactions among horizontal grid points, this scheme is very suitable for parallel computation. With the advantage of Intel Xeon Phi Many Integrated Core (MIC) architecture, efficient parallelization and vectorization essentials, it allows us to optimize the BMJ scheme. If compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670, the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.4x and 17.0x, respectively.
Optimizing meridional advection of the Advanced Research WRF (ARW) dynamics for Intel Xeon Phi coprocessor

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.

2015-05-01

The most widely used community weather forecast and research model in the world is the Weather Research and Forecast (WRF) model. Two distinct varieties of WRF exist. The one we are interested is the Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we optimize a meridional (north-south direction) advection subroutine for Intel Xeon Phi coprocessor. Advection is of the most time consuming routines in the ARW dynamics core. It advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.2x.
Intel Many Integrated Core (MIC) architecture optimization strategies for a memory-bound Weather Research and Forecasting (WRF) Goddard microphysics scheme

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2014-10-01

The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.
Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sitaraman, Hariswaran; Grout, Ray W

This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

NASA Astrophysics Data System (ADS)

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

2015-05-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

DOE PAGES

Dongarra, Jack; Gates, Mark; Haidar, Azzam; ...

2015-01-01

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA.more » High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.« less
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi

DOE PAGES

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...

2015-05-22

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
Efficient sparse matrix-matrix multiplication for computing periodic responses by shooting method on Intel Xeon Phi

NASA Astrophysics Data System (ADS)

Stoykov, S.; Atanassov, E.; Margenov, S.

2016-10-01

Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.
Discrete Particle Model for Porous Media Flow using OpenFOAM at Intel Xeon Phi Coprocessors

NASA Astrophysics Data System (ADS)

Shang, Zhi; Nandakumar, Krishnaswamy; Liu, Honggao; Tyagi, Mayank; Lupo, James A.; Thompson, Karten

2015-11-01

The discrete particle model (DPM) in OpenFOAM was used to study the turbulent solid particle suspension flows through the porous media of a natural dual-permeability rock. The 2D and 3D pore geometries of the porous media were generated by sphere packing with the radius ratio of 3. The porosity is about 38% same as the natural dual-permeability rock. In the 2D case, the mesh cells reach 5 million with 1 million solid particles and in the 3D case, the mesh cells are above 10 million with 5 million solid particles. The solid particles are distributed by Gaussian distribution from 20 μm to 180 μm with expectation as 100 μm. Through the numerical simulations, not only was the HPC studied using Intel Xeon Phi Coprocessors but also the flow behaviors of large scale solid suspension flows in porous media were studied. The authors would like to thank the support by IPCC@LSU-Intel Parallel Computing Center (LSU # Y1SY1-1) and the HPC resources at Louisiana State University (http://www.hpc.lsu.edu).

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.

2017-10-20

Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
Scaling Support Vector Machines On Modern HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2015-02-01

We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.
Wilson and Domainwall Kernels on Oakforest-PACS

NASA Astrophysics Data System (ADS)

Kanamori, Issaku; Matsufuru, Hideo

2018-03-01

We report the performance of Wilson and Domainwall Kernels on a new Intel Xeon Phi Knights Landing based machine named Oakforest-PACS, which is co-hosted by University of Tokyo and Tsukuba University and is currently fastest in Japan. This machine uses Intel Omni-Path for the internode network. We compare performance with several types of implementation including that makes use of the Grid library. The code is incorporated with the code set Bridge++.
Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
GeantV: from CPU to accelerators

NASA Astrophysics Data System (ADS)

Amadio, G.; Ananya, A.; Apostolakis, J.; Arora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Sehgal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.
Reducing adaptive optics latency using Xeon Phi many-core processors

NASA Astrophysics Data System (ADS)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah

2015-11-01

The next generation of Extremely Large Telescopes (ELTs) for astronomy will rely heavily on the performance of their adaptive optics (AO) systems. Real-time control is at the heart of the critical technologies that will enable telescopes to deliver the best possible science and will require a very significant extrapolation from current AO hardware existing for 4-10 m telescopes. Investigating novel real-time computing architectures and testing their eligibility against anticipated challenges is one of the main priorities of technology development for the ELTs. This paper investigates the suitability of the Intel Xeon Phi, which is a commercial off-the-shelf hardware accelerator. We focus on wavefront reconstruction performance, implementing a straightforward matrix-vector multiplication (MVM) algorithm. We present benchmarking results of the Xeon Phi on a real-time Linux platform, both as a standalone processor and integrated into an existing real-time controller (RTC). Performance of single and multiple Xeon Phis are investigated. We show that this technology has the potential of greatly reducing the mean latency and variations in execution time (jitter) of large AO systems. We present both a detailed performance analysis of the Xeon Phi for a typical E-ELT first-light instrument along with a more general approach that enables us to extend to any AO system size. We show that systematic and detailed performance analysis is an essential part of testing novel real-time control hardware to guarantee optimal science results.
Deploying electromagnetic particle-in-cell (EM-PIC) codes on Xeon Phi accelerators boards

NASA Astrophysics Data System (ADS)

Fonseca, Ricardo

2014-10-01

The complexity of the phenomena involved in several relevant plasma physics scenarios, where highly nonlinear and kinetic processes dominate, makes purely theoretical descriptions impossible. Further understanding of these scenarios requires detailed numerical modeling, but fully relativistic particle-in-cell codes such as OSIRIS are computationally intensive. The quest towards Exaflop computer systems has lead to the development of HPC systems based on add-on accelerator cards, such as GPGPUs and more recently the Xeon Phi accelerators that power the current number 1 system in the world. These cards, also referred to as Intel Many Integrated Core Architecture (MIC) offer peak theoretical performances of >1 TFlop/s for general purpose calculations in a single board, and are receiving significant attention as an attractive alternative to CPUs for plasma modeling. In this work we report on our efforts towards the deployment of an EM-PIC code on a Xeon Phi architecture system. We will focus on the parallelization and vectorization strategies followed, and present a detailed performance evaluation of code performance in comparison with the CPU code.
Time-efficient simulations of tight-binding electronic structures with Intel Xeon PhiTM many-core processors

NASA Astrophysics Data System (ADS)

Ryu, Hoon; Jeong, Yosang; Kang, Ji-Hoon; Cho, Kyu Nam

2016-12-01

Modelling of multi-million atomic semiconductor structures is important as it not only predicts properties of physically realizable novel materials, but can accelerate advanced device designs. This work elaborates a new Technology-Computer-Aided-Design (TCAD) tool for nanoelectronics modelling, which uses a sp3d5s∗ tight-binding approach to describe multi-million atomic structures, and simulate electronic structures with high performance computing (HPC), including atomic effects such as alloy and dopant disorders. Being named as Quantum simulation tool for Advanced Nanoscale Devices (Q-AND), the tool shows nice scalability on traditional multi-core HPC clusters implying the strong capability of large-scale electronic structure simulations, particularly with remarkable performance enhancement on latest clusters of Intel Xeon PhiTM coprocessors. A review of the recent modelling study conducted to understand an experimental work of highly phosphorus-doped silicon nanowires, is presented to demonstrate the utility of Q-AND. Having been developed via Intel Parallel Computing Center project, Q-AND will be open to public to establish a sound framework of nanoelectronics modelling with advanced HPC clusters of a many-core base. With details of the development methodology and exemplary study of dopant electronics, this work will present a practical guideline for TCAD development to researchers in the field of computational nanoelectronics.
Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Doerfler, Douglas; Austin, Brian; Cook, Brandon

There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less
Acceleration of boundary element method for linear elasticity

NASA Astrophysics Data System (ADS)

Zapletal, Jan; Merta, Michal; Čermák, Martin

2017-07-01

In this work we describe the accelerated assembly of system matrices for the boundary element method using the Intel Xeon Phi coprocessors. We present a model problem, provide a brief overview of its discretization and acceleration of the system matrices assembly using the coprocessors, and test the accelerated version using a numerical benchmark.
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

NASA Astrophysics Data System (ADS)

Lyakh, Dmitry I.

2015-04-01

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typically appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the naïve scattering algorithm (no memory access optimization). The tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).
GeantV: From CPU to accelerators

DOE PAGES

Amadio, G.; Ananya, A.; Apostolakis, J.; ...

2016-01-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also tomore » formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. Lastly, we also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.« less
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

NASA Astrophysics Data System (ADS)

Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

2014-03-01

List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery

We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

DOE PAGES

Lyakh, Dmitry I.

2015-01-05

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typicallymore » appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the na ve scattering algorithm (no memory access optimization). Furthermore, the tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).« less
(Re)engineering Earth System Models to Expose Greater Concurrency for Ultrascale Computing: Practice, Experience, and Musings

NASA Astrophysics Data System (ADS)

Mills, R. T.

2014-12-01

As the high performance computing (HPC) community pushes towards the exascale horizon, the importance and prevalence of fine-grained parallelism in new computer architectures is increasing. This is perhaps most apparent in the proliferation of so-called "accelerators" such as the Intel Xeon Phi or NVIDIA GPGPUs, but the trend also holds for CPUs, where serial performance has grown slowly and effective use of hardware threads and vector units are becoming increasingly important to realizing high performance. This has significant implications for weather, climate, and Earth system modeling codes, many of which display impressive scalability across MPI ranks but take relatively little advantage of threading and vector processing. In addition to increasing parallelism, next generation codes will also need to address increasingly deep hierarchies for data movement: NUMA/cache levels, on node vs. off node, local vs. wide neighborhoods on the interconnect, and even in the I/O system. We will discuss some approaches (grounded in experiences with the Intel Xeon Phi architecture) for restructuring Earth science codes to maximize concurrency across multiple levels (vectors, threads, MPI ranks), and also discuss some novel approaches for minimizing expensive data movement/communication.
The parallel algorithm for the 2D discrete wavelet transform

NASA Astrophysics Data System (ADS)

Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel

2018-04-01

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Kalman Filter Tracking on Parallel Architectures

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2015-12-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter [2]. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.
Kalman filter tracking on parallel architectures

NASA Astrophysics Data System (ADS)

Cerati, G.; Elmer, P.; Krutelyov, S.; Lantz, S.; Lefebvre, M.; McDermott, K.; Riley, D.; Tadel, M.; Wittich, P.; Wurthwein, F.; Yagil, A.

2017-10-01

We report on the progress of our studies towards a Kalman filter track reconstruction algorithm with optimal performance on manycore architectures. The combinatorial structure of these algorithms is not immediately compatible with an efficient SIMD (or SIMT) implementation; the challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying instruction set of modern processors. We show how the data and associated tasks can be organized in a way that is conducive to both multithreading and vectorization. We demonstrate very good performance on Intel Xeon and Xeon Phi architectures, as well as promising first results on Nvidia GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava

Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Examples include the Intel Xeon Phi, GPGPUs, and similar technologies. Algorithms should accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expressed as a sequence of small-matrix operations, such as the Kalman filter methods widely in use in high-energy physics experiments. In the High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the dominant computational problems ismore » expected to be finding and fitting charged-particle tracks during event reconstruction; today, the most common track-finding methods are those based on the Kalman filter. Experience at the LHC, both in the trigger and offline, has shown that these methods are robust and provide high physics performance. Previously we reported the significant parallel speedups that resulted from our efforts to adapt Kalman-filter-based tracking to many-core architectures such as Intel Xeon Phi. Here we report on how effectively those techniques can be applied to more realistic detector configurations and event complexity.« less

Wilson Dslash Kernel From Lattice QCD Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joo, Balint; Smelyanskiy, Mikhail; Kalamkar, Dhiraj D.

2015-07-01

Lattice Quantum Chromodynamics (LQCD) is a numerical technique used for calculations in Theoretical Nuclear and High Energy Physics. LQCD is traditionally one of the first applications ported to many new high performance computing architectures and indeed LQCD practitioners have been known to design and build custom LQCD computers. Lattice QCD kernels are frequently used as benchmarks (e.g. 168.wupwise in the SPEC suite) and are generally well understood, and as such are ideal to illustrate several optimization techniques. In this chapter we will detail our work in optimizing the Wilson-Dslash kernels for Intel Xeon Phi, however, as we will show themore » technique gives excellent performance on regular Xeon Architecture as well.« less
Modeling & Analysis of Multicore Architectures for Embedded SIGINT Applications

DTIC Science & Technology

2015-03-01

NVIDIA Kepler K20 [7][8] 2496e 706 225 3520 15.6 Intel Xeon Phi 5110P [9] 60 1050 225 1010 4.5 Adapteva Epiphany [10] 16 – 4K 800 0.270 19 70.4...Cortex A15 and a Kepler GPU with 192 “CUDA” cores, and is more comparable as an HPEEC platform than Tesla series GPUs, such as the NVIDIA C2075 and K20
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

PubMed

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-04-01

Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
MILC Code Performance on High End CPU and GPU Supercomputer Clusters

NASA Astrophysics Data System (ADS)

DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

2018-03-01

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
Application of Intel Many Integrated Core (MIC) architecture to the Yonsei University planetary boundary layer scheme in Weather Research and Forecasting model

NASA Astrophysics Data System (ADS)

Huang, Melin; Huang, Bormin; Huang, Allen H.

2014-10-01

The Weather Research and Forecasting (WRF) model provided operational services worldwide in many areas and has linked to our daily activity, in particular during severe weather events. The scheme of Yonsei University (YSU) is one of planetary boundary layer (PBL) models in WRF. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transports in the whole atmospheric column, determines the flux profiles within the well-mixed boundary layer and the stable layer, and thus provide atmospheric tendencies of temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. The YSU scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. To accelerate the computation process of the YSU scheme, we employ Intel Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.4x. Furthermore, the same CPU-based optimizations improved the performance on Intel Xeon E5-2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

NASA Astrophysics Data System (ADS)

Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

2017-12-01

As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.

PubMed

Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang

2017-03-14

The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).
Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

PubMed

Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S

2014-03-11

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.
WinHPC System Configuration | High-Performance Computing | NREL

Science.gov Websites

CPUs with 48GB of memory. Node 04 has dual Intel Xeon E5530 CPUs with 24GB of memory. Nodes 05-20 have dual AMD Opteron 2374 HE CPUs with 16GB of memory. Nodes 21-30 have been decommissioned. Nodes 31-35 have dual Intel Xeon X5675 CPUs with 48GB of memory. Nodes 36-37 have dual Intel Xeon E5-2680 CPUs with
Implementation of 5-layer thermal diffusion scheme in weather research and forecasting model with Intel Many Integrated Cores

NASA Astrophysics Data System (ADS)

Huang, Melin; Huang, Bormin; Huang, Allen H.

2014-10-01

For weather forecasting and research, the Weather Research and Forecasting (WRF) model has been developed, consisting of several components such as dynamic solvers and physical simulation modules. WRF includes several Land- Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the land's state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. The features, efficient parallelization and vectorization essentials, of Intel Many Integrated Core (MIC) architecture allow us to optimize this WRF 5-layer thermal diffusion scheme. In this work, we present the results of the computing performance on this scheme with Intel MIC architecture. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.1x. Accordingly, the same CPU-based optimizations improved the performance on Intel Xeon E5- 2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
Exploiting MIC architectures for the simulation of channeling of charged particles in crystals

NASA Astrophysics Data System (ADS)

Bagli, Enrico; Karpusenko, Vadim

2016-08-01

Coherent effects of ultra-relativistic particles in crystals is an area of science under development. DYNECHARM + + is a toolkit for the simulation of coherent interactions between high-energy charged particles and complex crystal structures. The particle trajectory in a crystal is computed through numerical integration of the equation of motion. The code was revised and improved in order to exploit parallelization on multi-cores and vectorization of single instructions on multiple data. An Intel Xeon Phi card was adopted for the performance measurements. The computation time was proved to scale linearly as a function of the number of physical and virtual cores. By enabling the auto-vectorization flag of the compiler a three time speedup was obtained. The performances of the card were compared to the Dual Xeon ones.
Performance of VPIC on Trinity

NASA Astrophysics Data System (ADS)

Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Li, H.; Nam, H. A.; Pang, X.; Rust, W. N., III; Wohlbier, J.; Yin, L.; Albright, B. J.

2016-10-01

Trinity is a new major DOE computing resource which is going through final acceptance testing at Los Alamos National Laboratory. Trinity has several new and unique architectural features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes. Additional unique features include use of on package high bandwidth memory (HBM) for the KNL nodes, the ability to configure the KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to port and optimize VPIC to Trinity and evaluate its performance. Because VPIC was recently released as Open Source, it is being used as part of acceptance testing for Trinity and is participating in the Trinity Open Science Program which has resulted in excellent collaboration activities with both Cray and Intel. Results of this work will be presented on performance of VPIC on both Haswell and KNL partitions for both single node runs and runs at scale. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
DD-αAMG on QPACE 3

NASA Astrophysics Data System (ADS)

Georg, Peter; Richtmann, Daniel; Wettig, Tilo

2018-03-01

We describe our experience porting the Regensburg implementation of the DD-αAMG solver from QPACE 2 to QPACE 3. We first review how the code was ported from the first generation Intel Xeon Phi processor (Knights Corner) to its successor (Knights Landing). We then describe the modifications in the communication library necessitated by the switch from InfiniBand to Omni-Path. Finally, we present the performance of the code on a single processor as well as the scaling on many nodes, where in both cases the speedup factor is close to the theoretical expectations.
Cognitive Medical Wireless Testbed System (COMWITS)

DTIC Science & Technology

2016-11-01

Number: ...... ...... Sub Contractors (DD882) Names of other research staff Inventions (DD882) Scientific Progress This testbed merges two ARO grants...bit 64 bit CPU Intel Xeon Processor E5-1650v3 (6C, 3.5 GHz, Turbo, HT , 15M, 140W) Intel Core i7-3770 (3.4 GHz Quad Core, 77W) Dual Intel Xeon
MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Song, Shuaiwen; Fu, Haohuan

2014-08-16

Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo; Sterpin, Edmond

2016-04-15

Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithmmore » of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.« less
Recent Performance Results of VPIC on Trinity

NASA Astrophysics Data System (ADS)

Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Le, A.; Li, H.; Nam, H.; Pang, X.; Stark, D. J.; Rust, W. N., III; Yin, L.; Albright, B. J.

2017-10-01

Trinity is a new DOE compute resource now in production at Los Alamos National Laboratory. Trinity has several new and unique features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes, use of on package high bandwidth memory (HBM) for KNL nodes, ability to configure KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to optimize VPIC on Trinity by taking advantage of these new architectural features. Results of work will be presented on performance of VPIC on Haswell and KNL partitions for single node runs and runs at scale. Results include use of burst buffers at scale to optimize I/O, comparison of strategies for using MPI and threads, performance benefits using HBM and effectiveness of using intrinsics for vectorization. Work performed under auspices of U.S. Dept. of Energy by Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by LANL LDRD program.
A polyphase filter for many-core architectures

NASA Astrophysics Data System (ADS)

Adámek, K.; Novotný, J.; Armour, W.

2016-07-01

In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFLOP/s/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.5 × to 1.92 × greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.
Evaluating Multi-core Architectures through Accelerating the Three-Dimensional Lax–Wendroff Correction

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2014-07-18

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time time-consuming, which greatly limits application’s performance and power efficiency. In this paper, we accelerate the forward modeling technique on the latest multi-core and many-core architectures such as Intel Sandy Bridge CPUs, NVIDIA Fermi C2070 GPU, NVIDIA Kepler K20x GPU, and the Intel Xeon Phi Co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels.more » For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best.« less
Baryonic and mesonic 3-point functions with open spin indices

NASA Astrophysics Data System (ADS)

Bali, Gunnar S.; Collins, Sara; Gläßle, Benjamin; Heybrock, Simon; Korcyl, Piotr; Löffler, Marius; Rödl, Rudolf; Schäfer, Andreas

2018-03-01

We have implemented a new way of computing three-point correlation functions. It is based on a factorization of the entire correlation function into two parts which are evaluated with open spin-(and to some extent flavor-) indices. This allows us to estimate the two contributions simultaneously for many different initial and final states and momenta, with little computational overhead. We explain this factorization as well as its efficient implementation in a new library which has been written to provide the necessary functionality on modern parallel architectures and on CPUs, including Intel's Xeon Phi series.

Aging in the three-dimensional random-field Ising model

NASA Astrophysics Data System (ADS)

von Ohr, Sebastian; Manssen, Markus; Hartmann, Alexander K.

2017-07-01

We studied the nonequilibrium aging behavior of the random-field Ising model in three dimensions for various values of the disorder strength. This allowed us to investigate how the aging behavior changes across the ferromagnetic-paramagnetic phase transition. We investigated a large system size of N =2563 spins and up to 108 Monte Carlo sweeps. To reach these necessary long simulation times, we employed an implementation running on Intel Xeon Phi coprocessors, reaching single-spin-flip times as short as 6 ps. We measured typical correlation functions in space and time to extract a growing length scale and corresponding exponents.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Yao; Balaprakash, Prasanna; Meng, Jiayuan

We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list ofmore » architectural scaling options for future processor designs.« less
Genten: Software for Generalized Tensor Decompositions v. 1.0.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Phipps, Eric T.; Kolda, Tamara G.; Dunlavy, Daniel

Tensors, or multidimensional arrays, are a powerful mathematical means of describing multiway data. This software provides computational means for decomposing or approximating a given tensor in terms of smaller tensors of lower dimension, focusing on decomposition of large, sparse tensors. These techniques have applications in many scientific areas, including signal processing, linear algebra, computer vision, numerical analysis, data mining, graph analysis, neuroscience and more. The software is designed to take advantage of parallelism present emerging computer architectures such has multi-core CPUs, many-core accelerators such as the Intel Xeon Phi, and computation-oriented GPUs to enable efficient processing of large tensors.
Employing OpenCL to Accelerate Ab Initio Calculations on Graphics Processing Units.

PubMed

Kussmann, Jörg; Ochsenfeld, Christian

2017-06-13

We present an extension of our graphics processing units (GPU)-accelerated quantum chemistry package to employ OpenCL compute kernels, which can be executed on a wide range of computing devices like CPUs, Intel Xeon Phi, and AMD GPUs. Here, we focus on the use of AMD GPUs and discuss differences as compared to CUDA-based calculations on NVIDIA GPUs. First illustrative timings are presented for hybrid density functional theory calculations using serial as well as parallel compute environments. The results show that AMD GPUs are as fast or faster than comparable NVIDIA GPUs and provide a viable alternative for quantum chemical applications.
A programming framework for data streaming on the Xeon Phi

NASA Astrophysics Data System (ADS)

Chapeland, S.; ALICE Collaboration

2017-10-01

ALICE (A Large Ion Collider Experiment) is the dedicated heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). After the second long shut-down of the LHC, the ALICE detector will be upgraded to cope with an interaction rate of 50 kHz in Pb-Pb collisions, producing in the online computing system (O2) a sustained throughput of 3.4 TB/s. This data will be processed on the fly so that the stream to permanent storage does not exceed 90 GB/s peak, the raw data being discarded. In the context of assessing different computing platforms for the O2 system, we have developed a framework for the Intel Xeon Phi processors (MIC). It provides the components to build a processing pipeline streaming the data from the PC memory to a pool of permanent threads running on the MIC, and back to the host after processing. It is based on explicit offloading mechanisms (data transfer, asynchronous tasks) and basic building blocks (FIFOs, memory pools, C++11 threads). The user only needs to implement the processing method to be run on the MIC. We present in this paper the architecture, implementation, and performance of this system.
Many-integrated core (MIC) technology for accelerating Monte Carlo simulation of radiation transport: A study based on the code DPM

NASA Astrophysics Data System (ADS)

Rodriguez, M.; Brualla, L.

2018-04-01

Monte Carlo simulation of radiation transport is computationally demanding to obtain reasonably low statistical uncertainties of the estimated quantities. Therefore, it can benefit in a large extent from high-performance computing. This work is aimed at assessing the performance of the first generation of the many-integrated core architecture (MIC) Xeon Phi coprocessor with respect to that of a CPU consisting of a double 12-core Xeon processor in Monte Carlo simulation of coupled electron-photonshowers. The comparison was made twofold, first, through a suite of basic tests including parallel versions of the random number generators Mersenne Twister and a modified implementation of RANECU. These tests were addressed to establish a baseline comparison between both devices. Secondly, through the p DPM code developed in this work. p DPM is a parallel version of the Dose Planning Method (DPM) program for fast Monte Carlo simulation of radiation transport in voxelized geometries. A variety of techniques addressed to obtain a large scalability on the Xeon Phi were implemented in p DPM. Maximum scalabilities of 84 . 2 × and 107 . 5 × were obtained in the Xeon Phi for simulations of electron and photon beams, respectively. Nevertheless, in none of the tests involving radiation transport the Xeon Phi performed better than the CPU. The disadvantage of the Xeon Phi with respect to the CPU owes to the low performance of the single core of the former. A single core of the Xeon Phi was more than 10 times less efficient than a single core of the CPU for all radiation transport simulations.
Kalman Filter Tracking on Parallel Architectures

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2016-11-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment.
Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Song, Shuaiwen; Tallent, Nathan R.; Vishnu, Abhinav

2013-09-23

Future exascale systems must be optimized for both power and performance at scale in order to achieve DOE’s goal of a sustained petaflop within 20 Megawatts by 2022 [1]. Massive parallelism of the future systems combined with complex memory hierarchies will form a barrier to efficient application and architecture design. These challenges are exacerbated with emerging complex architectures such as GPGPUs and Intel Xeon Phi as parallelism increases orders of magnitude and system power consumption can easily triple or quadruple. Therefore, we need techniques that can reduce the search space for optimization, isolate power-performance bottlenecks, identify root causes for software/hardwaremore » inefficiency, and effectively direct runtime scheduling.« less
Performance of GeantV EM Physics Models

NASA Astrophysics Data System (ADS)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Cosmo, G.; Duhem, L.; Elvira, D.; Folger, G.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2017-10-01

The recent progress in parallel hardware architectures with deeper vector pipelines or many-cores technologies brings opportunities for HEP experiments to take advantage of SIMD and SIMT computing models. Launched in 2013, the GeantV project studies performance gains in propagating multiple particles in parallel, improving instruction throughput and data locality in HEP event simulation on modern parallel hardware architecture. Due to the complexity of geometry description and physics algorithms of a typical HEP application, performance analysis is indispensable in identifying factors limiting parallel execution. In this report, we will present design considerations and preliminary computing performance of GeantV physics models on coprocessors (Intel Xeon Phi and NVidia GPUs) as well as on mainstream CPUs.
Electromagnetic Physics Models for Parallel Computing Architectures

NASA Astrophysics Data System (ADS)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen

2014-05-01

The Weather Research and Forecasting (WRF) model is designed for numerical weather prediction and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Numerical models are used to resolve the large-scale flow. However, subgrid-scale parameterizations are for an estimation of small-scale properties (e.g., boundary layer turbulence and convection, clouds, radiation). Those have a significant influence on the resolved scale due to the complex nonlinear nature of the atmosphere. For the cloudy planetary boundary layer (PBL), it is fundamental to parameterize vertical turbulent fluxes and subgrid-scale condensation in a realistic manner. A parameterization based on the Total Energy - Mass Flux (TEMF) that unifies turbulence and moist convection components produces a better result that the other PBL schemes. For that reason, the TEMF scheme is chosen as the PBL scheme we optimized for Intel Many Integrated Core (MIC), which ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our optimization results for TEMF planetary boundary layer scheme. The optimizations that were performed were quite generic in nature. Those optimizations included vectorization of the code to utilize vector units inside each CPU. Furthermore, memory access was improved by scalarizing some of the intermediate arrays. The results show that the optimization improved MIC performance by 14.8x. Furthermore, the optimizations increased CPU performance by 2.6x compared to the original multi-threaded code on quad core Intel Xeon E5-2603 running at 1.8 GHz. Compared to the optimized code running on a single CPU socket the optimized MIC code is 6.2x faster.
A comparison of SuperLU solvers on the intel MIC architecture

NASA Astrophysics Data System (ADS)

Tuncel, Mehmet; Duran, Ahmet; Celebi, M. Serdar; Akaydin, Bora; Topkaya, Figen O.

2016-10-01

In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU_MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing

PubMed Central

Fang, Ye; Ding, Yun; Feinstein, Wei P.; Koppelman, David M.; Moreno, Juana; Jarrell, Mark; Ramanujam, J.; Brylinski, Michal

2016-01-01

Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249. PMID:27420300
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing.

PubMed

Fang, Ye; Ding, Yun; Feinstein, Wei P; Koppelman, David M; Moreno, Juana; Jarrell, Mark; Ramanujam, J; Brylinski, Michal

2016-01-01

Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249.
Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2014-10-01

The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.
Innovative HPC architectures for the study of planetary plasma environments

NASA Astrophysics Data System (ADS)

Amaya, Jorge; Wolf, Anna; Lembège, Bertrand; Zitz, Anke; Alvarez, Damian; Lapenta, Giovanni

2016-04-01

DEEP-ER is an European Commission founded project that develops a new type of High Performance Computer architecture. The revolutionary system is currently used by KU Leuven to study the effects of the solar wind on the global environments of the Earth and Mercury. The new architecture combines the versatility of Intel Xeon computing nodes with the power of the upcoming Intel Xeon Phi accelerators. Contrary to classical heterogeneous HPC architectures, where it is customary to find CPU and accelerators in the same computing nodes, in the DEEP-ER system CPU nodes are grouped together (Cluster) and independently from the accelerator nodes (Booster). The system is equipped with a state of the art interconnection network, a highly scalable and fast I/O and a fail recovery resiliency system. The final objective of the project is to introduce a scalable system that can be used to create the next generation of exascale supercomputers. The code iPic3D from KU Leuven is being adapted to this new architecture. This particle-in-cell code can now perform the computation of the electromagnetic fields in the Cluster while the particles are moved in the Booster side. Using fast and scalable Xeon Phi accelerators in the Booster we can introduce many more particles per cell in the simulation than what is possible in the current generation of HPC systems, allowing to calculate fully kinetic plasmas with very low interpolation noise. The system will be used to perform fully kinetic, low noise, 3D simulations of the interaction of the solar wind with the magnetosphere of the Earth and Mercury. Preliminary simulations have been performed in other HPC centers in order to compare the results in different systems. In this presentation we show the complexity of the plasma flow around the planets, including the development of hydrodynamic instabilities at the flanks, the presence of the collision-less shock, the magnetosheath, the magnetopause, reconnection zones, the formation of the plasma sheet and the magnetotail, and the variation of ion/electron plasma flows when crossing these frontiers. The simulations also give access to detailed information about the particle dynamics and their velocity distribution at locations that can be used for comparison with satellite data.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava

2017-01-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particlemore » tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.« less
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2017-08-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Accelerated Application Development: The ORNL Titan Experience

DOE PAGES

Joubert, Wayne; Archibald, Richard K.; Berrill, Mark A.; ...

2015-05-09

The use of computational accelerators such as NVIDIA GPUs and Intel Xeon Phi processors is now widespread in the high performance computing community, with many applications delivering impressive performance gains. However, programming these systems for high performance, performance portability and software maintainability has been a challenge. In this paper we discuss experiences porting applications to the Titan system. Titan, which began planning in 2009 and was deployed for general use in 2013, was the first multi-petaflop system based on accelerator hardware. To ready applications for accelerated computing, a preparedness effort was undertaken prior to delivery of Titan. In this papermore » we report experiences and lessons learned from this process and describe how users are currently making use of computational accelerators on Titan.« less
Accelerated application development: The ORNL Titan experience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joubert, Wayne; Archibald, Rick; Berrill, Mark

2015-08-01

The use of computational accelerators such as NVIDIA GPUs and Intel Xeon Phi processors is now widespread in the high performance computing community, with many applications delivering impressive performance gains. However, programming these systems for high performance, performance portability and software maintainability has been a challenge. In this paper we discuss experiences porting applications to the Titan system. Titan, which began planning in 2009 and was deployed for general use in 2013, was the first multi-petaflop system based on accelerator hardware. To ready applications for accelerated computing, a preparedness effort was undertaken prior to delivery of Titan. In this papermore » we report experiences and lessons learned from this process and describe how users are currently making use of computational accelerators on Titan.« less

Real-time dedispersion for fast radio transient surveys, using auto tuning on many-core accelerators

NASA Astrophysics Data System (ADS)

Sclocco, A.; van Leeuwen, J.; Bal, H. E.; van Nieuwpoort, R. V.

2016-01-01

Dedispersion, the removal of deleterious smearing of impulsive signals by the interstellar matter, is one of the most intensive processing steps in any radio survey for pulsars and fast transients. We here present a study of the parallelization of this algorithm on many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon Phi. We find that dedispersion is inherently memory-bound. Even in a perfect scenario, hardware limitations keep the arithmetic intensity low, thus limiting performance. We next exploit auto-tuning to adapt dedispersion to different accelerators, observations, and even telescopes. We demonstrate that the optimal settings differ between observational setups, and that auto-tuning significantly improves performance. This impacts time-domain surveys from Apertif to SKA.
Electromagnetic physics models for parallel computing architectures

DOE PAGES

Amadio, G.; Ananya, A.; Apostolakis, J.; ...

2016-11-21

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
First experience of vectorizing electromagnetic physics models for detector simulation

NASA Astrophysics Data System (ADS)

Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.

2015-12-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
Evaluating and optimizing the NERSC workload on Knights Landing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barnes, T; Cook, B; Deslippe, J

2017-01-30

NERSC has partnered with 20 representative application teams to evaluate performance on the Xeon-Phi Knights Landing architecture and develop an application-optimization strategy for the greater NERSC workload on the recently installed Cori system. In this article, we present early case studies and summarized results from a subset of the 20 applications highlighting the impact of important architecture differences between the Xeon-Phi and traditional Xeon processors. We summarize the status of the applications and describe the greater optimization strategy that has formed.
Evaluating and Optimizing the NERSC Workload on Knights Landing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barnes, Taylor; Cook, Brandon; Doerfler, Douglas

2016-01-01

NERSC has partnered with 20 representative application teams to evaluate performance on the Xeon-Phi Knights Landing architecture and develop an application-optimization strategy for the greater NERSC workload on the recently installed Cori system. In this article, we present early case studies and summarized results from a subset of the 20 applications highlighting the impact of important architecture differences between the Xeon-Phi and traditional Xeon processors. We summarize the status of the applications and describe the greater optimization strategy that has formed.
Performance Analysis of GFDL's GCM Line-By-Line Radiative Transfer Model on GPU and MIC Architectures

NASA Astrophysics Data System (ADS)

Menzel, R.; Paynter, D.; Jones, A. L.

2017-12-01

Due to their relatively low computational cost, radiative transfer models in global climate models (GCMs) run on traditional CPU architectures generally consist of shortwave and longwave parameterizations over a small number of wavelength bands. With the rise of newer GPU and MIC architectures, however, the performance of high resolution line-by-line radiative transfer models may soon approach those of the physical parameterizations currently employed in GCMs. Here we present an analysis of the current performance of a new line-by-line radiative transfer model currently under development at GFDL. Although originally designed to specifically exploit GPU architectures through the use of CUDA, the radiative transfer model has recently been extended to include OpenMP in an effort to also effectively target MIC architectures such as Intel's Xeon Phi. Using input data provided by the upcoming Radiative Forcing Model Intercomparison Project (RFMIP, as part of CMIP 6), we compare model results and performance data for various model configurations and spectral resolutions run on both GPU and Intel Knights Landing architectures to analogous runs of the standard Oxford Reference Forward Model on traditional CPUs.
Speeding-up Bioinformatics Algorithms with Heterogeneous Architectures: Highly Heterogeneous Smith-Waterman (HHeterSW).

PubMed

Gálvez, Sergio; Ferusic, Adis; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan A; Dorado, Gabriel

2016-10-01

The Smith-Waterman algorithm has a great sensitivity when used for biological sequence-database searches, but at the expense of high computing-power requirements. To overcome this problem, there are implementations in literature that exploit the different hardware-architectures available in a standard PC, such as GPU, CPU, and coprocessors. We introduce an application that splits the original database-search problem into smaller parts, resolves each of them by executing the most efficient implementations of the Smith-Waterman algorithms in different hardware architectures, and finally unifies the generated results. Using non-overlapping hardware allows simultaneous execution, and up to 2.58-fold performance gain, when compared with any other algorithm to search sequence databases. Even the performance of the popular BLAST heuristic is exceeded in 78% of the tests. The application has been tested with standard hardware: Intel i7-4820K CPU, Intel Xeon Phi 31S1P coprocessors, and nVidia GeForce GTX 960 graphics cards. An important increase in performance has been obtained in a wide range of situations, effectively exploiting the available hardware.
Acceleration of Cherenkov angle reconstruction with the new Intel Xeon/FPGA compute platform for the particle identification in the LHCb Upgrade

NASA Astrophysics Data System (ADS)

Faerber, Christian

2017-10-01

The LHCb experiment at the LHC will upgrade its detector by 2018/2019 to a ‘triggerless’ readout scheme, where all the readout electronics and several sub-detector parts will be replaced. The new readout electronics will be able to readout the detector at 40 MHz. This increases the data bandwidth from the detector down to the Event Filter farm to 40 TBit/s, which also has to be processed to select the interesting proton-proton collision for later storage. The architecture of such a computing farm, which can process this amount of data as efficiently as possible, is a challenging task and several compute accelerator technologies are being considered for use inside the new Event Filter farm. In the high performance computing sector more and more FPGA compute accelerators are used to improve the compute performance and reduce the power consumption (e.g. in the Microsoft Catapult project and Bing search engine). Also for the LHCb upgrade the usage of an experimental FPGA accelerated computing platform in the Event Building or in the Event Filter farm is being considered and therefore tested. This platform from Intel hosts a general CPU and a high performance FPGA linked via a high speed link which is for this platform a QPI link. On the FPGA an accelerator is implemented. The used system is a two socket platform from Intel with a Xeon CPU and an FPGA. The FPGA has cache-coherent memory access to the main memory of the server and can collaborate with the CPU. As a first step, a computing intensive algorithm to reconstruct Cherenkov angles for the LHCb RICH particle identification was successfully ported in Verilog to the Intel Xeon/FPGA platform and accelerated by a factor of 35. The same algorithm was ported to the Intel Xeon/FPGA platform with OpenCL. The implementation work and the performance will be compared. Also another FPGA accelerator the Nallatech 385 PCIe accelerator with the same Stratix V FPGA were tested for performance. The results show that the Intel Xeon/FPGA platforms, which are built in general for high performance computing, are also very interesting for the High Energy Physics community.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE PAGES

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; ...

2017-06-01

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using themore » compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.« less
Large Scale GW Calculations on the Cori System

NASA Astrophysics Data System (ADS)

Deslippe, Jack; Del Ben, Mauro; da Jornada, Felipe; Canning, Andrew; Louie, Steven

The NERSC Cori system, powered by 9000+ Intel Xeon-Phi processors, represents one of the largest HPC systems for open-science in the United States and the world. We discuss the optimization of the GW methodology for this system, including both node level and system-scale optimizations. We highlight multiple large scale (thousands of atoms) case studies and discuss both absolute application performance and comparison to calculations on more traditional HPC architectures. We find that the GW method is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism across many layers of the system. This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, as part of the Computational Materials Sciences Program.
PyPWA: A partial-wave/amplitude analysis software framework

NASA Astrophysics Data System (ADS)

Salgado, Carlos

2016-05-01

The PyPWA project aims to develop a software framework for Partial Wave and Amplitude Analysis of data; providing the user with software tools to identify resonances from multi-particle final states in photoproduction. Most of the code is written in Python. The software is divided into two main branches: one general-shell where amplitude's parameters (or any parametric model) are to be estimated from the data. This branch also includes software to produce simulated data-sets using the fitted amplitudes. A second branch contains a specific realization of the isobar model (with room to include Deck-type and other isobar model extensions) to perform PWA with an interface into the computer resources at Jefferson Lab. We are currently implementing parallelism and vectorization using the Intel's Xeon Phi family of coprocessors.
High Performance Computing and Visualization Infrastructure for Simultaneous Parallel Computing and Parallel Visualization Research

DTIC Science & Technology

2016-11-09

Total Number: Sub Contractors (DD882) Names of Personnel receiving masters degrees Names of personnel receiving PHDs Names of other research staff...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W
TH-A-19A-08: Intel Xeon Phi Implementation of a Fast Multi-Purpose Monte Carlo Simulation for Proton Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Souris, K; Lee, J; Sterpin, E

2014-06-15

Purpose: Recent studies have demonstrated the capability of graphics processing units (GPUs) to compute dose distributions using Monte Carlo (MC) methods within clinical time constraints. However, GPUs have a rigid vectorial architecture that favors the implementation of simplified particle transport algorithms, adapted to specific tasks. Our new, fast, and multipurpose MC code, named MCsquare, runs on Intel Xeon Phi coprocessors. This technology offers 60 independent cores, and therefore more flexibility to implement fast and yet generic MC functionalities, such as prompt gamma simulations. Methods: MCsquare implements several models and hence allows users to make their own tradeoff between speed andmore » accuracy. A 200 MeV proton beam is simulated in a heterogeneous phantom using Geant4 and two configurations of MCsquare. The first one is the most conservative and accurate. The method of fictitious interactions handles the interfaces and secondary charged particles emitted in nuclear interactions are fully simulated. The second, faster configuration simplifies interface crossings and simulates only secondary protons after nuclear interaction events. Integral depth-dose and transversal profiles are compared to those of Geant4. Moreover, the production profile of prompt gammas is compared to PENH results. Results: Integral depth dose and transversal profiles computed by MCsquare and Geant4 are within 3%. The production of secondaries from nuclear interactions is slightly inaccurate at interfaces for the fastest configuration of MCsquare but this is unlikely to have any clinical impact. The computation time varies between 90 seconds for the most conservative settings to merely 59 seconds in the fastest configuration. Finally prompt gamma profiles are also in very good agreement with PENH results. Conclusion: Our new, fast, and multi-purpose Monte Carlo code simulates prompt gammas and calculates dose distributions in less than a minute, which complies with clinical time constraints. It has been successfully validated with Geant4. This work has been financialy supported by InVivoIGT, a public/private partnership between UCL and IBA.« less
Performance Study of Monte Carlo Codes on Xeon Phi Coprocessors — Testing MCNP 6.1 and Profiling ARCHER Geometry Module on the FS7ONNi Problem

NASA Astrophysics Data System (ADS)

Liu, Tianyu; Wolfe, Noah; Lin, Hui; Zieb, Kris; Ji, Wei; Caracappa, Peter; Carothers, Christopher; Xu, X. George

2017-09-01

This paper contains two parts revolving around Monte Carlo transport simulation on Intel Many Integrated Core coprocessors (MIC, also known as Xeon Phi). (1) MCNP 6.1 was recompiled into multithreading (OpenMP) and multiprocessing (MPI) forms respectively without modification to the source code. The new codes were tested on a 60-core 5110P MIC. The test case was FS7ONNi, a radiation shielding problem used in MCNP's verification and validation suite. It was observed that both codes became slower on the MIC than on a 6-core X5650 CPU, by a factor of 4 for the MPI code and, abnormally, 20 for the OpenMP code, and both exhibited limited capability of strong scaling. (2) We have recently added a Constructive Solid Geometry (CSG) module to our ARCHER code to provide better support for geometry modelling in radiation shielding simulation. The functions of this module are frequently called in the particle random walk process. To identify the performance bottleneck we developed a CSG proxy application and profiled the code using the geometry data from FS7ONNi. The profiling data showed that the code was primarily memory latency bound on the MIC. This study suggests that despite low initial porting e_ort, Monte Carlo codes do not naturally lend themselves to the MIC platform — just like to the GPUs, and that the memory latency problem needs to be addressed in order to achieve decent performance gain.
Performance Evaluation of an Intel Haswell- and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications

NASA Technical Reports Server (NTRS)

Saini, Subhash; Hood, Robert T.; Chang, Johnny; Baron, John

2016-01-01

We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5- 2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.
A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

NASA Astrophysics Data System (ADS)

Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

2016-05-01

In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Lattice QCD Calculations in Nuclear Physics towards the Exascale

NASA Astrophysics Data System (ADS)

Joo, Balint

2017-01-01

The combination of algorithmic advances and new highly parallel computing architectures are enabling lattice QCD calculations to tackle ever more complex problems in nuclear physics. In this talk I will review some computational challenges that are encountered in large scale cold nuclear physics campaigns such as those in hadron spectroscopy calculations. I will discuss progress in addressing these with algorithmic improvements such as multi-grid solvers and software for recent hardware architectures such as GPUs and Intel Xeon Phi, Knights Landing. Finally, I will highlight some current topics for research and development as we head towards the Exascale era This material is funded by the U.S. Department of Energy, Office Of Science, Offices of Nuclear Physics, High Energy Physics and Advanced Scientific Computing Research, as well as the Office of Nuclear Physics under contract DE-AC05-06OR23177.
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

PubMed

Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

2017-06-16

Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.
Modeling of Radiotherapy Linac Source Terms Using ARCHER Monte Carlo Code: Performance Comparison for GPU and MIC Parallel Computing Devices

NASA Astrophysics Data System (ADS)

Lin, Hui; Liu, Tianyu; Su, Lin; Bednarz, Bryan; Caracappa, Peter; Xu, X. George

2017-09-01

Monte Carlo (MC) simulation is well recognized as the most accurate method for radiation dose calculations. For radiotherapy applications, accurate modelling of the source term, i.e. the clinical linear accelerator is critical to the simulation. The purpose of this paper is to perform source modelling and examine the accuracy and performance of the models on Intel Many Integrated Core coprocessors (aka Xeon Phi) and Nvidia GPU using ARCHER and explore the potential optimization methods. Phase Space-based source modelling for has been implemented. Good agreements were found in a tomotherapy prostate patient case and a TrueBeam breast case. From the aspect of performance, the whole simulation for prostate plan and breast plan cost about 173s and 73s with 1% statistical error.

Peregrine System | High-Performance Computing | NREL

Science.gov Websites

) and longer-term (/projects) storage. These file systems are mounted on all nodes. Peregrine has three -2670 Xeon processors and 64 GB of memory. In addition to mounting the /home, /nopt, /projects and # cores/node Memory/node Peak (DP) performance per node 88 Intel Xeon E5-2670 "Sandy Bridge" 8
NASA Center for Climate Simulation (NCCS) Presentation

NASA Technical Reports Server (NTRS)

Webster, William P.

2012-01-01

The NASA Center for Climate Simulation (NCCS) offers integrated supercomputing, visualization, and data interaction technologies to enhance NASA's weather and climate prediction capabilities. It serves hundreds of users at NASA Goddard Space Flight Center, as well as other NASA centers, laboratories, and universities across the US. Over the past year, NCCS has continued expanding its data-centric computing environment to meet the increasingly data-intensive challenges of climate science. We doubled our Discover supercomputer's peak performance to more than 800 teraflops by adding 7,680 Intel Xeon Sandy Bridge processor-cores and most recently 240 Intel Xeon Phi Many Integrated Core (MIG) co-processors. A supercomputing-class analysis system named Dali gives users rapid access to their data on Discover and high-performance software including the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT), with interfaces from user desktops and a 17- by 6-foot visualization wall. NCCS also is exploring highly efficient climate data services and management with a new MapReduce/Hadoop cluster while augmenting its data distribution to the science community. Using NCCS resources, NASA completed its modeling contributions to the Intergovernmental Panel on Climate Change (IPCG) Fifth Assessment Report this summer as part of the ongoing Coupled Modellntercomparison Project Phase 5 (CMIP5). Ensembles of simulations run on Discover reached back to the year 1000 to test model accuracy and projected climate change through the year 2300 based on four different scenarios of greenhouse gases, aerosols, and land use. The data resulting from several thousand IPCC/CMIP5 simulations, as well as a variety of other simulation, reanalysis, and observationdatasets, are available to scientists and decision makers through an enhanced NCCS Earth System Grid Federation Gateway. Worldwide downloads have totaled over 110 terabytes of data.
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

NASA Astrophysics Data System (ADS)

Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.

2016-12-01

The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
Optimizing legacy molecular dynamics software with directive-based offload

DOE PAGES

Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; ...

2015-05-14

The directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In our paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We also demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also resultmore » in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMAS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel (R) Xeon Phi (TM) coprocessors and NVIDIA GPUs: The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (C) 2015 Elsevier B.V. All rights reserved.« less
Optimization of atmospheric transport models on HPC platforms

NASA Astrophysics Data System (ADS)

de la Cruz, Raúl; Folch, Arnau; Farré, Pau; Cabezas, Javier; Navarro, Nacho; Cela, José María

2016-12-01

The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output, synchronous communications, work unbalance, memory access latency or lack of task overlapping. We investigate how different software optimizations and porting to non general-purpose hardware architectures improve code scalability and execution times considering, as an example, the FALL3D volcanic ash transport model. To this purpose, we implement the FALL3D model equations in the WARIS framework, a software designed from scratch to solve in a parallel and efficient way different geoscience problems on a wide variety of architectures. In addition, we consider further improvements in WARIS such as hybrid MPI-OMP parallelization, spatial blocking, auto-tuning and thread affinity. Considering all these aspects together, the FALL3D execution times for a realistic test case running on general-purpose cluster architectures (Intel Sandy Bridge) decrease by a factor between 7 and 40 depending on the grid resolution. Finally, we port the application to Intel Xeon Phi (MIC) and NVIDIA GPUs (CUDA) accelerator-based architectures and compare performance, cost and power consumption on all the architectures. Implications on time-constrained operational model configurations are discussed.
Face classification using electronic synapses

NASA Astrophysics Data System (ADS)

Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H.-S. Philip; Qian, He

2017-05-01

Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shipman, Galen M.

These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
Kernel optimization for short-range molecular dynamics

NASA Astrophysics Data System (ADS)

Hu, Changjun; Wang, Xianmeng; Li, Jianjiang; He, Xinfu; Li, Shigang; Feng, Yangde; Yang, Shaofeng; Bai, He

2017-02-01

To optimize short-range force computations in Molecular Dynamics (MD) simulations, multi-threading and SIMD optimizations are presented in this paper. With respect to multi-threading optimization, a Partition-and-Separate-Calculation (PSC) method is designed to avoid write conflicts caused by using Newton's third law. Serial bottlenecks are eliminated with no additional memory usage. The method is implemented by using the OpenMP model. Furthermore, the PSC method is employed on Intel Xeon Phi coprocessors in both native and offload models. We also evaluate the performance of the PSC method under different thread affinities on the MIC architecture. In the SIMD execution, we explain the performance influence in the PSC method, considering the "if-clause" of the cutoff radius check. The experiment results show that our PSC method is relatively more efficient compared to some traditional methods. In double precision, our 256-bit SIMD implementation is about 3 times faster than the scalar version.
Face classification using electronic synapses.

PubMed

Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H-S Philip; Qian, He

2017-05-12

Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
(U) Status of Trinity and Crossroads Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Billy Joe; Lujan, James Westley; Hemmert, K. S.

2017-01-10

(U) This paper provides a general overview of current and future plans for the Advanced Simulation and Computing (ASC) Advanced Technology (AT) systems fielded by the New Mexico Alliance for Computing at Extreme Scale (ACES), a collaboration between Los Alamos Laboratory and Sandia National Laboratories. Additionally, this paper touches on research of technology beyond traditional CMOS. The status of Trinity, ASCs first AT system, and Crossroads, anticipated to succeed Trinity as the third AT system in 2020 will be presented, along with initial performance studies of the Intel Knights Landing Xeon Phi processors, introduced on Trinity. The challenges and opportunitiesmore » for our production simulation codes on AT systems will also be discussed. Trinity and Crossroads are a joint procurement by ACES and Lawrence Berkeley Laboratory as part of the Alliance for application Performance at EXtreme scale (APEX) http://apex.lanl.gov.« less
Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek

2016-10-06

We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW methodmore » is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.« less
LTE-Enhanced Cognitive Radio Network Testbed (LTE-CORNET)

DTIC Science & Technology

2016-11-01

4 PERCENT_SUPPORTEDNAME FTE Equivalent: Total Number: Sub Contractors (DD882) Names of Personnel receiving masters degrees Names of personnel...Turbo, HT , 15M, 140W) Intel Core i7-3770 (3.4 GHz Quad Core, 77W) Dual Intel Xeon E5-2695 v4 (18C, 2.1GHz, 3.3GHz Turbo, 2400MHz, 45MB, 120W
Multi-threaded ATLAS simulation on Intel Knights Landing processors

NASA Astrophysics Data System (ADS)

Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

2017-10-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
Early Experiences Writing Performance Portable OpenMP 4 Codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joubert, Wayne; Hernandez, Oscar R

In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
Development of seismic tomography software for hybrid supercomputers

NASA Astrophysics Data System (ADS)

Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

2015-04-01

Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.
Evaluation of stochastic algorithms for financial mathematics problems from point of view of energy-efficiency

DOE Office of Scientific and Technical Information (OSTI.GOV)

Atanassov, E.; Dimitrov, D., E-mail: d.slavov@bas.bg, E-mail: emanouil@parallel.bas.bg, E-mail: gurov@bas.bg; Gurov, T.

2015-10-28

The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for optionmore » pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.« less
LHCb Kalman Filter cross architecture studies

NASA Astrophysics Data System (ADS)

Cámpora Pérez, Daniel Hugo

2017-10-01

The 2020 upgrade of the LHCb detector will vastly increase the rate of collisions the Online system needs to process in software, in order to filter events in real time. 30 million collisions per second will pass through a selection chain, where each step is executed conditional to its prior acceptance. The Kalman Filter is a fit applied to all reconstructed tracks which, due to its time characteristics and early execution in the selection chain, consumes 40% of the whole reconstruction time in the current trigger software. This makes the Kalman Filter a time-critical component as the LHCb trigger evolves into a full software trigger in the Upgrade. I present a new Kalman Filter algorithm for LHCb that can efficiently make use of any kind of SIMD processor, and its design is explained in depth. Performance benchmarks are compared between a variety of hardware architectures, including x86_64 and Power8, and the Intel Xeon Phi accelerator, and the suitability of said architectures to efficiently perform the LHCb Reconstruction process is determined.
Efficient Calculation of Exact Exchange Within the Quantum Espresso Software Package

NASA Astrophysics Data System (ADS)

Barnes, Taylor; Kurth, Thorsten; Carrier, Pierre; Wichmann, Nathan; Prendergast, David; Kent, Paul; Deslippe, Jack

Accurate simulation of condensed matter at the nanoscale requires careful treatment of the exchange interaction between electrons. In the context of plane-wave DFT, these interactions are typically represented through the use of approximate functionals. Greater accuracy can often be obtained through the use of functionals that incorporate some fraction of exact exchange; however, evaluation of the exact exchange potential is often prohibitively expensive. We present an improved algorithm for the parallel computation of exact exchange in Quantum Espresso, an open-source software package for plane-wave DFT simulation. Through the use of aggressive load balancing and on-the-fly transformation of internal data structures, our code exhibits speedups of approximately an order of magnitude for practical calculations. Additional optimizations are presented targeting the many-core Intel Xeon-Phi ``Knights Landing'' architecture, which largely powers NERSC's new Cori system. We demonstrate the successful application of the code to difficult problems, including simulation of water at a platinum interface and computation of the X-ray absorption spectra of transition metal oxides.
Traditional Tracking with Kalman Filter on Parallel Architectures

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; MacNeill, Ian; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2015-05-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this, we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The most common track finding techniques in use today are however those based on the Kalman Filter. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. We report the results of our investigations into the potential and limitations of these algorithms on the new parallel hardware.
Understanding Portability of a High-Level Programming Model on Contemporary Heterogeneous Architectures

DOE PAGES

Sabne, Amit J.; Sakdhnagool, Putt; Lee, Seyong; ...

2015-07-13

Accelerator-based heterogeneous computing is gaining momentum in the high-performance computing arena. However, the increased complexity of heterogeneous architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle this problem. Although the abstraction provided by OpenACC offers productivity, it raises questions concerning both functional and performance portability. In this article, the authors propose HeteroIR, a high-level, architecture-independent intermediate representation, to map high-level programming models, such as OpenACC, to heterogeneous architectures. They present a compiler approach that translates OpenACC programs into HeteroIR and accelerator kernels to obtain OpenACC functional portability. They then evaluate the performance portability obtained bymore » OpenACC with their approach on 12 OpenACC programs on Nvidia CUDA, AMD GCN, and Intel Xeon Phi architectures. They study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.« less

Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs: A Case Study with Microscopy Image Analysis

PubMed Central

Teodoro, George; Kurc, Tahsin; Andrade, Guilherme; Kong, Jun; Ferreira, Renato; Saltz, Joel

2015-01-01

We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations. PMID:28239253
High performance in silico virtual drug screening on many-core processors.

PubMed

McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-05-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
Evaluation of stochastic algorithms for financial mathematics problems from point of view of energy-efficiency

NASA Astrophysics Data System (ADS)

Atanassov, E.; Dimitrov, D.; Gurov, T.

2015-10-01

The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for option pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.
Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

NASA Astrophysics Data System (ADS)

Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.

2018-05-01

Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
Big Data, Deep Learning and Tianhe-2 at Sun Yat-Sen University, Guangzhou

NASA Astrophysics Data System (ADS)

Yuen, D. A.; Dzwinel, W.; Liu, J.; Zhang, K.

2014-12-01

In this decade the big data revolution has permeated in many fields, ranging from financial transactions, medical surveys and scientific endeavors, because of the big opportunities people see ahead. What to do with all this data remains an intriguing question. This is where computer scientists together with applied mathematicians have made some significant inroads in developing deep learning techniques for unraveling new relationships among the different variables by means of correlation analysis and data-assimilation methods. Deep-learning and big data taken together is a grand challenge task in High-performance computing which demand both ultrafast speed and large memory. The Tianhe-2 recently installed at Sun Yat-Sen University in Guangzhou is well positioned to take up this challenge because it is currently the world's fastest computer at 34 Petaflops. Each compute node of Tianhe-2 has two CPUs of Intel Xeon E5-2600 and three Xeon Phi accelerators. The Tianhe-2 has a very large fast memory RAM of 88 Gigabytes on each node. The system has a total memory of 1,375 Terabytes. All of these technical features will allow very high dimensional (more than 10) problem in deep learning to be explored carefully on the Tianhe-2. Problems in seismology which can be solved include three-dimensional seismic wave simulations of the whole Earth with a few km resolution and the recognition of new phases in seismic wave form from assemblage of large data sets.
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagasaka, Y; Matsuoka, S; Azad, A

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Druinsky, Alex; Ghysels, Pieter; Li, Xiaoye S.

In this paper, we study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSS-embedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservoir simulations. We contrast the performance of our code on one 12-core socket of a Cray XC30 machine with performance on a 60-core Intel Xeon Phi coprocessor. To obtain top performance, we optimized the code to take full advantage of fine-grained parallelism andmore » made it thread-friendly for high thread count. We also developed a bounds-and-bottlenecks performance model of the solver which we used to guide us through the optimization effort, and also carried out performance tuning in the solver’s large parameter space. Finally, as a result, significant speedups were obtained on both machines.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
Earth system modelling on system-level heterogeneous architectures: EMAC (version 2.42) on the Dynamical Exascale Entry Platform (DEEP)

NASA Astrophysics Data System (ADS)

Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik

2016-09-01

We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

DOE PAGES

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry; ...

2016-10-27

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lyakh, Dmitry I.

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typicallymore » appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the na ve scattering algorithm (no memory access optimization). Furthermore, the tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).« less
Toward Exascale Earthquake Ground Motion Simulations for Near-Fault Engineering Analysis

DOE PAGES

Johansen, Hans; Rodgers, Arthur; Petersson, N. Anders; ...

2017-09-01

Modernizing SW4 for massively parallel time-domain simulations of earthquake ground motions in 3D earth models increases resolution and provides ground motion estimates for critical infrastructure risk evaluations. Simulations of ground motions from large (M ≥ 7.0) earthquakes require domains on the order of 100 to500 km and spatial granularity on the order of 1 to5 m resulting in hundreds of billions of grid points. Surface-focused structured mesh refinement (SMR) allows for more constant grid point per wavelength scaling in typical Earth models, where wavespeeds increase with depth. In fact, MR allows for simulations to double the frequency content relative tomore » a fixed grid calculation on a given resource. The authors report improvements to the SW4 algorithm developed while porting the code to the Cori Phase 2 (Intel Xeon Phi) systems at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. As a result, investigations of the performance of the innermost loop of the calculations found that reorganizing the order of operations can improve performance for massive problems.« less
Toward Exascale Earthquake Ground Motion Simulations for Near-Fault Engineering Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johansen, Hans; Rodgers, Arthur; Petersson, N. Anders

Modernizing SW4 for massively parallel time-domain simulations of earthquake ground motions in 3D earth models increases resolution and provides ground motion estimates for critical infrastructure risk evaluations. Simulations of ground motions from large (M ≥ 7.0) earthquakes require domains on the order of 100 to500 km and spatial granularity on the order of 1 to5 m resulting in hundreds of billions of grid points. Surface-focused structured mesh refinement (SMR) allows for more constant grid point per wavelength scaling in typical Earth models, where wavespeeds increase with depth. In fact, MR allows for simulations to double the frequency content relative tomore » a fixed grid calculation on a given resource. The authors report improvements to the SW4 algorithm developed while porting the code to the Cori Phase 2 (Intel Xeon Phi) systems at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. As a result, investigations of the performance of the innermost loop of the calculations found that reorganizing the order of operations can improve performance for massive problems.« less
Numerical solution of the Navier-Stokes equations by discontinuous Galerkin method

NASA Astrophysics Data System (ADS)

Krasnov, M. M.; Kuchugov, P. A.; E Ladonkina, M.; E Lutsky, A.; Tishkin, V. F.

2017-02-01

Detailed unstructured grids and numerical methods of high accuracy are frequently used in the numerical simulation of gasdynamic flows in areas with complex geometry. Galerkin method with discontinuous basis functions or Discontinuous Galerkin Method (DGM) works well in dealing with such problems. This approach offers a number of advantages inherent to both finite-element and finite-difference approximations. Moreover, the present paper shows that DGM schemes can be viewed as Godunov method extension to piecewise-polynomial functions. As is known, DGM involves significant computational complexity, and this brings up the question of ensuring the most effective use of all the computational capacity available. In order to speed up the calculations, operator programming method has been applied while creating the computational module. This approach makes possible compact encoding of mathematical formulas and facilitates the porting of programs to parallel architectures, such as NVidia CUDA and Intel Xeon Phi. With the software package, based on DGM, numerical simulations of supersonic flow past solid bodies has been carried out. The numerical results are in good agreement with the experimental ones.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

PubMed

Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

2018-05-03

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Analytical Performance Modeling and Validation of Intel’s Xeon Phi Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chunduri, Sudheer; Balaprakash, Prasanna; Morozov, Vitali

Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive because even a reasonably accurate model can be useful for performance tuning before the hardware is made available. In this paper, we develop a hardware model for Intel’s second-generation Xeon Phi architecture code-named Knights Landing (KNL) for the SKOPE framework. We validate the KNL hardware model by projecting the performance of mini-benchmarks and application kernels. The results show that our KNL model can project the performance with prediction errorsmore » of 10% to 20%. The hardware model also provides informative recommendations for code transformations and tuning.« less
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;

2006-01-01

The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.

Evaluation of an Adaptive Automation Trigger Based on Task Performance, Priority, and Frequency

DTIC Science & Technology

2013-06-01

with dual Intel ® Xeon ® CPU x5550 processors @ 2.67 GHz each, 12.0 GB RAM, and a 1.5 GB PCIe nVidia Quadro FX 4800 graphics card (Microsoft...Cole Publishing Company . Miller, C. A., & Parasuraman, R. (2007). Designing for flexible interaction between humans and automation: Delegation
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Hongzhang; Williams, Samuel; Jong, Wibe de

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Hongzhang; Williams, Samuel; de Jong, Wibe

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less

Preparing for Exascale: Towards convection-permitting, global atmospheric simulations with the Model for Prediction Across Scales (MPAS)

NASA Astrophysics Data System (ADS)

Heinzeller, Dominikus; Duda, Michael G.; Kunstmann, Harald

2017-04-01

With strong financial and political support from national and international initiatives, exascale computing is projected for the end of this decade. Energy requirements and physical limitations imply the use of accelerators and the scaling out to orders of magnitudes larger numbers of cores then today to achieve this milestone. In order to fully exploit the capabilities of these Exascale computing systems, existing applications need to undergo significant development. The Model for Prediction Across Scales (MPAS) is a novel set of Earth system simulation components and consists of an atmospheric core, an ocean core, a land-ice core and a sea-ice core. Its distinct features are the use of unstructured Voronoi meshes and C-grid discretisation to address shortcomings of global models on regular grids and the use of limited area models nested in a forcing data set, with respect to parallel scalability, numerical accuracy and physical consistency. Here, we present work towards the application of the atmospheric core (MPAS-A) on current and future high performance computing systems for problems at extreme scale. In particular, we address the issue of massively parallel I/O by extending the model to support the highly scalable SIONlib library. Using global uniform meshes with a convection-permitting resolution of 2-3km, we demonstrate the ability of MPAS-A to scale out to half a million cores while maintaining a high parallel efficiency. We also demonstrate the potential benefit of a hybrid parallelisation of the code (MPI/OpenMP) on the latest generation of Intel's Many Integrated Core Architecture, the Intel Xeon Phi Knights Landing.
High-performance modeling of plasma-based acceleration and laser-plasma interactions

NASA Astrophysics Data System (ADS)

Vay, Jean-Luc; Blaclard, Guillaume; Godfrey, Brendan; Kirchen, Manuel; Lee, Patrick; Lehe, Remi; Lobet, Mathieu; Vincenti, Henri

2016-10-01

Large-scale numerical simulations are essential to the design of plasma-based accelerators and laser-plasma interations for ultra-high intensity (UHI) physics. The electromagnetic Particle-In-Cell (PIC) approach is the method of choice for self-consistent simulations, as it is based on first principles, and captures all kinetic effects, and also scale favorably to many cores on supercomputers. The standard PIC algorithm relies on second-order finite-difference discretization of the Maxwell and Newton-Lorentz equations. We present here novel formulations, based on very high-order pseudo-spectral Maxwell solvers, which enable near-total elimination of the numerical Cherenkov instability and increased accuracy over the standard PIC method for standard laboratory frame and Lorentz boosted frame simulations. We also present the latest implementations in the PIC modules Warp-PICSAR and FBPIC on the Intel Xeon Phi and GPU architectures. Examples of applications will be given on the simulation of laser-plasma accelerators and high-harmonic generation with plasma mirrors. Work supported by US-DOE Contracts DE-AC02-05CH11231 and by the European Commission through the Marie Slowdoska-Curie fellowship PICSSAR Grant Number 624543. Used resources of NERSC.
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

NASA Astrophysics Data System (ADS)

Shi, X.

2017-10-01

Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Python in the NERSC Exascale Science Applications Program for Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ronaghi, Zahra; Thomas, Rollin; Deslippe, Jack

We describe a new effort at the National Energy Re- search Scientific Computing Center (NERSC) in performance analysis and optimization of scientific Python applications targeting the Intel Xeon Phi (Knights Landing, KNL) many- core architecture. The Python-centered work outlined here is part of a larger effort called the NERSC Exascale Science Applications Program (NESAP) for Data. NESAP for Data focuses on applications that process and analyze high-volume, high-velocity data sets from experimental/observational science (EOS) facilities supported by the US Department of Energy Office of Science. We present three case study applications from NESAP for Data that use Python. These codesmore » vary in terms of “Python purity” from applications developed in pure Python to ones that use Python mainly as a convenience layer for scientists without expertise in lower level programming lan- guages like C, C++ or Fortran. The science case, requirements, constraints, algorithms, and initial performance optimizations for each code are discussed. Our goal with this paper is to contribute to the larger conversation around the role of Python in high-performance computing today and tomorrow, highlighting areas for future work and emerging best practices« less
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations

NASA Astrophysics Data System (ADS)

Bernaschi, M.; Bisson, M.; Salvadore, F.

2014-10-01

We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
Connectivity: Performance Portable Algorithms for graph connectivity v. 0.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slota, George; Rajamanickam, Sivasankaran; Madduri, Kamesh

Graphs occur in several places in real world from road networks, social networks and scientific simulations. Connectivity is a graph analysis software to graph connectivity in modern architectures like multicore CPUs, Xeon Phi and GPUs.
A High Performance Computing Framework for Physics-based Modeling and Simulation of Military Ground Vehicles

DTIC Science & Technology

2011-03-25

number one and Nebulae at number three. Both systems rely on GPU co-processing and use Intel Xeon processors cards and NVIDIA Tesla C2050 GPUs. In...spite of a theoretical peak capability of almost 3 Petaflop/s, Nebulae clocked at 1.271 PFlop/s when running the Linpack benchmark, which puts it
Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

2016-12-01

The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Porting plasma physics simulation codes to modern computing architectures using the libmrc framework

NASA Astrophysics Data System (ADS)

Germaschewski, Kai; Abbott, Stephen

2015-11-01

Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source libmrc framework that has been used to modularize and port three plasma physics codes: The extended MHD code MRCv3 with implicit time integration and curvilinear grids; the OpenGGCM global magnetosphere model; and the particle-in-cell code PSC. libmrc consolidates basic functionality needed for simulations based on structured grids (I/O, load balancing, time integrators), and also introduces a parallel object model that makes it possible to maintain multiple implementations of computational kernels, on e.g. conventional processors and GPUs. It handles data layout conversions and enables us to port performance-critical parts of a code to a new architecture step-by-step, while the rest of the code can remain unchanged. We will show examples of the performance gains and some physics applications.
Chapter 13. Exploring Use of the Reserved Core

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holmen, John; Humphrey, Alan; Berzins, Martin

2015-07-29

In this chapter, we illustrate benefits of thinking in terms of thread management techniques when using a centralized scheduler model along with interoperability of MPI and PThread. This is facilitated through an exploration of thread placement strategies for an algorithm modeling radiative heat transfer with special attention to the 61st core. This algorithm plays a key role within the Uintah Computational Framework (UCF) and current efforts taking place at the University of Utah to model next-generation, large-scale clean coal boilers. In such simulations, this algorithm models the dominant form of heat transfer and consumes a large portion of compute time.more » Exemplified by a real-world example, this chapter presents our early efforts in porting a key portion of a scalability-centric codebase to the Intel Xeon Phi coprocessor. Specifically, this chapter presents results from our experiments profiling the native execution of a reverse Monte-Carlo ray tracing-based radiation model on a single coprocessor. These results demonstrate that our fastest run configurations utilized the 61st core and that performance was not profoundly impacted when explicitly oversubscribing the coprocessor operating system thread. Additionally, this chapter presents a portion of radiation model source code, a MIC-centric UCF cross-compilation example, and less conventional thread management technique for developers utilizing the PThreads threading model.« less
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Sourouri, Mohammed; Birger Raknes, Espen

2017-04-01

In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-code Processors

NASA Astrophysics Data System (ADS)

Linderman, R.; Spetka, S.; Fitzgerald, D.; Emeny, S.

The Physically-Constrained Iterative Deconvolution (PCID) image deblurring code is being ported to heterogeneous networks of multi-core systems, including Intel Xeons and IBM Cell Broadband Engines. This paper reports results from experiments using the JAWS supercomputer at MHPCC (60 TFLOPS of dual-dual Xeon nodes linked with Infiniband) and the Cell Cluster at AFRL in Rome, NY. The Cell Cluster has 52 TFLOPS of Playstation 3 (PS3) nodes with IBM Cell Broadband Engine multi-cores and 15 dual-quad Xeon head nodes. The interconnect fabric includes Infiniband, 10 Gigabit Ethernet and 1 Gigabit Ethernet to each of the 336 PS3s. The results compare approaches to parallelizing FFT executions across the Xeons and the Cell's Synergistic Processing Elements (SPEs) for frame-level image processing. The experiments included Intel's Performance Primitives and Math Kernel Library, FFTW3.2, and Carnegie Mellon's SPIRAL. Optimization of FFTs in the PCID code led to a decrease in relative processing time for FFTs. Profiling PCID version 6.2, about one year ago, showed the 13 functions that accounted for the highest percentage of processing were all FFT processing functions. They accounted for over 88% of processing time in one run on Xeons. FFT optimizations led to improvement in the current PCID version 8.0. A recent profile showed that only two of the 19 functions with the highest processing time were FFT processing functions. Timing measurements showed that FFT processing for PCID version 8.0 has been reduced to less than 19% of overall processing time. We are working toward a goal of scaling to 200-400 cores per job (1-2 imagery frames/core). Running a pair of cores on each set of frames reduces latency by implementing parallel FFT processing. Our current results show scaling well out to 100 pairs of cores. These results support the next higher level of parallelism in PCID, where groups of several hundred frames each producing one resolved image are sent to cliques of several hundred cores in a round robin fashion. Current efforts toward further performance enhancement for PCID are shifting toward using the Playstations in conjunction with the Xeons to take advantage of outstanding price/performance as well as the Flops/Watt cost advantage. We are fine-tuning the PCID parallization strategy to balance processing over Xeons and Cell BEs to find an optimal partitioning of PCID over the heterogeneous processors. A high performance information management system that exploits native Infiniband multicast is used to improve latency among the head nodes. Using a publication/subscription oriented information management system to implement a unified communications platform makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant. It features a loose couplingof publishers to subscribers through intervening brokers. We are also working on enhancing performance for both Xeons and Cell BEs, buy moving selected operations to single precision. Techniques for adapting the code to single precision and performance results are reported.
A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

DTIC Science & Technology

2012-02-17

to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of
Time-domain seismic modeling in viscoelastic media for full waveform inversion on heterogeneous computing platforms with OpenCL

NASA Astrophysics Data System (ADS)

Fabien-Ouellet, Gabriel; Gloaguen, Erwan; Giroux, Bernard

2017-03-01

Full Waveform Inversion (FWI) aims at recovering the elastic parameters of the Earth by matching recordings of the ground motion with the direct solution of the wave equation. Modeling the wave propagation for realistic scenarios is computationally intensive, which limits the applicability of FWI. The current hardware evolution brings increasing parallel computing power that can speed up the computations in FWI. However, to take advantage of the diversity of parallel architectures presently available, new programming approaches are required. In this work, we explore the use of OpenCL to develop a portable code that can take advantage of the many parallel processor architectures now available. We present a program called SeisCL for 2D and 3D viscoelastic FWI in the time domain. The code computes the forward and adjoint wavefields using finite-difference and outputs the gradient of the misfit function given by the adjoint state method. To demonstrate the code portability on different architectures, the performance of SeisCL is tested on three different devices: Intel CPUs, NVidia GPUs and Intel Xeon PHI. Results show that the use of GPUs with OpenCL can speed up the computations by nearly two orders of magnitudes over a single threaded application on the CPU. Although OpenCL allows code portability, we show that some device-specific optimization is still required to get the best performance out of a specific architecture. Using OpenCL in conjunction with MPI allows the domain decomposition of large models on several devices located on different nodes of a cluster. For large enough models, the speedup of the domain decomposition varies quasi-linearly with the number of devices. Finally, we investigate two different approaches to compute the gradient by the adjoint state method and show the significant advantages of using OpenCL for FWI.
Case for a field-programmable gate array multicore hybrid machine for an image-processing application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos

2011-01-01

General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
Particle In Cell Codes on Highly Parallel Architectures

NASA Astrophysics Data System (ADS)

Tableman, Adam

2014-10-01

We describe strategies and examples of Particle-In-Cell Codes running on Nvidia GPU and Intel Phi architectures. This includes basic implementations in skeletons codes and full-scale development versions (encompassing 1D, 2D, and 3D codes) in Osiris. Both the similarities and differences between Intel's and Nvidia's hardware will be examined. Work supported by grants NSF ACI 1339893, DOE DE SC 000849, DOE DE SC 0008316, DOE DE NA 0001833, and DOE DE FC02 04ER 54780.
Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

NASA Astrophysics Data System (ADS)

Ford, Eric B.; Dindar, Saleh; Peters, Jorg

2015-08-01

The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
Optimization of Selected Remote Sensing Algorithms for Embedded NVIDIA Kepler GPU Architecture

NASA Technical Reports Server (NTRS)

Riha, Lubomir; Le Moigne, Jacqueline; El-Ghazawi, Tarek

2015-01-01

This paper evaluates the potential of embedded Graphic Processing Units in the Nvidias Tegra K1 for onboard processing. The performance is compared to a general purpose multi-core CPU and full fledge GPU accelerator. This study uses two algorithms: Wavelet Spectral Dimension Reduction of Hyperspectral Imagery and Automated Cloud-Cover Assessment (ACCA) Algorithm. Tegra K1 achieved 51 for ACCA algorithm and 20 for the dimension reduction algorithm, as compared to the performance of the high-end 8-core server Intel Xeon CPU with 13.5 times higher power consumption.
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.

PubMed

Andrade, G; Ferreira, R; Teodoro, George; Rocha, Leonardo; Saltz, Joel H; Kurc, Tahsin

2014-10-01

High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems

PubMed Central

Andrade, G.; Ferreira, R.; Teodoro, George; Rocha, Leonardo; Saltz, Joel H.; Kurc, Tahsin

2015-01-01

High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales. PMID:26640423

Locality Aware Concurrent Start for Stencil Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Gao, Guang R.; Manzano Franco, Joseph B.

Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodesmore » with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less
MSTor: A program for calculating partition functions, free energies, enthalpies, entropies, and heat capacities of complex molecules including torsional anharmonicity

NASA Astrophysics Data System (ADS)

Zheng, Jingjing; Mielke, Steven L.; Clarkson, Kenneth L.; Truhlar, Donald G.

2012-08-01

We present a Fortran program package, MSTor, which calculates partition functions and thermodynamic functions of complex molecules involving multiple torsional motions by the recently proposed MS-T method. This method interpolates between the local harmonic approximation in the low-temperature limit, and the limit of free internal rotation of all torsions at high temperature. The program can also carry out calculations in the multiple-structure local harmonic approximation. The program package also includes six utility codes that can be used as stand-alone programs to calculate reduced moment of inertia matrices by the method of Kilpatrick and Pitzer, to generate conformational structures, to calculate, either analytically or by Monte Carlo sampling, volumes for torsional subdomains defined by Voronoi tessellation of the conformational subspace, to generate template input files, and to calculate one-dimensional torsional partition functions using the torsional eigenvalue summation method. Catalogue identifier: AEMF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 77 434 No. of bytes in distributed program, including test data, etc.: 3 264 737 Distribution format: tar.gz Programming language: Fortran 90, C, and Perl Computer: Itasca (HP Linux cluster, each node has two-socket, quad-core 2.8 GHz Intel Xeon X5560 “Nehalem EP” processors), Calhoun (SGI Altix XE 1300 cluster, each node containing two quad-core 2.66 GHz Intel Xeon “Clovertown”-class processors sharing 16 GB of main memory), Koronis (Altix UV 1000 server with 190 6-core Intel Xeon X7542 “Westmere” processors at 2.66 GHz), Elmo (Sun Fire X4600 Linux cluster with AMD Opteron cores), and Mac Pro (two 2.8 GHz Quad-core Intel Xeon processors) Operating system: Linux/Unix/Mac OS RAM: 2 Mbytes Classification: 16.3, 16.12, 23 Nature of problem: Calculation of the partition functions and thermodynamic functions (standard-state energy, enthalpy, entropy, and free energy as functions of temperatures) of complex molecules involving multiple torsional motions. Solution method: The multi-structural approximation with torsional anharmonicity (MS-T). The program also provides results for the multi-structural local harmonic approximation [1]. Restrictions: There is no limit on the number of torsions that can be included in either the Voronoi calculation or the full MS-T calculation. In practice, the range of problems that can be addressed with the present method consists of all multi-torsional problems for which one can afford to calculate all the conformations and their frequencies. Unusual features: The method can be applied to transition states as well as stable molecules. The program package also includes the hull program for the calculation of Voronoi volumes and six utility codes that can be used as stand-alone programs to calculate reduced moment-of-inertia matrices by the method of Kilpatrick and Pitzer, to generate conformational structures, to calculate, either analytically or by Monte Carlo sampling, volumes for torsional subdomain defined by Voronoi tessellation of the conformational subspace, to generate template input files, and to calculate one-dimensional torsional partition functions using the torsional eigenvalue summation method. Additional comments: The program package includes a manual, installation script, and input and output files for a test suite. Running time: There are 24 test runs. The running time of the test runs on a single processor of the Itasca computer is less than 2 seconds. J. Zheng, T. Yu, E. Papajak, I.M. Alecu, S.L. Mielke, D.G. Truhlar, Practical methods for including torsional anharmonicity in thermochemical calculations of complex molecules: The internal-coordinate multi-structural approximation, Phys. Chem. Chem. Phys. 13 (2011) 10885-10907.
An efficient implementation of semi-numerical computation of the Hartree-Fock exchange on the Intel Phi processor

NASA Astrophysics Data System (ADS)

Liu, Fenglai; Kong, Jing

2018-07-01

Unique technical challenges and their solutions for implementing semi-numerical Hartree-Fock exchange on the Phil Processor are discussed, especially concerning the single- instruction-multiple-data type of processing and small cache size. Benchmark calculations on a series of buckyball molecules with various Gaussian basis sets on a Phi processor and a six-core CPU show that the Phi processor provides as much as 12 times of speedup with large basis sets compared with the conventional four-center electron repulsion integration approach performed on the CPU. The accuracy of the semi-numerical scheme is also evaluated and found to be comparable to that of the resolution-of-identity approach.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

PubMed

Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

2014-06-25

The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions

PubMed Central

2014-01-01

Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis

PubMed Central

Teodoro, George; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel

2014-01-01

We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs). PMID:25419088
MFIX-DEM Phi: Performance and Capability Improvements Towards Industrial Grade Open-source DEM Framework with Integrated Uncertainty Quantification

DOE Office of Scientific and Technical Information (OSTI.GOV)

GEL, Aytekin; Jiao, Yang; Emady, Heather

Two major challenges hinder the effective use and adoption of multiphase computational fluid dynamics tools by the industry. The first is the need for significant computational resources, which is inversely proportional to the accuracy of solutions due to computational intensity of the algorithms. The second barrier is assessing the prediction credibility and confidence in the simulation results. In this project, a multi-tiered approach has been proposed under four broad activities to overcome these challenges while addressing all of the objectives outlined in FOA-0001238 through Phases 1 and 2 of the project. The present report consists of the results for onlymore » Phase 1, which was the funded performance period. From the start the project, all of the objectives outlined in FOA were addressed through four major activity tasks in an integrated and balanced fashion to improve adoption of MFIX suite of solvers for industrial use. The first task aimed to improve the performance of MFIX-DEM specifically targeting to acquire the peak performance on Intel Xeon and Xeon Phi based systems, which are expected to be one of the primary high-performance computing platforms both affordable and available for the industrial users in the next two to five years. However, due to a number of changes in course of the project, the scope of the performance improvements related task was significantly reduced to avoid duplicate work. Hence, more emphasis was placed on the other three tasks as discussed below.The second task aimed at physical modeling enhancements through implementation of polydispersity capability and validation of heat transfer models in MFIX. An extended verification and validation (V&V) study was performed for the new polydispersity feature implemented in MFIX-DEM both for granular and coupled gas-solid flows. The features of the polydispersity capability and results for an industrially relevant problem were disseminated through journal papers (one published and one under review at the time of writing of the final technical report). As part of the validation efforts, another industrially relevant problem of interest based on rotary drums was studied for several modes of heat transfer and results were presented in conferences. Third task was aimed towards an important and unique contribution of the project, which was to develop a unified uncertainty quantification framework by integrating MFIX-DEM with a graphical user interface (GUI) driven uncertainty quantification (UQ) engine, i.e., MFIX-GUI and PSUADE. The goal was to enable a user with only modest knowledge of statistics to effectively utilize the UQ framework offered with MFIX-DEM Phi to perform UQ analysis routinely. For Phase 1, a proof-of-concept demonstration of the proposed framework was completed and shared. Direct industry involvement was one of the key virtues of this project, which was performed through forth task. For this purpose, even at the proposal stage, the project team received strong interest in the proposed capabilities from two major corporations, which were further expanded throughout Phase 1 and a new collaboration with another major corporation from chemical industry was also initiated. The level of interest received and continued collaboration for the project during Phase 1 clearly shows the relevance and potential impact of the project for the industrial users.« less
Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method

DTIC Science & Technology

2015-06-01

5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Optimizing legacy molecular dynamics software with directive-based offload

NASA Astrophysics Data System (ADS)

Michael Brown, W.; Carrillo, Jan-Michael Y.; Gavhane, Nitin; Thakkar, Foram M.; Plimpton, Steven J.

2015-10-01

Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms can run efficiently on both the CPU and an x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel® Xeon Phi™ coprocessors and NVIDIA GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS.
Spectral-element simulation of two-dimensional elastic wave propagation in fully heterogeneous media on a GPU cluster

NASA Astrophysics Data System (ADS)

Rudianto, Indra; Sudarmaji

2018-04-01

We present an implementation of the spectral-element method for simulation of two-dimensional elastic wave propagation in fully heterogeneous media. We have incorporated most of realistic geological features in the model, including surface topography, curved layer interfaces, and 2-D wave-speed heterogeneity. To accommodate such complexity, we use an unstructured quadrilateral meshing technique. Simulation was performed on a GPU cluster, which consists of 24 core processors Intel Xeon CPU and 4 NVIDIA Quadro graphics cards using CUDA and MPI implementation. We speed up the computation by a factor of about 5 compared to MPI only, and by a factor of about 40 compared to Serial implementation.
Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena

NASA Astrophysics Data System (ADS)

Pankratius, V.; Gowanlock, M.; Blair, D. M.

2015-12-01

Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).
Spectral-element Seismic Wave Propagation on CUDA/OpenCL Hardware Accelerators

NASA Astrophysics Data System (ADS)

Peter, D. B.; Videau, B.; Pouget, K.; Komatitsch, D.

2015-12-01

Seismic wave propagation codes are essential tools to investigate a variety of wave phenomena in the Earth. Furthermore, they can now be used for seismic full-waveform inversions in regional- and global-scale adjoint tomography. Although these seismic wave propagation solvers are crucial ingredients to improve the resolution of tomographic images to answer important questions about the nature of Earth's internal processes and subsurface structure, their practical application is often limited due to high computational costs. They thus need high-performance computing (HPC) facilities to improving the current state of knowledge. At present, numerous large HPC systems embed many-core architectures such as graphics processing units (GPUs) to enhance numerical performance. Such hardware accelerators can be programmed using either the CUDA programming environment or the OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted by additional hardware accelerators, like e.g. AMD graphic cards, ARM-based processors as well as Intel Xeon Phi coprocessors. For seismic wave propagation simulations using the open-source spectral-element code package SPECFEM3D_GLOBE, we incorporated an automatic source-to-source code generation tool (BOAST) which allows us to use meta-programming of all computational kernels for forward and adjoint runs. Using our BOAST kernels, we generate optimized source code for both CUDA and OpenCL languages within the source code package. Thus, seismic wave simulations are able now to fully utilize CUDA and OpenCL hardware accelerators. We show benchmarks of forward seismic wave propagation simulations using SPECFEM3D_GLOBE on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE PAGES

Wang, Bei; Ethier, Stephane; Tang, William; ...

2017-06-29

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Bei; Ethier, Stephane; Tang, William

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
ASC-ATDM Performance Portability Requirements for 2015-2019

DOE Office of Scientific and Technical Information (OSTI.GOV)

Edwards, Harold C.; Trott, Christian Robert

This report outlines the research, development, and support requirements for the Advanced Simulation and Computing (ASC ) Advanced Technology, Development, and Mitigation (ATDM) Performance Portability (a.k.a., Kokkos) project for 2015 - 2019 . The research and development (R&D) goal for Kokkos (v2) has been to create and demonstrate a thread - parallel programming model a nd standard C++ library - based implementation that enables performance portability across diverse manycore architectures such as multicore CPU, Intel Xeon Phi, and NVIDIA Kepler GPU. This R&D goal has been achieved for algorithms that use data parallel pat terns including parallel - for, parallelmore » - reduce, and parallel - scan. Current R&D is focusing on hierarchical parallel patterns such as a directed acyclic graph (DAG) of asynchronous tasks where each task contain s nested data parallel algorithms. This five y ear plan includes R&D required to f ully and performance portably exploit thread parallelism across current and anticipated next generation platforms (NGP). The Kokkos library is being evaluated by many projects exploring algorithm s and code design for NGP. Some production libraries and applications such as Trilinos and LAMMPS have already committed to Kokkos as their foundation for manycore parallelism an d performance portability. These five year requirements includes support required for current and antic ipated ASC projects to be effective and productive in their use of Kokkos on NGP. The greatest risk to the success of Kokkos and ASC projects relying upon Kokkos is a lack of staffing resources to support Kokkos to the degree needed by these ASC projects. This support includes up - to - date tutorials, documentation, multi - platform (hardware and software stack) testing, minor feature enhancements, thread - scalable algorithm consulting, and managing collaborative R&D.« less
NPS-NRL-Rice-UIUC Collaboration on Navy Atmosphere-Ocean Coupled Models on Many-Core Computer Architectures Annual Report

DTIC Science & Technology

2014-09-30

portability is difficult to achieve on future supercomputers that use various type of accelerators (GPUs, Xeon - Phi , and SIMD etc). All of these...bottlenecks of NUMA. For example, in the CG code the state vector was originally stored as q(1 : Nvar ,1 : Npoin) where Nvar are the number of...a Global Grid Point (GGP) storage. On the other hand, in the DG code the state vector is typically stored as q(1 : Nvar ,1 : Npts,1 : Nelem) where
Parallelization of MRCI based on hole-particle symmetry.

PubMed

Suo, Bing; Zhai, Gaohong; Wang, Yubin; Wen, Zhenyi; Hu, Xiangqian; Li, Lemin

2005-01-15

The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs.
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

NASA Astrophysics Data System (ADS)

Gong, Chunye; Liu, Jie; Chi, Lihua; Huang, Haowei; Fang, Jingyue; Gong, Zhenghu

2011-07-01

Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates ( Sn) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670.
Static analysis of the hull plate using the finite element method

NASA Astrophysics Data System (ADS)

Ion, A.

2015-11-01

This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Allada, Veerendra, Benjegerdes, Troy; Bode, Brett

Commodity clusters augmented with application accelerators are evolving as competitive high performance computing systems. The Graphical Processing Unit (GPU) with a very high arithmetic density and performance per price ratio is a good platform for the scientific application acceleration. In addition to the interconnect bottlenecks among the cluster compute nodes, the cost of memory copies between the host and the GPU device have to be carefully amortized to improve the overall efficiency of the application. Scientific applications also rely on efficient implementation of the BAsic Linear Algebra Subroutines (BLAS), among which the General Matrix Multiply (GEMM) is considered as themore » workhorse subroutine. In this paper, they study the performance of the memory copies and GEMM subroutines that are critical to port the computational chemistry algorithms to the GPU clusters. To that end, a benchmark based on the NetPIPE framework is developed to evaluate the latency and bandwidth of the memory copies between the host and the GPU device. The performance of the single and double precision GEMM subroutines from the NVIDIA CUBLAS 2.0 library are studied. The results have been compared with that of the BLAS routines from the Intel Math Kernel Library (MKL) to understand the computational trade-offs. The test bed is a Intel Xeon cluster equipped with NVIDIA Tesla GPUs.« less

Comparison of the new intermediate complex atmospheric research (ICAR) model with the WRF model in a mesoscale catchment in Central Europe

NASA Astrophysics Data System (ADS)

Härer, Stefan; Bernhardt, Matthias; Gutmann, Ethan; Bauer, Hans-Stefan; Schulz, Karsten

2017-04-01

Until recently, a large gap existed in the atmospheric downscaling strategies. On the one hand, computationally efficient statistical approaches are widely used, on the other hand, dynamic but CPU-intensive numeric atmospheric models like the weather research and forecast (WRF) model exist. The intermediate complex atmospheric research (ICAR) model developed at NCAR (Boulder, Colorado, USA) addresses this gap by combining the strengths of both approaches: the process-based structure of a dynamic model and its applicability in a changing climate as well as the speed of a parsimonious modelling approach which facilitates the modelling of ensembles and a straightforward way to test new parametrization schemes as well as various input data sources. However, the ICAR model has not been tested in Europe and on slightly undulated terrain yet. This study now evaluates for the first time the ICAR model to WRF model runs in Central Europe comparing a complete year of model results in the mesoscale Attert catchment (Luxembourg). In addition to these modelling results, we also describe the first implementation of ICAR on an Intel Phi architecture and consequently perform speed tests between the Vienna cluster, a standard workstation and the use of an Intel Phi coprocessor. Finally, the study gives an outlook on sensitivity studies using slightly different input data sources.
BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations

NASA Astrophysics Data System (ADS)

Smaragdos, Georgios; Chatzikonstantis, Georgios; Kukreja, Rahul; Sidiropoulos, Harry; Rodopoulos, Dimitrios; Sourdis, Ioannis; Al-Ars, Zaid; Kachris, Christoforos; Soudris, Dimitrios; De Zeeuw, Chris I.; Strydis, Christos

2017-12-01

Objective. The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. Approach. In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU, a NVidia GP-GPU and a Maxeler Dataflow Engine. The PyNN software framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different experiment instances of a state-of-the-art neuron model, representing the inferior-olivary nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity densities, which can drastically affect the workload’s performance characteristics. Main results. The combined use of different HPC technologies demonstrates that BrainFrame is better able to cope with the modeling diversity encountered in realistic experiments while at the same time running on significantly lower energy budgets. Our performance analysis clearly shows that the model directly affects performance and all three technologies are required to cope with all the model use cases. Significance. The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast number of models already available. Additionally, it gives a clear roadmap for extending the platform support beyond the proof of concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations.

PubMed

Smaragdos, Georgios; Chatzikonstantis, Georgios; Kukreja, Rahul; Sidiropoulos, Harry; Rodopoulos, Dimitrios; Sourdis, Ioannis; Al-Ars, Zaid; Kachris, Christoforos; Soudris, Dimitrios; De Zeeuw, Chris I; Strydis, Christos

2017-12-01

The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU, a NVidia GP-GPU and a Maxeler Dataflow Engine. The PyNN software framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different experiment instances of a state-of-the-art neuron model, representing the inferior-olivary nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity densities, which can drastically affect the workload's performance characteristics. The combined use of different HPC technologies demonstrates that BrainFrame is better able to cope with the modeling diversity encountered in realistic experiments while at the same time running on significantly lower energy budgets. Our performance analysis clearly shows that the model directly affects performance and all three technologies are required to cope with all the model use cases. The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast number of models already available. Additionally, it gives a clear roadmap for extending the platform support beyond the proof of concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
Scalability of a Low-Cost Multi-Teraflop Linux Cluster for High-End Classical Atomistic and Quantum Mechanical Simulations

NASA Technical Reports Server (NTRS)

Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash

2003-01-01

Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
A hybrid algorithm for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Performance optimization of Qbox and WEST on Intel Knights Landing

NASA Astrophysics Data System (ADS)

Zheng, Huihuo; Knight, Christopher; Galli, Giulia; Govoni, Marco; Gygi, Francois

We present the optimization of electronic structure codes Qbox and WEST targeting the Intel®Xeon Phi™processor, codenamed Knights Landing (KNL). Qbox is an ab-initio molecular dynamics code based on plane wave density functional theory (DFT) and WEST is a post-DFT code for excited state calculations within many-body perturbation theory. Both Qbox and WEST employ highly scalable algorithms which enable accurate large-scale electronic structure calculations on leadership class supercomputer platforms beyond 100,000 cores, such as Mira and Theta at the Argonne Leadership Computing Facility. In this work, features of the KNL architecture (e.g. hierarchical memory) are explored to achieve higher performance in key algorithms of the Qbox and WEST codes and to develop a road-map for further development targeting next-generation computing architectures. In particular, the optimizations of the Qbox and WEST codes on the KNL platform will target efficient large-scale electronic structure calculations of nanostructured materials exhibiting complex structures and prediction of their electronic and thermal properties for use in solar and thermal energy conversion device. This work was supported by MICCoM, as part of Comp. Mats. Sci. Program funded by the U.S. DOE, Office of Sci., BES, MSE Division. This research used resources of the ALCF, which is a DOE Office of Sci. User Facility under Contract DE-AC02-06CH11357.
Closeout Report ARRA supplement to DE-FG02-08ER41546, 03/15/2010 to 03/14/2011 - Advanced Transfer Map Methods for the Description of Particle Beam Dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berz, Martin; Makino, Kyoko

The ARRA funds were utilized to acquire a cluster of high performance computers, consisting of one Altus 2804 Server based on a Quad AMD Opteron 6174 12C with 4 2.2 GHz nodes of 12 cores each, resulting in 48 directly usable cores; as well as a Relion 1751 Server using an Intel Xeon X5677 consisting of 4 3.46 GHz cores supporting 8 threads. Both systems run the Unix flavor CentOS, which is designed for use without need of updates, which greatly enhances their reliability. The systems are used to operate our COSY INFINITY environment which supports MPI parallelization. The unitsmore » arrived at MSU in September 2010, and were taken into operation shortly thereafter.« less
FPGA Online Tracking Algorithm for the PANDA Straw Tube Tracker

NASA Astrophysics Data System (ADS)

Liang, Yutie; Ye, Hua; Galuska, Martin J.; Gessler, Thomas; Kuhn, Wolfgang; Lange, Jens Soren; Wagner, Milan N.; Liu, Zhen'an; Zhao, Jingzhou

2017-06-01

A novel FPGA based online tracking algorithm for helix track reconstruction in a solenoidal field, developed for the PANDA spectrometer, is described. Employing the Straw Tube Tracker detector with 4636 straw tubes, the algorithm includes a complex track finder, and a track fitter. Implemented in VHDL, the algorithm is tested on a Xilinx Virtex-4 FX60 FPGA chip with different types of events, at different event rates. A processing time of 7 $\\mu$s per event for an average of 6 charged tracks is obtained. The momentum resolution is about 3\\% (4\\%) for $p_t$ ($p_z$) at 1 GeV/c. Comparing to the algorithm running on a CPU chip (single core Intel Xeon E5520 at 2.26 GHz), an improvement of 3 orders of magnitude in processing time is obtained. The algorithm can handle severe overlapping of events which are typical for interaction rates above 10 MHz.
Toward performance portability of the Albany finite element analysis code using the Kokkos library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.

Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Toward performance portability of the Albany finite element analysis code using the Kokkos library

DOE PAGES

Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...

2018-02-05

Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Rapid insights from remote sensing in the geosciences

NASA Astrophysics Data System (ADS)

Plaza, Antonio

2015-03-01

The growing availability of capacity computing for atomistic materials modeling has encouraged the use of high-accuracy computationally intensive interatomic potentials, such as SNAP. These potentials also happen to scale well on petascale computing platforms. SNAP has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected on to a basis of hyperspherical harmonics in four dimensions. The computational cost per atom is much greater than that of simpler potentials such as Lennard-Jones or EAM, while the communication cost remains modest. We discuss a variety of strategies for implementing SNAP in the LAMMPS molecular dynamics package. We present scaling results obtained running SNAP on three different classes of machine: a conventional Intel Xeon CPU cluster; the Titan GPU-based system; and the combined Sequoia and Vulcan BlueGene/Q. The growing availability of capacity computing for atomistic materials modeling has encouraged the use of high-accuracy computationally intensive interatomic potentials, such as SNAP. These potentials also happen to scale well on petascale computing platforms. SNAP has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected on to a basis of hyperspherical harmonics in four dimensions. The computational cost per atom is much greater than that of simpler potentials such as Lennard-Jones or EAM, while the communication cost remains modest. We discuss a variety of strategies for implementing SNAP in the LAMMPS molecular dynamics package. We present scaling results obtained running SNAP on three different classes of machine: a conventional Intel Xeon CPU cluster; the Titan GPU-based system; and the combined Sequoia and Vulcan BlueGene/Q. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corp., for the U.S. Dept. of Energy's National Nuclear Security Admin. under Contract DE-AC04-94AL85000.
Semiconductor Ion Implanters

DOE Office of Scientific and Technical Information (OSTI.GOV)

MacKinnon, Barry A.; Ruffell, John P.

In 1953 the Raytheon CK722 transistor was priced at $7.60. Based upon this, an Intel Xeon Quad Core processor containing 820,000,000 transistors should list at $6.2 billion. Particle accelerator technology plays an important part in the remarkable story of why that Intel product can be purchased today for a few hundred dollars. Most people of the mid twentieth century would be astonished at the ubiquity of semiconductors in the products we now buy and use every day. Though relatively expensive in the nineteen fifties they now exist in a wide range of items from high-end multicore microprocessors like the Intelmore » product to disposable items containing 'only' hundreds or thousands like RFID chips and talking greeting cards. This historical development has been fueled by continuous advancement of the several individual technologies involved in the production of semiconductor devices including Ion Implantation and the charged particle beamlines at the heart of implant machines. In the course of its 40 year development, the worldwide implanter industry has reached annual sales levels around $2B, installed thousands of dedicated machines and directly employs thousands of workers. It represents in all these measures, as much and possibly more than any other industrial application of particle accelerator technology. This presentation discusses the history of implanter development. It touches on some of the people involved and on some of the developmental changes and challenges imposed as the requirements of the semiconductor industry evolved.« less
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

NASA Astrophysics Data System (ADS)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less
Underwater Threat Source Localization: Processing Sensor Network TDOAs with a Terascale Optical Core Device

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Imam, Neena

2007-01-01

Revolutionary computing technologies are defined in terms of technological breakthroughs, which leapfrog over near-term projected advances in conventional hardware and software to produce paradigm shifts in computational science. For underwater threat source localization using information provided by a dynamical sensor network, one of the most promising computational advances builds upon the emergence of digital optical-core devices. In this article, we present initial results of sensor network calculations that focus on the concept of signal wavefront time-difference-of-arrival (TDOA). The corresponding algorithms are implemented on the EnLight processing platform recently introduced by Lenslet Laboratories. This tera-scale digital optical core processor is optimizedmore » for array operations, which it performs in a fixed-point-arithmetic architecture. Our results (i) illustrate the ability to reach the required accuracy in the TDOA computation, and (ii) demonstrate that a considerable speed-up can be achieved when using the EnLight 64a prototype processor as compared to a dual Intel XeonTM processor.« less
IGA-ADS: Isogeometric analysis FEM using ADS solver

NASA Astrophysics Data System (ADS)

Łoś, Marcin M.; Woźniak, Maciej; Paszyński, Maciej; Lenharth, Andrew; Hassaan, Muhamm Amber; Pingali, Keshav

2017-08-01

In this paper we present a fast explicit solver for solution of non-stationary problems using L2 projections with isogeometric finite element method. The solver has been implemented within GALOIS framework. It enables parallel multi-core simulations of different time-dependent problems, in 1D, 2D, or 3D. We have prepared the solver framework in a way that enables direct implementation of the selected PDE and corresponding boundary conditions. In this paper we describe the installation, implementation of exemplary three PDEs, and execution of the simulations on multi-core Linux cluster nodes. We consider three case studies, including heat transfer, linear elasticity, as well as non-linear flow in heterogeneous media. The presented package generates output suitable for interfacing with Gnuplot and ParaView visualization software. The exemplary simulations show near perfect scalability on Gilbert shared-memory node with four Intel® Xeon® CPU E7-4860 processors, each possessing 10 physical cores (for a total of 40 cores).
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Accelerating a three-dimensional eco-hydrological cellular automaton on GPGPU with OpenCL

NASA Astrophysics Data System (ADS)

Senatore, Alfonso; D'Ambrosio, Donato; De Rango, Alessio; Rongo, Rocco; Spataro, William; Straface, Salvatore; Mendicino, Giuseppe

2016-10-01

This work presents an effective implementation of a numerical model for complete eco-hydrological Cellular Automata modeling on Graphical Processing Units (GPU) with OpenCL (Open Computing Language) for heterogeneous computation (i.e., on CPUs and/or GPUs). Different types of parallel implementations were carried out (e.g., use of fast local memory, loop unrolling, etc), showing increasing performance improvements in terms of speedup, adopting also some original optimizations strategies. Moreover, numerical analysis of results (i.e., comparison of CPU and GPU outcomes in terms of rounding errors) have proven to be satisfactory. Experiments were carried out on a workstation with two CPUs (Intel Xeon E5440 at 2.83GHz), one GPU AMD R9 280X and one GPU nVIDIA Tesla K20c. Results have been extremely positive, but further testing should be performed to assess the functionality of the adopted strategies on other complete models and their ability to fruitfully exploit parallel systems resources.
Spectral Element Method for the Simulation of Unsteady Compressible Flows

NASA Technical Reports Server (NTRS)

Diosady, Laslo Tibor; Murman, Scott M.

2013-01-01

This work uses a discontinuous-Galerkin spectral-element method (DGSEM) to solve the compressible Navier-Stokes equations [1{3]. The inviscid ux is computed using the approximate Riemann solver of Roe [4]. The viscous fluxes are computed using the second form of Bassi and Rebay (BR2) [5] in a manner consistent with the spectral-element approximation. The method of lines with the classical 4th-order explicit Runge-Kutta scheme is used for time integration. Results for polynomial orders up to p = 15 (16th order) are presented. The code is parallelized using the Message Passing Interface (MPI). The computations presented in this work are performed using the Sandy Bridge nodes of the NASA Pleiades supercomputer at NASA Ames Research Center. Each Sandy Bridge node consists of 2 eight-core Intel Xeon E5-2670 processors with a clock speed of 2.6Ghz and 2GB per core memory. On a Sandy Bridge node the Tau Benchmark [6] runs in a time of 7.6s.

CUDA-based acceleration of collateral filtering in brain MR images

NASA Astrophysics Data System (ADS)

Li, Cheng-Yuan; Chang, Herng-Hua

2017-02-01

Image denoising is one of the fundamental and essential tasks within image processing. In medical imaging, finding an effective algorithm that can remove random noise in MR images is important. This paper proposes an effective noise reduction method for brain magnetic resonance (MR) images. Our approach is based on the collateral filter which is a more powerful method than the bilateral filter in many cases. However, the computation of the collateral filter algorithm is quite time-consuming. To solve this problem, we improved the collateral filter algorithm with parallel computing using GPU. We adopted CUDA, an application programming interface for GPU by NVIDIA, to accelerate the computation. Our experimental evaluation on an Intel Xeon CPU E5-2620 v3 2.40GHz with a NVIDIA Tesla K40c GPU indicated that the proposed implementation runs dramatically faster than the traditional collateral filter. We believe that the proposed framework has established a general blueprint for achieving fast and robust filtering in a wide variety of medical image denoising applications.
Methods for compressible fluid simulation on GPUs using high-order finite differences

NASA Astrophysics Data System (ADS)

Pekkilä, Johannes; Väisälä, Miikka S.; Käpylä, Maarit J.; Käpylä, Petri J.; Anjum, Omer

2017-08-01

We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.
The Ettention software package.

PubMed

Dahmen, Tim; Marsalek, Lukas; Marniok, Nico; Turoňová, Beata; Bogachev, Sviatoslav; Trampert, Patrick; Nickels, Stefan; Slusallek, Philipp

2016-02-01

We present a novel software package for the problem "reconstruction from projections" in electron microscopy. The Ettention framework consists of a set of modular building-blocks for tomographic reconstruction algorithms. The well-known block iterative reconstruction method based on Kaczmarz algorithm is implemented using these building-blocks, including adaptations specific to electron tomography. Ettention simultaneously features (1) a modular, object-oriented software design, (2) optimized access to high-performance computing (HPC) platforms such as graphic processing units (GPU) or many-core architectures like Xeon Phi, and (3) accessibility to microscopy end-users via integration in the IMOD package and eTomo user interface. We also provide developers with a clean and well-structured application programming interface (API) that allows for extending the software easily and thus makes it an ideal platform for algorithmic research while hiding most of the technical details of high-performance computing. Copyright © 2015 Elsevier B.V. All rights reserved.
Examining Troughs in the Mass Distribution of All Theoretically Possible Tryptic Peptides

PubMed Central

Nefedov, Alexey V.; Mitra, Indranil; Brasier, Allan R.; Sadygov, Rovshan G.

2011-01-01

This work describes the mass distribution of all theoretically possibly tryptic peptides made of 20 amino acids, up to the mass of 3 kDa, with resolution of 0.001 Da. We characterize regions between the peaks of the distribution, including gaps (forbidden zones) and low-populated areas (quiet zones). We show how the gaps shrink over the mass range, and when they completely disappear. We demonstrate that peptide compositions in quiet zones are less diverse than those in the peaks of the distribution, and that by eliminating certain types of unrealistic compositions the gaps in the distribution may be increased. The mass distribution is generated using a parallel implementation of a recursive procedure that enumerates all amino acid compositions. It allows us to enumerate all compositions of tryptic peptides below 3 kDa in 48 minutes using a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores). The results of this work can be used to facilitate protein identification and mass defect labeling in mass spectrometry-based proteomics experiments. PMID:21780838
PERI - Auto-tuning Memory Intensive Kernels for Multicore

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bailey, David H; Williams, Samuel; Datta, Kaushik

2008-06-24

We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to Sparse Matrix Vector Multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we developmore » a code generator for each kernel that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4X improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
A derivation and scalable implementation of the synchronous parallel kinetic Monte Carlo method for simulating long-time dynamics

NASA Astrophysics Data System (ADS)

Byun, Hye Suk; El-Naggar, Mohamed Y.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2017-10-01

Kinetic Monte Carlo (KMC) simulations are used to study long-time dynamics of a wide variety of systems. Unfortunately, the conventional KMC algorithm is not scalable to larger systems, since its time scale is inversely proportional to the simulated system size. A promising approach to resolving this issue is the synchronous parallel KMC (SPKMC) algorithm, which makes the time scale size-independent. This paper introduces a formal derivation of the SPKMC algorithm based on local transition-state and time-dependent Hartree approximations, as well as its scalable parallel implementation based on a dual linked-list cell method. The resulting algorithm has achieved a weak-scaling parallel efficiency of 0.935 on 1024 Intel Xeon processors for simulating biological electron transfer dynamics in a 4.2 billion-heme system, as well as decent strong-scaling parallel efficiency. The parallel code has been used to simulate a lattice of cytochrome complexes on a bacterial-membrane nanowire, and it is broadly applicable to other problems such as computational synthesis of new materials.
Speeding up spin-component-scaled third-order pertubation theory with the chain of spheres approximation: the COSX-SCS-MP3 method

NASA Astrophysics Data System (ADS)

Izsák, Róbert; Neese, Frank

2013-07-01

The 'chain of spheres' approximation, developed earlier for the efficient evaluation of the self-consistent field exchange term, is introduced here into the evaluation of the external exchange term of higher order correlation methods. Its performance is studied in the specific case of the spin-component-scaled third-order Møller--Plesset perturbation (SCS-MP3) theory. The results indicate that the approximation performs excellently in terms of both computer time and achievable accuracy. Significant speedups over a conventional method are obtained for larger systems and basis sets. Owing to this development, SCS-MP3 calculations on molecules of the size of penicillin (42 atoms) with a polarised triple-zeta basis set can be performed in ∼3 hours using 16 cores of an Intel Xeon E7-8837 processor with a 2.67 GHz clock speed, which represents a speedup by a factor of 8-9 compared to the previously most efficient algorithm. Thus, the increased accuracy offered by SCS-MP3 can now be explored for at least medium-sized molecules.
A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

PubMed

Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

2017-10-01

The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Application of the multireference equation of motion coupled cluster method, including spin-orbit coupling, to the atomic spectra of Cr, Mn, Fe and Co

NASA Astrophysics Data System (ADS)

Liu, Zhebing; Huntington, Lee M. J.; Nooijen, Marcel

2015-10-01

The recently introduced multireference equation of motion (MR-EOM) approach is combined with a simple treatment of spin-orbit coupling, as implemented in the ORCA program. The resulting multireference equation of motion spin-orbit coupling (MR-EOM-SOC) approach is applied to the first-row transition metal atoms Cr, Mn, Fe and Co, for which experimental data are readily available. Using the MR-EOM-SOC approach, the splittings in each L-S multiplet can be accurately assessed (root mean square (RMS) errors of about 70 cm-1). The RMS errors for J-specific excitation energies range from 414 to 783 cm-1 and are comparable to previously reported J-averaged MR-EOM results using the ACESII program. The MR-EOM approach is highly efficient. A typical MR-EOM calculation of a full spin-orbit spectrum takes about 2 CPU hours on a single processor of a 12-core node, consisting of Intel XEON 2.93 GHz CPUs with 12.3 MB of shared cache memory.
SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes

NASA Astrophysics Data System (ADS)

Homann, Holger; Laenen, Francois

2018-03-01

The numerical study of physical problems often require integrating the dynamics of a large number of particles evolving according to a given set of equations. Particles are characterized by the information they are carrying such as an identity, a position other. There are generally speaking two different possibilities for handling particles in high performance computing (HPC) codes. The concept of an Array of Structures (AoS) is in the spirit of the object-oriented programming (OOP) paradigm in that the particle information is implemented as a structure. Here, an object (realization of the structure) represents one particle and a set of many particles is stored in an array. In contrast, using the concept of a Structure of Arrays (SoA), a single structure holds several arrays each representing one property (such as the identity) of the whole set of particles. The AoS approach is often implemented in HPC codes due to its handiness and flexibility. For a class of problems, however, it is known that the performance of SoA is much better than that of AoS. We confirm this observation for our particle problem. Using a benchmark we show that on modern Intel Xeon processors the SoA implementation is typically several times faster than the AoS one. On Intel's MIC co-processors the performance gap even attains a factor of ten. The same is true for GPU computing, using both computational and multi-purpose GPUs. Combining performance and handiness, we present the library SoAx that has optimal performance (on CPUs, MICs, and GPUs) while providing the same handiness as AoS. For this, SoAx uses modern C++ design techniques such template meta programming that allows to automatically generate code for user defined heterogeneous data structures.
Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems

PubMed Central

Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C. M. A.; Saltz, Joel

2017-01-01

We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies. PMID:29081725
Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide

DOE PAGES

Tang, William; Wang, Bei; Ethier, Stephane; ...

2016-11-01

The goal of the extreme scale plasma turbulence studies described in this paper is to expedite the delivery of reliable predictions on confinement physics in large magnetic fusion systems by using world-class supercomputers to carry out simulations with unprecedented resolution and temporal duration. This has involved architecture-dependent optimizations of performance scaling and addressing code portability and energy issues, with the metrics for multi-platform comparisons being 'time-to-solution' and 'energy-to-solution'. Realistic results addressing how confinement losses caused by plasma turbulence scale from present-day devices to the much larger $25 billion international ITER fusion facility have been enabled by innovative advances in themore » GTC-P code including (i) implementation of one-sided communication from MPI 3.0 standard; (ii) creative optimization techniques on Xeon Phi processors; and (iii) development of a novel performance model for the key kernels of the PIC code. Our results show that modeling data movement is sufficient to predict performance on modern supercomputer platforms.« less
High performance in silico virtual drug screening on many-core processors

PubMed Central

Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-01-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727
Efficient molecular dynamics simulations with many-body potentials on graphics processing units

NASA Astrophysics Data System (ADS)

Fan, Zheyong; Chen, Wei; Vierimaa, Ville; Harju, Ari

2017-09-01

Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within different loops, which could result in write conflict between different threads in a CUDA kernel. In this work, we provide a new force evaluation algorithm, which is based on an explicit pairwise force expression for many-body potentials derived recently (Fan et al., 2015). In our algorithm, the force, virial stress, and heat current for a given atom can be accumulated within a single thread and is free of write conflicts. We discuss the formulations and algorithms and evaluate their performance. A new open-source code, GPUMD, is developed based on the proposed formulations. For the Tersoff many-body potential, the double precision performance of GPUMD using a Tesla K40 card is equivalent to that of the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) molecular dynamics code running with about 100 CPU cores (Intel Xeon CPU X5670 @ 2.93 GHz).
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Christopher H.; Long, Hai; Sides, Scott

2015-10-15

Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement ofmore » future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.« less
Optimization of a Lattice Boltzmann Computation on State-of-the-Art Multicore Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Carter, Jonathan; Oliker, Leonid

2009-04-10

We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon E5345 (Clovertown), AMD Opteron 2214 (Santa Rosa), AMD Opteron 2356 (Barcelona), Sun T5140 T2+ (Victoria Falls), as well asmore » a QS20 IBM Cell Blade. Rather than hand-tuning LBMHD for each system, we develop a code generator that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned LBMHD application achieves up to a 15x improvement compared with the original code at a given concurrency. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.« less
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

NASA Astrophysics Data System (ADS)

Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

2016-09-01

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Particle Identification on an FPGA Accelerated Compute Platform for the LHCb Upgrade

NASA Astrophysics Data System (ADS)

Fäerber, Christian; Schwemmer, Rainer; Machen, Jonathan; Neufeld, Niko

2017-07-01

The current LHCb readout system will be upgraded in 2018 to a “triggerless” readout of the entire detector at the Large Hadron Collider collision rate of 40 MHz. The corresponding bandwidth from the detector down to the foreseen dedicated computing farm (event filter farm), which acts as the trigger, has to be increased by a factor of almost 100 from currently 500 Gb/s up to 40 Tb/s. The event filter farm will preanalyze the data and will select the events on an event by event basis. This will reduce the bandwidth down to a manageable size to write the interesting physics data to tape. The design of such a system is a challenging task, and the reason why different new technologies are considered and have to be investigated for the different parts of the system. For the usage in the event building farm or in the event filter farm (trigger), an experimental field programmable gate array (FPGA) accelerated computing platform is considered and, therefore, tested. FPGA compute accelerators are used more and more in standard servers such as for Microsoft Bing search or Baidu search. The platform we use hosts a general Intel CPU and a high-performance FPGA linked via the high-speed Intel QuickPath Interconnect. An accelerator is implemented on the FPGA. It is very likely that these platforms, which are built, in general, for high-performance computing, are also very interesting for the high-energy physics community. First, the performance results of smaller test cases performed at the beginning are presented. Afterward, a part of the existing LHCb RICH particle identification is tested and is ported to the experimental FPGA accelerated platform. We have compared the performance of the LHCb RICH particle identification running on a normal CPU with the performance of the same algorithm, which is running on the Xeon-FPGA compute accelerator platform.
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack

In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less

Open release of the DCA++ project

NASA Astrophysics Data System (ADS)

Haehner, Urs; Solca, Raffaele; Staar, Peter; Alvarez, Gonzalo; Maier, Thomas; Summers, Michael; Schulthess, Thomas

We present the first open release of the DCA++ project, a highly scalable and efficient research code to solve quantum many-body problems with cutting edge quantum cluster algorithms. The implemented dynamical cluster approximation (DCA) and its DCA+ extension with a continuous self-energy capture nonlocal correlations in strongly correlated electron systems thereby allowing insight into high-Tc superconductivity. With the increasing heterogeneity of modern machines, DCA++ provides portable performance on conventional and emerging new architectures, such as hybrid CPU-GPU and Xeon Phi, sustaining multiple petaflops on ORNL's Titan and CSCS' Piz Daint. Moreover, we will describe how best practices in software engineering can be applied to make software development sustainable and scalable in a research group. Software testing and documentation not only prevent productivity collapse, but more importantly, they are necessary for correctness, credibility and reproducibility of scientific results. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) awarded by the INCITE program, and of the Swiss National Supercomputing Center. OLCF is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Optimizing Approximate Weighted Matching on Nvidia Kepler K40

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh

Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform. We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation. We also experiment with different combinations of work assigned to a warp. Using an exhaustive set ofmore » $269$ inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by $10$ to $$100\\times$$ for over $100$ instances, and from $100$ to $$1000\\times$$ for $15$ instances. We also demonstrate up to $$20\\times$$ speedup relative to $2$ threads, and up to $$5\\times$$ relative to $16$ threads on Intel Xeon platform with $16$ cores for the same algorithm. The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.« less
GPU-based ultra-fast dose calculation using a finite size pencil beam model.

PubMed

Gu, Xuejun; Choi, Dongju; Men, Chunhua; Pan, Hubert; Majumdar, Amitava; Jiang, Steve B

2009-10-21

Online adaptive radiation therapy (ART) is an attractive concept that promises the ability to deliver an optimal treatment in response to the inter-fraction variability in patient anatomy. However, it has yet to be realized due to technical limitations. Fast dose deposit coefficient calculation is a critical component of the online planning process that is required for plan optimization of intensity-modulated radiation therapy (IMRT). Computer graphics processing units (GPUs) are well suited to provide the requisite fast performance for the data-parallel nature of dose calculation. In this work, we develop a dose calculation engine based on a finite-size pencil beam (FSPB) algorithm and a GPU parallel computing framework. The developed framework can accommodate any FSPB model. We test our implementation in the case of a water phantom and the case of a prostate cancer patient with varying beamlet and voxel sizes. All testing scenarios achieved speedup ranging from 200 to 400 times when using a NVIDIA Tesla C1060 card in comparison with a 2.27 GHz Intel Xeon CPU. The computational time for calculating dose deposition coefficients for a nine-field prostate IMRT plan with this new framework is less than 1 s. This indicates that the GPU-based FSPB algorithm is well suited for online re-planning for adaptive radiotherapy.
Hot Chips and Hot Interconnects for High End Computing Systems

NASA Technical Reports Server (NTRS)

Saini, Subhash

2005-01-01

I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).
Advanced electronics for the CTF MEG system.

PubMed

McCubbin, J; Vrba, J; Spear, P; McKenzie, D; Willis, R; Loewen, R; Robinson, S E; Fife, A A

2004-11-30

Development of the CTF MEG system has been advanced with the introduction of a computer processing cluster between the data acquisition electronics and the host computer. The advent of fast processors, memory, and network interfaces has made this innovation feasible for large data streams at high sampling rates. We have implemented tasks including anti-alias filter, sample rate decimation, higher gradient balancing, crosstalk correction, and optional filters with a cluster consisting of 4 dual Intel Xeon processors operating on up to 275 channel MEG systems at 12 kHz sample rate. The architecture is expandable with additional processors to implement advanced processing tasks which may include e.g., continuous head localization/motion correction, optional display filters, coherence calculations, or real time synthetic channels (via beamformer). We also describe an electronics configuration upgrade to provide operator console access to the peripheral interface features such as analog signal and trigger I/O. This allows remote location of the acoustically noisy electronics cabinet and fitting of the cabinet with doors for improved EMI shielding. Finally, we present the latest performance results available for the CTF 275 channel MEG system including an unshielded SEF (median nerve electrical stimulation) measurement enhanced by application of an adaptive beamformer technique (SAM) which allows recognition of the nominal 20-ms response in the unaveraged signal.
Interactive high-resolution isosurface ray casting on multicore processors.

PubMed

Wang, Qin; JaJa, Joseph

2008-01-01

We present a new method for the interactive rendering of isosurfaces using ray casting on multi-core processors. This method consists of a combination of an object-order traversal that coarsely identifies possible candidate 3D data blocks for each small set of contiguous pixels, and an isosurface ray casting strategy tailored for the resulting limited-size lists of candidate 3D data blocks. While static screen partitioning is widely used in the literature, our scheme performs dynamic allocation of groups of ray casting tasks to ensure almost equal loads among the different threads running on multi-cores while maintaining spatial locality. We also make careful use of memory management environment commonly present in multi-core processors. We test our system on a two-processor Clovertown platform, each consisting of a Quad-Core 1.86-GHz Intel Xeon Processor, for a number of widely different benchmarks. The detailed experimental results show that our system is efficient and scalable, and achieves high cache performance and excellent load balancing, resulting in an overall performance that is superior to any of the previous algorithms. In fact, we achieve an interactive isosurface rendering on a 1024(2) screen for all the datasets tested up to the maximum size of the main memory of our platform.
Efficient Approximation Algorithms for Weighted $b$-Matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khan, Arif; Pothen, Alex; Mostofa Ali Patwary, Md.

2016-01-01

We describe a half-approximation algorithm, b-Suitor, for computing a b-Matching of maximum weight in a graph with weights on the edges. b-Matching is a generalization of the well-known Matching problem in graphs, where the objective is to choose a subset of M edges in the graph such that at most a specified number b(v) of edges in M are incident on each vertex v. Subject to this restriction we maximize the sum of the weights of the edges in M. We prove that the b-Suitor algorithm computes the same b-Matching as the one obtained by the greedy algorithm for themore » problem. We implement the algorithm on serial and shared-memory parallel processors, and compare its performance against a collection of approximation algorithms that have been proposed for the Matching problem. Our results show that the b-Suitor algorithm outperforms the Greedy and Locally Dominant edge algorithms by one to two orders of magnitude on a serial processor. The b-Suitor algorithm has a high degree of concurrency, and it scales well up to 240 threads on a shared memory multiprocessor. The b-Suitor algorithm outperforms the Locally Dominant edge algorithm by a factor of fourteen on 16 cores of an Intel Xeon multiprocessor.« less
GPU Lossless Hyperspectral Data Compression System for Space Applications

NASA Technical Reports Server (NTRS)

Keymeulen, Didier; Aranki, Nazeeh; Hopson, Ben; Kiely, Aaron; Klimesh, Matthew; Benkrid, Khaled

2012-01-01

On-board lossless hyperspectral data compression reduces data volume in order to meet NASA and DoD limited downlink capabilities. At JPL, a novel, adaptive and predictive technique for lossless compression of hyperspectral data, named the Fast Lossless (FL) algorithm, was recently developed. This technique uses an adaptive filtering method and achieves state-of-the-art performance in both compression effectiveness and low complexity. Because of its outstanding performance and suitability for real-time onboard hardware implementation, the FL compressor is being formalized as the emerging CCSDS Standard for Lossless Multispectral & Hyperspectral image compression. The FL compressor is well-suited for parallel hardware implementation. A GPU hardware implementation was developed for FL targeting the current state-of-the-art GPUs from NVIDIA(Trademark). The GPU implementation on a NVIDIA(Trademark) GeForce(Trademark) GTX 580 achieves a throughput performance of 583.08 Mbits/sec (44.85 MSamples/sec) and an acceleration of at least 6 times a software implementation running on a 3.47 GHz single core Intel(Trademark) Xeon(Trademark) processor. This paper describes the design and implementation of the FL algorithm on the GPU. The massively parallel implementation will provide in the future a fast and practical real-time solution for airborne and space applications.
Leveraging FPGAs for Accelerating Short Read Alignment.

PubMed

Arram, James; Kaplan, Thomas; Luk, Wayne; Jiang, Peiyong

2017-01-01

One of the key challenges facing genomics today is how to efficiently analyze the massive amounts of data produced by next-generation sequencing platforms. With general-purpose computing systems struggling to address this challenge, specialized processors such as the Field-Programmable Gate Array (FPGA) are receiving growing interest. The means by which to leverage this technology for accelerating genomic data analysis is however largely unexplored. In this paper, we present a runtime reconfigurable architecture for accelerating short read alignment using FPGAs. This architecture exploits the reconfigurability of FPGAs to allow the development of fast yet flexible alignment designs. We apply this architecture to develop an alignment design which supports exact and approximate alignment with up to two mismatches. Our design is based on the FM-index, with optimizations to improve the alignment performance. In particular, the n-step FM-index, index oversampling, a seed-and-compare stage, and bi-directional backtracking are included. Our design is implemented and evaluated on a 1U Maxeler MPC-X2000 dataflow node with eight Altera Stratix-V FPGAs. Measurements show that our design is 28 times faster than Bowtie2 running with 16 threads on dual Intel Xeon E5-2640 CPUs, and nine times faster than Soap3-dp running on an NVIDIA Tesla C2070 GPU.
Global fully kinetic models of planetary magnetospheres with iPic3D

NASA Astrophysics Data System (ADS)

Gonzalez, D.; Sanna, L.; Amaya, J.; Zitz, A.; Lembege, B.; Markidis, S.; Schriver, D.; Walker, R. J.; Berchem, J.; Peng, I. B.; Travnicek, P. M.; Lapenta, G.

2016-12-01

We report on the latest developments of our approach to model planetary magnetospheres, mini magnetospheres and the Earth's magnetosphere with the fully kinetic, electromagnetic particle in cell code iPic3D. The code treats electrons and multiple species of ions as full kinetic particles. We review: 1) Why a fully kinetic model and in particular why kinetic electrons are needed for capturing some of the most important aspects of the physics processes of planetary magnetospheres. 2) Why the energy conserving implicit method (ECIM) in its newest implementation [1] is the right approach to reach this goal. We consider the different electron scales and study how the new IECIM can be tuned to resolve only the electron scales of interest while averaging over the unresolved scales preserving their contribution to the evolution. 3) How with modern computing planetary magnetospheres, mini magnetosphere and eventually Earth's magnetosphere can be modeled with fully kinetic electrons. The path from petascale to exascale for iPiC3D is outlined based on the DEEP-ER project [2], using dynamic allocation of different processor architectures (Xeon and Xeon Phi) and innovative I/O technologies.Specifically results from models of Mercury are presented and compared with MESSENGER observations and with previous hybrid (fluid electrons and kinetic ions) simulations. The plasma convection around the planets includes the development of hydrodynamic instabilities at the flanks, the presence of the collisionless shocks, the magnetosheath, the magnetopause, reconnection zones, the formation of the plasma sheet and the magnetotail, and the variation of ion/electron plasma flows when crossing these frontiers. Given the full kinetic nature of our approach we focus on detailed particle dynamics and distribution at locations that can be used for comparison with satellite data. [1] Lapenta, G. (2016). Exactly Energy Conserving Implicit Moment Particle in Cell Formulation. arXiv preprint arXiv:1602.06326.[2] www.deep-er.eu
Fast 2D FWI on a multi and many-cores workstation.

NASA Astrophysics Data System (ADS)

Thierry, Philippe; Donno, Daniela; Noble, Mark

2014-05-01

Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.
Measurements of the LHCb software stack on the ARM architecture

NASA Astrophysics Data System (ADS)

Vijay Kartik, S.; Couturier, Ben; Clemencic, Marco; Neufeld, Niko

2014-06-01

The ARM architecture is a power-efficient design that is used in most processors in mobile devices all around the world today since they provide reasonable compute performance per watt. The current LHCb software stack is designed (and thus expected) to build and run on machines with the x86/x86_64 architecture. This paper outlines the process of measuring the performance of the LHCb software stack on the ARM architecture - specifically, the ARMv7 architecture on Cortex-A9 processors from NVIDIA and on full-fledged ARM servers with chipsets from Calxeda - and makes comparisons with the performance on x86_64 architectures on the Intel Xeon L5520/X5650 and AMD Opteron 6272. The paper emphasises the aspects of performance per core with respect to the power drawn by the compute nodes for the given performance - this ensures a fair real-world comparison with much more 'powerful' Intel/AMD processors. The comparisons of these real workloads in the context of LHCb are also complemented with the standard synthetic benchmarks HEPSPEC and Coremark. The pitfalls and solutions for the non-trivial task of porting the source code to build for the ARMv7 instruction set are presented. The specific changes in the build process needed for ARM-specific portions of the software stack are described, to serve as pointers for further attempts taken up by other groups in this direction. Cases where architecture-specific tweaks at the assembler lever (both in ROOT and the LHCb software stack) were needed for a successful compile are detailed - these cases are good indicators of where/how the software stack as well as the build system can be made more portable and multi-arch friendly. The experience gained from the tasks described in this paper are intended to i) assist in making an informed choice about ARM-based server solutions as a feasible low-power alternative to the current compute nodes, and ii) revisit the software design and build system for portability and generic improvements.
GPU accelerated Monte-Carlo simulation of SEM images for metrology

NASA Astrophysics Data System (ADS)

Verduin, T.; Lokhorst, S. R.; Hagen, C. W.

2016-03-01

In this work we address the computation times of numerical studies in dimensional metrology. In particular, full Monte-Carlo simulation programs for scanning electron microscopy (SEM) image acquisition are known to be notoriously slow. Our quest in reducing the computation time of SEM image simulation has led us to investigate the use of graphics processing units (GPUs) for metrology. We have succeeded in creating a full Monte-Carlo simulation program for SEM images, which runs entirely on a GPU. The physical scattering models of this GPU simulator are identical to a previous CPU-based simulator, which includes the dielectric function model for inelastic scattering and also refinements for low-voltage SEM applications. As a case study for the performance, we considered the simulated exposure of a complex feature: an isolated silicon line with rough sidewalls located on a at silicon substrate. The surface of the rough feature is decomposed into 408 012 triangles. We have used an exposure dose of 6 mC/cm2, which corresponds to 6 553 600 primary electrons on average (Poisson distributed). We repeat the simulation for various primary electron energies, 300 eV, 500 eV, 800 eV, 1 keV, 3 keV and 5 keV. At first we run the simulation on a GeForce GTX480 from NVIDIA. The very same simulation is duplicated on our CPU-based program, for which we have used an Intel Xeon X5650. Apart from statistics in the simulation, no difference is found between the CPU and GPU simulated results. The GTX480 generates the images (depending on the primary electron energy) 350 to 425 times faster than a single threaded Intel X5650 CPU. Although this is a tremendous speedup, we actually have not reached the maximum throughput because of the limited amount of available memory on the GTX480. Nevertheless, the speedup enables the fast acquisition of simulated SEM images for metrology. We now have the potential to investigate case studies in CD-SEM metrology, which otherwise would take unreasonable amounts of computation time.
Using a high-definition stereoscopic video system to teach microscopic surgery

NASA Astrophysics Data System (ADS)

Ilgner, Justus; Park, Jonas Jae-Hyun; Labbé, Daniel; Westhofen, Martin

2007-02-01

Introduction: While there is an increasing demand for minimally invasive operative techniques in Ear, Nose and Throat surgery, these operations are difficult to learn for junior doctors and demanding to supervise for experienced surgeons. The motivation for this study was to integrate high-definition (HD) stereoscopic video monitoring in microscopic surgery in order to facilitate teaching interaction between senior and junior surgeon. Material and methods: We attached a 1280x1024 HD stereo camera (TrueVisionSystems TM Inc., Santa Barbara, CA, USA) to an operating microscope (Zeiss ProMagis, Zeiss Co., Oberkochen, Germany), whose images were processed online by a PC workstation consisting of a dual IntelÂ® XeonÂ® CPU (Intel Co., Santa Clara, CA). The live image was displayed by two LCD projectors @ 1280x768 pixels on a 1,25m rear-projection screen by polarized filters. While the junior surgeon performed the surgical procedure based on the displayed stereoscopic image, all other participants (senior surgeon, nurse and medical students) shared the same stereoscopic image from the screen. Results: With the basic setup being performed only once on the day before surgery, fine adjustments required about 10 minutes extra during the operation schedule, which fitted into the time interval between patients and thus did not prolong operation times. As all relevant features of the operative field were demonstrated on one large screen, four major effects were obtained: A) Stereoscopy facilitated orientation for the junior surgeon as well as for medical students. B) The stereoscopic image served as an unequivocal guide for the senior surgeon to demonstrate the next surgical steps to the junior colleague. C) The theatre nurse shared the same image, anticipating the next instruments which were needed. D) Medical students instantly share the information given by all staff and the image, thus avoiding the need for an extra teaching session. Conclusion: High definition stereoscopy bears the potential to compress the learning curve for undergraduate as well as postgraduate medical professionals in minimally invasive surgery. Further studies will focus on the long term effect for operative training as well as on post-processing of HD stereoscopy video content for off-line interactive medical education.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Seyong; Kim, Jungwon; Vetter, Jeffrey S

This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler; it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high- level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, thismore » design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-level programming strategy for performance portability across heterogeneous HPC architectures.« less
SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

PubMed

Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai

2015-09-18

Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yeung, Yu-Hong; Pothen, Alex; Halappanavar, Mahantesh

We present an augmented matrix approach to update the solution to a linear system of equations when the coefficient matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to performmore » $N-x$ contingency analysis, i.e., determine the state of the system when up to $x$ links from $N$ fail. Our algorithms augment the coefficient matrix to account for the changes in it, and then compute the solution to the augmented system without refactoring the modified matrix. We provide two algorithms, a direct method, and a hybrid direct-iterative method for solving the augmented system. We also exploit the sparsity of the matrices and vectors to accelerate the overall computation. Our algorithms are compared on three power grids with PARDISO, a parallel direct solver, and CHOLMOD, a direct solver with the ability to modify the Cholesky factors of the coefficient matrix. We show that our augmented algorithms outperform PARDISO (by two orders of magnitude), and CHOLMOD (by a factor of up to 5). Further, our algorithms scale better than CHOLMOD as the number of elements updated increases. The solutions are computed with high accuracy. Our algorithms are capable of computing $N-x$ contingency analysis on a $778K$ bus grid, updating a solution with $x=20$ elements in $$1.6 \\times 10^{-2}$$ seconds on an Intel Xeon processor.« less
The Acceleration of Structural Microarchitectural Simulation via Scheduling

DTIC Science & Technology

2006-11-01

193 viii List of Tables 1.1 Size of Intel R ©Processors...Table 1.1 shows the total and estimated non-cache transistor counts in succeeding generations of Intel R ©microprocessors. (Cache array transistors are...Intel486TM 1989 1,200,000 800,000 Intel R ©Pentium R © 1993 3,100,000 2,300,000 Intel R ©Pentium R ©II 1997 7,500,000 5,500,000 Intel R ©Pentium R ©III 1999
Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

DOE PAGES

Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.; ...

2017-10-14

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less

Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
The Impact of IBM Cell Technology on the Programming Paradigm in the Context of Computer Systems for Climate and Weather Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Shujia; Duffy, Daniel; Clune, Thomas

The call for ever-increasing model resolutions and physical processes in climate and weather models demands a continual increase in computing power. The IBM Cell processor's order-of-magnitude peak performance increase over conventional processors makes it very attractive to fulfill this requirement. However, the Cell's characteristics, 256KB local memory per SPE and the new low-level communication mechanism, make it very challenging to port an application. As a trial, we selected the solar radiation component of the NASA GEOS-5 climate model, which: (1) is representative of column physics components (half the total computational time), (2) has an extremely high computational intensity: the ratiomore » of computational load to main memory transfers, and (3) exhibits embarrassingly parallel column computations. In this paper, we converted the baseline code (single-precision Fortran) to C and ported it to an IBM BladeCenter QS20. For performance, we manually SIMDize four independent columns and include several unrolling optimizations. Our results show that when compared with the baseline implementation running on one core of Intel's Xeon Woodcrest, Dempsey, and Itanium2, the Cell is approximately 8.8x, 11.6x, and 12.8x faster, respectively. Our preliminary analysis shows that the Cell can also accelerate the dynamics component (~;;25percent total computational time). We believe these dramatic performance improvements make the Cell processor very competitive as an accelerator.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE PAGES

Daily, Jeffrey A.

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eghtesad, Adnan; Germaschewski, Kai; Beyerlein, Irene J.

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processingmore » unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu–Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.« less
Multi-GPU Accelerated Admittance Method for High-Resolution Human Exposure Evaluation.

PubMed

Xiong, Zubiao; Feng, Shi; Kautz, Richard; Chandra, Sandeep; Altunyurt, Nevin; Chen, Ji

2015-12-01

A multi-graphics processing unit (GPU) accelerated admittance method solver is presented for solving the induced electric field in high-resolution anatomical models of human body when exposed to external low-frequency magnetic fields. In the solver, the anatomical model is discretized as a three-dimensional network of admittances. The conjugate orthogonal conjugate gradient (COCG) iterative algorithm is employed to take advantage of the symmetric property of the complex-valued linear system of equations. Compared against the widely used biconjugate gradient stabilized method, the COCG algorithm can reduce the solving time by 3.5 times and reduce the storage requirement by about 40%. The iterative algorithm is then accelerated further by using multiple NVIDIA GPUs. The computations and data transfers between GPUs are overlapped in time by using asynchronous concurrent execution design. The communication overhead is well hidden so that the acceleration is nearly linear with the number of GPU cards. Numerical examples show that our GPU implementation running on four NVIDIA Tesla K20c cards can reach 90 times faster than the CPU implementation running on eight CPU cores (two Intel Xeon E5-2603 processors). The implemented solver is able to solve large dimensional problems efficiently. A whole adult body discretized in 1-mm resolution can be solved in just several minutes. The high efficiency achieved makes it practical to investigate human exposure involving a large number of cases with a high resolution that meets the requirements of international dosimetry guidelines.
75 FR 21353 - Intel Corporation, Fab 20 Division, Including On-Site Leased Workers From Volt Technical...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-04-23

... DEPARTMENT OF LABOR Employment and Training Administration [TA-W-73,642] Intel Corporation, Fab 20... of Intel Corporation, Fab 20 Division, including on-site leased workers of Volt Technical Resources... Precision, Inc. were employed on-site at the Hillsboro, Oregon location of Intel Corporation, Fab 20...
Using the Intel Math Kernel Library on Peregrine | High-Performance

Science.gov Websites

Computing | NREL the Intel Math Kernel Library on Peregrine Using the Intel Math Kernel Library on Peregrine Learn how to use the Intel Math Kernel Library (MKL) with Peregrine system software. MKL architectures. Core math functions in MKL include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

NASA Astrophysics Data System (ADS)

Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

2018-03-01

Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, J; Dossa, D; Gokhale, M

Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe:more » (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.« less
Intel NX to PVM 3.2 message passing conversion library

NASA Technical Reports Server (NTRS)

Arthur, Trey; Nelson, Michael L.

1993-01-01

NASA Langley Research Center has developed a library that allows Intel NX message passing codes to be executed under the more popular and widely supported Parallel Virtual Machine (PVM) message passing library. PVM was developed at Oak Ridge National Labs and has become the defacto standard for message passing. This library will allow the many programs that were developed on the Intel iPSC/860 or Intel Paragon in a Single Program Multiple Data (SPMD) design to be ported to the numerous architectures that PVM (version 3.2) supports. Also, the library adds global operations capability to PVM. A familiarity with Intel NX and PVM message passing is assumed.
A configurable distributed high-performance computing framework for satellite's TDI-CCD imaging simulation

NASA Astrophysics Data System (ADS)

Xue, Bo; Mao, Bingjing; Chen, Xiaomei; Ni, Guoqiang

2010-11-01

This paper renders a configurable distributed high performance computing(HPC) framework for TDI-CCD imaging simulation. It uses strategy pattern to adapt multi-algorithms. Thus, this framework help to decrease the simulation time with low expense. Imaging simulation for TDI-CCD mounted on satellite contains four processes: 1) atmosphere leads degradation, 2) optical system leads degradation, 3) electronic system of TDI-CCD leads degradation and re-sampling process, 4) data integration. Process 1) to 3) utilize diversity data-intensity algorithms such as FFT, convolution and LaGrange Interpol etc., which requires powerful CPU. Even uses Intel Xeon X5550 processor, regular series process method takes more than 30 hours for a simulation whose result image size is 1500 * 1462. With literature study, there isn't any mature distributing HPC framework in this field. Here we developed a distribute computing framework for TDI-CCD imaging simulation, which is based on WCF[1], uses Client/Server (C/S) layer and invokes the free CPU resources in LAN. The server pushes the process 1) to 3) tasks to those free computing capacity. Ultimately we rendered the HPC in low cost. In the computing experiment with 4 symmetric nodes and 1 server , this framework reduced about 74% simulation time. Adding more asymmetric nodes to the computing network, the time decreased namely. In conclusion, this framework could provide unlimited computation capacity in condition that the network and task management server are affordable. And this is the brand new HPC solution for TDI-CCD imaging simulation and similar applications.
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

PubMed Central

2011-01-01

Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance. PMID:21631914
Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

PubMed

Rognes, Torbjørn

2011-06-01

The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.
Abstract: Inference and Interval Estimation for Indirect Effects With Latent Variable Models.

PubMed

Falk, Carl F; Biesanz, Jeremy C

2011-11-30

Models specifying indirect effects (or mediation) and structural equation modeling are both popular in the social sciences. Yet relatively little research has compared methods that test for indirect effects among latent variables and provided precise estimates of the effectiveness of different methods. This simulation study provides an extensive comparison of methods for constructing confidence intervals and for making inferences about indirect effects with latent variables. We compared the percentile (PC) bootstrap, bias-corrected (BC) bootstrap, bias-corrected accelerated (BC a ) bootstrap, likelihood-based confidence intervals (Neale & Miller, 1997), partial posterior predictive (Biesanz, Falk, and Savalei, 2010), and joint significance tests based on Wald tests or likelihood ratio tests. All models included three reflective latent variables representing the independent, dependent, and mediating variables. The design included the following fully crossed conditions: (a) sample size: 100, 200, and 500; (b) number of indicators per latent variable: 3 versus 5; (c) reliability per set of indicators: .7 versus .9; (d) and 16 different path combinations for the indirect effect (α = 0, .14, .39, or .59; and β = 0, .14, .39, or .59). Simulations were performed using a WestGrid cluster of 1680 3.06GHz Intel Xeon processors running R and OpenMx. Results based on 1,000 replications per cell and 2,000 resamples per bootstrap method indicated that the BC and BC a bootstrap methods have inflated Type I error rates. Likelihood-based confidence intervals and the PC bootstrap emerged as methods that adequately control Type I error and have good coverage rates.
A real-time coherent dedispersion pipeline for the giant metrewave radio telescope

NASA Astrophysics Data System (ADS)

De, Kishalay; Gupta, Yashwant

2016-02-01

A fully real-time coherent dedispersion system has been developed for the pulsar back-end at the Giant Metrewave Radio Telescope (GMRT). The dedispersion pipeline uses the single phased array voltage beam produced by the existing GMRT software back-end (GSB) to produce coherently dedispersed intensity output in real time, for the currently operational bandwidths of 16 MHz and 32 MHz. Provision has also been made to coherently dedisperse voltage beam data from observations recorded on disk. We discuss the design and implementation of the real-time coherent dedispersion system, describing the steps carried out to optimise the performance of the pipeline. Presently functioning on an Intel Xeon X5550 CPU equipped with a NVIDIA Tesla C2075 GPU, the pipeline allows dispersion free, high time resolution data to be obtained in real-time. We illustrate the significant improvements over the existing incoherent dedispersion system at the GMRT, and present some preliminary results obtained from studies of pulsars using this system, demonstrating its potential as a useful tool for low frequency pulsar observations. We describe the salient features of our implementation, comparing it with other recently developed real-time coherent dedispersion systems. This implementation of a real-time coherent dedispersion pipeline for a large, low frequency array instrument like the GMRT, will enable long-term observing programs using coherent dedispersion to be carried out routinely at the observatory. We also outline the possible improvements for such a pipeline, including prospects for the upgraded GMRT which will have bandwidths about ten times larger than at present.
SU-E-T-493: Accelerated Monte Carlo Methods for Photon Dosimetry Using a Dual-GPU System and CUDA.

PubMed

Liu, T; Ding, A; Xu, X

2012-06-01

To develop a Graphics Processing Unit (GPU) based Monte Carlo (MC) code that accelerates dose calculations on a dual-GPU system. We simulated a clinical case of prostate cancer treatment. A voxelized abdomen phantom derived from 120 CT slices was used containing 218×126×60 voxels, and a GE LightSpeed 16-MDCT scanner was modeled. A CPU version of the MC code was first developed in C++ and tested on Intel Xeon X5660 2.8GHz CPU, then it was translated into GPU version using CUDA C 4.1 and run on a dual Tesla m 2 090 GPU system. The code was featured with automatic assignment of simulation task to multiple GPUs, as well as accurate calculation of energy- and material- dependent cross-sections. Double-precision floating point format was used for accuracy. Doses to the rectum, prostate, bladder and femoral heads were calculated. When running on a single GPU, the MC GPU code was found to be ×19 times faster than the CPU code and ×42 times faster than MCNPX. These speedup factors were doubled on the dual-GPU system. The dose Result was benchmarked against MCNPX and a maximum difference of 1% was observed when the relative error is kept below 0.1%. A GPU-based MC code was developed for dose calculations using detailed patient and CT scanner models. Efficiency and accuracy were both guaranteed in this code. Scalability of the code was confirmed on the dual-GPU system. © 2012 American Association of Physicists in Medicine.
Fast generation of computer-generated hologram by graphics processing unit

NASA Astrophysics Data System (ADS)

Matsuda, Sho; Fujii, Tomohiko; Yamaguchi, Takeshi; Yoshikawa, Hiroshi

2009-02-01

A cylindrical hologram is well known to be viewable in 360 deg. This hologram depends high pixel resolution.Therefore, Computer-Generated Cylindrical Hologram (CGCH) requires huge calculation amount.In our previous research, we used look-up table method for fast calculation with Intel Pentium4 2.8 GHz.It took 480 hours to calculate high resolution CGCH (504,000 x 63,000 pixels and the average number of object points are 27,000).To improve quality of CGCH reconstructed image, fringe pattern requires higher spatial frequency and resolution.Therefore, to increase the calculation speed, we have to change the calculation method. In this paper, to reduce the calculation time of CGCH (912,000 x 108,000 pixels), we employ Graphics Processing Unit (GPU).It took 4,406 hours to calculate high resolution CGCH on Xeon 3.4 GHz.Since GPU has many streaming processors and a parallel processing structure, GPU works as the high performance parallel processor.In addition, GPU gives max performance to 2 dimensional data and streaming data.Recently, GPU can be utilized for the general purpose (GPGPU).For example, NVIDIA's GeForce7 series became a programmable processor with Cg programming language.Next GeForce8 series have CUDA as software development kit made by NVIDIA.Theoretically, calculation ability of GPU is announced as 500 GFLOPS. From the experimental result, we have achieved that 47 times faster calculation compared with our previous work which used CPU.Therefore, CGCH can be generated in 95 hours.So, total time is 110 hours to calculate and print the CGCH.
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levine, Benjamin G., E-mail: ben.levine@temple.ed; Stone, John E., E-mail: johns@ks.uiuc.ed; Kohlmeyer, Axel, E-mail: akohlmey@temple.ed

2011-05-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm aremore » presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.« less
Fast Analysis of Molecular Dynamics Trajectories with Graphics Processing Units—Radial Distribution Function Histogramming

PubMed Central

Stone, John E.; Kohlmeyer, Axel

2011-01-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis. PMID:21547007
Comparison of multihardware parallel implementations for a phase unwrapping algorithm

NASA Astrophysics Data System (ADS)

Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

2018-04-01

Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.

Intel Teach to the Future: A Partnership for Professional Development.

ERIC Educational Resources Information Center

Metcalf, Teri; Jolly, Deborah

This paper describes a public/private partnership program designed to provide staff development to help classroom teachers integrate technology in the curriculum by using the train-the-trainer model. The Intel[R] Teach to the Future Project was developed by Intel[R] in collaboration with other public and private sector partners, and has been…
Anharmonicity Rise the Thermal Conductivity in Amorphous Silicon

NASA Astrophysics Data System (ADS)

Lv, Wei; Henry, Asegun

We recently proposed a new method called Direct Green-Kubo Modal Analysis (GKMA) method, which has been shown to calculate the thermal conductivity (TC) of several amorphous materials accurately. A-F method has been widely used for amorphous materials. However, researchers have found out that it failed on several different materials. The missing component of A-F method is the harmonic approximation and considering only the interactions of modes with similar frequencies, which neglect interactions of modes with large frequency difference. On the contrary, GKMA method, which is based on molecular dynamics, intrinsically includes all types of phonon interactions. In GKMA method, each mode's TC comes from both mode self-correlations (autocorrelations) and mode-mode correlations (crosscorrelations). We have demonstrated that the GKMA predicted TC of a-Si from Tersoff potential is in excellent agreement with one of experimental results. In this work, we will present the GKMA applications on a-Si using multiple potentials and gives us more insight of the effect of anharmonicity on the TC of amorphous silicon. This research was supported Intel grant AGMT DTD 1-15-13 and computational resources by NSF supported XSEDE resources under allocations DMR130105 and TG- PHY130049.
Performance of a plasma fluid code on the Intel parallel computers

NASA Technical Reports Server (NTRS)

Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.

1992-01-01

One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
A biomolecular electrostatics solver using Python, GPUs and boundary elements that can handle solvent-filled cavities and Stern layers.

PubMed

Cooper, Christopher D; Bardhan, Jaydeep P; Barba, L A

2014-03-01

The continuum theory applied to biomolecular electrostatics leads to an implicit-solvent model governed by the Poisson-Boltzmann equation. Solvers relying on a boundary integral representation typically do not consider features like solvent-filled cavities or ion-exclusion (Stern) layers, due to the added difficulty of treating multiple boundary surfaces. This has hindered meaningful comparisons with volume-based methods, and the effects on accuracy of including these features has remained unknown. This work presents a solver called PyGBe that uses a boundary-element formulation and can handle multiple interacting surfaces. It was used to study the effects of solvent-filled cavities and Stern layers on the accuracy of calculating solvation energy and binding energy of proteins, using the well-known apbs finite-difference code for comparison. The results suggest that if required accuracy for an application allows errors larger than about 2% in solvation energy, then the simpler, single-surface model can be used. When calculating binding energies, the need for a multi-surface model is problem-dependent, becoming more critical when ligand and receptor are of comparable size. Comparing with the apbs solver, the boundary-element solver is faster when the accuracy requirements are higher. The cross-over point for the PyGBe code is in the order of 1-2% error, when running on one gpu card (nvidia Tesla C2075), compared with apbs running on six Intel Xeon cpu cores. PyGBe achieves algorithmic acceleration of the boundary element method using a treecode, and hardware acceleration using gpus via PyCuda from a user-visible code that is all Python. The code is open-source under MIT license.
An empirical comparison of several recent epistatic interaction detection methods.

PubMed

Wang, Yue; Liu, Guimei; Feng, Mengling; Wong, Limsoon

2011-11-01

Many new methods have recently been proposed for detecting epistatic interactions in GWAS data. There is, however, no in-depth independent comparison of these methods yet. Five recent methods-TEAM, BOOST, SNPHarvester, SNPRuler and Screen and Clean (SC)-are evaluated here in terms of power, type-1 error rate, scalability and completeness. In terms of power, TEAM performs best on data with main effect and BOOST performs best on data without main effect. In terms of type-1 error rate, TEAM and BOOST have higher type-1 error rates than SNPRuler and SNPHarvester. SC does not control type-1 error rate well. In terms of scalability, we tested the five methods using a dataset with 100 000 SNPs on a 64 bit Ubuntu system, with Intel (R) Xeon(R) CPU 2.66 GHz, 16 GB memory. TEAM takes ~36 days to finish and SNPRuler reports heap allocation problems. BOOST scales up to 100 000 SNPs and the cost is much lower than that of TEAM. SC and SNPHarvester are the most scalable. In terms of completeness, we study how frequently the pruning techniques employed by these methods incorrectly prune away the most significant epistatic interactions. We find that, on average, 20% of datasets without main effect and 60% of datasets with main effect are pruned incorrectly by BOOST, SNPRuler and SNPHarvester. The software for the five methods tested are available from the URLs below. TEAM: http://csbio.unc.edu/epistasis/download.php BOOST: http://ihome.ust.hk/~eeyang/papers.html. SNPHarvester: http://bioinformatics.ust.hk/SNPHarvester.html. SNPRuler: http://bioinformatics.ust.hk/SNPRuler.zip. Screen and Clean: http://wpicr.wpic.pitt.edu/WPICCompGen/. wangyue@nus.edu.sg.
FPGA-based GEM detector signal acquisition for SXR spectroscopy system

NASA Astrophysics Data System (ADS)

Wojenski, A.; Pozniak, K. T.; Kasprowicz, G.; Kolasinski, P.; Krawczyk, R.; Zabolotny, W.; Chernyshova, M.; Czarski, T.; Malinowski, K.

2016-11-01

The presented work is related to the Gas Electron Multiplier (GEM) detector soft X-ray spectroscopy system for tokamak applications. The used GEM detector has one-dimensional, 128 channel readout structure. The channels are connected to the radiation-hard electronics with configurable analog stage and fast ADCs, supporting speeds of 125 MSPS for each channel. The digitalized data is sent directly to the FPGAs using fast serial links. The preprocessing algorithms are implemented in the FPGAs, with the data buffering made in the on-board 2Gb DDR3 memory chips. After the algorithmic stage, the data is sent to the Intel Xeon-based PC for further postprocessing using PCI-Express link Gen 2. For connection of multiple FPGAs, PCI-Express switch 8-to-1 was designed. The whole system can support up to 2048 analog channels. The scope of the work is an FPGA-based implementation of the recorder of the raw signal from GEM detector. Since the system will work in a very challenging environment (neutron radiation, intense electro-magnetic fields), the registered signals from the GEM detector can be corrupted. In the case of the very intense hot plasma radiation (e.g. laser generated plasma), the registered signals can overlap. Therefore, it is valuable to register the raw signals from the GEM detector with high number of events during soft X-ray radiation. The signal analysis will have the direct impact on the implementation of photon energy computation algorithms. As the result, the system will produce energy spectra and topological distribution of soft X-ray radiation. The advanced software was developed in order to perform complex system startup and monitoring of hardware units. Using the array of two one-dimensional GEM detectors it will be possible to perform tomographic reconstruction of plasma impurities radiation in the SXR region.
Using the GeoFEST Faulted Region Simulation System

NASA Technical Reports Server (NTRS)

Parker, Jay W.; Lyzenga, Gregory A.; Donnellan, Andrea; Judd, Michele A.; Norton, Charles D.; Baker, Teresa; Tisdale, Edwin R.; Li, Peggy

2004-01-01

GeoFEST (the Geophysical Finite Element Simulation Tool) simulates stress evolution, fault slip and plastic/elastic processes in realistic materials, and so is suitable for earthquake cycle studies in regions such as Southern California. Many new capabilities and means of access for GeoFEST are now supported. New abilities include MPI-based cluster parallel computing using automatic PYRAMID/Parmetis-based mesh partitioning, automatic mesh generation for layered media with rectangular faults, and results visualization that is integrated with remote sensing data. The parallel GeoFEST application has been successfully run on over a half-dozen computers, including Intel Xeon clusters, Itanium II and Altix machines, and the Apple G5 cluster. It is not separately optimized for different machines, but relies on good domain partitioning for load-balance and low communication, and careful writing of the parallel diagonally preconditioned conjugate gradient solver to keep communication overhead low. Demonstrated thousand-step solutions for over a million finite elements on 64 processors require under three hours, and scaling tests show high efficiency when using more than (order of) 4000 elements per processor. The source code and documentation for GeoFEST is available at no cost from Open Channel Foundation. In addition GeoFEST may be used through a browser-based portal environment available to approved users. That environment includes semi-automated geometry creation and mesh generation tools, GeoFEST, and RIVA-based visualization tools that include the ability to generate a flyover animation showing deformations and topography. Work is in progress to support simulation of a region with several faults using 16 million elements, using a strain energy metric to adapt the mesh to faithfully represent the solution in a region of widely varying strain.
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Men Chunhua; Romeijn, H. Edwin; Jia Xun

2010-11-15

Purpose: To develop a novel aperture-based algorithm for volumetric modulated arc therapy (VMAT) treatment plan optimization with high quality and high efficiency. Methods: The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequentialmore » way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. Results: The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. Conclusions: The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.« less
Sci-Thur PM – Brachytherapy 01: Fast brachytherapy dose calculations: Characterization of egs-brachy features to enhance simulation efficiency

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chamberland, Marc; Taylor, Randle E.P.; Rogers, Da

2016-08-15

Purpose: egs-brachy is a fast, new EGSnrc user-code for brachytherapy applications. This study characterizes egs-brachy features that enhance simulation efficiency. Methods: Calculations are performed to characterize efficiency gains from various features. Simulations include radionuclide and miniature x-ray tube sources in water phantoms and idealized prostate, breast, and eye plaque treatments. Features characterized include voxel indexing of sources to reduce boundary checks during radiation transport, scoring collision kerma via tracklength estimator, recycling photons emitted from sources, and using phase space data to initiate simulations. Bremsstrahlung cross section enhancement (BCSE), uniform bremsstrahlung splitting (UBS), and Russian Roulette (RR) are considered for electronicmore » brachytherapy. Results: Efficiency is enhanced by a factor of up to 300 using tracklength versus interaction scoring of collision kerma and by up to 2.7 and 2.6 using phase space sources and particle recycling respectively compared to simulations in which particles are initiated within sources. On a single 2.5 GHz Intel Xeon E5-2680 processor cor, simulations approximating prostate and breast permanent implant ((2 mm){sup 3} voxels) and eye plaque ((1 mm){sup 3}) treatments take as little as 9 s (prostate, eye) and up to 31 s (breast) to achieve 2% statistical uncertainty on doses within the PTV. For electronic brachytherapy, BCSE, UBS, and RR enhance efficiency by a factor >2000 compared to a factor of >10{sup 4} using a phase space source. Conclusion: egs-brachy features provide substantial efficiency gains, resulting in calculation times sufficiently fast for full Monte Carlo simulations for routine brachytherapy treatment planning.« less
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT).

PubMed

Men, Chunhua; Romeijn, H Edwin; Jia, Xun; Jiang, Steve B

2010-11-01

To develop a novel aperture-based algorithm for volumetric modulated are therapy (VMAT) treatment plan optimization with high quality and high efficiency. The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequential way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.
egs_brachy: a versatile and fast Monte Carlo code for brachytherapy

NASA Astrophysics Data System (ADS)

Chamberland, Marc J. P.; Taylor, Randle E. P.; Rogers, D. W. O.; Thomson, Rowan M.

2016-12-01

egs_brachy is a versatile and fast Monte Carlo (MC) code for brachytherapy applications. It is based on the EGSnrc code system, enabling simulation of photons and electrons. Complex geometries are modelled using the EGSnrc C++ class library and egs_brachy includes a library of geometry models for many brachytherapy sources, in addition to eye plaques and applicators. Several simulation efficiency enhancing features are implemented in the code. egs_brachy is benchmarked by comparing TG-43 source parameters of three source models to previously published values. 3D dose distributions calculated with egs_brachy are also compared to ones obtained with the BrachyDose code. Well-defined simulations are used to characterize the effectiveness of many efficiency improving techniques, both as an indication of the usefulness of each technique and to find optimal strategies. Efficiencies and calculation times are characterized through single source simulations and simulations of idealized and typical treatments using various efficiency improving techniques. In general, egs_brachy shows agreement within uncertainties with previously published TG-43 source parameter values. 3D dose distributions from egs_brachy and BrachyDose agree at the sub-percent level. Efficiencies vary with radionuclide and source type, number of sources, phantom media, and voxel size. The combined effects of efficiency-improving techniques in egs_brachy lead to short calculation times: simulations approximating prostate and breast permanent implant (both with (2 mm)3 voxels) and eye plaque (with (1 mm)3 voxels) treatments take between 13 and 39 s, on a single 2.5 GHz Intel Xeon E5-2680 v3 processor core, to achieve 2% average statistical uncertainty on doses within the PTV. egs_brachy will be released as free and open source software to the research community.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

PubMed

Daily, Jeff

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
A biomolecular electrostatics solver using Python, GPUs and boundary elements that can handle solvent-filled cavities and Stern layers

NASA Astrophysics Data System (ADS)

Cooper, Christopher D.; Bardhan, Jaydeep P.; Barba, L. A.

2014-03-01

The continuum theory applied to biomolecular electrostatics leads to an implicit-solvent model governed by the Poisson-Boltzmann equation. Solvers relying on a boundary integral representation typically do not consider features like solvent-filled cavities or ion-exclusion (Stern) layers, due to the added difficulty of treating multiple boundary surfaces. This has hindered meaningful comparisons with volume-based methods, and the effects on accuracy of including these features has remained unknown. This work presents a solver called PyGBe that uses a boundary-element formulation and can handle multiple interacting surfaces. It was used to study the effects of solvent-filled cavities and Stern layers on the accuracy of calculating solvation energy and binding energy of proteins, using the well-known APBS finite-difference code for comparison. The results suggest that if required accuracy for an application allows errors larger than about 2% in solvation energy, then the simpler, single-surface model can be used. When calculating binding energies, the need for a multi-surface model is problem-dependent, becoming more critical when ligand and receptor are of comparable size. Comparing with the APBS solver, the boundary-element solver is faster when the accuracy requirements are higher. The cross-over point for the PyGBe code is on the order of 1-2% error, when running on one GPU card (NVIDIA Tesla C2075), compared with APBS running on six Intel Xeon CPU cores. PyGBe achieves algorithmic acceleration of the boundary element method using a treecode, and hardware acceleration using GPUs via PyCuda from a user-visible code that is all Python. The code is open-source under MIT license.
GPU-Accelerated Voxelwise Hepatic Perfusion Quantification

PubMed Central

Wang, H; Cao, Y

2012-01-01

Voxelwise quantification of hepatic perfusion parameters from dynamic contrast enhanced (DCE) imaging greatly contributes to assessment of liver function in response to radiation therapy. However, the efficiency of the estimation of hepatic perfusion parameters voxel-by-voxel in the whole liver using a dual-input single-compartment model requires substantial improvement for routine clinical applications. In this paper, we utilize the parallel computation power of a graphics processing unit (GPU) to accelerate the computation, while maintaining the same accuracy as the conventional method. Using CUDA-GPU, the hepatic perfusion computations over multiple voxels are run across the GPU blocks concurrently but independently. At each voxel, non-linear least squares fitting the time series of the liver DCE data to the compartmental model is distributed to multiple threads in a block, and the computations of different time points are performed simultaneously and synchronically. An efficient fast Fourier transform in a block is also developed for the convolution computation in the model. The GPU computations of the voxel-by-voxel hepatic perfusion images are compared with ones by the CPU using the simulated DCE data and the experimental DCE MR images from patients. The computation speed is improved by 30 times using a NVIDIA Tesla C2050 GPU compared to a 2.67 GHz Intel Xeon CPU processor. To obtain liver perfusion maps with 626400 voxels in a patient’s liver, it takes 0.9 min with the GPU-accelerated voxelwise computation, compared to 110 min with the CPU, while both methods result in perfusion parameters differences less than 10−6. The method will be useful for generating liver perfusion images in clinical settings. PMID:22892645
Comparing performance of many-core CPUs and GPUs for static and motion compensated reconstruction of C-arm CT data.

PubMed

Hofmann, Hannes G; Keck, Benjamin; Rohkohl, Christopher; Hornegger, Joachim

2011-01-01

Interventional reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. Hardware optimization is not an option but mandatory for interventional image processing and, in particular, for image reconstruction due to the high demands on performance. Several groups have published fast analytical 3-D reconstruction on highly parallel hardware such as GPUs to mitigate this issue. The authors show that the performance of modern CPU-based systems is in the same order as current GPUs for static 3-D reconstruction and outperforms them for a recent motion compensated (3-D+time) image reconstruction algorithm. This work investigates two algorithms: Static 3-D reconstruction as well as a recent motion compensated algorithm. The evaluation was performed using a standardized reconstruction benchmark, RABBITCT, to get comparable results and two additional clinical data sets. The authors demonstrate for a parametric B-spline motion estimation scheme that the derivative computation, which requires many write operations to memory, performs poorly on the GPU and can highly benefit from modern CPU architectures with large caches. Moreover, on a 32-core Intel Xeon server system, the authors achieve linear scaling with the number of cores used and reconstruction times almost in the same range as current GPUs. Algorithmic innovations in the field of motion compensated image reconstruction may lead to a shift back to CPUs in the future. For analytical 3-D reconstruction, the authors show that the gap between GPUs and CPUs became smaller. It can be performed in less than 20 s (on-the-fly) using a 32-core server.
Automated high-dose rate brachytherapy treatment planning for a single-channel vaginal cylinder applicator

NASA Astrophysics Data System (ADS)

Zhou, Yuhong; Klages, Peter; Tan, Jun; Chi, Yujie; Stojadinovic, Strahinja; Yang, Ming; Hrycushko, Brian; Medin, Paul; Pompos, Arnold; Jiang, Steve; Albuquerque, Kevin; Jia, Xun

2017-06-01

High dose rate (HDR) brachytherapy treatment planning is conventionally performed manually and/or with aids of preplanned templates. In general, the standard of care would be elevated by conducting an automated process to improve treatment planning efficiency, eliminate human error, and reduce plan quality variations. Thus, our group is developing AutoBrachy, an automated HDR brachytherapy planning suite of modules used to augment a clinical treatment planning system. This paper describes our proof-of-concept module for vaginal cylinder HDR planning that has been fully developed. After a patient CT scan is acquired, the cylinder applicator is automatically segmented using image-processing techniques. The target CTV is generated based on physician-specified treatment depth and length. Locations of the dose calculation point, apex point and vaginal surface point, as well as the central applicator channel coordinates, and the corresponding dwell positions are determined according to their geometric relationship with the applicator and written to a structure file. Dwell times are computed through iterative quadratic optimization techniques. The planning information is then transferred to the treatment planning system through a DICOM-RT interface. The entire process was tested for nine patients. The AutoBrachy cylindrical applicator module was able to generate treatment plans for these cases with clinical grade quality. Computation times varied between 1 and 3 min on an Intel Xeon CPU E3-1226 v3 processor. All geometric components in the automated treatment plans were generated accurately. The applicator channel tip positions agreed with the manually identified positions with submillimeter deviations and the channel orientations between the plans agreed within less than 1 degree. The automatically generated plans obtained clinically acceptable quality.
egs_brachy: a versatile and fast Monte Carlo code for brachytherapy.

PubMed

Chamberland, Marc J P; Taylor, Randle E P; Rogers, D W O; Thomson, Rowan M

2016-12-07

egs_brachy is a versatile and fast Monte Carlo (MC) code for brachytherapy applications. It is based on the EGSnrc code system, enabling simulation of photons and electrons. Complex geometries are modelled using the EGSnrc C++ class library and egs_brachy includes a library of geometry models for many brachytherapy sources, in addition to eye plaques and applicators. Several simulation efficiency enhancing features are implemented in the code. egs_brachy is benchmarked by comparing TG-43 source parameters of three source models to previously published values. 3D dose distributions calculated with egs_brachy are also compared to ones obtained with the BrachyDose code. Well-defined simulations are used to characterize the effectiveness of many efficiency improving techniques, both as an indication of the usefulness of each technique and to find optimal strategies. Efficiencies and calculation times are characterized through single source simulations and simulations of idealized and typical treatments using various efficiency improving techniques. In general, egs_brachy shows agreement within uncertainties with previously published TG-43 source parameter values. 3D dose distributions from egs_brachy and BrachyDose agree at the sub-percent level. Efficiencies vary with radionuclide and source type, number of sources, phantom media, and voxel size. The combined effects of efficiency-improving techniques in egs_brachy lead to short calculation times: simulations approximating prostate and breast permanent implant (both with (2 mm) 3 voxels) and eye plaque (with (1 mm) 3 voxels) treatments take between 13 and 39 s, on a single 2.5 GHz Intel Xeon E5-2680 v3 processor core, to achieve 2% average statistical uncertainty on doses within the PTV. egs_brachy will be released as free and open source software to the research community.
IntellEditS: intelligent learning-based editor of segmentations.

PubMed

Harrison, Adam P; Birkbeck, Neil; Sofka, Michal

2013-01-01

Automatic segmentation techniques, despite demonstrating excellent overall accuracy, can often produce inaccuracies in local regions. As a result, correcting segmentations remains an important task that is often laborious, especially when done manually for 3D datasets. This work presents a powerful tool called Intelligent Learning-Based Editor of Segmentations (IntellEditS) that minimizes user effort and further improves segmentation accuracy. The tool partners interactive learning with an energy-minimization approach to editing. Based on interactive user input, a discriminative classifier is trained and applied to the edited 3D region to produce soft voxel labeling. The labels are integrated into a novel energy functional along with the existing segmentation and image data. Unlike the state of the art, IntellEditS is designed to correct segmentation results represented not only as masks but also as meshes. In addition, IntellEditS accepts intuitive boundary-based user interactions. The versatility and performance of IntellEditS are demonstrated on both MRI and CT datasets consisting of varied anatomical structures and resolutions.
phyA-GFP is spectroscopically and photochemically similar to phyA and comprises both its native types, phyA' and phyA''.

PubMed

Sineshchekov, Vitaly; Sudnitsin, Artem; Ádám, Éva; Schäfer, Eberhard; Viczián, András

2014-12-01

Low-temperature fluorescence investigations of phyA-GFP used in experiments on its nuclear-cytoplasmic partitioning were carried out. In etiolated hypocotyls of phyA-deficient Arabidopsis thaliana expressing phyA-GFP, it was found that it is similar to phyA in spectroscopic parameters with both its native types, phyA' and phyA'', present and their ratio shifted towards phyA'. In transgenic tobacco hypocotyls, native phyA and rice phyA-GFP were also identical to phyA in the wild type whereas phyA-GFP belonged primarily to the phyA' type. Finally, truncated oat Δ6-12 phyA-GFP expressed in phyA-deficient Arabidopsis was represented by the phyA' type in contrast to full-length oat phyA-GFP with an approximately equal proportion of the two phyA types. This correlates with a previous observation that Δ6-12 phyA-GFP can form only numerous tiny subnuclear speckles while its wild-type counterpart can also localize into bigger and fewer subnuclear protein complexes. Thus, phyA-GFP is spectroscopically and photochemically similar or identical to the native phyA, suggesting that the GFP tag does not affect the chromophore. phyA-GFP comprises phyA'-GFP and phyA''-GFP, suggesting that both of them are potential participants in nuclear-cytoplasmic partitioning, which may contribute to its complexity.
Nicotine can skew the characterization of the macrophage type-1 (M{Phi}1) phenotype differentiated with granulocyte-macrophage colony-stimulating factor to the M{Phi}2 phenotype

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yanagita, Manabu; Kobayashi, Ryohei; Murakami, Shinya, E-mail: ipshinya@dent.osaka-u.ac.jp

Macrophages (M{Phi}s) exhibit functional heterogeneity and plasticity in the local microenvironment. Recently, it was reported that M{Phi}s can be divided into proinflammatory M{Phi}s (M{Phi}1) and anti-inflammatory M{Phi}s (M{Phi}2) based on their polarized functional properties. Here, we report that nicotine, the major ingredient of cigarette smoke, can modulate the characteristics of M{Phi}1. Granulocyte-macrophage colony-stimulating factor-driven M{Phi}1 with nicotine (Ni-M{Phi}1) showed the phenotypic characteristics of M{Phi}2. Like M{Phi}2, Ni-M{Phi}1 exhibited antigen-uptake activities. Ni-M{Phi}1 suppressed IL-12, but maintained IL-10 and produced high amounts of MCP-1 upon lipopolysaccharide stimulation compared with M{Phi}1. Moreover, we observed strong proliferative responses of T cells to lipopolysaccharide-stimulated M{Phi}1,more » whereas Ni-M{Phi}1 reduced T cell proliferation and inhibited IFN-{gamma} production by T cells. These results suggest that nicotine can change the functional characteristics of M{Phi} and skew the M{Phi}1 phenotype to M{Phi}2. We propose that nicotine is a potent regulator that modulates immune responses in microenvironments.« less

Summing up the Euler [phi] Function

ERIC Educational Resources Information Center

Loomis, Paul; Plytage, Michael; Polhill, John

2008-01-01

The Euler [phi] function counts the number of positive integers less than and relatively prime to a positive integer n. Here we look at perfect totient numbers, number for which [phi](n) + [phi]([phi](n)) + [phi]([phi]([phi](n))) + ... + 1 = n.
OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

NASA Astrophysics Data System (ADS)

Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

2017-11-01

We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary solutions, respectively. The present OpenMP programs are designed for computers with multi-core processors and optimized for compiling with both commercially-licensed Intel Fortran and popular free open-source GNU Fortran compiler. The programs are easy to use and are elaborated with helpful comments for the users. All input parameters are listed at the beginning of each program. Different output files provide physical quantities such as energy, chemical potential, root-mean-square sizes, densities, etc. We also present speedup test results for new versions of the programs. Program files doi:http://dx.doi.org/10.17632/y8zk3jgn84.2 Licensing provisions: Apache License 2.0 Programming language: OpenMP GNU and Intel Fortran 90. Computer: Any multi-core personal computer or workstation with the appropriate OpenMP-capable Fortran compiler installed. Number of processors used: All available CPU cores on the executing computer. Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 1888; ibid.204 (2016) 209. Does the new version supersede the previous version?: Not completely. It does supersede previous Fortran programs from both references above, but not OpenMP C programs from Comput. Phys. Commun. 204 (2016) 209. Nature of problem: The present Open Multi-Processing (OpenMP) Fortran programs, optimized for use with commercially-licensed Intel Fortran and free open-source GNU Fortran compilers, solve the time-dependent nonlinear partial differential (GP) equation for a trapped Bose-Einstein condensate in one (1d), two (2d), and three (3d) spatial dimensions for six different trap symmetries: axially and radially symmetric traps in 3d, circularly symmetric traps in 2d, fully isotropic (spherically symmetric) and fully anisotropic traps in 2d and 3d, as well as 1d traps, where no spatial symmetry is considered. Solution method: We employ the split-step Crank-Nicolson algorithm to discretize the time-dependent GP equation in space and time. The discretized equation is then solved by imaginary- or real-time propagation, employing adequately small space and time steps, to yield the solution of stationary and non-stationary problems, respectively. Reasons for the new version: Previously published Fortran programs [1,2] have now become popular tools [3] for solving the GP equation. These programs have been translated to the C programming language [4] and later extended to the more complex scenario of dipolar atoms [5]. Now virtually all computers have multi-core processors and some have motherboards with more than one physical computer processing unit (CPU), which may increase the number of available CPU cores on a single computer to several tens. The C programs have been adopted to be very fast on such multi-core modern computers using general-purpose graphic processing units (GPGPU) with Nvidia CUDA and computer clusters using Message Passing Interface (MPI) [6]. Nevertheless, previously developed Fortran programs are also commonly used for scientific computation and most of them use a single CPU core at a time in modern multi-core laptops, desktops, and workstations. Unless the Fortran programs are made aware and capable of making efficient use of the available CPU cores, the solution of even a realistic dynamical 1d problem, not to mention the more complicated 2d and 3d problems, could be time consuming using the Fortran programs. Previously, we published auto-parallel Fortran programs [2] suitable for Intel (but not GNU) compiler for solving the GP equation. Hence, a need for the full OpenMP version of the Fortran programs to reduce the execution time cannot be overemphasized. To address this issue, we provide here such OpenMP Fortran programs, optimized for both Intel and GNU Fortran compilers and capable of using all available CPU cores, which can significantly reduce the execution time. Summary of revisions: Previous Fortran programs [1] for solving the time-dependent GP equation in 1d, 2d, and 3d with different trap symmetries have been parallelized using the OpenMP interface to reduce the execution time on multi-core processors. There are six different trap symmetries considered, resulting in six programs for imaginary-time propagation and six for real-time propagation, totaling to 12 programs included in BEC-GP-OMP-FOR software package. All input data (number of atoms, scattering length, harmonic oscillator trap length, trap anisotropy, etc.) are conveniently placed at the beginning of each program, as before [2]. Present programs introduce a new input parameter, which is designated by Number_of_Threads and defines the number of CPU cores of the processor to be used in the calculation. If one sets the value 0 for this parameter, all available CPU cores will be used. For the most efficient calculation it is advisable to leave one CPU core unused for the background system's jobs. For example, on a machine with 20 CPU cores such that we used for testing, it is advisable to use up to 19 CPU cores. However, the total number of used CPU cores can be divided into more than one job. For instance, one can run three simulations simultaneously using 10, 4, and 5 CPU cores, respectively, thus totaling to 19 used CPU cores on a 20-core computer. The Fortran source programs are located in the directory src, and can be compiled by the make command using the makefile in the root directory BEC-GP-OMP-FOR of the software package. The examples of produced output files can be found in the directory output, although some large density files are omitted, to save space. The programs calculate the values of actually used dimensionless nonlinearities from the physical input parameters, where the input parameters correspond to the identical nonlinearity values as in the previously published programs [1], so that the output files of the old and new programs can be directly compared. The output files are conveniently named such that their contents can be easily identified, following the naming convention introduced in Ref. [2]. For example, a file named -out.txt, where is a name of the individual program, represents the general output file containing input data, time and space steps, nonlinearity, energy and chemical potential, and was named fort.7 in the old Fortran version of programs [1]. A file named -den.txt is the output file with the condensate density, which had the names fort.3 and fort.4 in the old Fortran version [1] for imaginary- and real-time propagation programs, respectively. Other possible density outputs, such as the initial density, are commented out in the programs to have a simpler set of output files, but users can uncomment and re-enable them, if needed. In addition, there are output files for reduced (integrated) 1d and 2d densities for different programs. In the real-time programs there is also an output file reporting the dynamics of evolution of root-mean-square sizes after a perturbation is introduced. The supplied real-time programs solve the stationary GP equation, and then calculate the dynamics. As the imaginary-time programs are more accurate than the real-time programs for the solution of a stationary problem, one can first solve the stationary problem using the imaginary-time programs, adapt the real-time programs to read the pre-calculated wave function and then study the dynamics. In that case the parameter NSTP in the real-time programs should be set to zero and the space mesh and nonlinearity parameters should be identical in both programs. The reader is advised to consult our previous publication where a complete description of the output files is given [2]. A readme.txt file, included in the root directory, explains the procedure to compile and run the programs. We tested our programs on a workstation with two 10-core Intel Xeon E5-2650 v3 CPUs. The parameters used for testing are given in sample input files, provided in the corresponding directory together with the programs. In Table 1 we present wall-clock execution times for runs on 1, 6, and 19 CPU cores for programs compiled using Intel and GNU Fortran compilers. The corresponding columns "Intel speedup" and "GNU speedup" give the ratio of wall-clock execution times of runs on 1 and 19 CPU cores, and denote the actual measured speedup for 19 CPU cores. In all cases and for all numbers of CPU cores, although the GNU Fortran compiler gives excellent results, the Intel Fortran compiler turns out to be slightly faster. Note that during these tests we always ran only a single simulation on a workstation at a time, to avoid any possible interference issues. Therefore, the obtained wall-clock times are more reliable than the ones that could be measured with two or more jobs running simultaneously. We also studied the speedup of the programs as a function of the number of CPU cores used. The performance of the Intel and GNU Fortran compilers is illustrated in Fig. 1, where we plot the speedup and actual wall-clock times as functions of the number of CPU cores for 2d and 3d programs. We see that the speedup increases monotonically with the number of CPU cores in all cases and has large values (between 10 and 14 for 3d programs) for the maximal number of cores. This fully justifies the development of OpenMP programs, which enable much faster and more efficient solving of the GP equation. However, a slow saturation in the speedup with the further increase in the number of CPU cores is observed in all cases, as expected. The speedup tends to increase for programs in higher dimensions, as they become more complex and have to process more data. This is why the speedups of the supplied 2d and 3d programs are larger than those of 1d programs. Also, for a single program the speedup increases with the size of the spatial grid, i.e., with the number of spatial discretization points, since this increases the amount of calculations performed by the program. To demonstrate this, we tested the supplied real2d-th program and varied the number of spatial discretization points NX=NY from 20 to 1000. The measured speedup obtained when running this program on 19 CPU cores as a function of the number of discretization points is shown in Fig. 2. The speedup first increases rapidly with the number of discretization points and eventually saturates. Additional comments: Example inputs provided with the programs take less than 30 minutes to run on a workstation with two Intel Xeon E5-2650 v3 processors (2 QPI links, 10 CPU cores, 25 MB cache, 2.3 GHz).



      
      Spectral turning bands for efficient Gaussian random fields generation on GPUs and accelerators
      NASA Astrophysics Data System (ADS)
      Hunger, L.; Cosenza, B.; Kimeswenger, S.; Fahringer, T.
         2015-11-01
         A random field (RF) is a set of correlated random variables associated with different spatial locations. RF generation algorithms are of crucial importance for many scientific areas, such as astrophysics, geostatistics, computer graphics, and many others. Current approaches commonly make use of 3D fast Fourier transform (FFT), which does not scale well for RF bigger than the available memory; they are also limited to regular rectilinear meshes. We introduce random field generation with the turning band method (RAFT), an RF generation algorithm based on the turning band method that is optimized for massively parallel hardware such as GPUs and accelerators. Our algorithm replaces the 3D FFT with a lower-order, one-dimensional FFT followed by a projection step and is further optimized with loop unrolling and blocking. RAFT can easily generate RF on non-regular (non-uniform) meshes and efficiently produce fields with mesh sizes bigger than the available device memory by using a streaming, out-of-core approach. Our algorithm generates RF with the correct statistical behavior and is tested on a variety of modern hardware, such as NVIDIA Tesla, AMD FirePro and Intel Phi. RAFT is faster than the traditional methods on regular meshes and has been successfully applied to two real case scenarios: planetary nebulae and cosmological simulations.
      

      
      Understanding the branching ratios of {chi}{sub c1{yields}{phi}{phi}}, {omega}{omega}, {omega}{phi} observed at BES-III
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Chen Dianyong; He Jun; Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000
         
         In this work, we discuss the contribution of the mesonic loops to the decay rates of {chi}{sub c1{yields}{phi}{phi}}, {omega}{omega}, which are suppressed by the helicity selection rules and {chi}{sub c1{yields}{phi}{omega}}, which is a double-Okubo-Zweig-Iizuka forbidden process. We find that the mesonic loop effects naturally explain the clear signals of {chi}{sub c1{yields}{phi}{phi}}, {omega}{omega} decay modes observed by the BES Collaboration. Moreover, we investigate the effects of the {omega}-{phi} mixing, which may result in the order of magnitude of the branching ratio BR({chi}{sub c1{yields}{omega}{phi}}) being 10{sup -7}. Thus, we are waiting for the accurate measurements of the BR({chi}{sub c1{yields}{omega}{omega}}), BR({chi}{sub c1{yields}{phi}{phi}}), andmore » BR({chi}{sub c1{yields}{omega}{phi}}), which may be very helpful for testing the long-distant contribution and the {omega}-{phi} mixing in {chi}{sub c1{yields}{phi}{phi}}, {omega}{omega}, {omega}{phi} decays.« less
      

      
      Performance Evaluation of Parallel Branch and Bound Search with the Intel iPSC (Intel Personal SuperComputer) Hypercube Computer.
      DTIC Science & Technology
      
         1986-12-01
         17 III. Analysis of Parallel Design ................................................ 18 Parallel Abstract Data ...Types ........................................... 18 Abstract Data Type .................................................. 19 Parallel ADT...22 Data -Structure Design ........................................... 23 Object-Oriented Design
      

      
      ICON-MIC: Implementing a CPU/MIC Collaboration Parallel Framework for ICON on Tianhe-2 Supercomputer.
      PubMed
      Wang, Zihao; Chen, Yu; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Liu, Zhiyong; Sun, Fei; Zhang, Fa
         2018-03-01
         Electron tomography (ET) is an important technique for studying the three-dimensional structures of the biological ultrastructure. Recently, ET has reached sub-nanometer resolution for investigating the native and conformational dynamics of macromolecular complexes by combining with the sub-tomogram averaging approach. Due to the limited sampling angles, ET reconstruction typically suffers from the "missing wedge" problem. Using a validation procedure, iterative compressed-sensing optimized nonuniform fast Fourier transform (NUFFT) reconstruction (ICON) demonstrates its power in restoring validated missing information for a low-signal-to-noise ratio biological ET dataset. However, the huge computational demand has become a bottleneck for the application of ICON. In this work, we implemented a parallel acceleration technology ICON-many integrated core (MIC) on Xeon Phi cards to address the huge computational demand of ICON. During this step, we parallelize the element-wise matrix operations and use the efficient summation of a matrix to reduce the cost of matrix computation. We also developed parallel versions of NUFFT on MIC to achieve a high acceleration of ICON by using more efficient fast Fourier transform (FFT) calculation. We then proposed a hybrid task allocation strategy (two-level load balancing) to improve the overall performance of ICON-MIC by making full use of the idle resources on Tianhe-2 supercomputer. Experimental results using two different datasets show that ICON-MIC has high accuracy in biological specimens under different noise levels and a significant acceleration, up to 13.3 × , compared with the CPU version. Further, ICON-MIC has good scalability efficiency and overall performance on Tianhe-2 supercomputer.
      

      
      Hierarchical algorithms for modeling the ocean on hierarchical architectures
      NASA Astrophysics Data System (ADS)
      Hill, C. N.
         2012-12-01
         This presentation will describe an approach to using accelerator/co-processor technology that maps hierarchical, multi-scale modeling techniques to an underlying hierarchical hardware architecture. The focus of this work is on making effective use of both CPU and accelerator/co-processor parts of a system, for large scale ocean modeling. In the work, a lower resolution basin scale ocean model is locally coupled to multiple, "embedded", limited area higher resolution sub-models. The higher resolution models execute on co-processor/accelerator hardware and do not interact directly with other sub-models. The lower resolution basin scale model executes on the system CPU(s). The result is a multi-scale algorithm that aligns with hardware designs in the co-processor/accelerator space. We demonstrate this approach being used to substitute explicit process models for standard parameterizations. Code for our sub-models is implemented through a generic abstraction layer, so that we can target multiple accelerator architectures with different programming environments. We will present two application and implementation examples. One uses the CUDA programming environment and targets GPU hardware. This example employs a simple non-hydrostatic two dimensional sub-model to represent vertical motion more accurately. The second example uses a highly threaded three-dimensional model at high resolution. This targets a MIC/Xeon Phi like environment and uses sub-models as a way to explicitly compute sub-mesoscale terms. In both cases the accelerator/co-processor capability provides extra compute cycles that allow improved model fidelity for little or no extra wall-clock time cost.
      

      
      Single event effect testing of the Intel 80386 family and the 80486 microprocessor
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Moran, A.; LaBel, K.; Gates, M.
         
         The authors present single event effect test results for the Intel 80386 microprocessor, the 80387 coprocessor, the 82380 peripheral device, and on the 80486 microprocessor. Both single event upset and latchup conditions were monitored.
      

      
      FastLane: An Agile Congestion Signaling Mechanism for Improving Datacenter Performance
      DTIC Science & Technology
      
         2013-05-20
         Cloudera, Ericsson, Facebook, General Electric, Hortonworks, Huawei , Intel, Microsoft, NetApp, Oracle, Quanta, Samsung, Splunk, VMware and Yahoo...Web Services, Google, SAP, Blue Goji, Cisco, Clearstory Data, Cloud- era, Ericsson, Facebook, General Electric, Hortonworks, Huawei , Intel, Microsoft
      

      
      76 FR 32372 - Notice of Receipt of Complaint; Solicitation of Comments Relating to the Public Interest
      Federal Register 2010, 2011, 2012, 2013, 2014
      
         2011-06-06
         ... Rica S.A. of Costa Rica, Intel Malaysia Sdn. Bhd of Malaysia, Intel (Philippines) of the Philippines... any public health, safety, or welfare concerns in the United States relating to the potential orders...
      

      
      The development of two postnatal health instruments: one for mothers (M-PHI) and one for fathers (F-PHI) to measure health during the first year of parenting.
      PubMed
      Jones, G L; Morrell, C J; Cooke, J M; Speier, D; Anumba, D; Stewart-Brown, S
         2011-09-01
         To develop and psychometrically evaluate two questionnaires measuring both positive and negative postnatal health of mothers (M-PHI) and fathers (F-PHI) during the first year of parenting. The M-PHI and the F-PHI were developed in four stages. Stage 1: Postnatal women's focus group (M-PHI) and postnatal fathers' postal questionnaire (F-PHI); Stage 2: Qualitative interviews; Stage 3: Pilot postal survey and main postal survey; and Stage 4: Test-retest postal survey. The M-PHI consisted of a 29-item core questionnaire with six main scales and five conditional scales. The F-PHI consisted of a 27-item questionnaire with six main scales. All scales achieved good internal reliability (Cronbach's α 0.66-0.87 for M-PHI, 0.72-0.90 for F-PHI). Intraclass correlation coefficients demonstrated high test-retest reliability (0.60-0.88). Correlation coefficients supported the criterion validity of the M-PHI and the F-PHI when tested against the Short-Form-12 (SF-12), Edinburgh Postnatal Depression Scale (EPDS) and the Warwick and Edinburgh Mental Well-Being Scale (WEMWBS). The M-PHI and F-PHI are valid, reliable, parent-generated instruments. These unique instruments will be invaluable for practitioners wishing to promote family-centred care and for trialists and other researchers requiring a validated instrument to measure both positive and negative health during the first postnatal year, as to date no such measurement has existed.
      

      
      PKS1 plays a role in red-light-based positive phototropism in roots.
      PubMed
      Molas, Maria Lia; Kiss, John Z
         2008-06-01
         Aerial parts of plants curve towards the light (i.e. positive phototropism), and roots typically grow away from the light (i.e. negative phototropism). In addition, Arabidopsis roots exhibit positive phototropism relative to red light (RL), and this response is mediated by phytochromes A and B (phyA and phyB). Upon light stimulation, phyA and phyB interact with the phytochrome kinase substrate (PKS1) in the cytoplasm. In this study, we investigated the role of PKS1, along with phyA and phyB, in the positive phototropic responses to RL in roots. Using a high-resolution feedback system, we studied the phenotypic responses of roots of phyA, phyB, pks1, phyA pks1 and phyB pks1 null mutants as well as the PKS1-overexpressing line in response to RL. PKS1 emerged as an intermediary in the signalling pathways and appears to promote a negative curvature to RL in roots. In addition, phyA and phyB were both essential for a positive response to RL and act in a complementary fashion. However, either photoreceptor acting without the other results in negative curvature in response to red illumination so that the mode of action differs depending on whether phyA and phyB act independently or together. Our results suggest that PKS1 is part of a signalling pathway independent of phyA and phyB and that PKS1 modulates RL-based root phototropism.
      

      
      SiRen: Leveraging Similar Regions for Efficient and Accurate Variant Calling
      DTIC Science & Technology
      
         2015-05-30
         Cloudera, EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware...EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware
      

      
      General dynamical properties of cosmological models with nonminimal kinetic coupling
      NASA Astrophysics Data System (ADS)
      Matsumoto, Jiro; Sushkov, Sergey V.
         2018-01-01
         We consider cosmological dynamics in the theory of gravity with the scalar field possessing the nonminimal kinetic coupling to curvature given as η Gμνphi,μphi,ν, where η is an arbitrary coupling parameter, and the scalar potential V(phi) which assumed to be as general as possible. With an appropriate dimensionless parametrization we represent the field equations as an autonomous dynamical system which contains ultimately only one arbitrary function χ (x)= 8 π | η | V(x/√8 π) with x=√8 πphi. Then, assuming the rather general properties of χ(x), we analyze stationary points and their stability, as well as all possible asymptotical regimes of the dynamical system. It has been shown that for a broad class of χ(x) there exist attractors representing three accelerated regimes of the Universe evolution, including de Sitter expansion (or late-time inflation), the Little Rip scenario, and the Big Rip scenario. As the specific examples, we consider a power-law potential V(phi)=M4(phi/phi0)σ, Higgs-like potential V(phi)=λ/4(phi2‑phi02)2, and exponential potential V(phi)=M4 e‑phi/phi0.
      

      
      How Managers' everyday decisions create or destroy your company's strategy.
      PubMed
      Bower, Joseph L; Gilbert, Clark G
         2007-02-01
         Senior executives have long been frustrated by the disconnection between the plans and strategies they devise and the actual behavior of the managers throughout the company. This article approaches the problem from the ground up, recognizing that every time a manager allocates resources, that decision moves the company either into or out of alignment with its announced strategy. A well-known story--Intel's exit from the memory business--illustrates this point. When discussing what businesses Intel should be in, Andy Grove asked Gordon Moore what they would do if Intel were a company that they had just acquired. When Moore answered, "Get out of memory," they decided to do just that. It turned out, though, that Intel's revenues from memory were by this time only 4% of total sales. Intel's lower-level managers had already exited the business. What Intel hadn't done was to shut down the flow of research funding into memory (which was still eating up one-third of all research expenditures); nor had the company announced its exit to the outside world. Because divisional and operating managers-as well as customers and capital markets-have such a powerful impact on the realized strategy of the firm, senior management might consider focusing less on the company's formal strategy and more on the processes by which the company allocates resources. Top managers must know the track record of the people who are making resource allocation proposals; recognize the strategic issues at stake; reach down to operational managers to work across division lines; frame resource questions to reflect the corporate perspective, especially when large sums of money are involved and conditions are highly uncertain; and create a new context that allows top executives to circumvent the regular resource allocation process when necessary.
      

      
      Lysine 206 in Arabidopsis phytochrome A is the major site for ubiquitin-dependent protein degradation.
      PubMed
      Rattanapisit, Kaewta; Cho, Man-Ho; Bhoo, Seong Hee
         2016-02-01
         Phytochrome A (phyA) is a light labile phytochrome that mediates plant development under red/far-red light condition. Degradation of phyA is initiated by red light-induced phyA-ubiquitin conjugation through the 26S proteasome pathway. The N-terminal of phyA is known to be important in phyA degradation. To determine the specific lysine residues in the N-terminal domain of phyA involved in light-induced ubiquitination and protein degradation, we aligned the amino acid sequence of the N-terminal domain of Arabidopsis phyA with those of phyA from other plant species. Based on the alignment results, phytochrome over-expressing Arabidopsis plants were generated. In particular, wild-type and mutant (substitutions of conserved lysines by arginines) phytochromes fused with GFP were expressed in phyA(-)211 Arabidopsis plants. Degradation kinetics of over-expressed phyA proteins revealed that degradation of the K206R phyA mutant protein was delayed. Delayed phyA degradation of the K206R phyA mutant protein resulted in reduction of red-light-induced phyA-ubiquitin conjugation. Furthermore, seedlings expressing the K206R phyA mutant protein showed an enhanced phyA response under far-red light, resulting in inhibition of hypocotyl elongation as well as cotyledon opening. Together, these results suggest that lysine 206 is the main lysine for rapid ubiquitination and protein degradation of Arabidopsis phytochrome A. © The Authors 2015. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.
      

      
      Long Range Strategy v3.2
      DTIC Science & Technology
      
         2010-01-12
         Feed into Palantir Increased intelligence from Population Feed into Palantir OOB, DOCTEMP, SITTEMP, created and validated ANSF integrated into S2 intel...connected from sensor to C2/executor All assets input intel into Palantir to ID the enemy; prioritize: 1) Zabul, 2) Kandahar; sub-prioritize
      

      
      Bacteriophage phi11 lysin: physicochemical characterization and comparison with phage phi80a lysin
      USDA-ARS?s Scientific Manuscript database
      
         
         Phage lytic enzymes are promising antimicrobial agents. Lysins of phage phi11 (LysPhi11) and phi80a (LysPhi80a) can lyse (destroy) biofilms and cells of antibiotic-resistant strains of Staphylococcus aureus. Stability of enzymes is one of the parameters making their practical use possible. The obj...
      

      
      Arabidopsis fhl/fhy1 double mutant reveals a distinct cytoplasmic action of phytochrome A
      PubMed Central
      Rösler, Jutta; Klein, Ilse; Zeidler, Mathias
         2007-01-01
         Phytochrome A (phyA) plays an important role during germination and early seedling development. Because phyA is the primary photoreceptor for the high-irradiance response and the very-low-fluence response, it can trigger development not only in red and far-red (FR) light but also in a wider range of light qualities. Although phyA action is generally associated with translocation to the nucleus and regulation of transcription, there is evidence for additional cytoplasmic functions. Because nuclear accumulation of phyA has been shown to depend on far-red-elongated hypocotyl 1 (FHY1) and FHL (FHY1-like), investigation of phyA function in a double fhl/fhy1 mutant might be valuable in revealing the mechanism of phyA translocation and possible cytoplasmic functions. In fhl/fhy1, the FR-triggered nuclear translocation of phyA could no longer be detected but could be restored by transgenic expression of CFP:FHY1. Whereas the fhl/fhy1 mutant showed a phyA phenotype in respect to hypocotyl elongation and cotyledon opening under high-irradiance response conditions as well as a typical phyA germination phenotype under very-low-fluence response conditions, fhl/fhy1 showed no phenotype with respect to the phyA-dependent abrogation of negative gravitropism in blue light and in red-enhanced phototropism, demonstrating clear cytoplasmic functions of phyA. Disturbance of phyA nuclear import in fhl/fhy1 led to formation of FR-induced phyA:GFP cytoplasmic foci resembling the sequestered areas of phytochrome. FHY1 and FHL play crucial roles in phyA nuclear translocation and signaling. Thus the double-mutant fhl/fhy1 allows nuclear and cytoplasmic phyA functions to be separated, leading to the novel identification of cytoplasmic phyA responses. PMID:17566111
      

      
      Interactions of phytochromes A, B1 and B2 in light-induced competence for adventitious shoot formation in hypocotyl of tomato (Solanum lycopersicum L.).
      PubMed
      Lercari, B; Bertram, L
         2004-02-01
         The interactions of phytochrome A (phyA), phytochrome B1 (phyB1) and phytochrome B2 (phyB2) in light-dependent shoot regeneration from the hypocotyl of tomato was analysed using all eight possible homozygous allelic combinations of the null mutants. The donor plants were pre-grown either in the dark or under red or far-red light for 8 days after sowing; thereafter hypocotyl segments (apical, middle and basal portions) were transferred onto hormone-free medium for culture under different light qualities. Etiolated apical segments cultured in vitro under white light showed a very high frequency of regeneration for all of the genotypes tested besides phyB1phyB2, phyAphyB1 and phyAphyB1phyB2 mutants. Evidence is provided of a specific interference of phyB2 with phyA-mediated HIR to far-red and blue light in etiolated explants. Pre-treatment of donor plants by growth under red light enhanced the competence of phyB1phyB2, phyAphyB1 and phyAphyB1phyB2 mutants for shoot regeneration, whereas pre-irradiation with far-red light enhanced the frequency of regeneration only in the phyAphyB1 mutant. Multiple phytochromes are involved in red light- and far-red light-dependent acquisition of competence for shoot regeneration. The position of the segments along the hypocotyl influenced the role of the various phytochromes and the interactions between them. The culture of competent hypocotyl segments under red, far-red or blue light reduced the frequency of explants forming shoots compared to those cultured under white light, with different genotypes having different response patterns.


   
       
            
              
          

«

15
      16
      17
   18
      19
      »

          
        

           
           
             
               
      
      Canonical single field slow-roll inflation with a non-monotonic tensor-to-scalar ratio
      NASA Astrophysics Data System (ADS)
      Germán, Gabriel; Herrera-Aguilar, Alfredo; Hidalgo, Juan Carlos; Sussman, Roberto A.
         2016-05-01
         We take a pragmatic, model independent approach to single field slow-roll canonical inflation by imposing conditions, not on the potential, but on the slow-roll parameter epsilon(phi) and its derivatives epsilon'(phi) and epsilon''(phi), thereby extracting general conditions on the tensor-to-scalar ratio r and the running nsk at phiH where the perturbations are produced, some 50-60 e-folds before the end of inflation. We find quite generally that for models where epsilon(phi) develops a maximum, a relatively large r is most likely accompanied by a positive running while a negligible tensor-to-scalar ratio implies negative running. The definitive answer, however, is given in terms of the slow-roll parameter ξ2(phi). To accommodate a large tensor-to-scalar ratio that meets the limiting values allowed by the Planck data, we study a non-monotonic epsilon(phi) decreasing during most part of inflation. Since at phiH the slow-roll parameter epsilon(phi) is increasing, we thus require that epsilon(phi) develops a maximum for phi > phiH after which epsilon(phi) decrease to small values where most e-folds are produced. The end of inflation might occur trough a hybrid mechanism and a small field excursion Δphie ≡ |phiH-phie| is obtained with a sufficiently thin profile for epsilon(phi) which, however, should not conflict with the second slow-roll parameter η(phi). As a consequence of this analysis we find bounds for Δphie, rH and for the scalar spectral index nsH. Finally we provide examples where these considerations are explicitly realised.
      

      
      Connecting Effective Instruction and Technology. Intel-elebration: Safari.
      ERIC Educational Resources Information Center
      Burton, Larry D.; Prest, Sharon
         
         Intel-ebration is an attempt to integrate the following research-based instructional frameworks and strategies: (1) dimensions of learning; (2) multiple intelligences; (3) thematic instruction; (4) cooperative learning; (5) project-based learning; and (6) instructional technology. This paper presents a thematic unit on safari, using the…
      

      
      Proton Irradiation of the 16GB Intel Optane SSD
      NASA Technical Reports Server (NTRS)
      Wyrwas, E. J.
         2017-01-01
         The purpose of this test is to assess the single event effects (SEE) and radiation susceptibility of the Intel Optane Memory device (SSD) containing the 3D Xpoint phase change memory (PCM) technology. This test is supported by the NASA Electronics Parts and Packaging Program (NEPP).
      

      
      Incentive Spirometry after Lung Resection: A Randomized Controlled Trial.
      PubMed
      Malik, Peter Ra; Fahim, Christine; Vernon, Jordyn; Thomas, Priya; Schieman, Colin; Finley, Christian J; Agzarian, John; Shargall, Yaron; Farrokhyar, Forough; Hanna, Wael C
         2018-04-24
         Incentive spirometry (IS) is thought to reduce the incidence of postoperative pulmonary complications (PPC) after lung resection. We sought to determine whether the addition of IS to routine physiotherapy following lung resection results in a lower rate of PPC, as compared to physiotherapy alone. A single-blind prospective randomized controlled trial was conducted in adults undergoing lung resection. Individuals with previous lung surgery or home oxygen were excluded. Participants randomized to the control arm (PHY) received routine physiotherapy alone (deep breathing, ambulation and shoulder exercises). Those randomized to the intervention arm (PHY/IS) received IS in addition to routine physiotherapy. The trial was powered to detect a 10% difference in the rate of PPC (beta=80%). Student's t-test and chi-square were utilized for continuous and categorical variables respectively, with a significance level of p=0.05. A total of 387 participants (n=195 PHY/IS; n=192 PHY) were randomized between 2014-2017. Baseline characteristics were comparable for both arms. The majority of patients underwent a pulmonary lobectomy (PHY/IS=59.5%, PHY=61.0%, p=0.84), with no difference in the rates of minimally invasive and open procedures. There were no differences in the incidence of PPC at 30 days postoperatively (PHY/IS=12.3%, PHY=13.0%, p=0.88). There were no differences in rates of pneumonia (PHY/IS=4.6%, PHY=7.8%, p=0.21), mechanical ventilation (PHY/IS=2.1%, PHY=1.0%, p=0.41), home-oxygen (PHY/IS=13.8%, PHY=14.6%, p=0.89), hospital length of stay (PHY/IS=4 days, PHY=4 days, p=0.34), or rate of readmission to hospital (PHY/IS=10.3%, PH=9.9%, p=1.00). The addition of IS to routine postoperative physiotherapy does not reduce the incidence of PPC after lung resection. Copyright © 2018. Published by Elsevier Inc.
      

      
      Purification and characterization of two distinct acidic phytases with broad pH stability from Aspergillus niger NCIM 563.
      PubMed
      Soni, S K; Magdum, A; Khire, J M
         2010-11-01
         Aspergillus niger NCIM 563 produced two different extracellular phytases (Phy I and Phy II) under submerged fermentation conditions at 30°C in medium containing dextrin-glucose-sodium nitrate-salts. Both the enzymes were purified to homogeneity using Rotavapor concentration, Phenyl-Sepharose column chromatography and Sephacryl S-200 gel filtration. The molecular mass of Phy I and II as determined by SDS-PAGE and gel filtration were 66, 264, 150 and 148 kDa respectively, indicating that Phy I consists of four identical subunits and Phy II is a monomer. The pI values of Phy I and II were 3.55 and 3.91, respectively. Phy I was highly acidic with optimum pH of 2.5 and was stable over a broad pH range (1.5-9.0) while Phy II showed a pH optimum of 5.0 with stability in the range of pH 3.5-9.0. Phy I exhibited very broad substrate specificity while Phy II was more specific for sodium phytate. Similarly Phy II was strongly inhibited by Ag(+), Hg(2+) (1 mM) metal ions and Phy I was partially inhibited. Peptide analysis by Mass Spectrometry (MS) MALDI-TOF also indicated that both the proteins were totally different. The K(m) for Phy I and II for sodium phytate was 2.01 and 0.145 mM while V(max) was 5,018 and 1,671 μmol min(-1) mg(-1), respectively. The N-terminal amino acid sequences of Phy I and Phy II were FSYGAAIPQQ and GVDERFPYTG, respectively. Phy II showed no homology with Phy I and any other known phytases from the literature suggesting its unique nature. This, according to us, is the first report of two distinct novel phytases from Aspergillus niger.
      

      
      Phytochrome B Requires PIF Degradation and Sequestration to Induce Light Responses Across a Wide Range of Light Conditions.
      PubMed
      Park, Eunae; Kim, Yeojae; Choi, Giltsu
         2018-05-15
         Phytochrome B (phyB) inhibits the function of phytochrome-interacting factors (PIFs) by inducing their degradation and sequestration, but the relative physiological importance of these two phyB activities is unclear. In an analysis of published Arabidopsis thaliana phyB mutations, we identified a point mutation in the N-terminal half of phyB (phyBG111D) that abolishes its PIF sequestration activity without affecting its PIF degradation activity. We also identified a point mutation in the phyB C-terminal domain, which, when combined with a deletion of the C-terminal end (phyB990G767R), does the opposite; it blocks PIF degradation without affecting PIF sequestration. The resulting phyB proteins, phyB990G767R and phyBG111D, are equally capable of inducing light responses under continuous red light. However, phyBG111D, which exhibits only the PIF degradation activity, induces stronger light responses than phyB990G767R under white light with prolonged dark periods (i.e., diurnal cycles). In contrast, phyB990G767R, which exhibits only the PIF sequestration activity, induces stronger light responses in flickering light (a condition that mimics sunflecks). Together, our results indicate that both of these separable phyB activities are required for light responses in varying light conditions. © 2018 American Society of Plant Biologists. All rights reserved.
      

      
      Role of the N*(1535) in pp{yields}pp{phi} and {pi}{sup -}p{yields}n{phi} reactions
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Xie Jujun; Graduate University of Chinese Academy of Sciences, Beijing 100049; Zou Bingsong
         2008-01-15
         The near-threshold {phi}-meson production in proton-proton and {pi}{sup -}p collisions is studied with the assumption that the production mechanism is due to the sub-N{phi}-threshold N*(1535) resonance. The {pi}{sup 0}-,{eta}-, and {rho}{sup 0}-meson exchanges for proton-proton collisions are considered. It is shown that the contribution to the pp{yields}pp{phi} reaction from the t-channel {pi}{sup 0}-meson exchange is dominant. With a significant N*(1535)N{phi} coupling [g{sub N*(1535)N{phi}}{sup 2}/4{pi}=0.13], both pp{yields}pp{phi} and {pi}{sup -}p{yields}n{phi} data are very well reproduced. The significant coupling of the N*(1535) resonance to N{phi} is compatible with previous indications of a large ss component in the quark wave function of themore » N*(1535) resonance and may be the real origin of the significant enhancement of the {phi} production over the naive OZI-rule predictions.« less
      

      
      The differential production cross section of the $$\\phi $$ (1020) meson in $$\\sqrt{s}$$ TeV $pp$ collisions measured with the ATLAS detector
      DOE PAGES
      Aad, G.; Abajyan, T.; Abbott, B.; ...
         2014-07-01
         Ameasurement is presented of themore » $$\\phi $$×BR($$\\phi $$ →K < sup > + < /sup > K < sup > - < /sup > ) production cross section at √s = 7 TeV using pp collision data corresponding to an integrated luminosity of 383 μb -1, collected with the ATLAS experiment at the HC. Selection of $$\\phi $$(1020) mesons is based on the identification of charged kaons by their energy loss in the pixel detector. The differential cross section ismeasured as a function of the transverse momentum, pT,$$\\phi $$ , and rapidity, y$$\\phi $$, of the $$\\phi $$(1020) meson in the fiducial region 500 < pT,$$\\phi $$ < 1200MeV, |y$$\\phi $$ | < 0.8, kaon p T,K > 230 MeV and kaon momentum p K < 800 MeV. The integrated $$\\phi $$(1020)-meson production cross section in this fiducial range is measured to be sφ×BR($$\\phi $$ →K < sup > + < /sup > K < sup > - < /sup > ) = 570 ± 8 (stat) ± 66 (syst) ± 20 (lumi) μb.« less
      

      
      WinHPC System Programming | High-Performance Computing | NREL
      Science.gov Websites
      
         
         Programming WinHPC System Programming Learn how to build and run an MPI (message passing interface (mpi.h) and library (msmpi.lib) are. To build from the command line, run... Start > Intel Software Development Tools > Intel C++ Compiler Professional... > C++ Build Environment for applications running
      

      
      SU-E-T-628: A Cloud Computing Based Multi-Objective Optimization Method for Inverse Treatment Planning.
      PubMed
      Na, Y; Suh, T; Xing, L
         2012-06-01
         Multi-objective (MO) plan optimization entails generation of an enormous number of IMRT or VMAT plans constituting the Pareto surface, which presents a computationally challenging task. The purpose of this work is to overcome the hurdle by developing an efficient MO method using emerging cloud computing platform. As a backbone of cloud computing for optimizing inverse treatment planning, Amazon Elastic Compute Cloud with a master node (17.1 GB memory, 2 virtual cores, 420 GB instance storage, 64-bit platform) is used. The master node is able to scale seamlessly a number of working group instances, called workers, based on the user-defined setting account for MO functions in clinical setting. Each worker solved the objective function with an efficient sparse decomposition method. The workers are automatically terminated if there are finished tasks. The optimized plans are archived to the master node to generate the Pareto solution set. Three clinical cases have been planned using the developed MO IMRT and VMAT planning tools to demonstrate the advantages of the proposed method. The target dose coverage and critical structure sparing of plans are comparable obtained using the cloud computing platform are identical to that obtained using desktop PC (Intel Xeon® CPU 2.33GHz, 8GB memory). It is found that the MO planning speeds up the processing of obtaining the Pareto set substantially for both types of plans. The speedup scales approximately linearly with the number of nodes used for computing. With the use of N nodes, the computational time is reduced by the fitting model, 0.2+2.3/N, with r̂2>0.99, on average of the cases making real-time MO planning possible. A cloud computing infrastructure is developed for MO optimization. The algorithm substantially improves the speed of inverse plan optimization. The platform is valuable for both MO planning and future off- or on-line adaptive re-planning. © 2012 American Association of Physicists in Medicine.
      

      
      MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures
      PubMed Central
      Díaz, David; Esteban, Francisco J.; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio
         2014-01-01
         We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications. PMID:24710354
      

      
      Computing effective properties of random heterogeneous materials on heterogeneous parallel processors
      NASA Astrophysics Data System (ADS)
      Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto
         2012-11-01
         In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
      

      
      WE-D-BRA-04: Online 3D EPID-Based Dose Verification for Optimum Patient Safety
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Spreeuw, H; Rozendaal, R; Olaciregui-Ruiz, I
         2015-06-15
         Purpose: To develop an online 3D dose verification tool based on EPID transit dosimetry to ensure optimum patient safety in radiotherapy treatments. Methods: A new software package was developed which processes EPID portal images online using a back-projection algorithm for the 3D dose reconstruction. The package processes portal images faster than the acquisition rate of the portal imager (∼ 2.5 fps). After a portal image is acquired, the software seeks for “hot spots” in the reconstructed 3D dose distribution. A hot spot is in this study defined as a 4 cm{sup 3} cube where the average cumulative reconstructed dose exceedsmore » the average total planned dose by at least 20% and 50 cGy. If a hot spot is detected, an alert is generated resulting in a linac halt. The software has been tested by irradiating an Alderson phantom after introducing various types of serious delivery errors. Results: In our first experiment the Alderson phantom was irradiated with two arcs from a 6 MV VMAT H&N treatment having a large leaf position error or a large monitor unit error. For both arcs and both errors the linac was halted before dose delivery was completed. When no error was introduced, the linac was not halted. The complete processing of a single portal frame, including hot spot detection, takes about 220 ms on a dual hexacore Intel Xeon 25 X5650 CPU at 2.66 GHz. Conclusion: A prototype online 3D dose verification tool using portal imaging has been developed and successfully tested for various kinds of gross delivery errors. The detection of hot spots was proven to be effective for the timely detection of these errors. Current work is focused on hot spot detection criteria for various treatment sites and the introduction of a clinical pilot program with online verification of hypo-fractionated (lung) treatments.« less
      

      
      
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Carver, R; Popple, R; Benhabib, S
         
         Purpose: To evaluate the accuracy of electron dose distribution calculated by the Varian Eclipse electron Monte Carlo (eMC) algorithm for use with recent commercially available bolus electron conformal therapy (ECT). Methods: eMC-calculated electron dose distributions for bolus ECT have been compared to those previously measured for cylindrical phantoms (retromolar trigone and nose), whose axial cross sections were based on the mid-PTV CT anatomy for each site. The phantoms consisted of SR4 muscle substitute, SR4 bone substitute, and air. The bolus ECT treatment plans were imported into the Eclipse treatment planning system and calculated using the maximum allowable histories (2×10{sup 9}),more » resulting in a statistical error of <0.2%. Smoothing was not used for these calculations. Differences between eMC-calculated and measured dose distributions were evaluated in terms of absolute dose difference as well as distance to agreement (DTA). Results: Results from the eMC for the retromolar trigone phantom showed 89% (41/46) of dose points within 3% dose difference or 3 mm DTA. There was an average dose difference of −0.12% with a standard deviation of 2.56%. Results for the nose phantom showed 95% (54/57) of dose points within 3% dose difference or 3 mm DTA. There was an average dose difference of 1.12% with a standard deviation of 3.03%. Dose calculation times for the retromolar trigone and nose treatment plans were 15 min and 22 min, respectively, using 16 processors (Intel Xeon E5-2690, 2.9 GHz) on a Varian Eclipse framework agent server (FAS). Results of this study were consistent with those previously reported for accuracy of the eMC electron dose algorithm and for the .decimal, Inc. pencil beam redefinition algorithm used to plan the bolus. Conclusion: These results show that the accuracy of the Eclipse eMC algorithm is suitable for clinical implementation of bolus ECT.« less
      

      
      Cache and energy efficient algorithms for Nussinov's RNA Folding.
      PubMed
      Zhao, Chunchun; Sahni, Sartaj
         2017-12-06
         An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy. Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
      

      
      SU-E-T-423: Fast Photon Convolution Calculation with a 3D-Ideal Kernel On the GPU
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Moriya, S; Sato, M; Tachibana, H
         
         Purpose: The calculation time is a trade-off for improving the accuracy of convolution dose calculation with fine calculation spacing of the KERMA kernel. We investigated to accelerate the convolution calculation using an ideal kernel on the Graphic Processing Units (GPU). Methods: The calculation was performed on the AMD graphics hardware of Dual FirePro D700 and our algorithm was implemented using the Aparapi that convert Java bytecode to OpenCL. The process of dose calculation was separated with the TERMA and KERMA steps. The dose deposited at the coordinate (x, y, z) was determined in the process. In the dose calculation runningmore » on the central processing unit (CPU) of Intel Xeon E5, the calculation loops were performed for all calculation points. On the GPU computation, all of the calculation processes for the points were sent to the GPU and the multi-thread computation was done. In this study, the dose calculation was performed in a water equivalent homogeneous phantom with 150{sup 3} voxels (2 mm calculation grid) and the calculation speed on the GPU to that on the CPU and the accuracy of PDD were compared. Results: The calculation time for the GPU and the CPU were 3.3 sec and 4.4 hour, respectively. The calculation speed for the GPU was 4800 times faster than that for the CPU. The PDD curve for the GPU was perfectly matched to that for the CPU. Conclusion: The convolution calculation with the ideal kernel on the GPU was clinically acceptable for time and may be more accurate in an inhomogeneous region. Intensity modulated arc therapy needs dose calculations for different gantry angles at many control points. Thus, it would be more practical that the kernel uses a coarse spacing technique if the calculation is faster while keeping the similar accuracy to a current treatment planning system.« less
      

      
      SU-E-T-37: A GPU-Based Pencil Beam Algorithm for Dose Calculations in Proton Radiation Therapy
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Kalantzis, G; Leventouri, T; Tachibana, H
         
         Purpose: Recent developments in radiation therapy have been focused on applications of charged particles, especially protons. Over the years several dose calculation methods have been proposed in proton therapy. A common characteristic of all these methods is their extensive computational burden. In the current study we present for the first time, to our best knowledge, a GPU-based PBA for proton dose calculations in Matlab. Methods: In the current study we employed an analytical expression for the protons depth dose distribution. The central-axis term is taken from the broad-beam central-axis depth dose in water modified by an inverse square correction whilemore » the distribution of the off-axis term was considered Gaussian. The serial code was implemented in MATLAB and was launched on a desktop with a quad core Intel Xeon X5550 at 2.67GHz with 8 GB of RAM. For the parallelization on the GPU, the parallel computing toolbox was employed and the code was launched on a GTX 770 with Kepler architecture. The performance comparison was established on the speedup factors. Results: The performance of the GPU code was evaluated for three different energies: low (50 MeV), medium (100 MeV) and high (150 MeV). Four square fields were selected for each energy, and the dose calculations were performed with both the serial and parallel codes for a homogeneous water phantom with size 300×300×300 mm3. The resolution of the PBs was set to 1.0 mm. The maximum speedup of ∼127 was achieved for the highest energy and the largest field size. Conclusion: A GPU-based PB algorithm for proton dose calculations in Matlab was presented. A maximum speedup of ∼127 was achieved. Future directions of the current work include extension of our method for dose calculation in heterogeneous phantoms.« less
      

      
      Theorem Proving in Intel Hardware Design
      NASA Technical Reports Server (NTRS)
      O'Leary, John
         2009-01-01
         For the past decade, a framework combining model checking (symbolic trajectory evaluation) and higher-order logic theorem proving has been in production use at Intel. Our tools and methodology have been used to formally verify execution cluster functionality (including floating-point operations) for a number of Intel products, including the Pentium(Registered TradeMark)4 and Core(TradeMark)i7 processors. Hardware verification in 2009 is much more challenging than it was in 1999 - today s CPU chip designs contain many processor cores and significant firmware content. This talk will attempt to distill the lessons learned over the past ten years, discuss how they apply to today s problems, outline some future directions.
      

      
      Measurement of the decays B--> phiK and B--> phiK*.
      PubMed
      Aubert, B; Boutigny, D; Gaillard, J M; Hicheur, A; Karyotakis, Y; Lees, J P; Robbe, P; Tisserand, V; Palano, A; Chen, G P; Chen, J C; Qi, N D; Rong, G; Wang, P; Zhu, Y S; Eigen, G; Reinertsen, P L; Stugu, B; Abbott, B; Abrams, G S; Borgland, A W; Breon, A B; Brown, D N; Button-Shafer, J; Cahn, R N; Clark, A R; Fan, Q; Gill, M S; Gowdy, S J; Gritsan, A; Groysman, Y; Jacobsen, R G; Kadel, R W; Kadyk, J; Kerth, L T; Kluth, S; Kolomensky, Y G; Kral, J F; LeClerc, C; Levi, M E; Liu, T; Lynch, G; Meyer, A B; Momayezi, M; Oddone, P J; Perazzo, A; Pripstein, M; Roe, N A; Romosan, A; Ronan, M T; Shelkov, V G; Telnov, A V; Wenzel, W A; Bright-Thomas, P G; Harrison, T J; Hawkes, C M; Kirk, A; Knowles, D J; O'Neale, S W; Penny, R C; Watson, A T; Watson, N K; Deppermann, T; Koch, H; Krug, J; Kunze, M; Lewandowski, B; Peters, K; Schmuecker, H; Steinke, M; Andress, J C; Barlow, N R; Bhimji, W; Chevalier, N; Clark, P J; Cottingham, W N; De Groot, N; Dyce, N; Foster, B; Mass, A; McFall, J D; Wallom, D; Wilson, F F; Abe, K; Hearty, C; Mattison, T S; McKenna, J A; Thiessen, D; Camanzi, B; Jolly, S; McKemey, A K; Tinslay, J; Blinov, V E; Bukin, A D; Bukin, D A; Buzykaev, A R; Dubrovin, M S; Golubev, V B; Ivanchenko, V N; Korol, A A; Kravchenko, E A; Onuchin, A P; Salnikov, A A; Serednyakov, S I; Skovpen, Y I; Telnov, V I; Yushkov, A N; Lankford, A J; Mandelkern, M; McMahon, S; Stoker, D P; Ahsan, A; Arisaka, K; Buchanan, C; Chun, S; Branson, J G; MacFarlane, D B; Prell, S; Rahatlou, S; Raven, G; Sharma, V; Campagnari, C; Dahmes, B; Hart, P A; Kuznetsova, N; Levy, S L; Long, O; Lu, A; Richman, J D; Verkerke, W; Witherell, M; Yellin, S; Beringer, J; Dorfan, D E; Eisner, A M; Frey, A; Grillo, A A; Grothe, M; Heusch, C A; Johnson, R P; Kroeger, W; Lockman, W S; Pulliam, T; Sadrozinski, H; Schalk, T; Schmitz, R E; Schumm, B A; Seiden, A; Turri, M; Walkowiak, W; Williams, D C; Wilson, M G; Chen, E; Dubois-Felsmann, G P; Dvoretskii, A; Hitlin, D G; Metzler, S; Oyang, J; Porter, F C; Ryd, A; Samuel, A; Weaver, M; Yang, S; Zhu, R Y; Devmal, S; Geld, T L; Jayatilleke, S; Mancinelli, G; Meadows, B T; Sokoloff, M D; Bloom, P; Fahey, S; Ford, W T; Gaede, F; Johnson, D R; Michael, A K; Nauenberg, U; Olivas, A; Park, H; Rankin, P; Roy, J; Sen, S; Smith, J G; van Hoek, W C; Wagner, D L; Blouw, J; Harton, J L; Krishnamurthy, M; Soffer, A; Toki, W H; Wilson, R J; Zhang, J; Brandt, T; Brose, J; Colberg, T; Dahlinger, G; Dickopp, M; Dubitzky, R S; Maly, E; Müller-Pfefferkorn, R; Otto, S; Schubert, K R; Schwierz, R; Spaan, B; Wilden, L; Behr, L; Bernard, D; Bonneaud, G R; Brochard, F; Cohen-Tanugi, J; Ferrag, S; Roussot, E; T'Jampens, S; Thiebaux, C; Vasileiadis, G; Verderi, M; Anjomshoaa, A; Bernet, R; Di Lodovico, F; Khan, A; Muheim, F; Playfer, S; Swain, J E; Falbo, M; Bozzi, C; Dittongo, S; Folegani, M; Piemontese, L; Treadwell, E; Anulli, F; Baldini-Ferroli, R; Calcaterra, A; de Sangro, R; Falciai, D; Finocchiaro, G; Patteri, P; Peruzzi, I M; Piccolo, M; Xie, Y; Zallo, A; Bagnasco, S; Buzzo, A; Contri, R; Crosetti, G; Fabbricatore, P; Farinon, S; Lo Vetere, M; Macri, M; Monge, M R; Musenich, R; Pallavicini, M; Parodi, R; Passaggio, S; Pastore, F C; Patrignani, C; Pia, M G; Priano, C; Robutti, E; Santroni, A; Morii, M; Bartoldus, R; Dignan, T; Hamilton, R; Mallik, U; Cochran, J; Crawley, H B; Fischer, P A; Lamsa, J; Meyer, W T; Rosenberg, E I; Benkebil, M; Grosdidier, G; Hast, C; Höcker, A; Lacker, H M; LePeltier, V; Lutz, A M; Plaszczynski, S; Schune, M H; Trincaz-Duvoid, S; Valassi, A; Wormser, G; Bionta, R M; Brigljevic, V; Fackler, O; Fujino, D; Lange, D J; Mugge, M; Shi, X; van Bibber, K; Wenaus, T J; Wright, D M; Wuest, C R; Carroll, M; Fry, J R; Gabathuler, E; Gamet, R; George, M; Kay, M; Payne, D J; Sloane, R J; Touramanis, C; Aspinwall, M L; Bowerman, D A; Dauncey, P D; Egede, U; Eschrich, I; Gunawardane, N J; Martin, R; Nash, J A; Sanders, P; Smith, D; Azzopardi, D E; Back, J J; Dixon, P; Harrison, P F; Potter, R J; Shorthouse, H W; Strother, P; Vidal, P B; Williams, M I; Cowan, G; George, S; Green, M G; Kurup, A; Marker, C E; McGrath, P; McMahon, T R; Ricciardi, S; Salvatore, F; Scott, I; Vaitsas, G; Brown, D; Davis, C L; Allison, J; Barlow, R J; Boyd, J T; Forti, A; Fullwood, J; Jackson, F; Lafferty, G D; Savvas, N; Simopoulos, E T; Weatherall, J H; Farbin, A; Jawahery, A; Lillard, V; Olsen, J; Roberts, D A; Schieck, J R; Blaylock, G; Dallapiccola, C; Flood, K T; Hertzbach, S S; Kofler, R; Lin, C S; Moore, T B; Staengle, H; Willocq, S; Wittlin, J; Brau, B; Cowan, R; Sciolla, G; Taylor, F; Yamamoto, R K; Britton, D I; Milek, M; Patel, P M; Trischuk, J; Lanni, F; Palombo, F; Bauer, J M; Booke, M; Cremaldi, L; Eschenburg, V; Kroeger, R; Reidy, J; Sanders, D A; Summers, D J; Martin, J P; Nief, J Y; Seitz, R; Taras, P; Zacek, V; Nicholson, H; Sutton, C S; Cartaro, C; Cavallo, N; De Nardo, G; Fabozzi, F; Gatto, C; Lista, L; Paolucci, P; Piccolo, D; Sciacca, C; LoSecco, J M; Alsmiller, J R; Gabriel, T A; Handler, T; Brau, J; Frey, R; Iwasaki, M; Sinev, N B; Strom, D; Colecchia, F; Dal Corso, F; Dorigo, A; Galeazzi, F; Margoni, M; Michelon, G; Morandin, M; Posocco, M; Rotondo, M; Simonetto, F; Stroili, R; Torassa, E; Voci, C; Benayoun, M; Briand, H; Chauveau, J; David, P; De La Vaissière, C; Del Buono, L; Hamon, O; Le Diberder, F; Leruste, P; Lory, J; Roos, L; Stark, J; Versillé, S; Manfredi, P F; Re, V; Speziali, V; Frank, E D; Gladney, L; Guo, Q H; Panetta, J H; Angelini, C; Batignani, G; Bettarini, S; Bondioli, M; Carpinelli, M; Forti, F; Giorgi, M A; Lusiani, A; Martinez-Vidal, F; Morganti, M; Neri, N; Paoloni, E; Rama, M; Rizzo, G; Sandrelli, F; Simi, G; Triggiani, G; Walsh, J; Haire, M; Judd, D; Paick, K; Turnbull, L; Wagoner, D E; Albert, J; Bula, C; Lu, C; McDonald, K T; Miftakov, V; Schaffner, S F; Smith, A J; Tumanov, A; Varnes, E W; Cavoto, G; del Re, D; Faccini, R; Ferrarotto, F; Ferroni, F; Fratini, K; Lamanna, E; Leonardi, E; Mazzoni, M A; Morganti, S; Piredda, G; Safai Tehrani, F; Serra, M; Voena, C; Christ, S; Waldi, R; Adye, T; Franek, B; Geddes, N I; Gopal, G P; Xella, S M; Aleksan, R; De Domenico, G; Emery, S; Gaidot, A; Ganzhur, S F; Giraud, P F; Hamel De Monchenault, G; Kozanecki, W; Langer, M; London, G W; Mayer, B; Serfass, B; Vasseur, G; Yeche, C; Zito, M; Copty, N; Purohit, M V; Singh, H; Yumiceva, F X; Adam, I; Anthony, P L; Aston, D; Baird, K; Bartelt, J; Bloom, E; Boyarski, A M; Bulos, F; Calderini, G; Claus, R; Convery, M R; Coupal, D P; Coward, D H; Dorfan, J; Doser, M; Dunwoodie, W; Field, R C; Glanzman, T; Godfrey, G L; Grosso, P; Himel, T; Huffer, M E; Innes, W R; Jessop, C P; Kelsey, M H; Kim, P; Kocian, M L; Langenegger, U; Leith, D W; Luitz, S; Luth, V; Lynch, H L; Manzin, G; Marsiske, H; Menke, S; Messner, R; Moffeit, K C; Mount, R; Muller, D R; O'Grady, C P; Petrak, S; Quinn, H; Ratcliff, B N; Robertson, S H; Rochester, L S; Roodman, A; Schietinger, T; Schindler, R H; Schwiening, J; Serbo, V V; Snyder, A; Soha, A; Spanier, S M; Stahl, A; Stelzer, J; Su, D; Sullivan, M K; Talby, M; Tanaka, H A; Trunov, A; Va'vra, J; Wagner, S R; Weinstein, A J; Wisniewski, W J; Young, C C; Burchat, P R; Cheng, C H; Kirkby, D; Meyer, T I; Roat, C; De Silva, A; Henderson, R; Bugg, W; Cohn, H; Hart, E; Weidemann, A W; Benninger, T; Izen, J M; Kitayama, I; Lou, X C; Turcotte, M; Bianchi, F; Bona, M; Di Girolamo, B; Gamba, D; Smol, A; Zanin, D; Bosisio, L; Della Ricca, G; Lanceri, L; Pompili, A; Poropat, P; Prest, M; Vallazza, E; Vuagnin, G; Panvini, R S; Brown, C M; Kowalewski, R; Roney, J M; Band, H R; Charles, E; Dasu, S; Elmer, P; Hu, H; Johnson, J R; Liu, R; Nielsen, J; Orejudos, W; Pan, Y; Prepost, R; Scott, I J; Sekula, S J; von Wimmersperg-Toeller, J H; Wu, S L; Yu, Z; Zobering, H; Kordich, T M; Neal, H
         2001-10-08
         We have observed the decays B--> phiK and phiK(*) in a sample of over 45 million B mesons collected with the BABAR detector at the PEP-II collider. The measured branching fractions are B(B+--> phiK+) = (7.7(+1.6)(-1.4)+/-0.8)x10(-6), B(B0--> phiK0) = (8.1(+3.1)(-2.5)+/-0.8)x10(-6), B(B+--> phiK(*+)) = (9.7(+4.2)(-3.4)+/-1.7)x10(-6), and B(B0--> phiK(*0)) = (8.7(+2.5)(-2.1)+/-1.1)x10(-6). We also report the upper limit B(B+--> phipi(+))<1.4x10(-6) ( 90% C.L.).
      

      
      Effect of dietary phosphorus, phytase, and 25-hydroxycholecalciferol on broiler chicken bone mineralization, litter phosphorus, and processing yields.
      PubMed
      Angel, R; Saylor, W W; Mitchell, A D; Powers, W; Applegate, T J
         2006-07-01
         Three floor pen experiments (Exp) were conducted to evaluate low nonphytin P (NPP) concentrations and the NPP sparing effect of phytase (PHY) and 25-hydroxycholecalciferol (25D) on bone mineralization, bone breaking during commercial processing, litter P, and water-soluble P (WSP) concentrations. Tested treatments (TRT) were control, National Research Council NPP; University of Maryland (UMD) NPP; UMD + PHY, UMD NPP reduced by 0.064% NPP + 600 U of PHY/kg; UMD + PHY + 25D, UMD NPP reduced by 0.090% NPP + 600 U of PHY and 70 microg of 25D/kg; control + PHY mimicked the industry practice of diets by 0.1% when PHY is added; and negative control with 90% UMD NPP concentrations. UMD + PHY and control + PHY diets contained 600 U of PHY/kg, and UMD + PHY + 25D contained 600 U of PHY + 70 microg of 25D/kg. Performance results were presented separately. After each Exp, litter P and WSP were determined, and bone measurements were obtained on 8 or 10 broilers per pen. Tested TRT did not affect broiler BW. Femur ash weight of broilers fed the UMD and UMD + PHY + 25D was lower in all Exp compared with that of broilers fed the control diet. Femur ash was similar for control and UMD + PHY broilers, yet averaged over all Exp, UMD + PHY broilers consumed 39% less NPP and required less NPP per gram of femur ash than those on the control (4.87 and 7.77 g of NPP/g of ash, Exp 3). At the end of Exp 3, broilers were processed in a commercial facility. Despite reductions in NPP intake and bone mineralization, no differences were observed in measurements of economic importance (parts lost, carcass yield, and incidence of broken bones). The P excretion per bird was lowest for birds fed the UMD + PHY + 25D diet followed by those fed the UMD + PHY and negative control diets (10.44, 12.00, and 13.78 g of P/bird, respectively) and were highest for those fed the control diet (19.55 g of P/bird). These results suggest that feeding diets low in P together with PHY and 25D will not affect performance or increase losses at processing while resulting in improved P retention and reductions in P and WSP excreted.
      

        
       
          

«

15
      16
      17
   18
      19
      »

          
        

     

   

   
       
            
              
          

«

16
      17
      18
   19
      20
      »

          
        

           
           
             
               
      
      75 FR 48338 - Intel Corporation; Analysis of Proposed Consent Order to Aid Public Comment
      Federal Register 2010, 2011, 2012, 2013, 2014
      
         2010-08-10
         ... integrated into chipsets as well as discrete graphics cards. NVIDIA has been at the forefront of developing... to connect peripheral products such as discrete GPUs to the CPU. A bus is a connection point between... platform. Intel's commitment to maintain an open PCIe bus will provide discrete graphics manufacturers...
      

      
      Guide to Evaluating the Essentials Training
      ERIC Educational Resources Information Center
      Education Development Center, Inc, 2006
         2006-01-01
         Countries that begin implementing the Intel[R] Teach to the Future Essentials course after March of 2006 are required to collect data using the Intel[R] Teach Essentials End of Training Survey to help support program improvement. This End of Training evaluation toolkit provides guidelines on: (1) End of Training Survey administration; (2) The…
      

      
      Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.
         
         Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path. Our evaluation consists of amore » cross section of convolutional neural net workloads: CifarNet, CaffeNet, AlexNet and GoogleNet topologies using the Cifar10 and ImageNet datasets. The workloads are vendor optimized for each architecture. GPUs provide the highest overall raw performance. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and KNL can be competitive when considering performance/watt. Furthermore, NVLink is critical to GPU scaling.« less
      

      
      Utilising the Intel RealSense Camera for Measuring Health Outcomes in Clinical Research.
      PubMed
      Siena, Francesco Luke; Byrom, Bill; Watts, Paul; Breedon, Philip
         2018-02-05
         Applications utilising 3D Camera technologies for the measurement of health outcomes in the health and wellness sector continues to expand. The Intel® RealSense™ is one of the leading 3D depth sensing cameras currently available on the market and aligns itself for use in many applications, including robotics, automation, and medical systems. One of the most prominent areas is the production of interactive solutions for rehabilitation which includes gait analysis and facial tracking. Advancements in depth camera technology has resulted in a noticeable increase in the integration of these technologies into portable platforms, suggesting significant future potential for pervasive in-clinic and field based health assessment solutions. This paper reviews the Intel RealSense technology's technical capabilities and discusses its application to clinical research and includes examples where the Intel RealSense camera range has been used for the measurement of health outcomes. This review supports the use of the technology to develop robust, objective movement and mobility-based endpoints to enable accurate tracking of the effects of treatment interventions in clinical trials.
      

      
      Observation of {chi}{sub c1} Decays into Vector Meson Pairs {phi}{phi}, {omega}{omega}, and {omega}{phi}
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Ablikim, M.; An, Z. H.; Bai, J. Z.
         
         Using (106{+-}4)x10{sup 6} {psi}(3686) events accumulated with the BESIII detector at the BEPCII e{sup +}e{sup -} collider, we present the first measurement of decays of {chi}{sub c1} to vector meson pairs {phi}{phi}, {omega}{omega}, and {omega}{phi}. The branching fractions are measured to be (4.4{+-}0.3{+-}0.5)x10{sup -4}, (6.0{+-}0.3{+-}0.7)x10{sup -4}, and (2.2{+-}0.6{+-}0.2)x10{sup -5}, for {chi}{sub c1}{yields}{phi}{phi}, {omega}{omega}, and {omega}{phi}, respectively, which indicates that the hadron helicity selection rule is significantly violated in {chi}{sub cJ} decays. In addition, the measurement of {chi}{sub cJ}{yields}{omega}{phi} provides the first indication of the rate of doubly OZI-suppressed {chi}{sub cJ} decay. Finally, we present improved measurements for the branching fractionsmore » of {chi}{sub c0} and {chi}{sub c2} to vector meson pairs.« less
      

      
      Application of high-performance computing to numerical simulation of human movement
      NASA Technical Reports Server (NTRS)
      Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.
         1995-01-01
         We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
      

      
      Bridging FPGA and GPU technologies for AO real-time control
      NASA Astrophysics Data System (ADS)
      Perret, Denis; Lainé, Maxime; Bernard, Julien; Gratadour, Damien; Sevin, Arnaud
         2016-07-01
         Our team has developed a common environment for high performance simulations and real-time control of AO systems based on the use of Graphics Processors Units in the context of the COMPASS project. Such a solution, based on the ability of the real time core in the simulation to provide adequate computing performance, limits the cost of developing AO RTC systems and makes them more scalable. A code developed and validated in the context of the simulation may be injected directly into the system and tested on sky. Furthermore, the use of relatively low cost components also offers significant advantages for the system hardware platform. However, the use of GPUs in an AO loop comes with drawbacks: the traditional way of offloading computation from CPU to GPUs - involving multiple copies and unacceptable overhead in kernel launching - is not well suited in a real time context. This last application requires the implementation of a solution enabling direct memory access (DMA) to the GPU memory from a third party device, bypassing the operating system. This allows this device to communicate directly with the real-time core of the simulation feeding it with the WFS camera pixel stream. We show that DMA between a custom FPGA-based frame-grabber and a computation unit (GPU, FPGA, or Coprocessor such as Xeon-phi) across PCIe allows us to get latencies compatible with what will be needed on ELTs. As a fine-grained synchronization mechanism is not yet made available by GPU vendors, we propose the use of memory polling to avoid interrupts handling and involvement of a CPU. Network and Vision protocols are handled by the FPGA-based Network Interface Card (NIC). We present the results we obtained on a complete AO loop using camera and deformable mirror simulators.
      

      
      AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams.
      PubMed
      Chen, Qiuwen; Luley, Ryan; Wu, Qing; Bishop, Morgan; Linderman, Richard W; Qiu, Qinru
         2018-05-01
         The evolution of high performance computing technologies has enabled the large-scale implementation of neuromorphic models and pushed the research in computational intelligence into a new era. Among the machine learning applications, unsupervised detection of anomalous streams is especially challenging due to the requirements of detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research topic. In this paper, we propose anomaly recognition and detection (AnRAD), a bioinspired detection framework that performs probabilistic inferences. We analyze the feature dependency and develop a self-structuring method that learns an efficient confabulation network using unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base using streaming data. Compared with several existing anomaly detection approaches, our method provides competitive detection quality. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementations of the detection algorithm on the graphic processing unit and the Xeon Phi coprocessor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor. The framework provides real-time service to concurrent data streams within diversified knowledge contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle behavior detection, the framework is able to monitor up to 16000 vehicles (data streams) and their interactions in real time with a single commodity coprocessor, and uses less than 0.2 ms for one testing subject. Finally, the detection network is ported to our spiking neural network simulator to show the potential of adapting to the emerging neuromorphic architectures.
      

      
      Phosphite, an analog of phosphate, suppresses the coordinated expression of genes under phosphate starvation.
      PubMed
      Varadarajan, Deepa K; Karthikeyan, Athikkattuvalasu S; Matilda, Paino Durzo; Raghothama, Kashchandra G
         2002-07-01
         Phosphate (Pi) and its analog phosphite (Phi) are acquired by plants via Pi transporters. Although the uptake and mobility of Phi and Pi are similar, there is no evidence suggesting that plants can utilize Phi as a sole source of phosphorus. Phi is also known to interfere with many of the Pi starvation responses in plants and yeast (Saccharomyces cerevisiae). In this study, effects of Phi on plant growth and coordinated expression of genes induced by Pi starvation were analyzed. Phi suppressed many of the Pi starvation responses that are commonly observed in plants. Enhanced root growth and root to shoot ratio, a hallmark of Pi stress response, was strongly inhibited by Phi. The negative effects of Phi were not obvious in plants supplemented with Pi. The expression of Pi starvation-induced genes such as LePT1, LePT2, AtPT1, and AtPT2 (high-affinity Pi transporters); LePS2 (a novel acid phosphatase); LePS3 and TPSI1 (novel genes); and PAP1 (purple acid phosphatase) was suppressed by Phi in plants and cell cultures. Expression of luciferase reporter gene driven by the Pi starvation-induced AtPT2 promoter was also suppressed by Phi. These analyses showed that suppression of Pi starvation-induced genes is an early response to addition of Phi. These data also provide evidence that Phi interferes with gene expression at the level of transcription. Synchronized suppression of multiple Pi starvation-induced genes by Phi points to its action on the early molecular events, probably signal transduction, in Pi starvation response.
      

      
      USE OF THE PROSTATE HEALTH INDEX FOR DETECTION OF PROSTATE CANCER: RESULTS FROM A LARGE ACADEMIC PRACTICE
      PubMed Central
      Tosoian, Jeffrey J.; Druskin, Sasha C.; Andreas, Darian; Mullane, Patrick; Chappidi, Meera; Joo, Sarah; Ghabili, Kamyar; Agostino, Joseph; Macura, Katarzyna J.; Carter, H. Ballentine; Schaeffer, Edward M.; Partin, Alan W.; Sokoll, Lori J.; Ross, Ashley E.
         2016-01-01
         BACKGROUND The Prostate Health Index (phi) outperforms PSA and other PSA derivatives for the diagnosis of prostate cancer (PCa). The impact of phi testing in the real-world clinical setting has not been previously assessed. METHODS In a single, large, academic center, phi was tested in 345 patients presenting for diagnostic evaluation for PCa. Findings on prostate biopsy (including Grade Group [GG], defined as GG1: Gleason score [GS] 6, GG2: GS 3+4=7, GG3: GS 4+3=7, GG4: GS 8, and GG5: GS 9-10), magnetic resonance imaging (MRI), and radical prostatectomy (RP) were prospectively recorded. Biopsy rates and outcomes were compared to a contemporary cohort that did not undergo phi testing (n=1318). RESULTS Overall, 39% of men with phi testing underwent prostate biopsy. No men with phi<19.6 were diagnosed with PCa, and only 3 men with phi<27 had cancer of GG≥2. Phi was superior to PSA for the prediction of any PCa (AUC 0.72 vs. 0.47) and GG≥2 PCa (AUC 0.77 vs. 0.53) on prostate biopsy. Among men undergoing MRI and phi, no men with phi<27 and PI-RADS≤3 had GG≥2 cancer. For those men proceeding to RP, increasing phi was associated with higher pathologic GG (p=0.002) and stage (p=0.001). Compared to patients who did not undergo phi testing, the use of phi was associated with a 9% reduction in the rate of prostate biopsy (39% vs. 48%; p<0.001). Importantly, the reduction in biopsy among the phi population was secondary to decreased incidence of negative (8%) and GG1 (1%) biopsies, while the proportion of biopsies detecting GG≥2 cancers remained unchanged. CONCLUSIONS In this large, real-time clinical experience, phi outperformed PSA alone, was associated with high-grade PCa, and provided complementary information to MRI. Incorporation of phi into clinical practice reduced the rate of unnecessary biopsies without changing the frequency of detection of higher grade cancers. PMID:28117387
      

      
      Phytochromes play a role in phototropism and gravitropism in Arabidopsis roots
      NASA Astrophysics Data System (ADS)
      Correll, Melanie J.; Coveney, Katrina M.; Raines, Steven V.; Mullen, Jack L.; Hangarter, Roger P.; Kiss, John Z.
         2003-05-01
         Phototropism as well as gravitropism plays a role in the oriented growth of roots in flowering plants. In blue or white light, roots exhibit negative phototropism, but red light induces positive phototropism in Arabidopsis roots. Phytochrome A (phyA) and phyB mediate the positive red-light-based photoresponse in roots since single mutants (and the double phyAB mutant) were severely impaired in this response. In blue-light-based negative phototropism, phyA and phyAB (but not phyB) were inhibited in the response relative to the WT. In root gravitropism, phyB and phyAB (but not phyA) were inhibited in the response compared to the WT. The differences observed in tropistic responses were not due to growth limitations since the growth rates among all the mutants tested were not significantly different from that of the WT. Thus, our study shows that the blue-light and red-light systems interact in roots and that phytochrome plays a key role in plant development by integrating multiple environmental stimuli.
      

      
      Multiple phytochromes are involved in red-light-induced enhancement of first-positive phototropism in arabidopsis thaliana
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Janoudi, A.K.; Gordon, W.R.; Poff, K.L.
         1997-03-01
         The amplitude of phototropic curvature to blue light is enhanced by a prior exposure of seedlings to red light. This enhancement is mediated by phytochrome. Fluence-response relationships have been constructed for red-light-induced enhancement in the phytochrome A (phyA) null mutant, the phytochrome B- (phyB) deficient mutant, and in two transgenic lines of Arabidopsis thaliana that overexpress either phyA or phyB. These fluence-response relationships demonstrate the existence of two responses in enhancement, a response in the very-low-to-low-fluence range, and a response in the high-fluence range. Only the response in the high-fluence range is present in the phyA null mutant. In contrast,more » the phyB-deficient mutant is indistinguishable from the wild-type parent in red-light responsiveness. These data indicate that phyA is necessary for the very-low-to-low but not the high-fluence response, and that phyB is not necessary for either response range. Based on these results, the high-fluence response, if controlled by a single phytochrome, must be controlled by a phytochrome other than phyA or phyB. Overexpression of phyA has a negative effect and overexpression of phyB has an enhancing effect in the high fluence range. These results suggest that overexpression of either phytochrome perturbs the function of the endogenous photoreceptor system in unpreditable fashion. 25 refs., 3 figs.« less
      

      
      SUMOylation of phytochrome-B negatively regulates light-induced signaling in Arabidopsis thaliana
      PubMed Central
      Sadanandom, Ari; Ádám, Éva; Orosa, Beatriz; Viczián, András; Klose, Cornelia; Zhang, Cunjin; Josse, Eve-Marie; Kozma-Bognár, László; Nagy, Ferenc
         2015-01-01
         The red/far red light absorbing photoreceptor phytochrome-B (phyB) cycles between the biologically inactive (Pr, λmax, 660 nm) and active (Pfr; λmax, 730 nm) forms and functions as a light quality and quantity controlled switch to regulate photomorphogenesis in Arabidopsis. At the molecular level, phyB interacts in a conformation-dependent fashion with a battery of downstream regulatory proteins, including PHYTOCHROME INTERACTING FACTOR transcription factors, and by modulating their activity/abundance, it alters expression patterns of genes underlying photomorphogenesis. Here we report that the small ubiquitin-like modifier (SUMO) is conjugated (SUMOylation) to the C terminus of phyB; the accumulation of SUMOylated phyB is enhanced by red light and displays a diurnal pattern in plants grown under light/dark cycles. Our data demonstrate that (i) transgenic plants expressing the mutant phyBLys996Arg-YFP photoreceptor are hypersensitive to red light, (ii) light-induced SUMOylation of the mutant phyB is drastically decreased compared with phyB-YFP, and (iii) SUMOylation of phyB inhibits binding of PHYTOCHROME INTERACTING FACTOR 5 to phyB Pfr. In addition, we show that OVERLY TOLERANT TO SALT 1 (OTS1) de-SUMOylates phyB in vitro, it interacts with phyB in vivo, and the ots1/ots2 mutant is hyposensitive to red light. Taken together, we conclude that SUMOylation of phyB negatively regulates light signaling and it is mediated, at least partly, by the action of OTS SUMO proteases. PMID:26283376
      

      
      Game-Based Experiential Learning in Online Management Information Systems Classes Using Intel's IT Manager 3
      ERIC Educational Resources Information Center
      Bliemel, Michael; Ali-Hassan, Hossam
         2014-01-01
         For several years, we used Intel's flash-based game "IT Manager 3: Unseen Forces" as an experiential learning tool, where students had to act as a manager making real-time prioritization decisions about repairing computer problems, training and upgrading systems with better technologies as well as managing increasing numbers of technical…
      

      
      Newsgroups, Activist Publics, and Corporate Apologia: The Case of Intel and Its Pentium Chip.
      ERIC Educational Resources Information Center
      Hearit, Keith Michael
         1999-01-01
         Applies J. Grunig's theory of publics to the phenomenon of Internet newsgroups using the case of the flawed Intel Pentium chip. Argues that technology facilitates the rapid movement of publics from the theoretical construct stage to the active stage. Illustrates some of the difficulties companies face in establishing their identity in cyberspace.…
      

      
      Mask manufacturing improvement through capability definition and bottleneck line management
      NASA Astrophysics Data System (ADS)
      Strott, Al
         1994-02-01
         In 1989, Intel's internal mask operation limited itself to research and development activities and re-inspection and pellicle application of externally manufactured masks. Recognizing the rising capital cost of mask manufacturing at the leading edge, Intel's Mask Operation management decided to offset some of these costs by manufacturing more masks internally. This was the beginning of the challenge they set to manufacture at least 50% of Intel's mask volume internally, at world class performance levels. The first step in responding to this challenge was the completion of a comprehensive operation capability analysis. A series of bottleneck improvements by focus teams resulted in an average cycle time improvement to less than five days on all product and less than two days on critical products.
      

      
      Multisite light-induced phosphorylation of the transcription factor PIF3 is necessary for both its rapid degradation and concomitant negative feedback modulation of photoreceptor phyB levels in Arabidopsis
      USDA-ARS?s Scientific Manuscript database
      
         
         Plants constantly monitor informational light signals using sensory photoreceptors, which include the phytochrome (phy) family (phyA to phyE), and adjust their growth and development accordingly. Following light-induced nuclear translocation, photoactivated phy molecules bind to and induce rapid pho...
      

      
      [Genetic study of bacteriophage phi81. I. Isolation, study of complementation and preliminary mapping of amber-mutants of bacteriophage phi81].
      PubMed
      Sineokiĭ, S P; Pogosov, V Z; Iankovskiĭ, N K; Krylov, V N
         1976-01-01
         123 Amber mutants of lambdoid bacteriophage phi81 are isolated and distributed into 19 complementation groups. Deletion mapping made possible to locate 5 gene groups on the genetic map of bacteriophage phi81 and to determine a region of possible location of mm' sticky ends on the prophage genetic map. A gene of phage phi81 is localized, which controls the adsorption specificity, and which functional similarity to a respective gene of phage phi80 is demonstrated.
      

      
      Evaluation of PHI Hunter in Natural Language Processing Research.
      PubMed
      Redd, Andrew; Pickard, Steve; Meystre, Stephane; Scehnet, Jeffrey; Bolton, Dan; Heavirland, Julia; Weaver, Allison Lynn; Hope, Carol; Garvin, Jennifer Hornung
         2015-01-01
         We introduce and evaluate a new, easily accessible tool using a common statistical analysis and business analytics software suite, SAS, which can be programmed to remove specific protected health information (PHI) from a text document. Removal of PHI is important because the quantity of text documents used for research with natural language processing (NLP) is increasing. When using existing data for research, an investigator must remove all PHI not needed for the research to comply with human subjects' right to privacy. This process is similar, but not identical, to de-identification of a given set of documents. PHI Hunter removes PHI from free-form text. It is a set of rules to identify and remove patterns in text. PHI Hunter was applied to 473 Department of Veterans Affairs (VA) text documents randomly drawn from a research corpus stored as unstructured text in VA files. PHI Hunter performed well with PHI in the form of identification numbers such as Social Security numbers, phone numbers, and medical record numbers. The most commonly missed PHI items were names and locations. Incorrect removal of information occurred with text that looked like identification numbers. PHI Hunter fills a niche role that is related to but not equal to the role of de-identification tools. It gives research staff a tool to reasonably increase patient privacy. It performs well for highly sensitive PHI categories that are rarely used in research, but still shows possible areas for improvement. More development for patterns of text and linked demographic tables from electronic health records (EHRs) would improve the program so that more precise identifiable information can be removed. PHI Hunter is an accessible tool that can flexibly remove PHI not needed for research. If it can be tailored to the specific data set via linked demographic tables, its performance will improve in each new document set.
      

      
      Contribution of elevated intracellular calcium to pulmonary arterial myocyte alkalinization during chronic hypoxia
      PubMed Central
      Luke, Trevor; Shimoda, Larissa A.
         2016-01-01
         Abstract In the lung, exposure to chronic hypoxia (CH) causes pulmonary hypertension, a debilitating disease. Development of this condition arises from increased muscularity and contraction of pulmonary vessels, associated with increases in pulmonary arterial smooth muscle cell (PASMC) intracellular pH (pHi) and Ca2+ concentration ([Ca2+]i). In this study, we explored the interaction between pHi and [Ca2+]i in PASMCs from rats exposed to normoxia or CH (3 weeks, 10% O2). PASMC pHi and [Ca2+]i were measured with fluorescent microscopy and the dyes BCECF and Fura-2. Both pHi and [Ca2+]i levels were elevated in PASMCs from hypoxic rats. Exposure to KCl increased [Ca2+]i and pHi to a similar extent in normoxic and hypoxic PASMCs. Conversely, removal of extracellular Ca2+ or blockade of Ca2+ entry with NiCl2 or SKF 96365 decreased [Ca2+]i and pHi only in hypoxic cells. Neither increasing pHi with NH4Cl nor decreasing pHi by removal of bicarbonate impacted PASMC [Ca2+]i. We also examined the roles of Na+/Ca2+ exchange (NCX) and Na+/H+ exchange (NHE) in mediating the elevated basal [Ca2+]i and Ca2+-dependent changes in PASMC pHi. Bepridil, dichlorobenzamil, and KB-R7943, which are NCX inhibitors, decreased resting [Ca2+]i and pHi only in hypoxic PASMCs and blocked the changes in pHi induced by altering [Ca2+]i. Exposure to ethyl isopropyl amiloride, an NHE inhibitor, decreased resting pHi and prevented changes in pHi due to changing [Ca2+]i. Our findings indicate that, during CH, the elevation in basal [Ca2+]i may contribute to the alkaline shift in pHi in PASMCs, likely via mechanisms involving reverse-mode NCX and NHE. PMID:27076907
      

        
       
          

«

16
      17
      18
   19
      20
      »

          
        

     

   

   
       
            
              
          

«

17
      18
      19
   20
      21
      »

          
        

           
           
             
               
      
      Clinical utility of the Prostate Health Index (phi) for biopsy decision management in a large group urology practice setting.
      PubMed
      White, Jay; Shenoy, B Vittal; Tutrone, Ronald F; Karsh, Lawrence I; Saltzstein, Daniel R; Harmon, William J; Broyles, Dennis L; Roddy, Tamra E; Lofaro, Lori R; Paoli, Carly J; Denham, Dwight; Reynolds, Mark A
         2018-04-01
         Deciding when to biopsy a man with non-suspicious DRE findings and tPSA in the 4-10 ng/ml range can be challenging, because two-thirds of such biopsies are typically found to be benign. The Prostate Health Index (phi) exhibits significantly improved diagnostic accuracy for prostate cancer detection when compared to tPSA and %fPSA, however only one published study to date has investigated its impact on biopsy decisions in clinical practice. An IRB approved observational study was conducted at four large urology group practices using a physician reported two-part questionnaire. Physician recommendations were recorded before and after receiving the phi test result. A historical control group was queried from each site's electronic medical records for eligible men who were seen by the same participating urologists prior to the implementation of the phi test in their practice. 506 men receiving a phi test were prospectively enrolled and 683 men were identified for the historical control group (without phi). Biopsy and pathological findings were also recorded for both groups. Men receiving a phi test showed a significant reduction in biopsy procedures performed when compared to the historical control group (36.4% vs. 60.3%, respectively, P < 0.0001). Based on questionnaire responses, the phi score impacted the physician's patient management plan in 73% of cases, including biopsy deferrals when the phi score was low, and decisions to perform biopsies when the phi score indicated an intermediate or high probability of prostate cancer (phi ≥36). phi testing significantly impacted the physician's biopsy decision for men with tPSA in the 4-10 ng/ml range and non-suspicious DRE findings. Appropriate utilization of phi resulted in a significant reduction in biopsy procedures performed compared to historical patients seen by the same participating urologists who would have met enrollment eligibility but did not receive a phi test.
      

      
      Evaluation of PHI Hunter in Natural Language Processing Research
      PubMed Central
      Redd, Andrew; Pickard, Steve; Meystre, Stephane; Scehnet, Jeffrey; Bolton, Dan; Heavirland, Julia; Weaver, Allison Lynn; Hope, Carol; Garvin, Jennifer Hornung
         2015-01-01
         Objectives We introduce and evaluate a new, easily accessible tool using a common statistical analysis and business analytics software suite, SAS, which can be programmed to remove specific protected health information (PHI) from a text document. Removal of PHI is important because the quantity of text documents used for research with natural language processing (NLP) is increasing. When using existing data for research, an investigator must remove all PHI not needed for the research to comply with human subjects’ right to privacy. This process is similar, but not identical, to de-identification of a given set of documents. Materials and methods PHI Hunter removes PHI from free-form text. It is a set of rules to identify and remove patterns in text. PHI Hunter was applied to 473 Department of Veterans Affairs (VA) text documents randomly drawn from a research corpus stored as unstructured text in VA files. Results PHI Hunter performed well with PHI in the form of identification numbers such as Social Security numbers, phone numbers, and medical record numbers. The most commonly missed PHI items were names and locations. Incorrect removal of information occurred with text that looked like identification numbers. Discussion PHI Hunter fills a niche role that is related to but not equal to the role of de-identification tools. It gives research staff a tool to reasonably increase patient privacy. It performs well for highly sensitive PHI categories that are rarely used in research, but still shows possible areas for improvement. More development for patterns of text and linked demographic tables from electronic health records (EHRs) would improve the program so that more precise identifiable information can be removed. Conclusions PHI Hunter is an accessible tool that can flexibly remove PHI not needed for research. If it can be tailored to the specific data set via linked demographic tables, its performance will improve in each new document set. PMID:26807078
      

      
      Nuclear phytochrome A signaling promotes phototropism in Arabidopsis.
      PubMed
      Kami, Chitose; Hersch, Micha; Trevisan, Martine; Genoud, Thierry; Hiltbrunner, Andreas; Bergmann, Sven; Fankhauser, Christian
         2012-02-01
         Phototropin photoreceptors (phot1 and phot2 in Arabidopsis thaliana) enable responses to directional light cues (e.g., positive phototropism in the hypocotyl). In Arabidopsis, phot1 is essential for phototropism in response to low light, a response that is also modulated by phytochrome A (phyA), representing a classical example of photoreceptor coaction. The molecular mechanisms underlying promotion of phototropism by phyA remain unclear. Most phyA responses require nuclear accumulation of the photoreceptor, but interestingly, it has been proposed that cytosolic phyA promotes phototropism. By comparing the kinetics of phototropism in seedlings with different subcellular localizations of phyA, we show that nuclear phyA accelerates the phototropic response, whereas in the fhy1 fhl mutant, in which phyA remains in the cytosol, phototropic bending is slower than in the wild type. Consistent with this data, we find that transcription factors needed for full phyA responses are needed for normal phototropism. Moreover, we show that phyA is the primary photoreceptor promoting the expression of phototropism regulators in low light (e.g., PHYTOCHROME KINASE SUBSTRATE1 [PKS1] and ROOT PHOTO TROPISM2 [RPT2]). Although phyA remains cytosolic in fhy1 fhl, induction of PKS1 and RPT2 expression still occurs in fhy1 fhl, indicating that a low level of nuclear phyA signaling is still present in fhy1 fhl.
      

      
      Nuclear Phytochrome A Signaling Promotes Phototropism in Arabidopsis[W][OA
      PubMed Central
      Kami, Chitose; Hersch, Micha; Trevisan, Martine; Genoud, Thierry; Hiltbrunner, Andreas; Bergmann, Sven; Fankhauser, Christian
         2012-01-01
         Phototropin photoreceptors (phot1 and phot2 in Arabidopsis thaliana) enable responses to directional light cues (e.g., positive phototropism in the hypocotyl). In Arabidopsis, phot1 is essential for phototropism in response to low light, a response that is also modulated by phytochrome A (phyA), representing a classical example of photoreceptor coaction. The molecular mechanisms underlying promotion of phototropism by phyA remain unclear. Most phyA responses require nuclear accumulation of the photoreceptor, but interestingly, it has been proposed that cytosolic phyA promotes phototropism. By comparing the kinetics of phototropism in seedlings with different subcellular localizations of phyA, we show that nuclear phyA accelerates the phototropic response, whereas in the fhy1 fhl mutant, in which phyA remains in the cytosol, phototropic bending is slower than in the wild type. Consistent with this data, we find that transcription factors needed for full phyA responses are needed for normal phototropism. Moreover, we show that phyA is the primary photoreceptor promoting the expression of phototropism regulators in low light (e.g., PHYTOCHROME KINASE SUBSTRATE1 [PKS1] and ROOT PHOTO TROPISM2 [RPT2]). Although phyA remains cytosolic in fhy1 fhl, induction of PKS1 and RPT2 expression still occurs in fhy1 fhl, indicating that a low level of nuclear phyA signaling is still present in fhy1 fhl. PMID:22374392
      

      
      How does gravity save or kill Q-balls?
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Tamaki, Takashi; Sakai, Nobuyuki; Department of Education, Yamagata University, Yamagata 990-8560
         2011-02-15
         We explore stability of gravitating Q-balls with potential V{sub 4}({phi})=(m{sup 2}/2){phi}{sup 2}-{lambda}{phi}{sup 4}+({phi}{sup 6}/M{sup 2}) via catastrophe theory, as an extension of our previous work on Q-balls with potential V{sub 3}({phi})=(m{sup 2}/2){phi}{sup 2}-{mu}{phi}{sup 3}+{lambda}{phi}{sup 4}. In flat spacetime Q-balls with V{sub 4} in the thick-wall limit are unstable and there is a minimum charge Q{sub min}, where Q-balls with Q
      

      
      Interaction between physostigmine and soman on brain regional cholinesterase activity and /sup 3/H-physostigmine distribution
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Hallak, M.E.; Woodruff, E.; Giacobini, E.
         1986-03-05
         Physostigmine (Phy) concentrations (as radioactivity) were studied in various brain areas after /sup 3/H-Phy administration as a function of time. Five min after 500 ..mu..g/kg i.m., cortex (CX) and total brain showed similar concentrations (370 ng/g) which were 50-90% higher than those of other brain regions (striatum, hippocampus, and medulla oblongata). Soman did not affect Phy levels in whole brain after pretreatment with Phy (100 or 500 ..mu..g/kg), however, the regional distribution of Phy was altered by soman as was ChE inhibition. A significant increase in Phy concentration was seen in HC (22 and 45% at 5 and 30 min,more » respectively) and CX (21% at 30 min). ChE activity in total brain was 12, 30, and 24% (5, 15 and 30 min after soman administration) lower than after Phy alone. If the pretreatment dose of Phy was increased to 500 ..mu..g/kg /sup 3/H-Phy, ChE activity was further reduced to 4, 13 and 19%. This might indicate that higher doses of Phy provide more protection of the enzyme from soman than lower doses. The protective role of Phy seen in total brain was not consistent for all brain regions. Soman alone produced a 95% ChE inhibition and there were no differences in its effect between total brain or brain areas. Pretreatment of the rat with Phy produced a protective effect upon ChE activity up to 30 min. However, no protective effect on survival was observed.« less
      

      
      Phytochromes play a role in phototropism and gravitropism in Arabidopsis roots.
      PubMed
      Correll, Melanie J; Coveney, Katrina M; Raines, Steven V; Mullen, Jack L; Hangarter, Roger P; Kiss, John Z
         2003-01-01
         Phototropism as well as gravitropism plays a role in the oriented growth of roots in flowering plants. In blue or white light, roots exhibit negative phototropism, but red light induces positive phototropism in Arabidopsis roots. Phytochrome A (phyA) and phyB mediate the positive red-light-based photoresponse in roots since single mutants (and the double phyAB mutant) were severely impaired in this response. In blue-light-based negative phototropism, phyA and phyAB (but not phyB) were inhibited in the response relative to the WT. In root gravitropism, phyB and phyAB (but not phyA) were inhibited in the response compared to the WT. The differences observed in tropistic responses were not due to growth limitations since the growth rates among all the mutants tested were not significantly different from that of the WT. Thus, our study shows that the blue-light and red-light systems interact in roots and that phytochrome plays a key role in plant development by integrating multiple environmental stimuli. c2003 COSPAR. Published by Elsevier Ltd. All rights reserved.
      

      
      Phytochromes play a role in phototropism and gravitropism in Arabidopsis roots
      NASA Technical Reports Server (NTRS)
      Correll, Melanie J.; Coveney, Katrina M.; Raines, Steven V.; Mullen, Jack L.; Hangarter, Roger P.; Kiss, John Z.
         2003-01-01
         Phototropism as well as gravitropism plays a role in the oriented growth of roots in flowering plants. In blue or white light, roots exhibit negative phototropism, but red light induces positive phototropism in Arabidopsis roots. Phytochrome A (phyA) and phyB mediate the positive red-light-based photoresponse in roots since single mutants (and the double phyAB mutant) were severely impaired in this response. In blue-light-based negative phototropism, phyA and phyAB (but not phyB) were inhibited in the response relative to the WT. In root gravitropism, phyB and phyAB (but not phyA) were inhibited in the response compared to the WT. The differences observed in tropistic responses were not due to growth limitations since the growth rates among all the mutants tested were not significantly different from that of the WT. Thus, our study shows that the blue-light and red-light systems interact in roots and that phytochrome plays a key role in plant development by integrating multiple environmental stimuli. c2003 COSPAR. Published by Elsevier Ltd. All rights reserved.
      

      
      Sequential and coordinated action of phytochromes A and B during Arabidopsis stem growth revealed by kinetic analysis
      NASA Technical Reports Server (NTRS)
      Parks, B. M.; Spalding, E. P.; Evans, M. L. (Principal Investigator)
         1999-01-01
         Photoreceptor proteins of the phytochrome family mediate light-induced inhibition of stem (hypocotyl) elongation during the development of photoautotrophy in seedlings. Analyses of overt mutant phenotypes have established the importance of phytochromes A and B (phyA and phyB) in this developmental process, but kinetic information that would augment emerging molecular models of phytochrome signal transduction is absent. We have addressed this deficiency by genetically dissecting phytochrome-response kinetics, after having solved the technical issues that previously limited growth studies of small Arabidopsis seedlings. We show here, with resolution on the order of minutes, that phyA initiated hypocotyl growth inhibition upon the onset of continuous red light. This primary contribution of phyA began to decrease after 3 hr of irradiation, the same time at which immunochemically detectable phyA disappeared and an exclusively phyB-dependent phase of inhibition began. The sequential and coordinated actions of phyA and phyB in red light were not observed in far-red light, which inhibited growth persistently through an exclusively phyA-mediated pathway.
      

      
      Intracellular pH Regulation in Cultured Astrocytes from Rat Hippocampus
      PubMed Central
      Bevensee, Mark O.; Weed, Regina A.; Boron, Walter F.
         1997-01-01
         We studied the regulation of intracellular pH (pHi) in single cultured astrocytes passaged once from the hippocampus of the rat, using the dye 2′,7′-biscarboxyethyl-5,6-carboxyfluorescein (BCECF) to monitor pHi. Intrinsic buffering power (βI) was 10.5 mM (pH unit)−1 at pHi 7.0, and decreased linearly with pHi; the best-fit line to the data had a slope of −10.0 mM (pH unit)−2. In the absence of HCO3 −, pHi recovery from an acid load was mediated predominantly by a Na-H exchanger because the recovery was inhibited 88% by amiloride and 79% by ethylisopropylamiloride (EIPA) at pHi 6.05. The ethylisopropylamiloride-sensitive component of acid extrusion fell linearly with pHi. Acid extrusion was inhibited 68% (pHi 6.23) by substituting Li+ for Na+ in the bath solution. Switching from a CO2/HCO3 −-free to a CO2/HCO3 −-containing bath solution caused mean steady state pHi to increase from 6.82 to 6.90, due to a Na+-driven HCO3 − transporter. The HCO3 −-induced pHi increase was unaffected by amiloride, but was inhibited 75% (pHi 6.85) by 400 μM 4,4′-diisothiocyanatostilbene-2,2′-disulfonic acid (DIDS), and 65% (pHi 6.55–6.75) by pretreating astrocytes for up to ∼6.3 h with 400 μM 4-acetamide-4′-isothiocyanatostilbene-2,2′-disulfonic acid (SITS). The CO2/HCO3 −-induced pHi increase was blocked when external Na+ was replaced with N-methyl-d-glucammonium (NMDG+). In the presence of HCO3 −, the Na+-driven HCO3 − transporter contributed to the pHi recovery from an acid load. For example, HCO3 − shifted the plot of acid-extrusion rate vs. pHi by 0.15–0.3 pH units in the alkaline direction. Also, with Na-H exchange inhibited by amiloride, HCO3 − increased acid extrusion 3.8-fold (pHi 6.20). When astrocytes were acid loaded in amiloride, with Li+ as the major cation, HCO3 − failed to elicit a substantial increase in pHi. Thus, Li+ does not appear to substitute well for Na+ on the HCO3 − transporter. We conclude that an amiloride-sensitive Na-H exchanger and a Na+-driven HCO3 − transporter are the predominant acid extruders in astrocytes. PMID:9379175
      

      
      Benchmarking GNU Radio Kernels and Multi-Processor Scheduling
      DTIC Science & Technology
      
         2013-01-14
         AMD E350 APU , comparable to Atom • ARM Cortex A8 running on a Gumstix Overo on an Ettus USRP E110 The general testing procedure consists of • Build...Intel Atom, and the AMD E350 APU . 3.2 Multi-Processor Scheduling Figure 1: GFLOPs per second through an FFT array on an Intel i7. Example output from
      

      
      The Intelcities Community of Practice: The Capacity-Building, Co-Design, Evaluation, and Monitoring of E-Government Services
      ERIC Educational Resources Information Center
      Deakin, Mark; Lombardi, Patrizia; Cooper, Ian
         2011-01-01
         The paper examines the IntelCities Community of Practice (CoP) supporting the development of the organization's capacity-building, co-design, monitoring, and evaluation of e-government services. It begins by outlining the IntelCities CoP and goes on to set out the integrated model of electronically enhanced government (e-government) services…
      

      
      Parallel Climate Data Assimilation PSAS Package Achieves 18 GFLOPs on 512-Node Intel Paragon
      NASA Technical Reports Server (NTRS)
      Ding, H. Q.; Chan, C.; Gennery, D. B.; Ferraro, R. D.
         1995-01-01
         Several algorithms were added to the Physical-space Statistical Analysis System (PSAS) from Goddard, which assimilates observational weather data by correcting for different levels of uncertainty about the data and different locations for mobile observation platforms. The new algorithms and use of the 512-node Intel Paragon allowed a hundred-fold decrease in processing time.
      

      
      Why K-12 IT Managers and Administrators Are Embracing the Intel-Based Mac
      ERIC Educational Resources Information Center
      Technology & Learning, 2007
         2007-01-01
         Over the past year, Apple has dramatically increased its share of the school computer marketplace--especially in the category of notebook computers. A recent study conducted by Grunwald Associates and Rockman et al. reports that one of the major reasons for this growth is Apple's introduction of the Intel processor to the entire line of Mac…
      

      
      76 FR 63342 - Petition for Exemption; Summary of Petition Received
      Federal Register 2010, 2011, 2012, 2013, 2014
      
         2011-10-12
         ...-1039. Petitioner: PHI, Inc. Section of 14 CFR Affected: Sec. 91.9(a). Description of Relief Sought: PHI, Inc. (PHI), requests an exemption from 91.9(a) to allow PHI to operate S-92A helicopters in accordance...
      

      
      Intracellular pH change does not accompany egg activation in the mouse.
      PubMed
      Phillips, K P; Baltz, J M
         1996-09-01
         In the sea urchin, some other marine invertebrates, and the frog, Xenopus, egg activation at fertilization is accompanied by an increase in intracellular pH (pHi). We measured pHi in germinal vesicle (GV)-intact mouse oocytes, ovulated eggs, and in vivo fertilized zygotes using the pH indicator dye, SNARF-1. The mean pH, was 6.96 +/- 0.004 (+/- SEM) in GV-intact oocytes, 7.00 +/- 0.01 in ovulated, unfertilized eggs, and 7.02 +/- 0.01 in fertilized zygotes, indicating no sustained changes in pHi after germinal vesicle breakdown (GVBD) or fertilization. To examine whether transient changes in pHi occur shortly after egg activation, mouse eggs were parthenogenetically activated by 7% ethanol in phosphate buffered saline (PBS); no significant change in pHi followed ethanol activation. Since increased Na+/H+ antiporter activity is responsible for pHi increase in the sea urchin, pHi was measured in the absence of added bicarbonate or CO2 (a condition under which the antiporter would be the only major pHi regulatory mechanism able to operate, since the others were bicarbonate-dependent) in GV-intact oocytes, ovulated eggs, and in vivo fertilized zygotes to determine whether a Na+/H+ antiporter was activated. There was no physiologically significant difference in pHi after GVBD or fertilization, when pHi was measured in bicarbonate-free medium, nor any change upon parthenogenetic activation. Thus, a change in pHi is not a feature of egg activation in the mouse.
      

      
      [Application study on PHI and 16PF and SCL-90 for freshman's psychology inspection].
      PubMed
      Niu, Peng
         2009-07-01
         To explore the effect of application of the measurement table of PHI and 16PF and SCL-90 for freshmen psychology inspection. The measurement tables of PHI and 16PF for psychology inspection of freshmen of 2004-2007 years were used to sift crisis intervention objects. Continuous four years test showed certain stability, in addition to excited factors,the scores of Freshmen's PHI factors were more lower than normal. The incidence rates of mental problems screened by PHI table were very low and 3-5 serious-mental-problem students weren't detected. The problem can be resolved by the application of PHI combined with 16PF through remesuring the suspected cases by SCL-90. The combinative application of PHI, 16PF and SCL-90 would be better.
      

      
      Force-field parameters of the Psi and Phi around glycosidic bonds to oxygen and sulfur atoms.
      PubMed
      Saito, Minoru; Okazaki, Isao
         2009-12-01
         The Psi and Phi torsion angles around glycosidic bonds in a glycoside chain are the most important determinants of the conformation of a glycoside chain. We determined force-field parameters for Psi and Phi torsion angles around a glycosidic bond bridged by a sulfur atom, as well as a bond bridged by an oxygen atom as a preparation for the next study, i.e., molecular dynamics free energy calculations for protein-sugar and protein-inhibitor complexes. First, we extracted the Psi or Phi torsion energy component from a quantum mechanics (QM) total energy by subtracting all the molecular mechanics (MM) force-field components except for the Psi or Phi torsion angle. The Psi and Phi energy components extracted (hereafter called "the remaining energy components") were calculated for simple sugar models and plotted as functions of the Psi and Phi angles. The remaining energy component curves of Psi and Phi were well represented by the torsion force-field functions consisting of four and three cosine functions, respectively. To confirm the reliability of the force-field parameters and to confirm its compatibility with other force-fields, we calculated adiabatic potential curves as functions of Psi and Phi for the model glycosides by adopting the Psi and Phi force-field parameters obtained and by energetically optimizing other degrees of freedom. The MM potential energy curves obtained for Psi and Phi well represented the QM adiabatic curves and also these curves' differences with regard to the glycosidic oxygen and sulfur atoms. Our Psi and Phi force-fields of glycosidic oxygen gave MM potential energy curves that more closely represented the respective QM curves than did those of the recently developed GLYCAM force-field. (c) 2009 Wiley Periodicals, Inc.
      

      
      Molecular characterization of a genomic region in a Lactococcus bacteriophage that is involved in its sensitivity to the phage defense mechanism AbiA.
      PubMed
      Dinsmore, P K; Klaenhammer, T R
         1997-05-01
         A spontaneous mutant of the lactococcal phage phi31 that is insensitive to the phage defense mechanism AbiA was characterized in an effort to identify the phage factor(s) involved in sensitivity of phi31 to AbiA. A point mutation was localized in the genome of the AbiA-insensitive phage (phi31A) by heteroduplex analysis of a 9-kb region. The mutation (G to T) was within a 738-bp open reading frame (ORF245) and resulted in an arginine-to-leucine change in the predicted amino acid sequence of the protein. The mutant phi31A-ORF245 reduced the sensitivity of phi31 to AbiA when present in trans, indicating that the mutation in ORF245 is responsible for the AbiA insensitivity of phi31A. Transcription of ORF245 occurs early in the phage infection cycles of phi31 and phi31A and is unaffected by AbiA. Expansion of the phi31 sequence revealed ORF169 (immediately upstream of ORF245) and ORF71 (which ends 84 bp upstream of ORF169). Two inverted repeats lie within the 84-bp region between ORF71 and ORF169. Sequence analysis of an independently isolated AbiA-insensitive phage, phi31B, identified a mutation (G to A) in one of the inverted repeats. A 118-bp fragment from phi31, encompassing the 84-bp region between ORF71 and ORF169, eliminates AbiA activity against phi31 when present in trans, establishing a relationship between AbiA and this fragment. The study of this region of phage phi31 has identified an open reading frame (ORF245) and a 118-bp DNA fragment that interact with AbiA and are likely to be involved in the sensitivity of this phage to AbiA.
      

      
      [Improvement of Phi bodies stain and its clinical significance].
      PubMed
      Gong, Xu-Bo; Lu, Xing-Guo; Yan, Li-Juan; Xiao, Xi-Bin; Wu, Dong; Xu, Gen-Bo; Zhang, Xiao-Hong; Zhao, Xiao-Ying
         2009-02-01
         The aim of this study was to improve the dyeing method of hydroperoxidase (HPO), to analyze the morphologic features of Phi bodies and to evaluate the clinical application of this method. 128 bone marrow or peripheral blood smears from patients with myeloid and lymphoid malignancies were stained by improved HPO staining. The Phi bodies were observed with detection rate of Phi bodies in different leukemias. 69 acute myeloid leukemia (AML) specimens were chosen randomly, the positive rate and the number of Phi bodies between the improved HPO and POX stain based on the same substrate of 3, 3'diaminobenzidine were compared. The results showed that the shape of bundle-like Phi bodies was variable, long or short. while the nubbly Phi bodies often presented oval and smooth. Club-like Phi bodies were found in M(3). The detection rates of bundle-like Phi bodies in AML M(1)-M(5) were 42.9% (6/14), 83.3% (15/18), 92.0% (23/25), 52.3% (11/21), 33.3% (5/15) respectively, and those of nubbly Phi bodies were 28.6% (4/14), 66.7% (12/18), 11.1% (3/25), 33.3% (7/21), 20.0% (3/15) respectively. The detection rate of bundle-like Phi bodies in M(3) was significantly higher than that in (M(1) + M(2)) or (M(4) + M(5)) groups. The detection rate of nubbly Phi bodies in (M(1) + M(2)) group was higher than that in M(3) group. In conclusion, after improvement of staining method, the HPO stain becomes simple, the detection rate of Phi bodies is higher than that by the previous method, the positive granules are more obvious, and the results become stable. This improved method plays an important role in differentiating AML from ALL, subtyping AML, and evaluating the therapeutic results.
      

        
       
          

«

17
      18
      19
   20
      21
      »

          
        

     

   

   
       
            
              
          

«

18
      19
      20
   21
      22
      »

          
        

           
           
             
               
      
      The role and uptake of private health insurance in different health care systems: are there lessons for developing countries?
      PubMed
      Odeyemi, Isaac Ao; Nixon, John
         2013-01-01
         Social and national health insurance schemes are being introduced in many developing countries in moving towards universal health care. However, gaps in coverage are common and can only be met by out-of-pocket payments, general taxation, or private health insurance (PHI). This study provides an overview of PHI in different health care systems and discusses factors that affect its uptake and equity. A representative sample of countries was identified (United States, United Kingdom, The Netherlands, France, Australia, and Latvia) that illustrates the principal forms and roles of PHI. Literature describing each country's health care system was used to summarize how PHI is utilized and the factors that affect its uptake and equity. In the United States, PHI is a primary source of funding in conjunction with tax-based programs to support vulnerable groups; in the UK and Latvia, PHI is used in a supplementary role to universal tax-based systems; in France and Latvia, complementary PHI is utilized to cover gaps in public funding; in The Netherlands, PHI is supplementary to statutory private and social health insurance; in Australia, the government incentivizes the uptake of complementary PHI through tax rebates and penalties. The uptake of PHI is influenced by age, income, education, health care system typology, and the incentives or disincentives applied by governments. The effect on equity can either be positive or negative depending on the type of PHI adopted and its role within the wider health care system. PHI has many manifestations depending on the type of health care system used and its role within that system. This study has illustrated its common applications and the factors that affect its uptake and equity in different health care systems. The results are anticipated to be helpful in informing how developing countries may utilize PHI to meet the aim of achieving universal health care.
      

      
      Density perturbations in general modified gravitational theories
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      De Felice, Antonio; Tsujikawa, Shinji; Mukohyama, Shinji
         2010-07-15
         We derive the equations of linear cosmological perturbations for the general Lagrangian density f(R,{phi},X)/2+L{sub c}, where R is a Ricci scalar, {phi} is a scalar field, and X=-{partial_derivative}{sup {mu}{phi}{partial_derivative}}{sub {mu}{phi}/}2 is a field kinetic energy. We take into account a nonlinear self-interaction term L{sub c}={xi}({phi}) {open_square}{phi}({partial_derivative}{sup {mu}{phi}{partial_derivative}}{sub {mu}{phi}}) recently studied in the context of ''Galileon'' cosmology, which keeps the field equations at second order. Taking into account a scalar-field mass explicitly, the equations of matter density perturbations and gravitational potentials are obtained under a quasistatic approximation on subhorizon scales. We also derive conditions for the avoidance of ghosts and Laplacianmore » instabilities associated with propagation speeds. Our analysis includes most of modified gravity models of dark energy proposed in literature; and thus it is convenient to test the viability of such models from both theoretical and observational points of view.« less
      

      
      P2P Watch: Personal Health Information Detection in Peer-to-Peer File-Sharing Networks
      PubMed Central
      El Emam, Khaled; Arbuckle, Luk; Neri, Emilio; Rose, Sean; Jonker, Elizabeth
         2012-01-01
         Background Users of peer-to-peer (P2P) file-sharing networks risk the inadvertent disclosure of personal health information (PHI). In addition to potentially causing harm to the affected individuals, this can heighten the risk of data breaches for health information custodians. Automated PHI detection tools that crawl the P2P networks can identify PHI and alert custodians. While there has been previous work on the detection of personal information in electronic health records, there has been a dearth of research on the automated detection of PHI in heterogeneous user files. Objective To build a system that accurately detects PHI in files sent through P2P file-sharing networks. The system, which we call P2P Watch, uses a pipeline of text processing techniques to automatically detect PHI in files exchanged through P2P networks. P2P Watch processes unstructured texts regardless of the file format, document type, and content. Methods We developed P2P Watch to extract and analyze PHI in text files exchanged on P2P networks. We labeled texts as PHI if they contained identifiable information about a person (eg, name and date of birth) and specifics of the person’s health (eg, diagnosis, prescriptions, and medical procedures). We evaluated the system’s performance through its efficiency and effectiveness on 3924 files gathered from three P2P networks. Results P2P Watch successfully processed 3924 P2P files of unknown content. A manual examination of 1578 randomly selected files marked by the system as non-PHI confirmed that these files indeed did not contain PHI, making the false-negative detection rate equal to zero. Of 57 files marked by the system as PHI, all contained both personally identifiable information and health information: 11 files were PHI disclosures, and 46 files contained organizational materials such as unfilled insurance forms, job applications by medical professionals, and essays. Conclusions PHI can be successfully detected in free-form textual files exchanged through P2P networks. Once the files with PHI are detected, affected individuals or data custodians can be alerted to take remedial action. PMID:22776692
      

      
      P2P watch: personal health information detection in peer-to-peer file-sharing networks.
      PubMed
      Sokolova, Marina; El Emam, Khaled; Arbuckle, Luk; Neri, Emilio; Rose, Sean; Jonker, Elizabeth
         2012-07-09
         Users of peer-to-peer (P2P) file-sharing networks risk the inadvertent disclosure of personal health information (PHI). In addition to potentially causing harm to the affected individuals, this can heighten the risk of data breaches for health information custodians. Automated PHI detection tools that crawl the P2P networks can identify PHI and alert custodians. While there has been previous work on the detection of personal information in electronic health records, there has been a dearth of research on the automated detection of PHI in heterogeneous user files. To build a system that accurately detects PHI in files sent through P2P file-sharing networks. The system, which we call P2P Watch, uses a pipeline of text processing techniques to automatically detect PHI in files exchanged through P2P networks. P2P Watch processes unstructured texts regardless of the file format, document type, and content. We developed P2P Watch to extract and analyze PHI in text files exchanged on P2P networks. We labeled texts as PHI if they contained identifiable information about a person (eg, name and date of birth) and specifics of the person's health (eg, diagnosis, prescriptions, and medical procedures). We evaluated the system's performance through its efficiency and effectiveness on 3924 files gathered from three P2P networks. P2P Watch successfully processed 3924 P2P files of unknown content. A manual examination of 1578 randomly selected files marked by the system as non-PHI confirmed that these files indeed did not contain PHI, making the false-negative detection rate equal to zero. Of 57 files marked by the system as PHI, all contained both personally identifiable information and health information: 11 files were PHI disclosures, and 46 files contained organizational materials such as unfilled insurance forms, job applications by medical professionals, and essays. PHI can be successfully detected in free-form textual files exchanged through P2P networks. Once the files with PHI are detected, affected individuals or data custodians can be alerted to take remedial action.
      

      
      Tonometry revisited: perfusion-related, metabolic, and respiratory components of gastric mucosal acidosis in acute cardiorespiratory failure.
      PubMed
      Jakob, Stephan M; Parviainen, Ilkka; Ruokonen, Esko; Kogan, Alexander; Takala, Jukka
         2008-05-01
         Mucosal pH (pHi) is influenced by local perfusion and metabolism (mucosal-arterial pCO2 gradient, DeltapCO2), systemic metabolic acidosis (arterial bicarbonate), and respiration (arterial pCO2). We determined these components of pHi and their relation to outcome during the first 24 h of intensive care. We studied 103 patients with acute respiratory or circulatory failure (age, 63+/-2 [mean+/-SEM]; Acute Physiology and Chronic Health Evaluation II score, 20+/-1; Sequential Organ Failure Assessment score, 8+/-0). pHi, and the effects of bicarbonate and arterial and mucosal pCO2 on pHi, were assessed at admission, 6, and 24 h. pHi was reduced (at admission, 7.27+/-0.01) due to low arterial bicarbonate and increased DeltapCO2. Low pHi (<7.32) at admission (n=58; mortality, 29% vs. 13% in those with pHi>or=7.32 at admission; P=0.061) was associated with an increased DeltapCO2 in 59% of patients (mortality, 47% vs. 4% for patients with low pHi and normal DeltapCO2; P=0.0003). An increased versus normal DeltapCO2, regardless of pHi, was associated with increased mortality at admission (51% vs. 5%; P<0.0001; n=39) and at 6 h (34% vs. 13%; P=0.016; n=45). A delayed normalization or persistently low pHi (n=47) or high DeltapCO2 (n=25) was associated with high mortality (low pHi [34%] vs. high DeltapCO2 [60%]; P=0.046). In nonsurvivors, hypocapnia increased pHi at baseline, 6, and 24 h (all P
      

      
      Some demographic issues affecting private health insurance.
      PubMed
      Hanning, Brian
         2004-01-01
         There will be significant changes in the demography of persons with Private Health Insurance (PHI). Two methods of projecting PHI coverage are discussed in this paper. The first assumes the only factors affecting PHI coverage are demographic change and mortality and facilitates comparisons between actual and projected PHI coverage. The second projects the percentage of the population insured in each five year age cohort, and makes allowance for changes in PHI coverage due to all factors. Demographic change will increase Registered Health Benefit Organization (RHBO) premiums by 1.7% per annum. The role of these projections in analysing the effect of future premium increases on PHI retention rates is also discussed.
      

      
      Independent regulation of reovirus membrane penetration and apoptosis by the mu1 phi domain.
      PubMed
      Danthi, Pranav; Coffey, Caroline M; Parker, John S L; Abel, Ty W; Dermody, Terence S
         2008-12-01
         Apoptosis plays an important role in the pathogenesis of reovirus encephalitis. Reovirus outer-capsid protein mu1, which functions to penetrate host cell membranes during viral entry, is the primary regulator of apoptosis following reovirus infection. Ectopic expression of full-length and truncated forms of mu1 indicates that the mu1 phi domain is sufficient to elicit a cell death response. To evaluate the contribution of the mu1 phi domain to the induction of apoptosis following reovirus infection, phi mutant viruses were generated by reverse genetics and analyzed for the capacity to penetrate cell membranes and elicit apoptosis. We found that mutations in phi diminish reovirus membrane penetration efficiency by preventing conformational changes that lead to generation of key reovirus entry intermediates. Independent of effects on membrane penetration, amino acid substitutions in phi affect the apoptotic potential of reovirus, suggesting that phi initiates apoptosis subsequent to cytosolic delivery. In comparison to wild-type virus, apoptosis-defective phi mutant viruses display diminished neurovirulence following intracranial inoculation of newborn mice. These results indicate that the phi domain of mu1 plays an important regulatory role in reovirus-induced apoptosis and disease.
      

      
      Phi is not beta, and why Wertheimer's discovery launched the Gestalt revolution.
      PubMed
      Steinman, R M; Pizlo, Z; Pizlo, F J
         2000-01-01
         Max Wertheimer (1880-1943), the founder of the Gestalt School of Psychology, published a monograph on the perception of apparent motion in 1912, which initiated a new direction for a great deal of subsequent perceptual theory and research. Wertheimer's research was inspired by a serendipitous observation of a pure apparent movement, which he called the phi-phenomenon to distinguish it from optimal apparent movement (beta), which resembles real movement. Wertheimer called his novel observation 'pure' because it was perceived in the absence of any object being seen to change its position in space. The phi-phenomenon, as well as the best conditions for seeing it, were not described clearly in this monograph, leading to considerable subsequent confusion about its appearance and occurrence. We review the history leading to the discovery of the phi-phenomenon, and then describe: (i) a likely source for the confusion evident in most contemporary research on the phi-phenomenon; (ii) the best conditions for seeing the phi-phenomenon; (iii) new conditions that provide a particularly vivid phi-phenomenon; and (iv) two lines of thought that may provide explanations of the phi-phenomenon and also distinguish phi from beta.
      

      
      PHI-base: a new interface and further additions for the multi-species pathogen–host interactions database
      PubMed Central
      Urban, Martin; Cuzick, Alayne; Rutherford, Kim; Irvine, Alistair; Pedro, Helder; Pant, Rashmi; Sadanadan, Vidyendra; Khamari, Lokanath; Billal, Santoshkumar; Mohanty, Sagar; Hammond-Kosack, Kim E.
         2017-01-01
         The pathogen–host interactions database (PHI-base) is available at www.phi-base.org. PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen–host interactions reported in peer reviewed research articles. In addition, literature that indicates specific gene alterations that did not affect the disease interaction phenotype are curated to provide complete datasets for comparative purposes. Viruses are not included. Here we describe a revised PHI-base Version 4 data platform with improved search, filtering and extended data display functions. A PHIB-BLAST search function is provided and a link to PHI-Canto, a tool for authors to directly curate their own published data into PHI-base. The new release of PHI-base Version 4.2 (October 2016) has an increased data content containing information from 2219 manually curated references. The data provide information on 4460 genes from 264 pathogens tested on 176 hosts in 8046 interactions. Prokaryotic and eukaryotic pathogens are represented in almost equal numbers. Host species belong ∼70% to plants and 30% to other species of medical and/or environmental importance. Additional data types included into PHI-base 4 are the direct targets of pathogen effector proteins in experimental and natural host organisms. The curation problems encountered and the future directions of the PHI-base project are briefly discussed. PMID:27915230
      

      
      Self-reproduction in k-inflation
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Helmer, Ferdinand; Winitzki, Sergei
         2006-09-15
         We study cosmological self-reproduction in models of inflation driven by a scalar field {phi} with a noncanonical kinetic term (k-inflation). We develop a general criterion for the existence of attractors and establish conditions selecting a class of k-inflation models that admit a unique attractor solution. We then consider quantum fluctuations on the attractor background. We show that the correlation length of the fluctuations is of order c{sub s}H{sup -1}, where c{sub s} is the speed of sound. By computing the magnitude of field fluctuations, we determine the coefficients of Fokker-Planck equations describing the probability distribution of the spatially averaged fieldmore » {phi}. The field fluctuations are generally large in the inflationary attractor regime; hence, eternal self-reproduction is a generic feature of k-inflation. This is established more formally by demonstrating the existence of stationary solutions of the relevant Fokker-Planck equations. We also show that there exists a (model-dependent) range {phi}{sub R}<{phi}<{phi}{sub max} within which large fluctuations are likely to drive the field towards the upper boundary {phi}={phi}{sub max}, where the semiclassical consideration breaks down. An exit from inflation into reheating without reaching {phi}{sub max} will occur almost surely (with probability 1) only if the initial value of {phi} is below {phi}{sub R}. In this way, strong self-reproduction effects constrain models of k-inflation.« less
      

      
      Latency-Information Theory: The Mathematical-Physical Theory of Communication-Observation
      DTIC Science & Technology
      
         2010-01-01
         Werner Heisenberg of quantum mechanics; 3) the source-entropy and channel-capacity lossless performance bounds of Claude Shannon that guide...through noisy intel-space channels, and where the physical time-dislocations of intel-space exhibit a passing of time Heisenberg information...life-space sensor, and where the physical time- dislocations of life-space exhibit a passing of time Heisenberg information-uncertainty; and 4
      

      
      Communication overhead on the Intel iPSC-860 hypercube
      NASA Technical Reports Server (NTRS)
      Bokhari, Shahid H.
         1990-01-01
         Experiments were conducted on the Intel iPSC-860 hypercube in order to evaluate the overhead of interprocessor communication. It is demonstrated that: (1) contrary to popular belief, the distance between two communicating processors has a significant impact on communication time, (2) edge contention can increase communication time by a factor of more than 7, and (3) node contention has no measurable impact.
      

      
      Comparisons between Intel 386 and i486 microprecessors
      NASA Technical Reports Server (NTRS)
      Liu, Yuan-Kwei
         1989-01-01
         A quick and preliminary comparison is made between the Intel 386 and i486 microprocessors. The following topics are discussed: the i486 key elements, comparison of instruction set architecture, the i486 on-chip cache characteristics, the i486 multiprocessor support, comparison of performance, comparison of power consumption, comparison of radiation hardening potential, and recommendations for the Space Station Freedom (SSF) Data Management System (DMS).
      

      
      Student Intern Freed Competes at Intel ISEF, Two Others Awarded at Local Science Fair | Poster
      Cancer.gov
      
         
         Class of 2014–2015 Werner H. Kirsten (WHK) student intern Rebecca “Natasha” Freed earned a fourth-place award in biochemistry at the 2015 Intel International Science and Engineering Fair (ISEF), the largest high school science research competition in the world, according to the Society for Science & the Public’s website. Freed described the event as “transformative
      

      
      The prostate health index PHI predicts oncological outcome and biochemical recurrence after radical prostatectomy - analysis in 437 patients
      PubMed Central
      Maxeiner, Andreas; Kilic, Ergin; Matalon, Julia; Friedersdorff, Frank; Miller, Kurt; Jung, Klaus; Stephan, Carsten; Busch, Jonas
         2017-01-01
         The purpose of this study was to investigate the Prostate-Health-Index (PHI) for pathological outcome prediction following radical prostatectomy and also for biochemical recurrence prediction in comparison to established parameters such as Gleason-score, pathological tumor stage, resection status (R0/1) and prostate-specific antigen (PSA). Out of a cohort of 460 cases with preoperative PHI-measurements (World Health Organization calibration: Beckman Coulter Access-2-Immunoassay) between 2001 and 2014, 437 patients with complete follow up data were included. From these 437 patients, 87 (19.9%) developed a biochemical recurrence. Patient characteristics were compared by using chi-square test. Predictors were analyzed by multivariate adjusted logistic and Cox regression. The median follow up for a biochemical recurrence was 65 (range 3-161) months. PHI, PSA, [-2]proPSA, PHI- and PSA-density performed as significant variables (p < 0.05) for cancer aggressiveness: Gleason-score <7 or ≥7 (ISUP grade 1 or ≥2) . Concerning pathological tumor stage discrimination and prediction, variables as PHI, PSA, %fPSA, [-2]proPSA, PHI- and PSA-density significantly discriminated between stages 
      

      
      The prostate health index PHI predicts oncological outcome and biochemical recurrence after radical prostatectomy - analysis in 437 patients.
      PubMed
      Maxeiner, Andreas; Kilic, Ergin; Matalon, Julia; Friedersdorff, Frank; Miller, Kurt; Jung, Klaus; Stephan, Carsten; Busch, Jonas
         2017-10-03
         The purpose of this study was to investigate the Prostate-Health-Index (PHI) for pathological outcome prediction following radical prostatectomy and also for biochemical recurrence prediction in comparison to established parameters such as Gleason-score, pathological tumor stage, resection status (R0/1) and prostate-specific antigen (PSA). Out of a cohort of 460 cases with preoperative PHI-measurements (World Health Organization calibration: Beckman Coulter Access-2-Immunoassay) between 2001 and 2014, 437 patients with complete follow up data were included. From these 437 patients, 87 (19.9%) developed a biochemical recurrence. Patient characteristics were compared by using chi-square test. Predictors were analyzed by multivariate adjusted logistic and Cox regression. The median follow up for a biochemical recurrence was 65 (range 3-161) months. PHI, PSA, [-2]proPSA, PHI- and PSA-density performed as significant variables (p < 0.05) for cancer aggressiveness: Gleason-score <7 or ≥7 (ISUP grade 1 or ≥2) . Concerning pathological tumor stage discrimination and prediction, variables as PHI, PSA, %fPSA, [-2]proPSA, PHI- and PSA-density significantly discriminated between stages 
      

      
      X-Phi and Carnapian Explication.
      PubMed
      Shepherd, Joshua; Justus, James
         2015-04-01
         The rise of experimental philosophy (x-phi) has placed metaphilosophical questions, particularly those concerning concepts, at the center of philosophical attention. X-phi offers empirically rigorous methods for identifying conceptual content, but what exactly it contributes towards evaluating conceptual content remains unclear. We show how x-phi complements Rudolf Carnap's underappreciated methodology for concept determination, explication. This clarifies and extends x-phi's positive philosophical import, and also exhibits explication's broad appeal. But there is a potential problem: Carnap's account of explication was limited to empirical and logical concepts, but many concepts of interest to philosophers (experimental and otherwise) are essentially normative. With formal epistemology as a case study, we show how x-phi assisted explication can apply to normative domains.
      

      
      Cost/Performance Ratio Achieved by Using a Commodity-Based Cluster
      NASA Technical Reports Server (NTRS)
      Lopez, Isaac
         2001-01-01
         Researchers at the NASA Glenn Research Center acquired a commodity cluster based on Intel Corporation processors to compare its performance with a traditional UNIX cluster in the execution of aeropropulsion applications. Since the cost differential of the clusters was significant, a cost/performance ratio was calculated. After executing a propulsion application on both clusters, the researchers demonstrated a 9.4 cost/performance ratio in favor of the Intel-based cluster. These researchers utilize the Aeroshark cluster as one of the primary testbeds for developing NPSS parallel application codes and system software. The Aero-shark cluster provides 64 Intel Pentium II 400-MHz processors, housed in 32 nodes. Recently, APNASA - a code developed by a Government/industry team for the design and analysis of turbomachinery systems was used for a simulation on Glenn's Aeroshark cluster.
      

      
      Full cycle trigonometric function on Intel Quartus II Verilog
      NASA Astrophysics Data System (ADS)
      Mustapha, Muhazam; Zulkarnain, Nur Antasha
         2018-02-01
         This paper discusses about an improvement of a previous research on hardware based trigonometric calculations. Tangent function will also be implemented to get a complete set. The functions have been simulated using Quartus II where the result will be compared to the previous work. The number of bits has also been extended for each trigonometric function. The design is based on RTL due to its resource efficient nature. At earlier stage, a technology independent test bench simulation was conducted on ModelSim due to its convenience in capturing simulation data so that accuracy information can be obtained. On second stage, Intel/Altera Quartus II will be used to simulate on technology dependent platform, particularly on the one belonging to Intel/Altera itself. Real data on no. logic elements used and propagation delay have also been obtained.
      

      
      phiGENOME: an integrative navigation throughout bacteriophage genomes.
      PubMed
      Stano, Matej; Klucar, Lubos
         2011-11-01
         phiGENOME is a web-based genome browser generating dynamic and interactive graphical representation of phage genomes stored in the phiSITE, database of gene regulation in bacteriophages. phiGENOME is an integral part of the phiSITE web portal (http://www.phisite.org/phigenome) and it was optimised for visualisation of phage genomes with the emphasis on the gene regulatory elements. phiGENOME consists of three components: (i) genome map viewer built using Adobe Flash technology, providing dynamic and interactive graphical display of phage genomes; (ii) sequence browser based on precisely formatted HTML tags, providing detailed exploration of genome features on the sequence level and (iii) regulation illustrator, based on Scalable Vector Graphics (SVG) and designed for graphical representation of gene regulations. Bringing 542 complete genome sequences accompanied with their rich annotations and references, makes phiGENOME a unique information resource in the field of phage genomics. Copyright Â© 2011 Elsevier Inc. All rights reserved.
      

        
       
          

«

18
      19
      20
   21
      22
      »

          
        

     

   

   
       
            
              
          

«

19
      20
      21
   22
      23
      »

          
        

           
           
             
               
      
      DNA conformational change induced by the bacteriophage phi 29 connector.
      PubMed Central
      Valpuesta, J M; Serrano, M; Donate, L E; Herranz, L; Carrascosa, J L
         1992-01-01
         Translocation of viral DNA inwards and outwards of the capsid of double-stranded DNA bacteriophages occurs through the connector, a key viral structure that is known to interact with DNA. It is shown here that phage phi 29 connector binds both linear and circular double-stranded DNA. However, DNA-mediated protection of phi 29 connectors against Staphylococcus aureus endoprotease V8 digestion suggests that binding to linear DNA is more stable than to circular DNA. Endoprotease V8-protection assays also suggest that the length of the linear DNA required to produce a stable phi 29 connector-DNA interaction is, at least, twice longer than the phi 29 connector channel. This result is confirmed by experiments of phi 29 connector-protection of DNA against DNase I digestion. Furthermore, DNA circularization assays indicate that phi 29 connectors restrain negative supercoiling when bound to linear DNA. This DNA conformational change is not observed upon binding to circular DNA and it could reflect the existence of some left-handed DNA coiling or DNA untwisting inside of the phi 29 connector channel. Images PMID:1454519
      

      
      Scaling of ion expansion energy with laser flux in moderate-Z plasmas produced by lasers
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Gupta, P.D.; Goel, S.K.; Uppal, J.S.
         1982-09-01
         Ion expansion energy measurements in plasmas created by focusing 1-GW, 5-nsec Nd:glass laser on plane solid targets of polythene, carbon, and aluminum are reported. It is observed that the scaling of ion expansion energy with laser flux Phi varies between Phi/sup 0.28/ and Phi/sup 0.66/ for polythene, Phi/sup 0.28/ and Phi/sup 0.70/ for carbon, and Phi/sup 0.51/ and Phi/sup 0.44/ for aluminum in the flux range 5 x 10/sup 10/--5 x 10/sup 12/ W/cm/sup 2/ of our experiment. The scaling is either much slower or faster than a scaling of Phi/sup 4/9/ expected from a self-regulating model for plasmas createdmore » in the low flux range. It is shown that this behavior, as well as results of experiments on similar plasmas reported by other authors, can be explained when radiation losses and the energy spent in ionization are also considered in the self-regulating model.« less
      

      
      Defining, Describing, and Categorizing Public Health Infrastructure Priorities for Tropical Cyclone, Flood, Storm, Tornado, and Tsunami-Related Disasters.
      PubMed
      Ryan, Benjamin J; Franklin, Richard C; Burkle, Frederick M; Watt, Kerrianne; Aitken, Peter; Smith, Erin C; Leggat, Peter
         2016-08-01
         The study aim was to undertake a qualitative research literature review to analyze available databases to define, describe, and categorize public health infrastructure (PHI) priorities for tropical cyclone, flood, storm, tornado, and tsunami-related disasters. Five electronic publication databases were searched to define, describe, or categorize PHI and discuss tropical cyclone, flood, storm, tornado, and tsunami-related disasters and their impact on PHI. The data were analyzed through aggregation of individual articles to create an overall data description. The data were grouped into PHI themes, which were then prioritized on the basis of degree of interdependency. Sixty-seven relevant articles were identified. PHI was categorized into 13 themes with a total of 158 descriptors. The highest priority PHI identified was workforce. This was followed by water, sanitation, equipment, communication, physical structure, power, governance, prevention, supplies, service, transport, and surveillance. This review identified workforce as the most important of the 13 thematic areas related to PHI and disasters. If its functionality fails, workforce has the greatest impact on the performance of health services. If addressed post-disaster, the remaining forms of PHI will then be progressively addressed. These findings are a step toward providing an evidence base to inform PHI priorities in the disaster setting. (Disaster Med Public Health Preparedness. 2016;10:598-610).
      

      
      The occurrence of phi in dento-facial beauty of fine art from antiquity through the Renaissance.
      PubMed
      Wiener, R Constance; Wiener Pla, Regina M
         2012-01-01
         External beauty is a complex construct that influences lives and may be impacted by dentists. Beauty is not easily quantified, but one cited anthropometric of beauty is the ratio phi, the number 1.618033(...). This study examined phi as a measure of female frontal facial beauty in classic Western art, using pre- Renaissance (N = 30), and Renaissance (N = 30) artwork. Four horizontal and five vertical ratios were determined in the works of art, which were then compared with the phi ratio. All horizontal ratios for both pre-Renaissance and Renaissance artwork were similar to each other, but did not contain the phi ratio (P < 0.001). Nevertheless, all vertical ratios for pre-Renaissance and Renaissance art-work did contain the phi ratio within their confidence intervals with the exception of the vertical ratio, "intereye point to soft tissue menton/ intereye point to stomion", that was found to be less than phi in the Renaissance group. The study provides evidence of the presence of the phi ratio in vertical aspect of females in artwork from pre-Renaissance through the Renaissance demonstrating consistent temporal preferences. Therefore, the phi ratio seems to be an important consideration in altering vertical facial dimensions in full mouth rehabilitation and reconstructive orthognathic surgery involving females.
      

      
      Cytoplasmic pH influences cytoplasmic calcium in MC3T3-E1 osteoblast cells
      NASA Technical Reports Server (NTRS)
      Lin, H. S.; Hughes-Fulford, M.; Kumegawa, M.; Pitts, A. C.; Snowdowne, K. W.
         1993-01-01
         We found that the cytoplasmic concentration of calcium (Cai) of MC3T3-E1 osteoblasts was influenced by the type of pH buffer we used in the perfusing medium, suggesting that intracellular pH (pHi) might influence Cai. To study this effect, the Cai and pHi were monitored as we applied various experimental conditions known to change pHi. Exposure to NH4Cl caused a transient increase in both pHi and Cai without a change in extracellular pH (pHo). Decreasing pHo and pHi by lowering the bicarbonate concentration of the medium decreased Cai, and increasing pHi by the removal of 5% CO2 increased Cai. Clamping pHi to known values with 10 microM nigericin, a potassium proton ionophore, also influenced Cai: acid pHi lowered Cai, whereas alkaline pHi increased it. The rise in Cai appears to be very sensitive to the extracellular concentration of calcium, suggesting the existence of a pH-sensitive calcium influx mechanism. We conclude that physiologic changes in pH could modulate Cai by controlling the influx of calcium ions and could change the time course of the Cai transient associated with hormonal activation.
      

      
      RNA packaging device of double-stranded RNA bacteriophages, possibly as simple as hexamer of P4 protein.
      PubMed
      Kainov, Denis E; Pirttimaa, Markus; Tuma, Roman; Butcher, Sarah J; Thomas, George J; Bamford, Dennis H; Makeyev, Eugene V
         2003-11-28
         Genomes of complex viruses have been demonstrated, in many cases, to be packaged into preformed empty capsids (procapsids). This reaction is performed by molecular motors translocating nucleic acid against the concentration gradient at the expense of NTP hydrolysis. At present, the molecular mechanisms of packaging remain elusive due to the complex nature of packaging motors. In the case of the double-stranded RNA bacteriophage phi 6 from the Cystoviridae family, packaging of single-stranded genomic precursors requires a hexameric NTPase, P4. In the present study, the purified P4 proteins from two other cystoviruses, phi 8 and phi 13, were characterized and compared with phi 6 P4. All three proteins are hexameric, single-stranded RNA-stimulated NTPases with alpha/beta folds. Using a direct motor assay, we found that phi 8 and phi 13 P4 hexamers translocate 5' to 3' along ssRNA, whereas the analogous activity of phi 6 P4 requires association with the procapsid. This difference is explained by the intrinsically high affinity of phi 8 and phi 13 P4s for nucleic acids. The unidirectional translocation results in RNA helicase activity. Thus, P4 proteins of Cystoviridae exhibit extensive similarity to hexameric helicases and are simple models for studying viral packaging motor mechanisms.
      

      
      Measurements of CP Asymmetries in the Decay B --> {phi}K
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Aubert, B
         
         The authors present a preliminary measurement of the time-dependent CP asymmetry for the neutral B-meson decay B{sup 0} --> {phi}K{sup 0}. They use a sample of approximately 227 million B-meson pairs recorded at the {Upsilon}(4S) resonance with the BABAR detector at the PEP-II B-meson Factory at SLAC. They reconstruct the CP eigenstates {phi}K{sub s}{sup 0} and {phi}K{sub L}{sup 0} where {phi} --> K{sup +}K{sup -}, K{sub s}{sup 0} --> {pi}{sup +}{pi}{sup -}, and K{sub L}{sup 0} is observed via its hadronic interactions. The other B meson in the event is tagged as either a B{sup 0} or {bar B}{sup 0}more » from its decay products. The values of the CP-violation parameters deived from the combined {phi}K{sup 0} dataset are S{sub {phi}K} = +0.50 {+-} 0.25(stat.){sub -0.04}{sup +0.07}(syst.) and C{sub {phi}K} = 0.00 {+-} 0.23(stat.) {+-}0.05(syst.). In addition, the authors measure the CP-violating charge asymmetry A{sub CP}(B{sup +} --> {phi}K{sup +}) = 0.054 {+-} 0.056(stat.) {+-} 0.012(syst.). All results are preliminary.« less
      

      
      Kinetics and metabolism of physostigmine in rat in the presence of soman
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Khalique, A.; Somani, S.M.
         1986-03-01
         The effect of soman (105 ..mu..g/kg; 1.5 LD/sub 50/ s.c.) administration on pharmacokinetics and metabolism of /sup 3/H-physostigmine (Phy) was studied in rats. The rats were pretreated with either Phy 100 ..mu..g/kg i.v. or 500 ..mu..g/kg i.m., 5 or 15 min prior to soman administration. Phy and metabolites were determined in plasma and brain by HPLC. The half-life of Phy in plasma after i.v. administration was 15.5 min both in the presence and absence of soman, however the t/sub 1/2/ in brain was 11 min and 13 min, respectively. Clearance was 71.4 ml/min/kg in the Phy treated rat and 90more » ml/min/kg in the presence of soman. The half-life of Phy in plasma was 18 min and 17 min, and in brain 17 min and 15 min, respectively in the absence and presence of soman after i.m. dose of Phy. Clearance after Phy treatment was 85.2 mlmin/kg however in the presence of soman, it was 66.7 ml/min/kg. Phy was slightly less metabolized to eseroline and two other metabolites, M/sub 1/ and M/sub 2/, in the presence of soman after i.v. as well as after i.m. administration in plasma and brain. The soman administration does not change the pharmacokinetics of Phy by the two different dosages and routes of administration.« less
      

      
      PHI-base: a new interface and further additions for the multi-species pathogen-host interactions database.
      PubMed
      Urban, Martin; Cuzick, Alayne; Rutherford, Kim; Irvine, Alistair; Pedro, Helder; Pant, Rashmi; Sadanadan, Vidyendra; Khamari, Lokanath; Billal, Santoshkumar; Mohanty, Sagar; Hammond-Kosack, Kim E
         2017-01-04
         The pathogen-host interactions database (PHI-base) is available at www.phi-base.org PHI-base contains expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions reported in peer reviewed research articles. In addition, literature that indicates specific gene alterations that did not affect the disease interaction phenotype are curated to provide complete datasets for comparative purposes. Viruses are not included. Here we describe a revised PHI-base Version 4 data platform with improved search, filtering and extended data display functions. A PHIB-BLAST search function is provided and a link to PHI-Canto, a tool for authors to directly curate their own published data into PHI-base. The new release of PHI-base Version 4.2 (October 2016) has an increased data content containing information from 2219 manually curated references. The data provide information on 4460 genes from 264 pathogens tested on 176 hosts in 8046 interactions. Prokaryotic and eukaryotic pathogens are represented in almost equal numbers. Host species belong ∼70% to plants and 30% to other species of medical and/or environmental importance. Additional data types included into PHI-base 4 are the direct targets of pathogen effector proteins in experimental and natural host organisms. The curation problems encountered and the future directions of the PHI-base project are briefly discussed. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
      

      
      Novel phytochrome sequences in Arabidopsis thaliana: Structure, evolution, and differential expression of a plant regulatory photoreceptor family
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Sharrock, R.A.; Quail, P.H.
         1989-01-01
         Phytochrome is a plant regulatory photoreceptor that mediates red light effects on a wide variety of physiological and molecular responses. DNA blot analysis indicates that the Arabidopsis thaliana genome contains four to five phytochrome-related gene sequences. The authors have isolated and sequenced cDNA clones corresponding to three of these genes and have deduced the amino acid sequence of the full-length polypeptide encoded in each case. One of these proteins (phyA) shows 65-80% amino acid sequence identity with the major, etiolated-tissue phytochrome apoproteins described previously in other plant species. The other two polypeptides (phyB and phyC) are unique in that theymore » have low sequence identity with each other, with phyA, and with all previously described phytochromes. The phyA, phyB, and phyC proteins are of similar molecular mass, have related hydropathic profiles, and contain a conserved chromophore attachment region. However, the sequence comparison data indicate that the three phy genes diverged early in plant evolution, well before the divergence of the two major groups of angiosperms, the monocots and dicots. The steady-state level of the phyA transcript is high in dark-grown A. thaliana seedlings and is down-regulated by light. In contrast, the phyB and phyC transcripts are present at lower levels and are not strongly light-regulated. These findings indicate that the red/far red light-responsive phytochrome photoreceptor system in A. thaliana, and perhaps in all higher plants, consists of a family of chromoproteins that are heterogeneous in structure and regulation.« less
      

      
      Time course of cholinesterase activity in plasma, brain and muscle of rat pretreated with physostigmine, and then soman
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Giacobini, E.; Boyer, A.; Somani, S.M.
         1986-03-05
         Time course of /sup 3/H-physostigmine (Phy) concentration and cholinesterase (ChE) activity in plasma and tissues was studied in rats pretreated with Phy and then soman. Rats were dosed with Phy (100 ..mu..g/kg, i.v.), 5 or 15 min prior to soman (105 ..mu..g/kg, 1.5 LD/sub 50/, s.c.) treatment and were sacrificed at various times; Phys conc. and ChE activity were determined. BuChE activity in plasma was 5% of control from 7-30 min after Phy i.v. pretreatment and soman or soman alone treatment. Plasma Phy conc. steadily declined (32.6 ng/ml at 7 min) to 15 ng/ml at 30 min. ChE activity inmore » muscle was 60-50% of control for Phy pretreated but soman alone gave 85-72% activity from 2-30 min. Brain ChE activity was about 5% of control within 2 min after soman treatment; however, with Phy pretreatment, the activity was about 52% at 7 min, 40% at 22 min, which recovered to 45% of control at 35 min, indicating that Phy protected brain ChE. Brain Phy conc. steadily declined (58.6 ng/g at 7 min) to 11.7 ng/g at 30 min. However, pretreatment of rat with a higher dose of Phy and then soman showed BuChE in plasma and ChE in brain and muscle to be about 25, 35 and 51%, in comparison to about 5% in plasma and brain with soman alone treatment, indicating higher protection of ChE enzyme with higher conc. of Phy in plasma and brain.« less
      

      
      A phytochrome/phototropin chimeric photoreceptor of fern functions as a blue/far-red light-dependent photoreceptor for phototropism in Arabidopsis.
      PubMed
      Kanegae, Takeshi; Kimura, Izumi
         2015-08-01
         In the fern Adiantum capillus-veneris, the phototropic response of the protonemal cells is induced by blue light and partially inhibited by subsequent irradiation with far-red light. This observation strongly suggests the existence of a phytochrome that mediates this blue/far-red reversible response; however, the phytochrome responsible for this response has not been identified. PHY3/NEO1, one of the three phytochrome genes identified in Adiantum, encodes a chimeric photoreceptor composed of both a phytochrome and a phototropin domain. It was demonstrated that phy3 mediates the red light-dependent phototropic response of Adiantum, and that phy3 potentially functions as a phototropin. These findings suggest that phy3 is the phytochrome that mediates the blue/far-red response in Adiantum protonemata. In the present study, we expressed Adiantum phy3 in a phot1 phot2 phototropin-deficient Arabidopsis line, and investigated the ability of phy3 to induce phototropic responses under various light conditions. Blue light irradiation clearly induced a phototropic response in the phy3-expressing transgenic seedlings, and this effect was fully inhibited by simultaneous irradiation with far-red light. In addition, experiments using amino acid-substituted phy3 indicated that FMN-cysteinyl adduct formation in the light, oxygen, voltage (LOV) domain was not necessary for the induction of blue light-dependent phototropism by phy3. We thus demonstrate that phy3 is the phytochrome that mediates the blue/far-red reversible phototropic response in Adiantum. Furthermore, our results imply that phy3 can function as a phototropin, but that it acts principally as a phytochrome that mediates both the red/far-red and blue/far-red light responses. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
      

      
      Tumor-cytolytic human macrophages cultured as nonadherent cells: potential for the adoptive immunotherapy of cancer.
      PubMed
      Helinski, E H; Hurley, E L; Streck, R J; Bielat, K L; Pauly, J L
         1990-01-01
         Tumor-cytolytic lymphokine (e.g., interleukin-2; IL-2)-activated killer cells are currently being evaluated in IL-2/LAK cell adoptive immunotherapy regimens for the treatment of cancer. Monocyte-derived macrophages (M phi) are also known to be efficient tumor killer cells; accordingly, M phi that have been activated in vitro may also be of therapeutic merit. However, attempts to cultivate M phi for morphological and functional studies have often been compromised because M phi adhere rapidly and tenaciously to cultureware. Studies that we have conducted to address this problem have proven successful in developing procedures for the long-term cultivation of non-adherent immunocompetent M phi in serum-free medium using petri dishes containing a thin Teflon liner. The utility of this technology is documented by the results of studies presented herein in which light and scanning electron microscopy was used to analyze tumor-cytolytic human M phi. In these experiments, we demonstrated that nonadherent immunocompetent human M phi can be prepared for detailed examinations of their pleomorphic membrane architecture. Moreover, nonadherent human M phi could readily be collected for preparing conjugates of M phi and tumor cells. It is anticipated that this technology should prove useful for future structure-function studies defining the topographical location and spatial distribution of antigens and receptors on M phi membrane ultrastructures, particularly the microvilli-like projections that bridge together an immunocompetent effector M phi and target cell (e.g., tumor cells and microbial pathogens) and which provide the physical interaction required for the initial phases of a cellular immune response that includes antigen recognition and cell-to-cell adhesion.
      

      
      Expression of Aspergillus nidulans phy Gene in Nicotiana benthamiana Produces Active Phytase with Broad Specificities
      PubMed Central
      Oh, Tae-Kyun; Oh, Sung; Kim, Seongdae; Park, Jae Sung; Vinod, Nagarajan; Jang, Kyung Min; Kim, Sei Chang; Choi, Chang Won; Ko, Suk-Min; Jeong, Dong Kee; Udayakumar, Rajangam
         2014-01-01
         A full-length phytase gene (phy) of Aspergillus nidulans was amplified from the cDNA library by polymerase chain reaction (PCR), and it was introduced into a bacterial expression vector, pET-28a. The recombinant protein (rPhy-E, 56 kDa) was overexpressed in the insoluble fraction of Escherichia coli culture, purified by Ni-NTA resin under denaturing conditions and injected into rats as an immunogen. To express A. nidulans phytase in a plant, the full-length of phy was cloned into a plant expression binary vector, pPZP212. The resultant construct was tested for its transient expression by Agrobacterium-infiltration into Nicotiana benthamiana leaves. Compared with a control, the agro-infiltrated leaf tissues showed the presence of phy mRNA and its high expression level in N. benthamiana. The recombinant phytase (rPhy-P, 62 kDa) was strongly reacted with the polyclonal antibody against the nonglycosylated rPhy-E. The rPhy-P showed glycosylation, two pH optima (pH 4.5 and pH 5.5), an optimum temperature at 45~55 °C, thermostability and broad substrate specificities. After deglycosylation by peptide-N-glycosidase F (PNGase-F), the rPhy-P significantly lost the phytase activity and retained 1/9 of the original activity after 10 min of incubation at 45 °C. Therefore, the deglycosylation caused a significant reduction in enzyme thermostability. In animal experiments, oral administration of the rPhy-P at 1500 U/kg body weight/day for seven days caused a significant reduction of phosphorus excretion by 16% in rat feces. Besides, the rPhy-P did not result in any toxicological changes and clinical signs. PMID:25192284
      

      
      Genome of Enterobacteriophage Lula/phi80 and Insights into Its Ability To Spread in the Laboratory Environment
      PubMed Central
      Rotman, Ella; Kouzminova, Elena; Plunkett, Guy
         2012-01-01
         The novel temperate bacteriophage Lula, contaminating laboratory Escherichia coli strains, turned out to be the well-known lambdoid phage phi80. Our previous studies revealed that two characteristics of Lula/phi80 facilitate its spread in the laboratory environment: cryptic lysogen productivity and stealthy infectivity. To understand the genetics/genomics behind these traits, we sequenced and annotated the Lula/phi80 genome, encountering an E. coli-toxic gene revealed as a gap in the sequencing contig and analyzing a few genes in more detail. Lula/phi80's genome layout copies that of lambda, yet homology with other lambdoid phages is mostly limited to the capsid genes. Lula/phi80's DNA is resistant to cutting with several restriction enzymes, suggesting DNA modification, but deletion of the phage's damL gene, coding for DNA adenine methylase, did not make DNA cuttable. The damL mutation of Lula/phi80 also did not change the phage titer in lysogen cultures, whereas the host dam mutation did increase it almost 100-fold. Since the high phage titer in cultures of Lula/phi80 lysogens is apparently in response to endogenous DNA damage, we deleted the only Lula/phi80 SOS-controlled gene, dinL. We found that dinL mutant lysogens release fewer phage in response to endogenous DNA damage but are unchanged in their response to external DNA damage. The toxic gene of Lula/phi80, gamL, encodes an inhibitor of the host ATP-dependent exonucleases, RecBCD and SbcCD. Its own antidote, agt, apparently encoding a modifier protein, was found nearby. Interestingly, Lula/phi80 lysogens are recD and sbcCD phenocopies, so GamL and Agt are part of lysogenic conversion. PMID:23042999
      

      
      Relationship between intracellular pH and proton mobility in rat and guinea-pig ventricular myocytes.
      PubMed
      Swietach, Pawel; Vaughan-Jones, Richard D
         2005-08-01
         Intracellular H+ ion mobility in eukaryotic cells is low because of intracellular buffering. We have investigated whether Hi+ mobility varies with pHi. A dual microperfusion apparatus was used to expose guinea-pig or rat myocytes to small localized doses (3-5 mm) of ammonium chloride (applied in Hepes-buffered solution). Intracellular pH (pHi) was monitored confocally using the fluorescent dye, carboxy-SNARF-1. Local ammonium exposure produced a stable, longitudinal pHi gradient. Its size was fed into a look-up table (LUT) to give an estimate of the apparent intracellular proton diffusion coefficient (D(app)H). LUTs were generated using a diffusion-reaction model of Hi+ mobility based on intracellular buffer diffusion. To examine the pHi sensitivity of D(app)H, whole-cell pHi was initially displaced using a whole-cell ammonium or acetate prepulse, before locally applying the low dose of ammonium. In both rat and guinea-pig, D(app)H decreased with pHi over the range 7.5-6.5. In separate pipette-loading experiments, the intracellular diffusion coefficient for carboxy-SNARF-1 (a mobile-buffer analogue) exhibited no significant pHi dependence. The pHi sensitivity of D(app)H is thus likely to be governed by the mobile fraction of intrinsic buffering capacity. These results reinforce the buffer hypothesis of Hi+ mobility. The pHi dependence of D(app)H was used to characterize the mobile and fixed buffer components, and to estimate D(mob) (the average diffusion coefficient for intracellular mobile buffer). One consequence of a decline in Hi+ mobility at low pHi is that it will predispose the myocardium to pHi nonuniformity. The physiological relevance of this is discussed.
      

      
      Hydrogen ion dynamics in human red blood cells
      PubMed Central
      Swietach, Pawel; Tiffert, Teresa; Mauritz, Jakob M A; Seear, Rachel; Esposito, Alessandro; Kaminski, Clemens F; Lew, Virgilio L; Vaughan-Jones, Richard D
         2010-01-01
         Our understanding of pH regulation within red blood cells (RBCs) has been inferred mainly from indirect experiments rather than from in situ measurements of intracellular pH (pHi). The present work shows that carboxy-SNARF-1, a pH fluorophore, when used with confocal imaging or flow cytometry, reliably reports pHi in individual, human RBCs, provided intracellular fluorescence is calibrated using a ‘null-point’ procedure. Mean pHi was 7.25 in CO2/HCO3−-buffered medium and 7.15 in Hepes-buffered medium, and varied linearly with extracellular pH (slope of 0.77). Intrinsic (non-CO2/HCO3−-dependent) buffering power, estimated in the intact cell (85 mmol (l cell)−1 (pH unit)−1 at resting pHi), was somewhat higher than previous estimates from cell lysates (50–70 mmol (l cell)−1 (pH unit)−1). Acute displacement of pHi (superfusion of weak acids/bases) triggered rapid pHi recovery. This was mediated via membrane Cl−/HCO3− exchange (the AE1 gene product), irrespective of whether recovery was from an intracellular acid or base load, and with no evident contribution from other transporters such as Na+/H+ exchange. H+-equivalent flux through AE1 was a linear function of [H+]i and reversed at resting pHi, indicating that its activity is not allosterically regulated by pHi, in contrast to other AE isoforms. By simultaneously monitoring pHi and markers of cell volume, a functional link between membrane ion transport, volume and pHi was demonstrated. RBC pHi is therefore tightly regulated via AE1 activity, but modulated during changes of cell volume. A comparable volume–pHi link may also be important in other cell types expressing anion exchangers. Direct measurement of pHi should be useful in future investigations of RBC physiology and pathology. PMID:20962000
      

      
      European Scientific Notes. Volume 35, Number 12,
      DTIC Science & Technology
      
         1981-12-31
         been redesigned to work A. Osorio, which was organized some 3 with the Intel 8085 microprocessor, it years ago and contains about half of the has the...operational set. attempt to derive a set of invariants MOISE is based on the Intel 8085A upon which virtually speaker-invariant microprocessor, and...FACILITY software interface; a Research Signal Processor (RSP) using reduced computational It has been IBM International’s complexity algorithms for
      

      
      An Analysis of Hardware-Assisted Virtual Machine Based Rootkits
      DTIC Science & Technology
      
         2014-06-01
         certain aspects of TPM implementation just to name a few. HyperWall is an architecture proposed by Szefer and Lee to protect guest VMs from...DISTRIBUTION CODE 13. ABSTRACT (maximum 200 words) The use of virtual machine (VM) technology has expanded rapidly since AMD and Intel implemented ...Intel VT-x implementations of Blue Pill to identify commonalities in the respective versions’ attack methodologies from both a functional and technical
      

      
      The Employer-Led Health Care Revolution.
      PubMed
      McDonald, Patricia A; Mecklenburg, Robert S; Martin, Lindsay A
         2015-01-01
         To tame its soaring health care costs, intel tried many popular approaches: "consumer-driven health care" offerings such as high-deductible/low-premium plans, on-site clinics and employee wellness programs. But by 2009 intel realized that those programs alone would not enable the company to solve the problem, because they didn't affect its root cause: the steadily rising cost of the care employees and their families were receiving. Intel projected that its health care expenditures would hit a whopping $1 billion by 2012. So the company decided to try a novel approach. As a large purchaser of health services and with expertise in quality improvement and supplier management, intel was uniquely positioned to drive transformation in its local health care market. The company decided that it would manage the quality and cost of its health care suppliers with the same rigor it applied to its equipment suppliers by monitoring quality and cost. It spearheaded a collaborative effort in Portland, Oregon, that included two health systems, a plan administrator, and a major government employer. So far the Portland collaborative has reduced treatment costs for certain medical conditions by 24% to 49%, improved patient satisfaction, and eliminated over 10,000 hours worth of waste in the two health systems' business processes.
      

        
       
          

«

19
      20
      21
   22
      23
      »

          
        

     

   

   
       
            
              
          

«

20
      21
      22
   23
      24
      »

          
        

           
           
             
               
      
      Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.
         
         Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD, and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Ourmore » evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling --- sometimes encouraged by restricted GPU memory --- NVLink is less important.« less
      

      
      EUV mask pilot line at Intel Corporation
      NASA Astrophysics Data System (ADS)
      Stivers, Alan R.; Yan, Pei-Yang; Zhang, Guojing; Liang, Ted; Shu, Emily Y.; Tejnil, Edita; Lieberman, Barry; Nagpal, Rajesh; Hsia, Kangmin; Penn, Michael; Lo, Fu-Chang
         2004-12-01
         The introduction of extreme ultraviolet (EUV) lithography into high volume manufacturing requires the development of a new mask technology. In support of this, Intel Corporation has established a pilot line devoted to encountering and eliminating barriers to manufacturability of EUV masks. It concentrates on EUV-specific process modules and makes use of the captive standard photomask fabrication capability of Intel Corporation. The goal of the pilot line is to accelerate EUV mask development to intersect the 32nm technology node. This requires EUV mask technology to be comparable to standard photomask technology by the beginning of the silicon wafer process development phase for that technology node. The pilot line embodies Intel's strategy to lead EUV mask development in the areas of the mask patterning process, mask fabrication tools, the starting material (blanks) and the understanding of process interdependencies. The patterning process includes all steps from blank defect inspection through final pattern inspection and repair. We have specified and ordered the EUV-specific tools and most will be installed in 2004. We have worked with International Sematech and others to provide for the next generation of EUV-specific mask tools. Our process of record is run repeatedly to ensure its robustness. This primes the supply chain and collects information needed for blank improvement.
      

      
      Prostate Health Index (Phi) and Prostate Cancer Antigen 3 (PCA3) Significantly Improve Prostate Cancer Detection at Initial Biopsy in a Total PSA Range of 2–10 ng/ml
      PubMed Central
      Perdonà, Sisto; Marino, Ada; Mazzarella, Claudia; Perruolo, Giuseppe; D’Esposito, Vittoria; Cosimato, Vincenzo; Buonerba, Carlo; Di Lorenzo, Giuseppe; Musi, Gennaro; De Cobelli, Ottavio; Chun, Felix K.; Terracciano, Daniela
         2013-01-01
         Many efforts to reduce prostate specific antigen (PSA) overdiagnosis and overtreatment have been made. To this aim, Prostate Health Index (Phi) and Prostate Cancer Antigen 3 (PCA3) have been proposed as new more specific biomarkers. We evaluated the ability of phi and PCA3 to identify prostate cancer (PCa) at initial prostate biopsy in men with total PSA range of 2–10 ng/ml. The performance of phi and PCA3 were evaluated in 300 patients undergoing first prostate biopsy. ROC curve analyses tested the accuracy (AUC) of phi and PCA3 in predicting PCa. Decision curve analyses (DCA) were used to compare the clinical benefit of the two biomarkers. We found that the AUC value of phi (0.77) was comparable to those of %p2PSA (0.76) and PCA3 (0.73) with no significant differences in pairwise comparison (%p2PSA vs phi p = 0.673, %p2PSA vs. PCA3 p = 0.417 and phi vs. PCA3 p = 0.247). These three biomarkers significantly outperformed fPSA (AUC = 0.60), % fPSA (AUC = 0.62) and p2PSA (AUC = 0.63). At DCA, phi and PCA3 exhibited a very close net benefit profile until the threshold probability of 25%, then phi index showed higher net benefit than PCA3. Multivariable analysis showed that the addition of phi and PCA3 to the base multivariable model (age, PSA, %fPSA, DRE, prostate volume) increased predictive accuracy, whereas no model improved single biomarker performance. Finally we showed that subjects with active surveillance (AS) compatible cancer had significantly lower phi and PCA3 values (p<0.001 and p = 0.01, respectively). In conclusion, both phi and PCA3 comparably increase the accuracy in predicting the presence of PCa in total PSA range 2–10 ng/ml at initial biopsy, outperforming currently used %fPSA. PMID:23861782
      

      
      Prostate Health Index (Phi) and Prostate Cancer Antigen 3 (PCA3) significantly improve prostate cancer detection at initial biopsy in a total PSA range of 2-10 ng/ml.
      PubMed
      Ferro, Matteo; Bruzzese, Dario; Perdonà, Sisto; Marino, Ada; Mazzarella, Claudia; Perruolo, Giuseppe; D'Esposito, Vittoria; Cosimato, Vincenzo; Buonerba, Carlo; Di Lorenzo, Giuseppe; Musi, Gennaro; De Cobelli, Ottavio; Chun, Felix K; Terracciano, Daniela
         2013-01-01
         Many efforts to reduce prostate specific antigen (PSA) overdiagnosis and overtreatment have been made. To this aim, Prostate Health Index (Phi) and Prostate Cancer Antigen 3 (PCA3) have been proposed as new more specific biomarkers. We evaluated the ability of phi and PCA3 to identify prostate cancer (PCa) at initial prostate biopsy in men with total PSA range of 2-10 ng/ml. The performance of phi and PCA3 were evaluated in 300 patients undergoing first prostate biopsy. ROC curve analyses tested the accuracy (AUC) of phi and PCA3 in predicting PCa. Decision curve analyses (DCA) were used to compare the clinical benefit of the two biomarkers. We found that the AUC value of phi (0.77) was comparable to those of %p2PSA (0.76) and PCA3 (0.73) with no significant differences in pairwise comparison (%p2PSA vs phi p = 0.673, %p2PSA vs. PCA3 p = 0.417 and phi vs. PCA3 p = 0.247). These three biomarkers significantly outperformed fPSA (AUC = 0.60), % fPSA (AUC = 0.62) and p2PSA (AUC = 0.63). At DCA, phi and PCA3 exhibited a very close net benefit profile until the threshold probability of 25%, then phi index showed higher net benefit than PCA3. Multivariable analysis showed that the addition of phi and PCA3 to the base multivariable model (age, PSA, %fPSA, DRE, prostate volume) increased predictive accuracy, whereas no model improved single biomarker performance. Finally we showed that subjects with active surveillance (AS) compatible cancer had significantly lower phi and PCA3 values (p<0.001 and p = 0.01, respectively). In conclusion, both phi and PCA3 comparably increase the accuracy in predicting the presence of PCa in total PSA range 2-10 ng/ml at initial biopsy, outperforming currently used %fPSA.
      

      
      The phi-meson and Chiral-mass-meson production in heavy-ion collisions as potential probes of quark-gluon-plasma and Chiral symmetry transitions
      NASA Technical Reports Server (NTRS)
      Takahashi, Y.; Eby, P. B.
         1985-01-01
         Possibilities of observing abundances of phi mesons and narrow hadronic pairs, as results of QGP and Chiral transitions, are considered for nucleus-nucleus interactions. Kinematical requirements in forming close pairs are satisfied in K+K decays of S(975) and delta (980) mesons with small phi, and phi (91020) mesons with large PT, and in pi-pi decays of familiar resonance mesons only in a partially restored chiral symmetry. Gluon-gluon dominance in QGP can enhance phi meson production. High hadronization rates of primordial resonance mesons which form narrow hadronic pairs are not implausible. Past cosmic ray evidences of anomalous phi production and narrow pair abundances are considered.
      

      
      Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-core Processors
      DTIC Science & Technology
      
         2009-09-01
         TFLOPS of Playstation 3 (PS3) nodes with IBM Cell Broadband Engine multi-cores and 15 dual-quad Xeon head nodes. The interconnect fabric includes... 4   3. INFORMATION MANAGEMENT FOR PARALLELIZATION AND...STREAMING............................................................. 7  4 . RESULTS
      

      
      IRBIT plays an important role in NHE3-mediated pHi regulation in HSG cells.
      PubMed
      Tran, Tien Manh; Park, Moon-Yong; Lee, Jiyeon; Bae, Jun-Seok; Hwang, Sung-Min; Choi, Se-Young; Mikoshiba, Katsuhiko; Park, Kyungpyo
         2013-07-19
         Expression of inositol-1,4,5-trisphosphate (IP3) receptor-binding protein (IRBIT) has been reported in epithelial cells. However, its role in pHi regulation is not well understood. In this study, we investigated the role of IRBIT in pHi regulation, mediated by Na(+)/H(+) exchangers (NHEs), in salivary glands. We measured pHi recovery from cell acidification in BCECF-loaded salivary HSG cells. Western blot and co-immunoprecipitation (CO-IP) assays were also performed, showing that NHE1, 2 and 3 are expressed, and IRBIT binds to NHE3. HOE642, a specific NHE1 blocker, inhibited pHi recovery, but 40% pH(i) recovery was still observed even at the highest concentration of HOE642. Furthermore, pretreatment of the cells with siIRBIT significantly inhibited pHi recovery, indicating that NHE3 potentially plays a role in pHi recovery as well. The amount of membrane-localized NHE3 and its interaction with IRBIT are also significantly increased by cell acidification. In addition, we found that Ste20p-related proline alanine-rich kinase (SPAK) reverses the effect of IRBIT on membrane NHE3 translocation. Taken together, we conclude that IRBIT plays an important role in pHi regulation, mediated by NHE3, and further regulated by SPAK. Copyright © 2013 Elsevier Inc. All rights reserved.
      

      
      Characterisation of transition state structures for protein folding using 'high', 'medium' and 'low' {Phi}-values.
      PubMed
      Geierhaas, Christian D; Salvatella, Xavier; Clarke, Jane; Vendruscolo, Michele
         2008-03-01
         It has been suggested that Phi-values, which allow structural information about transition states (TSs) for protein folding to be obtained, are most reliably interpreted when divided into three classes (high, medium and low). High Phi-values indicate almost completely folded regions in the TS, intermediate Phi-values regions with a detectable amount of structure and low Phi-values indicate mostly unstructured regions. To explore the extent to which this classification can be used to characterise in detail the structure of TSs for protein folding, we used Phi-values divided into these classes as restraints in molecular dynamics simulations. This type of procedure is related to that used in NMR spectroscopy to define the structure of native proteins from the measurement of inter-proton distances derived from nuclear Overhauser effects. We illustrate this approach by determining the TS ensembles of five proteins and by showing that the results are similar to those obtained by using as restraints the actual numerical Phi-values measured experimentally. Our results indicate that the simultaneous consideration of a set of low-resolution Phi-values can provide sufficient information for characterising the architecture of a TS for folding of a protein.
      

      
      A novel protein tyrosine phosphatase like phytase from Lactobacillus fermentum NKN51: Cloning, characterization and application in mineral release for food technology applications.
      PubMed
      Sharma, Rekha; Kumar, Piyush; Kaushal, Vandana; Das, Rahul; Kumar Navani, Naveen
         2018-02-01
         A novel protein tyrosine phosphatase like phytase (PTPLP), designated as PhyLf from probiotic bacterium Lactobacillus fermentum NKN51 was identified, cloned, expressed and characterized. The recombinant PhyLf showed specific activity of 174.5 U/mg. PhyLf exhibited strict specificity towards phytate and optimum temperature at 60 °C, pH 5.0 and ionic strength of 100 mM. K m and K cat of PhyLf for phytate were 0.773 mM and 84.31 s -1 , respectively. PhyLf exhibited high resistance against oxidative inactivation. PhyLf shares no homology, sans the active site with reported PTLPs, warranting classification as a new subclass. Dephytinization of durum wheat and finger millet under in vitro gastrointestinal conditions using PhyLf enhanced the bioaccessibility of mineral ions. Probiotic origin, phytate specificity, resistance to oxidative environment and gastric milieu coupled with ability to release micronutrients are unique properties of PhyLf which present a strong case for its use in ameliorating nutritional value of cereals and animal feed. Copyright © 2017 Elsevier Ltd. All rights reserved.
      

      
      Student Intern Ben Freed Competes as Finalist in Intel STS Competition, Three Other Interns Named Semifinalists | Poster
      Cancer.gov
      
         
         By Ashley DeVine, Staff Writer Werner H. Kirstin (WHK) student intern Ben Freed was one of 40 finalists to compete in the Intel Science Talent Search (STS) in Washington, DC, in March. “It was seven intense days of interacting with amazing judges and incredibly smart and interesting students. We met President Obama, and then the MIT astronomy lab named minor planets after each
      

      
      Introducing Argonne’s Theta Supercomputer
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      None
         
         Theta, the Argonne Leadership Computing Facility’s (ALCF) new Intel-Cray supercomputer, is officially open to the research community. Theta’s massively parallel, many-core architecture puts the ALCF on the path to Aurora, the facility’s future Intel-Cray system. Capable of nearly 10 quadrillion calculations per second, Theta enables researchers to break new ground in scientific investigations that range from modeling the inner workings of the brain to developing new materials for renewable energy applications.
      

      
      Electronics Industry Study Report: Semiconductors and Defense Electronics
      DTIC Science & Technology
      
         2003-01-01
         Access Memory (DRAM) chips and microprocessors. Samsung , Micron, Hynix, and Infineon control almost three-fourths of the DRAM market,8 while Intel alone...Country 2001 Sales ($B) 2002 Sales ($B) % Change % 2002 Mkt 1 1 Intel U.S. 23.7 24.0 1% 16.9% 2 3 Samsung Semiconductor S. Korea 6.3...located in four major regions: the United States, Europe, Japan, and the Asia-Pacific region (includes South Korea, China, Singapore, Malaysia , Taiwan
      

      
      Perfmon2: a leap forward in performance monitoring
      NASA Astrophysics Data System (ADS)
      Jarp, S.; Jurga, R.; Nowak, A.
         2008-07-01
         This paper describes the software component, perfmon2, that is about to be added to the Linux kernel as the standard interface to the Performance Monitoring Unit (PMU) on common processors, including x86 (AMD and Intel), Sun SPARC, MIPS, IBM Power and Intel Itanium. It also describes a set of tools for doing performance monitoring in practice and details how the CERN openlab team has participated in the testing and development of these tools.
      

      
      Regionalization: The Cure for an Ailing Intelligence Career Field
      DTIC Science & Technology
      
         2013-03-01
         10 To borrow a marketing analogy, brand to brand ( B2B ) marketing is critical when operating in a resource constrained environment. “If left...Intel is the ultimate ingredient brand . It makes zero sales to end consumers, yet Intel built a consumer demand pull for its chips that required...the ultimate ingredient brand , complementary to MI but distinct and in high demand. A Vision for FA34 – The Regionalization Argument Vision is
      

      
      A new parallel-vector finite element analysis software on distributed-memory computers
      NASA Technical Reports Server (NTRS)
      Qin, Jiangning; Nguyen, Duc T.
         1993-01-01
         A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
      

      
      IntellWheels: modular development platform for intelligent wheelchairs.
      PubMed
      Braga, Rodrigo Antonio Marques; Petry, Marcelo; Reis, Luis Paulo; Moreira, António Paulo
         2011-01-01
         Intelligent wheelchairs (IWs) can become an important solution to the challenge of assisting individuals who have disabilities and are thus unable to perform their daily activities using classic powered wheelchairs. This article describes the concept and design of IntellWheels, a modular platform to facilitate the development of IWs through a multiagent system paradigm. In fact, modularity is achieved not only in the software perspective, but also through a generic hardware framework that was designed to fit, in a straightforward manner, almost any commercial powered wheelchair. Experimental results demonstrate the successful integration of all modules in the platform, providing safe motion to the IW. Furthermore, the results achieved with a prototype running in autonomous mode in simulated and mixed-reality environments also demonstrate the potential of our approach. Although some future research is still necessary to fully accomplish our objectives, preliminary tests have shown that IntellWheels will effectively reduce users' limitations, offering them a much more independent life.
      

      
      PASCAL/48 reference manual
      NASA Technical Reports Server (NTRS)
      Knight, J. C.; Hamm, R. W.
         1984-01-01
         PASCAL/48 is a programming language for the Intel MCS-48 series of microcomputers. In particular, it can be used with the Intel 8748. It is designed to allow the programmer to control most of the instructions being generated and the allocation of storage. The language can be used instead of ASSEMBLY language in most applications while allowing the user the necessary degree of control over hardware resources. Although it is called PASCAL/48, the language differs in many ways from PASCAL. The program structure and statements of the two languages are similar, but the expression mechanism and data types are different. The PASCAL/48 cross-compiler is written in PASCAL and runs on the CDC CYBER NOS system. It generates object code in Intel hexadecimal format that can be used to program the MCS-48 series of microcomputers. This reference manual defines the language, describes the predeclared procedures, lists error messages, illustrates use, and includes language syntax diagrams.
      

      
      Ordovician and Silurian Phi Kappa and Trail Creek formations, Pioneer Mountains, central Idaho; stratigraphic and structural revisions, and new data on graptolite faunas
      USGS Publications Warehouse
      Dover, James H.; Berry, William B.N.; Ross, Reuben James
         1980-01-01
         Recent geologic mapping in the northern Pioneer Mountains combined with the identification of graptolites from 116 new collections indicate that the Ordovician and Silurian Phi Kappa and Trail Creek Formations occur in a series of thrust-bounded slices within a broad zone of imbricate thrust faulting. Though confirming a deformational style first reported in a 1963 study by Michael Churkin, our data suggest that the complexity and regional extent of the thrust zone were not previously recognized. Most previously published sections of the Phi Kappa and Trail Creek Formations were measured across unrecognized thrust faults and therefore include not only structural repetitions of graptolitic Ordovician and Silurian rocks but also other tectonically juxtaposed lithostratigraphic units of diverse ages as well. Because of this discovery, the need to reconsider the stratigraphic validity of these formations and their lithology, nomenclature, structural distribution, facies relations, and graptolite faunas has arisen. The Phi Kappa Formation in most thrust slices has internal stratigraphic continuity despite the intensity of deformation to which it was subjected. As revised herein, the Phi Kappa Formation is restricted to a structurally repeated succession of predominantly black, carbonaceous, graptolitic argillite and shale. Some limy, light-gray-weathering shale occurs in the middle part of the section, and fine-grained locally pebbly quartzite is present at the base. The basal quartzite is here named the Basin Gulch Quartzite Member of the Phi Kappa. The Phi Kappa redefined on a lithologic basis represents the span of Ordovician time from W. B. N. Berry's graptolite zones 2-4 through 15 and also includes approximately 17 m of lithologically identical shale of Early and Middle Silurian age at the top. The lower contact of the formation as revised is tectonic. The Phi Kappa is gradationally overlain by the Trail Creek Formation as restricted herein. Most of the coarser clastic rocks reported in previously measured sections of the Phi Kappa, as well as the sequence along Phi Kappa Creek from which the name originates, are excluded from the Phi Kappa as revised and are reassigned to two structural plates of Mississippian Copper Basin Formation; other strata now excluded from the formation are reassigned to the Trail Creek Formation and to an unnamed Silurian and Devonian unit. As redefined, the Phi Kappa Formation is only about 240 m thick, compared with the 3,860 m originally estimated, and it occupies only about 25 percent of the outcrop area previously mapped in 1930 by H. G. Westgate and C. P. Ross. Despite this drastic reduction in thickness and the exclusion of the rocks along Phi Kappa Creek, the name Phi Kappa is retained because of widely accepted prior usage to denote the Ordovician graptolitic shale facies of central Idaho, and because the Phi Kappa Formation as revised is present in thrust slices on Phi Kappa Mountain, at the head of Phi Kappa Creek. The lithic and faunal consistency of this unit throughout the area precludes the necessity for major facies telescoping along individual faults within the outcrop belt. However, tens of kilometers of tectonic shortening seems required to juxtapose the imbricated Phi Kappa shale facies with the Middle Ordovician part of the carbonate and quartzite shale sequence of east central Idaho. The shelf rocks are exposed in the Wildhorse structural window of the northeastern Pioneer Mountains, and attain a thickness of at least 1,500 m throughout the region north and east of the Pioneer Mountains. The Phi Kappa is in direct thrust contact on intensely deformed medium- to high-grade metamorphic equivalents of the same shelf sequence in the Pioneer window at the south end of the Phi Kappa-Trail Creek outcrop belt. Along East Pass, Big Lake, and Pine Creeks, north of the Pioneer Mountains, some rocks previously mapped as Ramshorn Slate are lithologically and faunally equivalent to the P
      

      
      The extraction of Φ – N total cross section from d ( γ , p K + K - ) n
      DOE PAGES
      Qian, X.; Chen, W.; Gao, H.; ...
         2009-10-01
         We report on the first measurement of the differential cross section ofmore » $$\\phi$$-meson photoproduction for the $$d(\\gamma,pK^{+}K^{-})n$$ exclusive reaction channel. The experiment was performed using a \\textcolor{black}{tagged-photon} beam and the CEBAF Large Acceptance Spectrometer (CLAS) at Jefferson Lab. A combined analysis using data from the $$d(\\gamma,pK^{+}K^{-})n$$ channel and those from a previous publication on coherent $$\\phi$$ production on the deuteron has been carried out to extract the $$\\phi-N$$ total cross section, $$\\sigma_{\\phi N}$$. The extracted $$\\phi-N$$ total cross section favors a value above 20 mb. This value is larger than the value extracted using vector-meson dominance models for $$\\phi$$ photoproduction on the proton.« less
      

      
      The Novel Phages phiCD5763 and phiCD2955 Represent Two Groups of Big Plasmidial Siphoviridae Phages of Clostridium difficile.
      PubMed
      Ramírez-Vargas, Gabriel; Goh, Shan; Rodríguez, César
         2018-01-01
         Until recently, Clostridium difficile phages were limited to Myoviruses and Siphoviruses of medium genome length (32-57 kb). Here we report the finding of phiCD5763, a Siphovirus with a large extrachromosomal circular genome (132.5 kb, 172 ORFs) and a large capsid (205.6 ± 25.6 nm in diameter) infecting MLST Clade 1 strains of C. difficile . Two subgroups of big phage genomes similar to phiCD5763 were identified in 32 NAP CR1 /RT012/ST-54 C. difficile isolates from Costa Rica and in whole genome sequences (WGS) of 41 C. difficile isolates of Clades 1, 2, 3, and 4 from Canada, USA, UK, Belgium, Iraq, and China. Through comparative genomics we discovered another putative big phage genome in a non-NAP CR1 isolate from Costa Rica, phiCD2955, which represents other big phage genomes found in 130 WGS of MLST Clade 1 and 2 isolates from Canada, USA, Hungary, France, Austria, and UK. phiCD2955 (131.6 kb, 172 ORFs) is related to a previously reported C. difficile phage genome, phiCD211/phiCDIF1296T. Detailed genome analyses of phiCD5763, phiCD2955, phiCD211/phiCDIF1296T, and seven other putative C. difficile big phage genome sequences of 131-136 kb reconstructed from publicly available WGS revealed a modular gene organization and high levels of sequence heterogeneity at several hotspots, suggesting that these genomes correspond to biological entities undergoing recombination. Compared to other C. difficile phages, these big phages have unique predicted terminase, capsid, portal, neck and tail proteins, receptor binding proteins (RBPs), recombinases, resolvases, primases, helicases, ligases, and hypothetical proteins. Moreover, their predicted gene load suggests a complex regulation of both phage and host functions. Overall, our results indicate that the prevalence of C. difficile big bacteriophages is more widespread than realized and open new avenues of research aiming to decipher how these viral elements influence the biology of this emerging pathogen.
      

        
       
          

«

20
      21
      22
   23
      24
      »

          
        

     

   

   
       
            
              
          

«

21
      22
      23
   24
      25
      »

          
        

           
           
             
               
      
      Release and bioactivity of bone morphogenetic protein-2 are affected by scaffold binding techniques in vitro and in vivo.
      PubMed
      Suliman, Salwa; Xing, Zhe; Wu, Xujun; Xue, Ying; Pedersen, Torbjorn O; Sun, Yang; Døskeland, Anne P; Nickel, Joachim; Waag, Thilo; Lygre, Henning; Finne-Wistrand, Anna; Steinmüller-Nethl, Doris; Krueger, Anke; Mustafa, Kamal
         2015-01-10
         A low dose of 1μg rhBMP-2 was immobilised by four different functionalising techniques on recently developed poly(l-lactide)-co-(ε-caprolactone) [(poly(LLA-co-CL)] scaffolds. It was either (i) physisorbed on unmodified scaffolds [PHY], (ii) physisorbed onto scaffolds modified with nanodiamond particles [nDP-PHY], (iii) covalently linked onto nDPs that were used to modify the scaffolds [nDP-COV] or (iv) encapsulated in microspheres distributed on the scaffolds [MICS]. Release kinetics of BMP-2 from the different scaffolds was quantified using targeted mass spectrometry for up to 70days. PHY scaffolds had an initial burst of release while MICS showed a gradual and sustained increase in release. In contrast, NDP-PHY and nDP-COV scaffolds showed no significant release, although nDP-PHY scaffolds maintained bioactivity of BMP-2. Human mesenchymal stem cells cultured in vitro showed upregulated BMP-2 and osteocalcin gene expression at both week 1 and week 3 in the MICS and nDP-PHY scaffold groups. These groups also demonstrated the highest BMP-2 extracellular protein levels as assessed by ELISA, and mineralization confirmed by Alizarin red. Cells grown on the PHY scaffolds in vitro expressed collagen type 1 alpha 2 early but the scaffold could not sustain rhBMP-2 release to express mineralization. After 4weeks post-implantation using a rat mandible critical-sized defect model, micro-CT and Masson trichrome results showed accelerated bone regeneration in the PHY, nDP-PHY and MICS groups. The results demonstrate that PHY scaffolds may not be desirable for clinical use, since similar osteogenic potential was not seen under both in vitro and in vivo conditions, in contrast to nDP-PHY and MICS groups, where continuous low doses of BMP-2 induced satisfactory bone regeneration in both conditions. The nDP-PHY scaffolds used here in critical-sized bone defects for the first time appear to have promise compared to growth factors adsorbed onto a polymer alone and the short distance effect prevents adverse systemic side effects. Copyright © 2014. Published by Elsevier B.V.
      

      
      Intracellular pH in mammalian stages of Trypanosoma cruzi is K+-dependent and regulated by H+-ATPases.
      PubMed
      Van Der Heyden, N; Docampo, R
         2000-02-05
         Regulation of intracellular pH (pHi) was investigated in Trypanosoma cruzi amastigotes and trypomastigotes using 2',7'-bis-(carboxyethyl)-5(and-6)-carboxyfluorescein (BCECF). pHi was determined to be 7.33 +/- 0.08 and 7.35 +/- 0.07 in amastigotes and trypomastigotes, respectively, and there were no significant differences in the regulation of pH, between the two stages. Steady-state pHi, recovery of pHi from acidification, and H+-efflux were all decreased markedly by the H+-ATPase inhibitors N,N'-dicyclohexylcarbodi-imide (DCCD), diethylstilbestrol (DES) and N-ethylmaleimide (NEM) supporting a significant role for a plasma membrane H+-ATPase in the regulation of pHi. pHi was maintained at neutrality over a range of external pH (pHe) from 5-8 in parasites suspended in a buffer containing Na+ and K+ (standard buffer) but was acidified at low pHe in the absence of these cations (choline buffer). The pHi of trypomastigotes decreased significantly when they transformed into amastigotes. The rate of recovery of pHi by acidified parasites was similar in Na+-free buffer and standard buffer but was slower in the absence of K+ (K+-free or choline buffer) and parasites suspended in choline buffer were acidic by 0.25 pH units as compared with controls. Ba2+ and Cs+ decreased the pHi of parasites suspended in standard but not choline buffer suggesting the presence of an inward directed K+ channel. The pHi of amastigotes and trypomastigotes suspended in Cl(-)-free buffer was decreased by 0.13 and 0.2 pH units, respectively, supporting the presence of a chloride conductive channel. No evidence of pH regulation via a Na+/H+ or Cl-/HCO3- exchanger was found. These results are consistent with the presence of a plasma membrane H+-ATPase that regulates pHi and is supported by K+ and Cl- channels.
      

      
      Medically Unexplained and Explained Physical Symptoms in the General Population: Association with Prevalent and Incident Mental Disorders
      PubMed Central
      van Eck van der Sluijs, Jonna; ten Have, Margreet; Rijnders, Cees; van Marwijk, Harm; de Graaf, Ron; van der Feltz-Cornelis, Christina
         2015-01-01
         Background Clinical studies have shown that Medically Unexplained Symptoms (MUS) are related to common mental disorders. It is unknown how often common mental disorders occur in subjects who have explained physical symptoms (PHY), MUS or both, in the general population, what the incidence rates are, and whether there is a difference between PHY and MUS in this respect. Aim To study the prevalence and incidence rates of mood, anxiety and substance use disorders in groups with PHY, MUS and combined MUS and PHY compared to a no-symptoms reference group in the general population. Method Data were derived from the Netherlands Mental Health Survey and Incidence Study-2 (NEMESIS-2), a nationally representative face-to-face survey of the general population aged 18-64 years. We selected subjects with explained physical symptoms only (n=1952), with MUS only (n=177), with both MUS and PHY (n=209), and a reference group with no physical symptoms (n=4168). The assessment of common mental disorders was through the Composite International Diagnostic Interview 3.0. Multivariate logistic regression analyses were used to examine the association between group membership and the prevalence and first-incidence rates of comorbid mental disorders, adjusted for socio-demographic characteristics. Results MUS were associated with the highest prevalence rates of mood and anxiety disorders, and combined MUS and PHY with the highest prevalence rates of substance disorder. Combined MUS and PHY were associated with a higher incidence rate of mood disorder only (OR 2.9 (95%CI:1.27,6.74)). Conclusion In the general population, PHY, MUS and the combination of both are related to mood and anxiety disorder, but odds are highest for combined MUS and PHY in relation to substance use disorder. Combined MUS and PHY are related to a greater incidence of mood disorder. These findings warrant further research into possibilities to improve recognition and early intervention in subjects with combined MUS and PHY. PMID:25853676
      

      
      Specific binding of (/sup 3/H-Tyr8)physalaemin to rat submaxillary gland substance P receptor
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Bahouth, S.W.; Lazaro, D.M.; Brundish, D.E.
         1985-01-01
         (/sup 3/H)Physalaemin ((/sup 3/H)PHY) binds to a single class of noninteracting sites on rat submaxillary gland membranes suspended in high ionic strength media with a KD of 2.7 nM, a Bmax of 240 fmol/mg of protein, and low nonspecific binding. The relative potencies of substance P (SP) and its fragments in competing with (/sup 3/H)PHY correlate with their relative salivation potencies. This indicates that (/sup 3/H)PHY interacts with a physiologically relevant SP receptor. In low ionic strength media, the KD of (/sup 3/H)PHY does not change, but SP and some of its fragments are more potent than PHY in competingmore » with (/sup 3/H) PHY. Computer-assisted analysis of (/sup 3/H)PHY and (/sup 3/H)SP binding in high and low ionic strength media demonstrated that both peptides are equipotent in high ionic strength but that the affinity of SP increases by 70-fold in low ionic strength. The SP fragments that contain a basic residue in positions 1 and/or 3 also display an increased affinity in low ionic strength. These findings document that (/sup 3/H)PHY binding in high ionic strength (mu . 0.6) accurately reflects the pharmacological potencies of agonists on the SP-P receptor. The binding of (/sup 3/H)PHY, like that of (/sup 3/H)SP, increases by the addition of divalent cations (Mg2+ greater than Ca2+ greater than Mn2+). Guanine nucleotides decrease (/sup 3/H)PHY binding by decreasing the Bmax to the same level (160 fmol/mg of protein), in the presence or absence of Mg2+.« less
      

      
      Evolutionary divergence of phytochrome protein function in Zea mays PIF3 signaling.
      PubMed
      Kumar, Indrajit; Swaminathan, Kankshita; Hudson, Karen; Hudson, Matthew E
         2016-07-01
         Two maize phytochrome-interacting factor (PIF) basic helix-loop-helix (bHLH) family members, ZmPIF3.1 and ZmPIF3.2, were identified, cloned and expressed in vitro to investigate light-signaling interactions. A phylogenetic analysis of sequences of the maize bHLH transcription factor gene family revealed the extent of the PIF family, and a total of seven predicted PIF-encoding genes were identified from genes encoding bHLH family VIIa/b proteins in the maize genome. To investigate the role of maize PIFs in phytochrome signaling, full-length cDNAs for phytochromes PhyA2, PhyB1, PhyB2 and PhyC1 from maize were cloned and expressed in vitro as chromophorylated holophytochromes. We showed that ZmPIF3.1 and ZmPIF3.2 interact specifically with the Pfr form of maize holophytochrome B1 (ZmphyB1), showing no detectable affinity for the Pr form. Maize holophytochrome B2 (ZmphyB2) showed no detectable binding affinity for PIFs in either Pr or Pfr forms, but phyB Pfr from Arabidopsis interacted with ZmPIF3.1 similarly to ZmphyB1 Pfr. We conclude that subfunctionalization at the protein-protein interaction level has altered the role of phyB2 relative to that of phyB1 in maize. Since the phyB2 mutant shows photomorphogenic defects, we conclude that maize phyB2 is an active photoreceptor, without the binding of PIF3 seen in other phyB family proteins. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.
      

      
      Plasma HIV viral rebound following protocol-indicated cessation of ART commenced in primary and chronic HIV infection.
      PubMed
      Hamlyn, Elizabeth; Ewings, Fiona M; Porter, Kholoud; Cooper, David A; Tambussi, Giuseppe; Schechter, Mauro; Pedersen, Court; Okulicz, Jason F; McClure, Myra; Babiker, Abdel; Weber, Jonathan; Fidler, Sarah
         2012-01-01
         The magnitude of HIV viral rebound following ART cessation has consequences for clinical outcome and onward transmission. We compared plasma viral load (pVL) rebound after stopping ART initiated in primary (PHI) and chronic HIV infection (CHI). Two populations with protocol-indicated ART cessation from SPARTAC (PHI, n = 182) and SMART (CHI, n = 1450) trials. Time for pVL to reach pre-ART levels after stopping ART was assessed in PHI using survival analysis. Differences in pVL between PHI and CHI populations 4 weeks after stopping ART were examined using linear and logistic regression. Differences in pVL slopes up to 48 weeks were examined using linear mixed models and viral burden was estimated through a time-averaged area-under-pVL curve. CHI participants were categorised by nadir CD4 at ART stop. Of 171 PHI participants, 71 (41.5%) rebounded to pre-ART pVL levels, at a median of 50 (95% CI 48-51) weeks after stopping ART. Four weeks after stopping treatment, although the proportion with pVL ≥ 400 copies/ml was similar (78% PHI versus 79% CHI), levels were 0.45 (95% CI 0.26-0.64) log(10) copies/ml lower for PHI versus CHI, and remained lower up to 48 weeks. Lower CD4 nadir in CHI was associated with higher pVL after ART stop. Rebound for CHI participants with CD4 nadir >500 cells/mm(3) was comparable to that experienced by PHI participants. Stopping ART initiated in PHI and CHI was associated with viral rebound to levels conferring increased transmission risk, although the level of rebound was significantly lower and sustained in PHI compared to CHI.
      

      
      A Systematic Review and Meta-analysis of the Diagnostic Accuracy of Prostate Health Index and 4-Kallikrein Panel Score in Predicting Overall and High-grade Prostate Cancer.
      PubMed
      Russo, Giorgio Ivan; Regis, Federica; Castelli, Tommaso; Favilla, Vincenzo; Privitera, Salvatore; Giardina, Raimondo; Cimino, Sebastiano; Morgia, Giuseppe
         2017-08-01
         Markers for prostate cancer (PCa) have progressed over recent years. In particular, the prostate health index (PHI) and the 4-kallikrein (4K) panel have been demonstrated to improve the diagnosis of PCa. We aimed to review the diagnostic accuracy of PHI and the 4K panel for PCa detection. We performed a systematic literature search of PubMed, EMBASE, Cochrane, and Academic One File databases until July 2016. We included diagnostic accuracy studies that used PHI or 4K panel for the diagnosis of PCa or high-grade PCa. The methodological quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Twenty-eight studies including 16,762 patients have been included for the analysis. The pooled data showed a sensitivity of 0.89 and 0.74 for PHI and 4K panel, respectively, for PCa detection and a pooled specificity of 0.34 and 0.60 for PHI and 4K panel, respectively. The derived area under the curve (AUC) from the hierarchical summary receiver operating characteristic (HSROC) showed an accuracy of 0.76 and 0.72 for PHI and 4K panel respectively. For high-grade PCa detection, the pooled sensitivity was 0.93 and 0.87 for PHI and 4K panel, respectively, whereas the pooled specificity was 0.34 and 0.61 for PHI and 4K panel, respectively. The derived AUC from the HSROC showed an accuracy of 0.82 and 0.81 for PHI and 4K panel, respectively. Both PHI and the 4K panel provided good diagnostic accuracy in detecting overall and high-grade PCa. Copyright © 2016 Elsevier Inc. All rights reserved.
      

      
      An Integrative Model for Phytochrome B Mediated Photomorphogenesis: From Protein Dynamics to Physiology
      PubMed Central
      Kircher, Stefan; Kirchenbauer, Daniel; Timmer, Jens; Nagy, Ferenc; Schäfer, Eberhard; Fleck, Christian
         2010-01-01
         Background Plants have evolved various sophisticated mechanisms to respond and adapt to changes of abiotic factors in their natural environment. Light is one of the most important abiotic environmental factors and it regulates plant growth and development throughout their entire life cycle. To monitor the intensity and spectral composition of the ambient light environment, plants have evolved multiple photoreceptors, including the red/far-red light-sensing phytochromes. Methodology/Principal Findings We have developed an integrative mathematical model that describes how phytochrome B (phyB), an essential receptor in Arabidopsis thaliana, controls growth. Our model is based on a multiscale approach and connects the mesoscopic intracellular phyB protein dynamics to the macroscopic growth phenotype. To establish reliable and relevant parameters for the model phyB regulated growth we measured: accumulation and degradation, dark reversion kinetics and the dynamic behavior of different nuclear phyB pools using in vivo spectroscopy, western blotting and Fluorescence Recovery After Photobleaching (FRAP) technique, respectively. Conclusions/Significance The newly developed model predicts that the phyB-containing nuclear bodies (NBs) (i) serve as storage sites for phyB and (ii) control prolonged dark reversion kinetics as well as partial reversibility of phyB Pfr in extended darkness. The predictive power of this mathematical model is further validated by the fact that we are able to formalize a basic photobiological observation, namely that in light-grown seedlings hypocotyl length depends on the total amount of phyB. In addition, we demonstrate that our theoretical predictions are in excellent agreement with quantitative data concerning phyB levels and the corresponding hypocotyl lengths. Hence, we conclude that the integrative model suggested in this study captures the main features of phyB-mediated photomorphogenesis in Arabidopsis. PMID:20502669
      

      
      A search for a doubly-charged Higgs boson in pp collisions at $$\\sqrt{s} = 7 \\ \\mbox{TeV}$$
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Chatrchyan, S.; Khachatryan, V.; Sirunyan, A. M.
         
         A search for a doubly-charged Higgs boson in pp collisions atmore » $$ \\sqrt{s}=7 $$ TeV is presented. The data correspond to an integrated luminosity of 4.9 inverse femtobarns, collected by the CMS experiment at the LHC. The search is performed using events with three or more isolated charged leptons of any flavor, giving sensitivity to the decays of pair-produced triplet components Phi[++]Phi[--], and Phi[++]Phi[-] from associated production. No excess is observed compared to the background prediction, and upper limits at the 95% confidence level are set on the Phi[++] production cross section, under specific assumptions on its branching fractions. Lower bounds on the Phi[++] mass are reported, providing significantly more stringent constraints than previously published limits.« less
      

      
      Effects of intracellular pH on the mitotic apparatus and mitotic stage in the sand dollar egg.
      PubMed
      Watanabe, K; Hamaguchi, M S; Hamaguchi, Y
         1997-01-01
         The effect of change in intracellular pH (pHi) on mitosis was investigated in the sand dollar egg. The pHi in the fertilized egg of Scaphechinus mirabilis and Clypeaster japonicus, which was 7.34 and 7.31, respectively, changed by means of treating the egg at nuclear envelope breakdown with sea water containing acetate and/or ammonia at various values of pH. The mitotic apparatus at pHi 6.70 became larger than that of normal fertilized eggs; that is, the mitotic spindle had the maximal size, especially in length at pHi 6.70. The spindle length linearly decreased when pHi increased from 6.70 to 7.84. By polarization microscopy, the increase in birefringence retardation was detected at slightly acidic pHi, suggesting that the increase in size of the spindle is caused by the increase in the amount of microtubules in the spindle. At pHi 6.30, the organization of the mitotic apparatus was inhibited. Furthermore, slightly acidic pHi caused cleavage retardation or inhibition. By counting the number of the eggs at various mitotic stages with time after treating them with the media, it is found that metaphase was persistent and most of the S. mirabilis eggs were arrested at metaphase under the condition of pHi 6.70. It is concluded that at slightly acidic pH, the microtubules in the spindle are stabilized and more microtubules assembled than those in the normal eggs.
      

      
      Multisite Light-Induced Phosphorylation of the Transcription Factor PIF3 Is Necessary for Both Its Rapid Degradation and Concomitant Negative Feedback Modulation of Photoreceptor phyB Levels in Arabidopsis[C][W
      PubMed Central
      Ni, Weimin; Xu, Shou-Ling; Chalkley, Robert J.; Pham, Thao Nguyen D.; Guan, Shenheng; Maltby, Dave A.; Burlingame, Alma L.; Wang, Zhi-Yong; Quail, Peter H.
         2013-01-01
         Plants constantly monitor informational light signals using sensory photoreceptors, which include the phytochrome (phy) family (phyA to phyE), and adjust their growth and development accordingly. Following light-induced nuclear translocation, photoactivated phy molecules bind to and induce rapid phosphorylation and degradation of phy-interacting basic Helix Loop Helix (bHLH) transcription factors (PIFs), such as PIF3, thereby regulating the expression of target genes. However, the mechanisms underlying the signal-relay process are still not fully understood. Here, using mass spectrometry, we identify multiple, in vivo, light-induced Ser/Thr phosphorylation sites in PIF3. Using transgenic expression of site-directed mutants of PIF3, we provide evidence that a set of these phosphorylation events acts collectively to trigger rapid degradation of the PIF3 protein in response to initial exposure of dark-grown seedlings to light. In addition, we show that phyB-induced PIF3 phosphorylation is also required for the known negative feedback modulation of phyB levels in prolonged light, potentially through codegradation of phyB and PIF3. This mutually regulatory intermolecular transaction thus provides a mechanism with the dual capacity to promote early, graded, or threshold regulation of the primary, PIF3-controlled transcriptional network in response to initial light exposure, and later, to attenuate global sensitivity to the light signal through reductions in photoreceptor levels upon prolonged exposure. PMID:23903316
      

      
      Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
      DOE PAGES
      Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles; ...
         2018-05-05
         Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consistsmore » of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.« less
      

      
      Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Gawande, Nitin A.; Daily, Jeff A.; Siegel, Charles
         
         Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consistsmore » of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.« less
      

      
      Recent advances in PC-Linux systems for electronic structure computations by optimized compilers and numerical libraries.
      PubMed
      Yu, Jen-Shiang K; Yu, Chin-Hui
         2002-01-01
         One of the most frequently used packages for electronic structure research, GAUSSIAN 98, is compiled on Linux systems with various hardware configurations, including AMD Athlon (with the "Thunderbird" core), AthlonMP, and AthlonXP (with the "Palomino" core) systems as well as the Intel Pentium 4 (with the "Willamette" core) machines. The default PGI FORTRAN compiler (pgf77) and the Intel FORTRAN compiler (ifc) are respectively employed with different architectural optimization options to compile GAUSSIAN 98 and test the performance improvement. In addition to the BLAS library included in revision A.11 of this package, the Automatically Tuned Linear Algebra Software (ATLAS) library is linked against the binary executables to improve the performance. Various Hartree-Fock, density-functional theories, and the MP2 calculations are done for benchmarking purposes. It is found that the combination of ifc with ATLAS library gives the best performance for GAUSSIAN 98 on all of these PC-Linux computers, including AMD and Intel CPUs. Even on AMD systems, the Intel FORTRAN compiler invariably produces binaries with better performance than pgf77. The enhancement provided by the ATLAS library is more significant for post-Hartree-Fock calculations. The performance on one single CPU is potentially as good as that on an Alpha 21264A workstation or an SGI supercomputer. The floating-point marks by SpecFP2000 have similar trends to the results of GAUSSIAN 98 package.
      

      
      Elevated Prostate Health Index (phi) and Biopsy Reclassification During Active Surveillance of Prostate Cancer.
      PubMed
      Andreas, Darian; Tosoian, Jeffrey J; Landis, Patricia; Wolf, Sacha; Glavaris, Stephanie; Lotan, Tamara L; Schaeffer, Edward M; Sokoll, Lori J; Ross, Ashley E
         2016-07-01
         The Prostate Health Index (phi) has been FDA approved for decision-making regarding prostate biopsy. Phi has additionally been shown to positively correlate with tumor volume, extraprostatic disease and higher Gleason grade tumors. Here we describe a case in which an elevated phi encouraged biopsy of a gentleman undergoing active surveillance leading to reclassification of his disease as high risk prostate cancer.
      

      
      Dye to use with virus challenge for testing barrier materials.
      PubMed Central
      Lytle, C D; Felten, R P; Truscott, W
         1991-01-01
         Can FD&C Blue no. 1 dye photoinactivate bacteriophages phi X174, T7, PRD1, and phi 6 under laboratory lighting conditions? At high levels of light, the dye (500 microM) photoinactivated only phi 6. Thus, this dye can be used at concentrations up to 500 microM with bacteriophages phi X174, T7, and PRD1 to test barrier material integrity. PMID:1872612
      

      
      Tiny abortive initiation transcripts exert antitermination activity on an RNA hairpin-dependent intrinsic terminator.
      PubMed
      Lee, Sooncheol; Nguyen, Huong Minh; Kang, Changwon
         2010-10-01
         No biological function has been identified for tiny RNA transcripts that are abortively and repetitiously released from initiation complexes of RNA polymerase in vitro and in vivo to date. In this study, we show that abortive initiation affects termination in transcription of bacteriophage T7 gene 10. Specifically, abortive transcripts produced from promoter phi 10 exert trans-acting antitermination activity on terminator T phi both in vitro and in vivo. Following abortive initiation cycling of T7 RNA polymerase at phi 10, short G-rich and oligo(G) RNAs were produced and both specifically sequestered 5- and 6-nt C + U stretch sequences, consequently interfering with terminator hairpin formation. This antitermination activity depended on sequence-specific hybridization of abortive transcripts with the 5' but not 3' half of T phi RNA. Antitermination was abolished when T phi was mutated to lack a C + U stretch, but restored when abortive transcript sequence was additionally modified to complement the mutation in T phi, both in vitro and in vivo. Antitermination was enhanced in vivo when the abortive transcript concentration was increased via overproduction of RNA polymerase or ribonuclease deficiency. Accordingly, antitermination activity exerted on T phi by abortive transcripts should facilitate expression of T phi-downstream promoter-less genes 11 and 12 in T7 infection of Escherichia coli.
      

      
      Estimation of Reconnection Flux Using Post-Eruption Arcades and Its Relevance to Magnetic Clouds at 1 AU
      NASA Technical Reports Server (NTRS)
      Gopalswamy, N.; Yashiro, S.; Akiyama, S.; Xie, H.
         2017-01-01
         We report on a new method to compute the flare reconnection (RC) flux from post-eruption arcades (PEAs) and the underlying photospheric magnetic fields. In previous works, the RC flux has been computed using the cumulative flare ribbon area. Here we obtain the RC flux as the flux in half of the area underlying the PEA in EUV imaged after the flare maximum. We apply this method to a set of 21 eruptions that originated near the solar disk center in Solar Cycle 23. We find that the RC flux from the arcade method ((Phi)rA) has excellent agreement with the flux from the flare-ribbon method ((Phi)rR) according to (Phi)rA = 1.24((Phi)rR)(sup 0.99). We also find (Phi)rA to be correlated with the poloidal flux ((Phi)P) of the associated magnetic cloud at 1 AU: (Phi)P = 1.20((Phi)rA)(sup 0.85). This relation is nearly identical to that obtained by Qiu et al. (Astrophys. J. 659, 758, 2007) using a set of only 9 eruptions. Our result supports the idea that flare reconnection results in the formation of the flux rope and PEA as a common process.
      

      
      Role of phi cells and the endodermis under salt stress in Brassica oleracea.
      PubMed
      Fernandez-Garcia, N; Lopez-Perez, L; Hernandez, M; Olmos, E
         2009-01-01
         Phi cell layers were discovered in the 19th century in a small number of species, including members of the Brassicaceae family. A mechanical role was first suggested for this structure; however, this has never been demonstrated. The main objective of the present work was to analyse the ultrastructure of phi cells, their influence on ion movement from the cortex to the stele, and their contribution to salt stress tolerance in Brassica oleracea. Transmission electron microscopy and X-ray microanalysis studies were used to analyse the subcellular structure and distribution of ions in phi cells and the endodermis under salt stress. Ion movement was analysed using lanthanum as an apoplastic tracer. The ultrastructural results confirm that phi cells are specialized cells showing cell wall ingrowths in the inner tangential cell walls. X-ray microanalysis confirmed a build-up of sodium. Phi thickenings were lignified and lanthanum moved periplasmically at this level. To the best of our knowledge, this is the first study reporting the possible role of the phi cells as a barrier controlling the movement of ions from the cortex to the stele. Therefore, the phi cell layer and endodermis seem to be regulating ion transport in Brassica oleracea under salt stress.
      

      
      Yellow fluorescent protein phiYFPv (Phialidium): structure and structure-based mutagenesis
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Pletneva, Nadya V.; Pletnev, Vladimir Z., E-mail: vzpletnev@gmail.com; Souslova, Ekaterina
         
         The yellow fluorescent protein phiYFPv with improved folding has been developed from the spectrally identical wild-type phiYFP found in the marine jellyfish Phialidium. The yellow fluorescent protein phiYFPv (λ{sub em}{sup max} ≃ 537 nm) with improved folding has been developed from the spectrally identical wild-type phiYFP found in the marine jellyfish Phialidium. The latter fluorescent protein is one of only two known cases of naturally occurring proteins that exhibit emission spectra in the yellow–orange range (535–555 nm). Here, the crystal structure of phiYFPv has been determined at 2.05 Å resolution. The ‘yellow’ chromophore formed from the sequence triad Thr65-Tyr66-Gly67 adoptsmore » the bicyclic structure typical of fluorophores emitting in the green spectral range. It was demonstrated that perfect antiparallel π-stacking of chromophore Tyr66 and the proximal Tyr203, as well as Val205, facing the chromophore phenolic ring are chiefly responsible for the observed yellow emission of phiYFPv at 537 nm. Structure-based site-directed mutagenesis has been used to identify the key functional residues in the chromophore environment. The obtained results have been utilized to improve the properties of phiYFPv and its homologous monomeric biomarker tagYFP.« less
      

        
       
          

«

21
      22
      23
   24
      25
      »

          
        

     

   

   
       
            
              
          

«

21
      22
      23
      24
   25
      »

          
        

           
           
             
               
      
      Intracellular pH in sperm physiology.
      PubMed
      Nishigaki, Takuya; José, Omar; González-Cota, Ana Laura; Romero, Francisco; Treviño, Claudia L; Darszon, Alberto
         2014-08-01
         Intracellular pH (pHi) regulation is essential for cell function. Notably, several unique sperm ion transporters and enzymes whose elimination causes infertility are either pHi dependent or somehow related to pHi regulation. Amongst them are: CatSper, a Ca(2+) channel; Slo3, a K(+) channel; the sperm-specific Na(+)/H(+) exchanger and the soluble adenylyl cyclase. It is thus clear that pHi regulation is of the utmost importance for sperm physiology. This review briefly summarizes the key components involved in pHi regulation, their characteristics and participation in fundamental sperm functions such as motility, maturation and the acrosome reaction. Copyright © 2014 Elsevier Inc. All rights reserved.
      

      
      On character amenability of Banach algebras
      NASA Astrophysics Data System (ADS)
      Kaniuth, E.; Lau, A. T.; Pym, J.
         2008-08-01
         We continue our work [E. Kaniuth, A.T. Lau, J. Pym, On [phi]-amenability of Banach algebras, Math. Proc. Cambridge Philos. Soc. 144 (2008) 85-96] in the study of amenability of a Banach algebra A defined with respect to a character [phi] of A. Various necessary and sufficient conditions of a global and a pointwise nature are found for a Banach algebra to possess a [phi]-mean of norm 1. We also completely determine the size of the set of [phi]-means for a separable weakly sequentially complete Banach algebra A with no [phi]-mean in A itself. A number of illustrative examples are discussed.
      

      
      The genome of the Erwinia amylovora phage PhiEaH1 reveals greater diversity and broadens the applicability of phages for the treatment of fire blight.
      PubMed
      Meczker, Katalin; Dömötör, Dóra; Vass, János; Rákhely, Gábor; Schneider, György; Kovács, Tamás
         2014-01-01
         The enterobacterium Erwinia amylovora is the causal agent of fire blight. This study presents the analysis of the complete genome of phage PhiEaH1, isolated from the soil surrounding an E. amylovora-infected apple tree in Hungary. Its genome is 218 kb in size, containing 244 ORFs. PhiEaH1 is the second E. amylovora infecting phage from the Siphoviridae family whose complete genome sequence was determined. Beside PhiEaH2, PhiEaH1 is the other active component of Erwiphage, the first bacteriophage-based pesticide on the market against E. amylovora. Comparative genome analysis in this study has revealed that PhiEaH1 not only differs from the 10 formerly sequenced E. amylovora bacteriophages belonging to other phage families, but also from PhiEaH2. Sequencing of more Siphoviridae phage genomes might reveal further diversity, providing opportunities for the development of even more effective biological control agents, phage cocktails against Erwinia fire blight disease of commercial fruit crops.
      

      
      [PHI regulates histone methylation and acetylation in Burkitt lymphoma Daudi cell line].
      PubMed
      Hong, Ling-Ling; Ma, Xu-Dong; Huang, Yi-Qun
         2011-02-01
         This study was purposed to investigate the effects of phenylhexyl isothiocyanate (PHI) on Burkitt lymphoma Daudi cell line and regulation of histone acetylation and methylation in Daudi cells, and to explore the potential mechanism. The apoptotic rate of Daudi cells treated with PHI was measured by flow cytometry, the changes of histone H3 and H4 acetylation, histone H3K9 and H3K4 methylation in Daudi cells treated with PHI were detected by Western blot. The results showed that PHI could induce apoptosis of Daudi cells, increased the acetylation level of H3 and H4, enhanced the methylation of H3K4, but reduced the methylation of H3K9. It is concluded that the PHI can up-regulate the acetylation level of histone H3 associated with transcription stimulation and the methylation of histone H3K4, down-regulate the methylation on histone H3K9 associated with transcription inhibition, promotes the apoptosis of Daudi cells. PHI may be a potential agent for target therapy of lymphoma.
      

      
      The life of phi: the development of phi thickenings in roots of the orchids of the genus Miltoniopsis.
      PubMed
      Idris, Nurul A; Collings, David A
         2015-02-01
         Phi thickenings, bands of secondary wall thickenings that reinforce the primary wall of root cortical cells in a wide range of species, are described for the first time in the epiphytic orchid Miltoniopsis. As with phi thickenings found in other plants, the phi thickenings in Miltoniopsis contain highly aligned cellulose running along the lengths of the thickenings, and are lignified but not suberized. Using a combination of histological and immunocytochemical techniques, thickening development can be categorized into three different stages. Microtubules align lengthwise along the thickening during early and intermediate stages of development, and callose is deposited within the thickening in a pattern similar to the microtubules. These developing thickenings also label with the fluorescently tagged lectin wheat germ agglutinin (WGA). These associations with microtubules and callose, and the WGA labeling, all disappear when the phi thickenings are mature. This pattern of callose and WGA deposition show changes in the thickened cell wall composition and may shed light on the function of phi thickenings in plant roots, a role for which has yet to be established.
      

      
      Genome packaging in EL and Lin68, two giant phiKZ-like bacteriophages of P. aeruginosa.
      PubMed
      Sokolova, O S; Shaburova, O V; Pechnikova, E V; Shaytan, A K; Krylov, S V; Kiselev, N A; Krylov, V N
         2014-11-01
         A unique feature of the Pseudomonas aeruginosa giant phage phiKZ is its way of genome packaging onto a spool-like protein structure, the inner body. Until recently, no similar structures have been detected in other phages. We have studied DNA packaging in P. aeruginosa phages EL and Lin68 using cryo-electron microscopy and revealed the presence of inner bodies. The shape and positioning of the inner body and the density of the DNA packaging in EL are different from those found in phiKZ and Lin68. This internal organization explains how the shorter EL genome is packed into a large EL capsid, which has the same external dimensions as the capsids of phiKZ and Lin68. The similarity in the structural organization in EL and other phiKZ-like phages indicates that EL is phylogenetically related to other phiKZ-like phages, and, despite the lack of detectable DNA homology, EL, phiKZ, and Lin68 descend from a common ancestor. Copyright © 2014 Elsevier Inc. All rights reserved.
      

      
      Corneal power evaluation after myopic corneal refractive surgery using artificial neural networks.
      PubMed
      Koprowski, Robert; Lanza, Michele; Irregolare, Carlo
         2016-11-15
         Efficacy and high availability of surgery techniques for refractive defect correction increase the number of patients who undergo to this type of surgery. Regardless of that, with increasing age, more and more patients must undergo cataract surgery. Accurate evaluation of corneal power is an extremely important element affecting the precision of intraocular lens (IOL) power calculation and errors in this procedure could affect quality of life of patients and satisfaction with the service provided. The available device able to measure corneal power have been tested to be not reliable after myopic refractive surgery. Artificial neural networks with error backpropagation and one hidden layer were proposed for corneal power prediction. The article analysed the features acquired from the Pentacam HR tomograph, which was necessary to measure the corneal power. Additionally, several billion iterations of artificial neural networks were conducted for several hundred simulations of different network configurations and different features derived from the Pentacam HR. The analysis was performed on a PC with Intel ® Xeon ® X5680 3.33 GHz CPU in Matlab ® Version 7.11.0.584 (R2010b) with Signal Processing Toolbox Version 7.1 (R2010b), Neural Network Toolbox 7.0 (R2010b) and Statistics Toolbox (R2010b). A total corneal power prediction error was obtained for 172 patients (113 patients forming the training set and 59 patients in the test set) with an average age of 32 ± 9.4 years, including 67% of men. The error was at an average level of 0.16 ± 0.14 diopters and its maximum value did not exceed 0.75 dioptres. The Pentacam parameters (measurement results) providing the above result are tangential anterial/posterior. The corneal net power and equivalent k-reading power. The analysis time for a single patient (a single eye) did not exceed 0.1 s, whereas the time of network training was about 3 s for 1000 iterations (the number of neurons in the hidden layer was 400).
      

      
      Large-scale high density 3D AMT for mineral exploration — A case history from volcanic massive sulfide Pb-Zn deposit with 2000 AMT sites
      NASA Astrophysics Data System (ADS)
      Chen, R.; Chen, S.; He, L.; Yao, H.; Li, H.; Xi, X.; Zhao, X.
         2017-12-01
         EM method plays a key role in volcanic massive sulfide (VMS) deposit which is with high grade and high economic value. However, the performance of high density 3D AMT in detecting deep concealed VMS targets is not clear. The size of a typical VMS target is less than 100 m x 100 m x 50 m, it's a challenge task to find it with large depth. We carried a test in a VMS Pb-Zn deposit using high density 3D AMT with site spacing as 20 m and profile spacing as 40 - 80 m. About 2000 AMT sites were acquired in an area as 2000 m x 1500 m. Then we used a sever with 8 CPUs (Intel Xeon E7-8880 v3, 2.3 GHz, 144 cores), 2048 GB RAM, and 40 TB disk array to invert above 3D AMT sites using integral equation forward modeling and re-weighted conjugated-gradient inversion. The depth of VMS ore body is about 600 m and the size of the ore body is about 100 x 100 x 20m with dip angle about 45 degree. We finds that it's very hard to recover the location and shape of the ore body by 3D AMT inversion even using the data of all AMT sites and frequencies. However, it's possible to recover the location and shape of the deep concealed ore body if we adjust the inversion parameters carefully. A new set of inversion parameter needs to be find for high density 3D AMT data set and the inversion parameters working good for Dublin Secret Model II (DSM 2) is not suitable for our real data. This problem may be caused by different data density and different number of frequency. We find a set of good inversion parameter by comparing the shape and location of ore body with inversion result and trying different inversion parameters. And the application of new inversion parameter in nearby area with high density AMT sites shows that the inversion result is improved greatly.
      

      
      Dosimetric verification and clinical evaluation of a new commercially available Monte Carlo-based dose algorithm for application in stereotactic body radiation therapy (SBRT) treatment planning
      NASA Astrophysics Data System (ADS)
      Fragoso, Margarida; Wen, Ning; Kumar, Sanath; Liu, Dezhi; Ryu, Samuel; Movsas, Benjamin; Munther, Ajlouni; Chetty, Indrin J.
         2010-08-01
         Modern cancer treatment techniques, such as intensity-modulated radiation therapy (IMRT) and stereotactic body radiation therapy (SBRT), have greatly increased the demand for more accurate treatment planning (structure definition, dose calculation, etc) and dose delivery. The ability to use fast and accurate Monte Carlo (MC)-based dose calculations within a commercial treatment planning system (TPS) in the clinical setting is now becoming more of a reality. This study describes the dosimetric verification and initial clinical evaluation of a new commercial MC-based photon beam dose calculation algorithm, within the iPlan v.4.1 TPS (BrainLAB AG, Feldkirchen, Germany). Experimental verification of the MC photon beam model was performed with film and ionization chambers in water phantoms and in heterogeneous solid-water slabs containing bone and lung-equivalent materials for a 6 MV photon beam from a Novalis (BrainLAB) linear accelerator (linac) with a micro-multileaf collimator (m3 MLC). The agreement between calculated and measured dose distributions in the water phantom verification tests was, on average, within 2%/1 mm (high dose/high gradient) and was within ±4%/2 mm in the heterogeneous slab geometries. Example treatment plans in the lung show significant differences between the MC and one-dimensional pencil beam (PB) algorithms within iPlan, especially for small lesions in the lung, where electronic disequilibrium effects are emphasized. Other user-specific features in the iPlan system, such as options to select dose to water or dose to medium, and the mean variance level, have been investigated. Timing results for typical lung treatment plans show the total computation time (including that for processing and I/O) to be less than 10 min for 1-2% mean variance (running on a single PC with 8 Intel Xeon X5355 CPUs, 2.66 GHz). Overall, the iPlan MC algorithm is demonstrated to be an accurate and efficient dose algorithm, incorporating robust tools for MC-based SBRT treatment planning in the routine clinical setting.
      

      
      Developing Workforce Capacity in Public Health Informatics: Core Competencies and Curriculum Design
      PubMed Central
      Wholey, Douglas R.; LaVenture, Martin; Rajamani, Sripriya; Kreiger, Rob; Hedberg, Craig; Kenyon, Cynthia
         2018-01-01
         We describe a master’s level public health informatics (PHI) curriculum to support workforce development. Public health decision-making requires intensive information management to organize responses to health threats and develop effective health education and promotion. PHI competencies prepare the public health workforce to design and implement these information systems. The objective for a Master’s and Certificate in PHI is to prepare public health informaticians with the competencies to work collaboratively with colleagues in public health and other health professions to design and develop information systems that support population health improvement. The PHI competencies are drawn from computer, information, and organizational sciences. A curriculum is proposed to deliver the competencies and result of a pilot PHI program is presented. Since the public health workforce needs to use information technology effectively to improve population health, it is essential for public health academic institutions to develop and implement PHI workforce training programs. PMID:29770321
      

      
      Intracellular pH of symbiotic dinoflagellates
      NASA Astrophysics Data System (ADS)
      Gibbin, E. M.; Davy, S. K.
         2013-09-01
         Intracellular pH (pHi) is likely to play a key role in maintaining the functional success of cnidarian-dinoflagellate symbiosis, yet until now the pHi of the symbiotic dinoflagellates (genus Symbiodinium) has never been quantified. Flow cytometry was used in conjunction with the ratiometric fluorescent dye BCECF to monitor changes in pHi over a daily light/dark cycle. The pHi of Symbiodinium type B1 freshly isolated from the model sea anemone Aiptasia pulchella was 7.25 ± 0.01 (mean ± SE) in the light and 7.10 ± 0.02 in the dark. A comparable effect of irradiance was seen across a variety of cultured Symbiodinium genotypes (types A1, B1, E1, E2, F1, and F5) which varied between pHi 7.21-7.39 in the light and 7.06-7.14 in the dark. Of note, there was a significant genotypic difference in pHi, irrespective of irradiance.
      

      
      Analog quadrature signal to phase angle data conversion by a quadrature digitizer and quadrature counter
      DOEpatents
      Buchenauer, C.J.
         1981-09-23
         The quadrature phase angle phi (t) of a pair of quadrature signals S/sub 1/(t) and S/sub 2/(t) is digitally encoded on a real time basis by a quadrature digitizer for fractional phi (t) rotational excursions and by a quadrature up/down counter for full phi (t) rotations. The pair of quadrature signals are of the form S/sub 1/(t) = k(t) sin phi (t) and S/sub 2/(t) = k(t) cos phi (t) where k(t) is a signal common to both. The quadrature digitizer and the quadrature up/down counter may be used together or singularly as desired or required. Optionally, a digital-to-analog converter may follow the outputs of the quadrature digitizer and the quadrature up/down counter to provide an analog signal output of the quadrature phase angle phi (t).
      

      
      Analog quadrature signal to phase angle data conversion by a quadrature digitizer and quadrature counter
      DOEpatents
      Buchenauer, C. Jerald
         1984-01-01
         The quadrature phase angle .phi.(t) of a pair of quadrature signals S.sub.1 (t) and S.sub.2 (t) is digitally encoded on a real time basis by a quadrature digitizer for fractional .phi.(t) rotational excursions and by a quadrature up/down counter for full .phi.(t) rotations. The pair of quadrature signals are of the form S.sub.1 (t)=k(t) sin .phi.(t) and S.sub.2 (t)=k(t) cos .phi.(t) where k(t) is a signal common to both. The quadrature digitizer and the quadrature up/down counter may be used together or singularly as desired or required. Optionally, a digital-to-analog converter may follow the outputs of the quadrature digitizer and the quadrature up/down counter to provide an analog signal output of the quadrature phase angle .phi.(t).
      

      
      Macrophage tumoricidal mechanisms are selectively altered by prenatal chlordane exposure.
      PubMed
      Theus, S A; Tabor, D R; Soderberg, L S; Barnett, J B
         1992-09-01
         Macrophages (m phi) derived from mice treated in utero with chlordane show a significant delay of tumoricidal induction activity. In this study, m phi from chlordane-treated animals required a 48 h in vitro period of induction with interferon-gamma and lipopolysaccharide (IFN/LPS) before they could kill P815 targets. Similarly, m phi from chlordane-treated animals also failed to produce an immediate H2O2 burst upon perturbation. Conversely, their stimulated control m phi counterparts were tumoricidal by 2 h and exhibited a respiratory burst without any delay. Moreover, levels of the second messenger, inositol triphosphate (IP3), were significantly delayed in chlordane-treated animals following interaction with IFN/LPS. When nitrate/nitrite production was analyzed as an alternate mechanism for killing tumors, stimulated m phi from both normal and chlordane-treated animals responded equally. The data show that chlordane differentially introduces defects in m phi biochemical mechanisms associated with tumor killing.
      

      
      Order and disorder in crystals of hexameric NTPases from dsRNA bacteriophages.
      PubMed
      Mancini, Erika J; Grimes, Jonathan M; Malby, Robyn; Sutton, Geoffrey C; Kainov, Denis E; Juuti, Jarmo T; Makeyev, Eugene V; Tuma, Roman; Bamford, Dennis H; Stuart, David I
         2003-12-01
         The packaging of genomic RNA in members of the Cystoviridae is performed by P4, a hexameric protein with NTPase activity. Across family members such as Phi6, Phi8 and Phi13, the P4 proteins show low levels of sequence identity, but presumably have similar atomic structures. Initial structure-determination efforts for P4 from Phi6 and Phi8 were hampered by difficulties in obtaining crystals that gave ordered diffraction. Diffraction from crystals of full-length P4 showed a variety of disorder and anisotropy. Subsequently, crystals of Phi13 P4 were obtained which yielded well ordered diffraction to 1.7 A. Comparison of the packing arrangements of P4 hexamers in different crystal forms and analysis of the disorder provides insights into the flexibility of this family of proteins, which might be an integral part of their biological function.
      

      
      Measurement of differential cross sections in the $$\\phi^*$$ variable for inclusive Z boson production in pp collisions at $$\\sqrt{s}=$$ 8 TeV
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Sirunyan, Albert M; et al.
         
         Measurements of differential cross sections dmore » $$\\sigma$$/d$$\\phi^*$$ and double-differential cross sections d$$^2\\sigma$$/d$$\\phi^*\\,$$d$|y|$ for inclusive Z boson production are presented using the dielectron and dimuon final states. The kinematic observable $$\\phi^*$$ correlates with the dilepton transverse momentum but has better resolution, and $y$ is the dilepton rapidity. The analysis is based on data collected with the CMS experiment at a centre-of-mass energy of 8 TeV corresponding to an integrated luminosity of 19.7 fb$$^{-1}$$. The normalised cross section (1/$$\\sigma$$)$$\\,$$d$$\\sigma$$/d$$\\phi^*$$, within the fiducial kinematic region, is measured with a precision of better than 0.5% for $$\\phi^*$$<1. The measurements are compared to theoretical predictions and they agree, typically, within few percent.« less
      

      
      Developing Workforce Capacity in Public Health Informatics: Core Competencies and Curriculum Design.
      PubMed
      Wholey, Douglas R; LaVenture, Martin; Rajamani, Sripriya; Kreiger, Rob; Hedberg, Craig; Kenyon, Cynthia
         2018-01-01
         We describe a master's level public health informatics (PHI) curriculum to support workforce development. Public health decision-making requires intensive information management to organize responses to health threats and develop effective health education and promotion. PHI competencies prepare the public health workforce to design and implement these information systems. The objective for a Master's and Certificate in PHI is to prepare public health informaticians with the competencies to work collaboratively with colleagues in public health and other health professions to design and develop information systems that support population health improvement. The PHI competencies are drawn from computer, information, and organizational sciences. A curriculum is proposed to deliver the competencies and result of a pilot PHI program is presented. Since the public health workforce needs to use information technology effectively to improve population health, it is essential for public health academic institutions to develop and implement PHI workforce training programs.
      

      
      Reusable design: A proposed approach to Public Health Informatics system design
      PubMed Central
      
         2011-01-01
         Background Since it was first defined in 1995, Public Health Informatics (PHI) has become a recognized discipline, with a research agenda, defined domain-specific competencies and a specialized corpus of technical knowledge. Information systems form a cornerstone of PHI research and implementation, representing significant progress for the nascent field. However, PHI does not advocate or incorporate standard, domain-appropriate design methods for implementing public health information systems. Reusable design is generalized design advice that can be reused in a range of similar contexts. We propose that PHI create and reuse information design knowledge by taking a systems approach that incorporates design methods from the disciplines of Human-Computer Interaction, Interaction Design and other related disciplines. Discussion Although PHI operates in a domain with unique characteristics, many design problems in public health correspond to classic design problems, suggesting that existing design methods and solution approaches are applicable to the design of public health information systems. Among the numerous methodological frameworks used in other disciplines, we identify scenario-based design and participatory design as two widely-employed methodologies that are appropriate for adoption as PHI standards. We make the case that these methods show promise to create reusable design knowledge in PHI. Summary We propose the formalization of a set of standard design methods within PHI that can be used to pursue a strategy of design knowledge creation and reuse for cost-effective, interoperable public health information systems. We suggest that all public health informaticians should be able to use these design methods and the methods should be incorporated into PHI training. PMID:21333000
      

      
      Student Intern Ben Freed Competes as Finalist in Intel STS Competition, Three Other Interns Named Semifinalists | Poster
      Cancer.gov
      
         
         By Ashley DeVine, Staff Writer Werner H. Kirstin (WHK) student intern Ben Freed was one of 40 finalists to compete in the Intel Science Talent Search (STS) in Washington, DC, in March. “It was seven intense days of interacting with amazing judges and incredibly smart and interesting students. We met President Obama, and then the MIT astronomy lab named minor planets after each of us,” Freed said of the competition.  
      

      
      Logic Design of a Shared Disk System in a Multi-Micro Computer Environment.
      DTIC Science & Technology
      
         1983-06-01
         overall system, is given. An exnaustive description of eacn device can De found in tne cited references. A. INTEL 80S5 Tne INTEL Be86 is a nign...eitner could De accomplished, it was necessary to understand ootn tne existing system arcnitecture ani software. Tne last cnapter addressed tnat...to De adapted: tne loader program and tne Doot ROP program. Tne loader program is a simplified version of CP/M-Bö and contains cniy encu^n file
      

        
       
          

«

21
      22
      23
      24
   25
      »

          
        

     

   

   
       
            
              
          

«

21
      22
      23
      24
      25
   »

          
        

           
           
             
               
      
      DBPQL: A view-oriented query language for the Intel Data Base Processor
      NASA Technical Reports Server (NTRS)
      Fishwick, P. A.
         1983-01-01
         An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.
      

      
      Personal supercomputing by using transputer and Intel 80860 in plasma engineering
      NASA Astrophysics Data System (ADS)
      Ido, S.; Aoki, K.; Ishine, M.; Kubota, M.
         1992-09-01
         Transputer (T800) and 64-bit RISC Intel 80860 (i860) added on a personal computer can be used as an accelerator. When 32-bit T800s in a parallel system or 64-bit i860s are used, scientific calculations are carried out several ten times as fast as in the case of commonly used 32-bit personal computers or UNIX workstations. Benchmark tests and examples of physical simulations using T800s and i860 are reported.
      

      
      Direct CP Violation in Charmless Hadronic B-Meson Decays at the PEP-II Asymmetric B-Meson Factory
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Telnov, Alexandre Valerievich; /UC, Berkeley
         2005-05-06
         The study of the quark transition b {yields} s{bar s}s, which is a pure loop-level (''penguin'') process leading to several B-meson-decay final states, most notably {phi}K, is arguably the hottest topic in B-meson physics today. The reason is the sensitivity of the amplitudes and the CP-violating asymmetries in such processes to physics beyond the Standard Model. By performing these measurements, we improve our understanding of the phenomenon of combined-parity (CP) violation, which is believed to be responsible for the dominance of matter over antimatter in our Universe. Here, we present measurements of branching fractions and charge asymmetries in the decaysmore » B{sup +} {yields} {phi}K{sup +} and B{sup 0} {yields} {phi}K{sup 0} in a sample of approximately 89 million B{bar B} pairs collected by the BABAR detector at the PEP-II asymmetric-energy B-meson Factory at SLAC. We determine {Beta}(B{sup +} {yields} {phi}K{sup +}) = (10.0{sub -0.8}{sup +0.9} {+-} 0.5) x 10{sup -6} and {Beta}(B{sup 0} {yields} {phi}K{sup 0}) = (8.4{sub -1.3}{sup +1.5} {+-} 0.5) x 10{sup -6}, where the first error is statistical and the second is systematic. Additionally, we measure the CP-violating charge asymmetry {Alpha}{sub CP}(B{sup {+-}} {yields} {phi}K{sup {+-}}) = 0.04 {+-} 0.09 {+-} 0.01, with a 90% confidence-level interval of [-0.10, 0.18], and set an upper limit on the CKM- and color-suppressed decay B{sup +} {yields} {phi}{pi}{sup +}, {Beta}(B{sup +} {yields} {phi}{pi}{sup +}) < 0.41 x 10{sup -6} (at the 90% confidence level). Our results are consistent with the Standard Model, which predicts {Alpha}{sub CP}(B{sup {+-}} {yields} {phi}K{sup {+-}}) {approx}< 1% and {Beta}(B {yields} {phi}{tau}) << 10{sup -7}. Since many models of physics beyond the Standard Model introduce additional loop diagrams with new heavy particles and new CP-violating phases that would contribute to these decays, potentially making {Alpha}{sub CP} (B{sup {+-}} {yields} {phi}K{sup {+-}}) and {Beta}(B {yields} {phi}{pi}) quite large, our results can be used to substantially constrain the parameter spaces of such models.« less
      

      
      Is there a field-theoretic explanation for precursor biopolymers?
      PubMed
      Rosen, Gerald
         2002-08-01
         A Hu-Barkana-Gruzinov cold dark matter scalar field phi may enter a weak isospin invariant derivative interaction that causes the flow of right-handed electrons to align parallel to (inverted delta phi). Hence, in the outer regions of galaxies where (inverted delta phi) is large, as in galactic halos, the derivative interaction may induce a chirality-imbued quantum chemistry. Such a chirality-imbued chemistry would in turn be conducive to the formation of abundant precursor biopolymers on interstellar dust grains, comets and meteors in galactic halo regions, with subsequent delivery to planets in the inner galactic regions where phi and (inverted delta phi) are concomitantly near zero and left-right symmetric terrestrial quantum chemistry prevails.
      

      
      Thermal performance testing of two Thales 9310 pulse-tube cryocoolers for PHyTIR
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Paine, Christopher G.
         2014-01-29
         PHyTIR is a NASA-funded technology demonstration for a near-term earth-observing instrument in the thermal infrared spectrum, intended for use in the HyspIRI mission. PHyTIR will use two Thales 9310 single-stage pulse tube cryocoolers, one to directly cool the FPA, the other to simulate a passive radiator. We report performance measurements for the two Thales 9310 cryocoolers intended for inclusion in the PHyTIR demonstrator.
      

      
      Genomic characterization of Ralstonia solanacearum phage phiRSB1, a T7-like wide-host-range phage.
      PubMed
      Kawasaki, Takeru; Shimizu, Mio; Satsuma, Hideki; Fujiwara, Akiko; Fujie, Makoto; Usami, Shoji; Yamada, Takashi
         2009-01-01
         PhiRSB1 is a wide-host-range, T7-like bacteriophage that infects and efficiently lyses the phytopathogenic bacterium Ralstonia solanacearum. The phiRSB1 genome comprises 43,079 bp of double-stranded DNA (61.7% G+C) with 325-bp terminal repeats and contains 47 open reading frames. Strong activity of tandem early promoters and wide specificity of phage promoters of phiRSB1 were demonstrated.
      

      
      [The value of PHI/PCA3 in the early diagnosis of prostate cancer].
      PubMed
      Tan, S J; Xu, L W; Xu, Z; Wu, J P; Liang, K; Jia, R P
         2016-01-12
         To investigate the value of prostate health index (PHI) and prostate cancer gene 3 (PCA3) in the early diagnosis of prostate cancer (PCa). A total of 190 patients with abnormal serum prostate specific antigen (PSA) or abnormal digital rectal examination were enrolled. They were all underwent initial biopsy and 11 of them were also underwent repeated biopsy. In addition, 25 healthy cases (with normal digital rectal examination and PSA<4 ng/ml) were the control group.The PHI and PCA3 were detected by using immunofluorescence and Loop-Mediated Isothermal Amplification (LAMP). The sensitivity and specificity of diagnosis were determined by ROC curve.In addition, the relationship between PHI/PSA and the Gleason score and clinical stage were analyzed. A total of 89 patients were confirmed PCa by Pathological diagnosis. The other 101 patients were diagnosed as benign prostatic hyperplasia (BPH). The sensitivity and specificity of PCA3 test were 85.4% was 92.1%. Area under curve (AUC) of PHI is higher than AUC of PSA (0.727>0.699). The PHI in peripheral blood was positively correlated with Gleason score and clinical stage. The detection of PCA3 and PHI shows excellent detecting effectiveness. Compared with single PSA, the combined detection of PHI and PCA3 improved the diagnostic specificity. It can provide a new method for the early diagnosis in prostate cancer and avoid unnecessary biopsies.
      

      
      Genome packaging in EL and Lin68, two giant phiKZ-like bacteriophages of P. aeruginosa
      DOE Office of Scientific and Technical Information (OSTI.GOV)
      Sokolova, O.S., E-mail: sokolova@mail.bio.msu.ru; A.V. Shoubnikov Institute of Crystallography RAS, Moscow; Shaburova, O.V.
         
         A unique feature of the Pseudomonas aeruginosa giant phage phiKZ is its way of genome packaging onto a spool-like protein structure, the inner body. Until recently, no similar structures have been detected in other phages. We have studied DNA packaging in P. aeruginosa phages EL and Lin68 using cryo-electron microscopy and revealed the presence of inner bodies. The shape and positioning of the inner body and the density of the DNA packaging in EL are different from those found in phiKZ and Lin68. This internal organization explains how the shorter EL genome is packed into a large EL capsid, whichmore » has the same external dimensions as the capsids of phiKZ and Lin68. The similarity in the structural organization in EL and other phiKZ-like phages indicates that EL is phylogenetically related to other phiKZ-like phages, and, despite the lack of detectable DNA homology, EL, phiKZ, and Lin68 descend from a common ancestor. - Highlights: • We performed a comparative structural study of giant P. aeruginosa phages: EL, Lin68 and phiKZ. • We revealed that the inner body is a common feature in giant phages. • The phage genome size correlates with the overall dimensions of the inner body.« less
      

      
      Expression, purification and characterization of a phyAm-phyCs fusion phytase*
      PubMed Central
      Zou, Li-kou; Wang, Hong-ning; Pan, Xin; Tian, Guo-bao; Xie, Zi-wen; Wu, Qi; Chen, Hui; Xie, Tao; Yang, Zhi-rong
         2008-01-01
         The phyAm gene encoding acid phytase and optimized neutral phytase phyCs gene were inserted into expression vector pPIC9K in correct orientation and transformed into Pichia pastoris in order to expand the pH profile of phytase and decrease the cost of production. The fusion phytase phyAm-phyCs gene was successfully overexpressed in P. pastoris as an active and extracellular phytase. The yield of total extracellular fusion phytase activity is (25.4±0.53) U/ml at the flask scale and (159.1±2.92) U/ml for high cell-density fermentation, respectively. Purified fusion phytase exhibits an optimal temperature at 55 °C and an optimal pH at 5.5~6.0 and its relative activity remains at a relatively high level of above 70% in the range of pH 2.0 to 7.0. About 51% to 63% of its original activity remains after incubation at 75 °C to 95 °C for 10 min. Due to heavy glycosylation, the expressed fusion phytase shows a broad and diffuse band in SDS-PAGE (sodium dodecyl sulfate-polyacrylamide gel electrophoresis). After deglycosylation by endoglycosidase H (EndoHf), the enzyme has an apparent molecular size of 95 kDa. The characterization of the fusion phytase was compared with those of phyCs and phyAm. PMID:18600783
      

      
      PhiC31 recombination system demonstrates heritable germinal transmission of site-specific excision from the Arabidopsis genome
      PubMed Central
      
         2010-01-01
         Background The large serine recombinase phiC31 from broad host range Streptomyces temperate phage, catalyzes the site-specific recombination of two recognition sites that differ in sequence, typically known as attachment sites attB and attP. Previously, we characterized the phiC31 catalytic activity and modes of action in the fission yeast Schizosaccharomyces pombe. Results In this work, the phiC31 recombinase gene was placed under the control of the Arabidopsis OXS3 promoter and introduced into Arabidopsis harboring a chromosomally integrated attB and attP-flanked target sequence. The phiC31 recombinase excised the attB and attP-flanked DNA, and the excision event was detected in subsequent generations in the absence of the phiC31 gene, indicating germinal transmission was possible. We further verified that the genomic excision was conservative and that introduction of a functional recombinase can be achieved through secondary transformation as well as manual crossing. Conclusion The phiC31 system performs site-specific recombination in germinal tissue, a prerequisite for generating stable lines with unwanted DNA removed. The precise site-specific deletion by phiC31 in planta demonstrates that the recombinase can be used to remove selectable markers or other introduced transgenes that are no longer desired and therefore can be a useful tool for genome engineering in plants. PMID:20178628
      

      
      Evaluating the Applicability of Phi Coefficient in Indicating Habitat Preferences of Forest Soil Fauna Based on a Single Field Study in Subtropical China.
      PubMed
      Cui, Yang; Wang, Silong; Yan, Shaokui
         2016-01-01
         Phi coefficient directly depends on the frequencies of occurrence of organisms and has been widely used in vegetation ecology to analyse the associations of organisms with site groups, providing a characterization of ecological preference, but its application in soil ecology remains rare. Based on a single field experiment, this study assessed the applicability of phi coefficient in indicating the habitat preferences of soil fauna, through comparing phi coefficient-induced results with those of ordination methods in charactering soil fauna-habitat(factors) relationships. Eight different habitats of soil fauna were implemented by reciprocal transfer of defaunated soil cores between two types of subtropical forests. Canonical correlation analysis (CCorA) showed that ecological patterns of fauna-habitat relationships and inter-fauna taxa relationships expressed, respectively, by phi coefficients and predicted abundances calculated from partial redundancy analysis (RDA), were extremely similar, and a highly significant relationship between the two datasets was observed (Pillai's trace statistic = 1.998, P = 0.007). In addition, highly positive correlations between phi coefficients and predicted abundances for Acari, Collembola, Nematode and Hemiptera were observed using linear regression analysis. Quantitative relationships between habitat preferences and soil chemical variables were also obtained by linear regression, which were analogous to the results displayed in a partial RDA biplot. Our results suggest that phi coefficient could be applicable on a local scale in evaluating habitat preferences of soil fauna at coarse taxonomic levels, and that the phi coefficient-induced information, such as ecological preferences and the associated quantitative relationships with habitat factors, will be largely complementary to the results of ordination methods. The application of phi coefficient in soil ecology may extend our knowledge about habitat preferences and distribution-abundance relationships, which will benefit the understanding of biodistributions and variations in community compositions in the soil. Similar studies in other places and scales apart from our local site will be need for further evaluation of phi coefficient.
      

      
      Evaluating the Applicability of Phi Coefficient in Indicating Habitat Preferences of Forest Soil Fauna Based on a Single Field Study in Subtropical China
      PubMed Central
      Cui, Yang; Wang, Silong; Yan, Shaokui
         2016-01-01
         Phi coefficient directly depends on the frequencies of occurrence of organisms and has been widely used in vegetation ecology to analyse the associations of organisms with site groups, providing a characterization of ecological preference, but its application in soil ecology remains rare. Based on a single field experiment, this study assessed the applicability of phi coefficient in indicating the habitat preferences of soil fauna, through comparing phi coefficient-induced results with those of ordination methods in charactering soil fauna-habitat(factors) relationships. Eight different habitats of soil fauna were implemented by reciprocal transfer of defaunated soil cores between two types of subtropical forests. Canonical correlation analysis (CCorA) showed that ecological patterns of fauna-habitat relationships and inter-fauna taxa relationships expressed, respectively, by phi coefficients and predicted abundances calculated from partial redundancy analysis (RDA), were extremely similar, and a highly significant relationship between the two datasets was observed (Pillai's trace statistic = 1.998, P = 0.007). In addition, highly positive correlations between phi coefficients and predicted abundances for Acari, Collembola, Nematode and Hemiptera were observed using linear regression analysis. Quantitative relationships between habitat preferences and soil chemical variables were also obtained by linear regression, which were analogous to the results displayed in a partial RDA biplot. Our results suggest that phi coefficient could be applicable on a local scale in evaluating habitat preferences of soil fauna at coarse taxonomic levels, and that the phi coefficient-induced information, such as ecological preferences and the associated quantitative relationships with habitat factors, will be largely complementary to the results of ordination methods. The application of phi coefficient in soil ecology may extend our knowledge about habitat preferences and distribution-abundance relationships, which will benefit the understanding of biodistributions and variations in community compositions in the soil. Similar studies in other places and scales apart from our local site will be need for further evaluation of phi coefficient. PMID:26930593
      

      
      Embryonic common snapping turtles (Chelydra serpentina) preferentially regulate intracellular tissue pH during acid-base challenges.
      PubMed
      Shartau, Ryan B; Crossley, Dane A; Kohl, Zachary F; Brauner, Colin J
         2016-07-01
         The nests of embryonic turtles naturally experience elevated CO2 (hypercarbia), which leads to increased blood PCO2  and a respiratory acidosis, resulting in reduced blood pH [extracellular pH (pHe)]. Some fishes preferentially regulate tissue pH [intracellular pH (pHi)] against changes in pHe; this has been proposed to be associated with exceptional CO2 tolerance and has never been identified in amniotes. As embryonic turtles may be CO2 tolerant based on nesting strategy, we hypothesized that they preferentially regulate pHi, conferring tolerance to severe acute acid-base challenges. This hypothesis was tested by investigating pH regulation in common snapping turtles (Chelydra serpentina) reared in normoxia then exposed to hypercarbia (13 kPa PCO2 ) for 1 h at three developmental ages: 70% and 90% of incubation, and yearlings. Hypercarbia reduced pHe but not pHi, at all developmental ages. At 70% of incubation, pHe was depressed by 0.324 pH units while pHi of brain, white muscle and lung increased; heart, liver and kidney pHi remained unchanged. At 90% of incubation, pHe was depressed by 0.352 pH units but heart pHi increased with no change in pHi of other tissues. Yearlings exhibited a pHe reduction of 0.235 pH units but had no changes in pHi of any tissues. The results indicate common snapping turtles preferentially regulate pHi during development, but the degree of response is reduced throughout development. This is the first time preferential pHi regulation has been identified in an amniote. These findings may provide insight into the evolution of acid-base homeostasis during development of amniotes, and vertebrates in general. © 2016. Published by The Company of Biologists Ltd.
      

      
      Evidence for the role of a Na(+)/HCO(3)(-) cotransporter in trout hepatocyte pHi regulation.
      PubMed
      Furimsky, M; Moon, T W; Perry, S F
         2000-07-01
         The mechanisms of intracellular pH (pHi) regulation were examined in hepatocytes of the rainbow trout Oncorhynchus mykiss. pHi was monitored using the pH-sensitive fluorescent dye BCECF, and the effects of various media and pharmacological agents were examined for their influence on baseline pHi and recovery rates from acid and base loading. Rates of Na(+) uptake were measured using (22)Na, and changes in membrane potential were examined using the potentiometric fluorescent dye Oxonol VI. The rate of proton extrusion following acid loading was diminished by the blockade of either Na(+)/H(+) exchange (using amiloride) or anion transport (using DIDS). The removal of external HCO(3)(-) and the abolition of outward K(+) diffusion by the channel blocker Ba(2+) also decreased the rate of proton extrusion following acid load. Depolarization of the cell membrane with 50 mmol l(-)(1) K(+), however, did not affect pHi. The rate of recovery from base loading was significantly diminished by the blockade of anion transport, removal of external HCO(3)(-) and, to a lesser extent, by blocking Na(+)/H(+) exchange. The blockade of K(+) conductance had no effect. The decrease in Na(+) uptake rate observed in the presence of the anion transport blocker DIDS and the DIDS-sensitive hyperpolarization of membrane potential during recovery from acid loading suggest that a Na(+)-dependent electrogenic transport system is involved in the restoration of pHi after intracellular acidification. The effects on baseline pHi indicate that the different membrane exchangers are tonically active in the maintenance of steady-state pHi. This study confirms the roles of a Na(+)/H(+) exchanger and a Cl(-)/HCO(3)(-) exchanger in the regulation of trout hepatocyte pHi and provides new evidence that a Na(+)/HCO(3)(-) cotransporter contributes to pHi regulation.
      

      
      A portable approach for PIC on emerging architectures
      NASA Astrophysics Data System (ADS)
      Decyk, Viktor
         2016-03-01
         A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
      

      
      Web interfaces to relational databases
      NASA Technical Reports Server (NTRS)
      Carlisle, W. H.
         1996-01-01
         This reports on a project to extend the capabilities of a Virtual Research Center (VRC) for NASA's Advanced Concepts Office. The work was performed as part of NASA's 1995 Summer Faculty Fellowship program and involved the development of a prototype component of the VRC - a database system that provides data creation and access services within a room of the VRC. In support of VRC development, NASA has assembled a laboratory containing the variety of equipment expected to be used by scientists within the VRC. This laboratory consists of the major hardware platforms, SUN, Intel, and Motorola processors and their most common operating systems UNIX, Windows NT, Windows for Workgroups, and Macintosh. The SPARC 20 runs SUN Solaris 2.4, an Intel Pentium runs Windows NT and is installed on a different network from the other machines in the laboratory, a Pentium PC runs Windows for Workgroups, two Intel 386 machines run Windows 3.1, and finally, a PowerMacintosh and a Macintosh IIsi run MacOS.
      

      
      Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines
      NASA Technical Reports Server (NTRS)
      Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.
         1994-01-01
         The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
      

      
      Intracellular pH regulation by acid-base transporters in mammalian neurons
      PubMed Central
      Ruffin, Vernon A.; Salameh, Ahlam I.; Boron, Walter F.; Parker, Mark D.
         2014-01-01
         Intracellular pH (pHi) regulation in the brain is important in both physiological and physiopathological conditions because changes in pHi generally result in altered neuronal excitability. In this review, we will cover 4 major areas: (1) The effect of pHi on cellular processes in the brain, including channel activity and neuronal excitability. (2) pHi homeostasis and how it is determined by the balance between rates of acid loading (JL) and extrusion (JE). The balance between JE and JL determine steady-state pHi, as well as the ability of the cell to defend pHi in the face of extracellular acid-base disturbances (e.g., metabolic acidosis). (3) The properties and importance of members of the SLC4 and SLC9 families of acid-base transporters expressed in the brain that contribute to JL (namely the Cl-HCO3 exchanger AE3) and JE (the Na-H exchangers NHE1, NHE3, and NHE5 as well as the Na+- coupled HCO3− transporters NBCe1, NBCn1, NDCBE, and NBCn2). (4) The effect of acid-base disturbances on neuronal function and the roles of acid-base transporters in defending neuronal pHi under physiopathologic conditions. PMID:24592239
      

      
      Comparative evaluation of urinary PCA3 and TMPRSS2: ERG scores and serum PHI in predicting prostate cancer aggressiveness.
      PubMed
      Tallon, Lucile; Luangphakdy, Devillier; Ruffion, Alain; Colombel, Marc; Devonec, Marian; Champetier, Denis; Paparel, Philippe; Decaussin-Petrucci, Myriam; Perrin, Paul; Vlaeminck-Guillem, Virginie
         2014-07-30
         It has been suggested that urinary PCA3 and TMPRSS2:ERG fusion tests and serum PHI correlate to cancer aggressiveness-related pathological criteria at prostatectomy. To evaluate and compare their ability in predicting prostate cancer aggressiveness, PHI and urinary PCA3 and TMPRSS2:ERG (T2) scores were assessed in 154 patients who underwent radical prostatectomy for biopsy-proven prostate cancer. Univariate and multivariate analyses using logistic regression and decision curve analyses were performed. All three markers were predictors of a tumor volume≥0.5 mL. Only PHI predicted Gleason score≥7. T2 score and PHI were both independent predictors of extracapsular extension(≥pT3), while multifocality was only predicted by PCA3 score. Moreover, when compared to a base model (age, digital rectal examination, serum PSA, and Gleason sum at biopsy), the addition of both PCA3 score and PHI to the base model induced a significant increase (+12%) when predicting tumor volume>0.5 mL. PHI and urinary PCA3 and T2 scores can be considered as complementary predictors of cancer aggressiveness at prostatectomy.
      

      
      Imaging of Intracellular pH in Tumor Spheroids Using Genetically Encoded Sensor SypHer2.
      PubMed
      Zagaynova, Elena V; Druzhkova, Irina N; Mishina, Natalia M; Ignatova, Nadezhda I; Dudenkova, Varvara V; Shirmanova, Marina V
         2017-01-01
         Intracellular pH (pHi) is one of the most important parameters that regulate the physiological state of cells and tissues. pHi homeostasis is crucial for normal cell functioning. Cancer cells are characterized by having a higher (neutral to slightly alkaline) pHi and lower (acidic) extracellular pH (pHe) compared to normal cells. This is referred to as a "reversed" pH gradient, and is essential in supporting their accelerated growth rate, invasion and migration, and in suppressing anti-tumor immunity, the promotion of metabolic coupling with fibroblasts and in preventing apoptosis. Moreover, abnormal pH, both pHi and pHe, contribute to drug resistance in cancers. Therefore, the development of methods for measuring pH in living tumor cells is likely to lead to better understanding of tumor biology and to open new ways for cancer treatment. Genetically encoded, fluorescent, pH-sensitive probes represent promising instruments enabling the subcellular measurement of pHi with unrivaled specificity and high accuracy. Here, we describe a protocol for pHi imaging at a microscopic level in HeLa tumor spheroids, using the genetically encoded ratiometric (dual-excitation) pHi indicator, SypHer2.
      

        
       
          

«

21
      22
      23
      24
      25
   »

          
        

     

   

   Some links on this page may take you to non-federal websites. Their policies may differ from this site.