Collective Framework and Performance Optimizations to Open MPI for Cray XT Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ladd, Joshua S; Gorentla Venkata, Manjunath; Shamis, Pavel
2011-01-01
The performance and scalability of collective operations plays a key role in the performance and scalability of many scientific applications. Within the Open MPI code base we have developed a general purpose hierarchical collective operations framework called Cheetah, and applied it at large scale on the Oak Ridge Leadership Computing Facility's Jaguar (OLCF) platform, obtaining better performance and scalability than the native MPI implementation. This paper discuss Cheetah's design and implementation, and optimizations to the framework for Cray XT 5 platforms. Our results show that the Cheetah's Broadcast and Barrier perform better than the native MPI implementation. For medium data,more » the Cheetah's Broadcast outperforms the native MPI implementation by 93% for 49,152 processes problem size. For small and large data, it out performs the native MPI implementation by 10% and 9%, respectively, at 24,576 processes problem size. The Cheetah's Barrier performs 10% better than the native MPI implementation for 12,288 processes problem size.« less
NASA Technical Reports Server (NTRS)
Phillips, Jennifer K.
1995-01-01
Two of the current and most popular implementations of the Message-Passing Standard, Message Passing Interface (MPI), were contrasted: MPICH by Argonne National Laboratory, and LAM by the Ohio Supercomputer Center at Ohio State University. A parallel skyline matrix solver was adapted to be run in a heterogeneous environment using MPI. The Message-Passing Interface Forum was held in May 1994 which lead to a specification of library functions that implement the message-passing model of parallel communication. LAM, which creates it's own environment, is more robust in a highly heterogeneous network. MPICH uses the environment native to the machine architecture. While neither of these free-ware implementations provides the performance of native message-passing or vendor's implementations, MPICH begins to approach that performance on the SP-2. The machines used in this study were: IBM RS6000, 3 Sun4, SGI, and the IBM SP-2. Each machine is unique and a few machines required specific modifications during the installation. When installed correctly, both implementations worked well with only minor problems.
Implementing Multidisciplinary and Multi-Zonal Applications Using MPI
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.
1995-01-01
Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. Unfortunately, simple message passing models, like Intel's NX library, only allow point-to-point and global communication within a single system-defined partition. This makes implementation of these applications quite difficult, if not impossible. In this report it is shown that the new Message Passing Interface (MPI) standard is a viable portable library for implementing the message passing portion of multidisciplinary applications. Further, with the extension of a portable loader, fully portable multidisciplinary application programs can be developed. Finally, the performance of MPI is compared to that of some native message passing libraries. This comparison shows that MPI can be implemented to deliver performance commensurate with native message libraries.
Cheetah: A Framework for Scalable Hierarchical Collective Operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Richard L; Gorentla Venkata, Manjunath; Ladd, Joshua S
2011-01-01
Collective communication operations, used by many scientific applications, tend to limit overall parallel application performance and scalability. Computer systems are becoming more heterogeneous with increasing node and core-per-node counts. Also, a growing number of data-access mechanisms, of varying characteristics, are supported within a single computer system. We describe a new hierarchical collective communication framework that takes advantage of hardware-specific data-access mechanisms. It is flexible, with run-time hierarchy specification, and sharing of collective communication primitives between collective algorithms. Data buffers are shared between levels in the hierarchy reducing collective communication management overhead. We have implemented several versions of the Message Passingmore » Interface (MPI) collective operations, MPI Barrier() and MPI Bcast(), and run experiments using up to 49, 152 processes on a Cray XT5, and a small InfiniBand based cluster. At 49, 152 processes our barrier implementation outperforms the optimized native implementation by 75%. 32 Byte and one Mega-Byte broadcasts outperform it by 62% and 11%, respectively, with better scalability characteristics. Improvements relative to the default Open MPI implementation are much larger.« less
High-energy physics software parallelization using database techniques
NASA Astrophysics Data System (ADS)
Argante, E.; van der Stok, P. D. V.; Willers, I.
1997-02-01
A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradimg, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI.
An implementation and evaluation of the MPI 3.0 one-sided communication interface
Dinan, James S.; Balaji, Pavan; Buntinas, Darius T.; ...
2016-01-09
The Q1 Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI’s remote memory access (RMA) interface, which provides support for one-sided communication. MPI-3 RMA is expected to greatly enhance the usability and performance ofMPI RMA.We present the first complete implementation of MPI-3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging-based networks and is publicly available in the latest release of the MPICH MPI implementation. Here using this implementation, we explore the performance impact of new MPI-3 functionality and semantics. Results indicate that the MPI-3 RMA interface providesmore » significant advantages over the MPI-2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations.« less
An implementation and evaluation of the MPI 3.0 one-sided communication interface
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dinan, James S.; Balaji, Pavan; Buntinas, Darius T.
The Q1 Message Passing Interface (MPI) 3.0 standard includes a significant revision to MPI’s remote memory access (RMA) interface, which provides support for one-sided communication. MPI-3 RMA is expected to greatly enhance the usability and performance ofMPI RMA.We present the first complete implementation of MPI-3 RMA and document implementation techniques and performance optimization opportunities enabled by the new interface. Our implementation targets messaging-based networks and is publicly available in the latest release of the MPICH MPI implementation. Here using this implementation, we explore the performance impact of new MPI-3 functionality and semantics. Results indicate that the MPI-3 RMA interface providesmore » significant advantages over the MPI-2 interface by enabling increased communication concurrency through relaxed semantics in the interface and additional routines that provide new window types, synchronization modes, and atomic operations.« less
An MPI-1 Compliant Thread-Based Implementation
NASA Astrophysics Data System (ADS)
Díaz Martín, J. C.; Rico Gallego, J. A.; Álvarez Llorente, J. M.; Perogil Duque, J. F.
This work presents AzequiaMPI, the first full compliant implementation of the MPI-1 standard where the MPI node is a thread. Performance comparisons with MPICH2-Nemesis show that thread-based implementations exploit adequately the multicore architectures under oversubscription, what could make MPI competitive with OpenMP-like solutions.
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Jong, Wibe de
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buntinas, D.; Mercier, G.; Gropp, W.
2005-12-02
This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorentla Venkata, Manjunath; Shamis, Pavel; Graham, Richard L
2013-01-01
Many scientific simulations, using the Message Passing Interface (MPI) programming model, are sensitive to the performance and scalability of reduction collective operations such as MPI Allreduce and MPI Reduce. These operations are the most widely used abstractions to perform mathematical operations over all processes that are part of the simulation. In this work, we propose a hierarchical design to implement the reduction operations on multicore systems. This design aims to improve the efficiency of reductions by 1) tailoring the algorithms and customizing the implementations for various communication mechanisms in the system 2) providing the ability to configure the depth ofmore » hierarchy to match the system architecture, and 3) providing the ability to independently progress each of this hierarchy. Using this design, we implement MPI Allreduce and MPI Reduce operations (and its nonblocking variants MPI Iallreduce and MPI Ireduce) for all message sizes, and evaluate on multiple architectures including InfiniBand and Cray XT5. We leverage and enhance our existing infrastructure, Cheetah, which is a framework for implementing hierarchical collective operations to implement these reductions. The experimental results show that the Cheetah reduction operations outperform the production-grade MPI implementations such as Open MPI default, Cray MPI, and MVAPICH2, demonstrating its efficiency, flexibility and portability. On Infini- Band systems, with a microbenchmark, a 512-process Cheetah nonblocking Allreduce and Reduce achieves a speedup of 23x and 10x, respectively, compared to the default Open MPI reductions. The blocking variants of the reduction operations also show similar performance benefits. A 512-process nonblocking Cheetah Allreduce achieves a speedup of 3x, compared to the default MVAPICH2 Allreduce implementation. On a Cray XT5 system, a 6144-process Cheetah Allreduce outperforms the Cray MPI by 145%. The evaluation with an application kernel, Conjugate Gradient solver, shows that the Cheetah reductions speeds up total time to solution by 195%, demonstrating the potential benefits for scientific simulations.« less
OPAL: An Open-Source MPI-IO Library over Cray XT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Weikuan; Vetter, Jeffrey S; Canon, Richard Shane
Parallel IO over Cray XT is supported by a vendor-supplied MPI-IO package. This package contains a proprietary ADIO implementation built on top of the sysio library. While it is reasonable to maintain a stable code base for application scientists' convenience, it is also very important to the system developers and researchers to analyze and assess the effectiveness of parallel IO software, and accordingly, tune and optimize the MPI-IO implementation. A proprietary parallel IO code base relinquishes such flexibilities. On the other hand, a generic UFS-based MPI-IO implementation is typically used on many Linux-based platforms. We have developed an open-source MPI-IOmore » package over Lustre, referred to as OPAL (OPportunistic and Adaptive MPI-IO Library over Lustre). OPAL provides a single source-code base for MPI-IO over Lustre on Cray XT and Linux platforms. Compared to Cray implementation, OPAL provides a number of good features, including arbitrary specification of striping patterns and Lustre-stripe aligned file domain partitioning. This paper presents the performance comparisons between OPAL and Cray's proprietary implementation. Our evaluation demonstrates that OPAL achieves the performance comparable to the Cray implementation. We also exemplify the benefits of an open source package in revealing the underpinning of the parallel IO performance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amestoy, Patrick R.; Duff, Iain S.; L'Excellent, Jean-Yves
2001-10-10
We examine the mechanics of the send and receive mechanism of MPI and in particular how we can implement message passing in a robust way so that our performance is not significantly affected by changes to the MPI system. This leads us to using the Isend/Irecv protocol which will entail sometimes significant algorithmic changes. We discuss this within the context of two different algorithms for sparse Gaussian elimination that we have parallelized. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. Both algorithms are difficult to parallelize on distributed memory machines. Our initial strategiesmore » were based on simple MPI point-to-point communication primitives. With such approaches, the parallel performance of both codes are very sensitive to the MPI implementation, the way MPI internal buffers are used in particular. We then modified our codes to use more sophisticated nonblocking versions of MPI communication. This significantly improved the performance robustness (independent of the MPI buffering mechanism) and scalability, but at the cost of increased code complexity.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sayan Ghosh, Jeff Hammond
OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less
Ojeda-May, Pedro; Nam, Kwangho
2017-08-08
The strategy and implementation of scalable and efficient semiempirical (SE) QM/MM methods in CHARMM are described. The serial version of the code was first profiled to identify routines that required parallelization. Afterward, the code was parallelized and accelerated with three approaches. The first approach was the parallelization of the entire QM/MM routines, including the Fock matrix diagonalization routines, using the CHARMM message passage interface (MPI) machinery. In the second approach, two different self-consistent field (SCF) energy convergence accelerators were implemented using density and Fock matrices as targets for their extrapolations in the SCF procedure. In the third approach, the entire QM/MM and MM energy routines were accelerated by implementing the hybrid MPI/open multiprocessing (OpenMP) model in which both the task- and loop-level parallelization strategies were adopted to balance loads between different OpenMP threads. The present implementation was tested on two solvated enzyme systems (including <100 QM atoms) and an S N 2 symmetric reaction in water. The MPI version exceeded existing SE QM methods in CHARMM, which include the SCC-DFTB and SQUANTUM methods, by at least 4-fold. The use of SCF convergence accelerators further accelerated the code by ∼12-35% depending on the size of the QM region and the number of CPU cores used. Although the MPI version displayed good scalability, the performance was diminished for large numbers of MPI processes due to the overhead associated with MPI communications between nodes. This issue was partially overcome by the hybrid MPI/OpenMP approach which displayed a better scalability for a larger number of CPU cores (up to 64 CPUs in the tested systems).
What does fault tolerant Deep Learning need from MPI?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amatya, Vinay C.; Vishnu, Abhinav; Siegel, Charles M.
Deep Learning (DL) algorithms have become the {\\em de facto} Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive -- even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults -- requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: {\\em What is needed from MPI for designing fault tolerant DL implementations?} In this paper, we address this problem for permanent faults. We motivate the need for amore » fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by extending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet neural network topology demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM.« less
Characterizing MPI matching via trace-based simulation
Ferreira, Kurt Brian; Levy, Scott Larson Nicoll; Pedretti, Kevin; ...
2017-01-01
With the increased scale expected on future leadership-class systems, detailed information about the resource usage and performance of MPI message matching provides important insights into how to maintain application performance on next-generation systems. However, obtaining MPI message matching performance data is often not possible without significant effort. A common approach is to instrument an MPI implementation to collect relevant statistics. While this approach can provide important data, collecting matching data at runtime perturbs the application's execution, including its matching performance, and is highly dependent on the MPI library's matchlist implementation. In this paper, we introduce a trace-based simulation approach tomore » obtain detailed MPI message matching performance data for MPI applications without perturbing their execution. Using a number of key parallel workloads, we demonstrate that this simulator approach can rapidly and accurately characterize matching behavior. Specifically, we use our simulator to collect several important statistics about the operation of the MPI posted and unexpected queues. For example, we present data about search lengths and the duration that messages spend in the queues waiting to be matched. Here, data gathered using this simulation-based approach have significant potential to aid hardware designers in determining resource allocation for MPI matching functions and provide application and middleware developers with insight into the scalability issues associated with MPI message matching.« less
''Towards a High-Performance and Robust Implementation of MPI-IO on Top of GPFS''
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prost, J.P.; Tremann, R.; Blackwore, R.
2000-01-11
MPI-IO/GPFS is a prototype implementation of the I/O chapter of the Message Passing Interface (MPI) 2 standard. It uses the IBM General Parallel File System (GPFS), with prototyped extensions, as the underlying file system. this paper describes the features of this prototype which support its high performance and robustness. The use of hints at the file system level and at the MPI-IO level allows tailoring the use of the file system to the application needs. Error handling in collective operations provides robust error reporting and deadlock prevention in case of returning errors.
Enabling communication concurrency through flexible MPI endpoints
Dinan, James; Grant, Ryan E.; Balaji, Pavan; ...
2014-09-23
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. Our paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Also, endpoints enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. Furthermore, these characteristics are illustrated through several examples and an empirical study thatmore » contrasts current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.« less
Enabling communication concurrency through flexible MPI endpoints
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dinan, James; Grant, Ryan E.; Balaji, Pavan
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. Our paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Also, endpoints enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. Furthermore, these characteristics are illustrated through several examples and an empirical study thatmore » contrasts current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.« less
Enabling communication concurrency through flexible MPI endpoints
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dinan, James; Grant, Ryan E.; Balaji, Pavan
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. This paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Endpoints also enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. These characteristics are illustrated through several examples and an empirical study that contrastsmore » current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.« less
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program
NASA Astrophysics Data System (ADS)
Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.
2018-02-01
We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
NASA Astrophysics Data System (ADS)
Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.
1995-03-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, S.; Zacharia, T.; Baltas, N.
1995-04-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
An MPI + $X$ implementation of contact global search using Kokkos
Hansen, Glen A.; Xavier, Patrick G.; Mish, Sam P.; ...
2015-10-05
This paper describes an approach that seeks to parallelize the spatial search associated with computational contact mechanics. In contact mechanics, the purpose of the spatial search is to find “nearest neighbors,” which is the prelude to an imprinting search that resolves the interactions between the external surfaces of contacting bodies. In particular, we are interested in the contact global search portion of the spatial search associated with this operation on domain-decomposition-based meshes. Specifically, we describe an implementation that combines standard domain-decomposition-based MPI-parallel spatial search with thread-level parallelism (MPI-X) available on advanced computer architectures (those with GPU coprocessors). Our goal ismore » to demonstrate the efficacy of the MPI-X paradigm in the overall contact search. Standard MPI-parallel implementations typically use a domain decomposition of the external surfaces of bodies within the domain in an attempt to efficiently distribute computational work. This decomposition may or may not be the same as the volume decomposition associated with the host physics. The parallel contact global search phase is then employed to find and distribute surface entities (nodes and faces) that are needed to compute contact constraints between entities owned by different MPI ranks without further inter-rank communication. Key steps of the contact global search include computing bounding boxes, building surface entity (node and face) search trees and finding and distributing entities required to complete on-rank (local) spatial searches. To enable source-code portability and performance across a variety of different computer architectures, we implemented the algorithm using the Kokkos hardware abstraction library. While we targeted development towards machines with a GPU accelerator per MPI rank, we also report performance results for OpenMP with a conventional multi-core compute node per rank. Results here demonstrate a 47 % decrease in the time spent within the global search algorithm, comparing the reference ACME algorithm with the GPU implementation, on an 18M face problem using four MPI ranks. As a result, while further work remains to maximize performance on the GPU, this result illustrates the potential of the proposed implementation.« less
Memory Compression Techniques for Network Address Management in MPI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guo, Yanfei; Archer, Charles J.; Blocksome, Michael
MPI allows applications to treat processes as a logical collection of integer ranks for each MPI communicator, while internally translating these logical ranks into actual network addresses. In current MPI implementations the management and lookup of such network addresses use memory sizes that are proportional to the number of processes in each communicator. In this paper, we propose a new mechanism, called AV-Rankmap, for managing such translation. AV-Rankmap takes advantage of logical patterns in rank-address mapping that most applications naturally tend to have, and it exploits the fact that some parts of network address structures are naturally more performance criticalmore » than others. It uses this information to compress the memory used for network address management. We demonstrate that AV-Rankmap can achieve performance similar to or better than that of other MPI implementations while using significantly less memory.« less
NASA Astrophysics Data System (ADS)
Bonelli, Francesco; Tuttafesta, Michele; Colonna, Gianpiero; Cutrone, Luigi; Pascazio, Giuseppe
2017-10-01
This paper describes the most advanced results obtained in the context of fluid dynamic simulations of high-enthalpy flows using detailed state-to-state air kinetics. Thermochemical non-equilibrium, typical of supersonic and hypersonic flows, was modeled by using both the accurate state-to-state approach and the multi-temperature model proposed by Park. The accuracy of the two thermochemical non-equilibrium models was assessed by comparing the results with experimental findings, showing better predictions provided by the state-to-state approach. To overcome the huge computational cost of the state-to-state model, a multiple-nodes GPU implementation, based on an MPI-CUDA approach, was employed and a comprehensive code performance analysis is presented. Both the pure MPI-CPU and the MPI-CUDA implementations exhibit excellent scalability performance. GPUs outperform CPUs computing especially when the state-to-state approach is employed, showing speed-ups, of the single GPU with respect to the single-core CPU, larger than 100 in both the case of one MPI process and multiple MPI process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fang, Aiman; Laguna, Ignacio; Sato, Kento
Future high-performance computing systems may face frequent failures with their rapid increase in scale and complexity. Resilience to faults has become a major challenge for large-scale applications running on supercomputers, which demands fault tolerance support for prevalent MPI applications. Among failure scenarios, process failures are one of the most severe issues as they usually lead to termination of applications. However, the widely used MPI implementations do not provide mechanisms for fault tolerance. We propose FTA-MPI (Fault Tolerance Assistant MPI), a programming model that provides support for failure detection, failure notification and recovery. Specifically, FTA-MPI exploits a try/catch model that enablesmore » failure localization and transparent recovery of process failures in MPI applications. We demonstrate FTA-MPI with synthetic applications and a molecular dynamics code CoMD, and show that FTA-MPI provides high programmability for users and enables convenient and flexible recovery of process failures.« less
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)
2002-01-01
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
SBML-PET-MPI: a parallel parameter estimation tool for Systems Biology Markup Language based models.
Zi, Zhike
2011-04-01
Parameter estimation is crucial for the modeling and dynamic analysis of biological systems. However, implementing parameter estimation is time consuming and computationally demanding. Here, we introduced a parallel parameter estimation tool for Systems Biology Markup Language (SBML)-based models (SBML-PET-MPI). SBML-PET-MPI allows the user to perform parameter estimation and parameter uncertainty analysis by collectively fitting multiple experimental datasets. The tool is developed and parallelized using the message passing interface (MPI) protocol, which provides good scalability with the number of processors. SBML-PET-MPI is freely available for non-commercial use at http://www.bioss.uni-freiburg.de/cms/sbml-pet-mpi.html or http://sites.google.com/site/sbmlpetmpi/.
High-Performance Design Patterns for Modern Fortran
Haveraaen, Magne; Morris, Karla; Rouson, Damian; ...
2015-01-01
This paper presents ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain. We also show how Fortran, over the last few decades, has changed to become a language well-suited for state-of-the-art software development. Fortran’s new coarray distributed data structure, the language’s class mechanism, and its side-effect-free, pure procedure capability provide the scaffolding on which we implement HPC software. These features empower compilers to organize parallel computations with efficient communication. We present some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data. We implemented thesemore » patterns using coarrays and the message passing interface (MPI). We compared the codes’ complexity and performance. The MPI code is much more complex and depends on external libraries. The MPI code on Cray hardware using the Cray compiler is 1.5–2 times faster than the coarray code on the same hardware. The Intel compiler implements coarrays atop Intel’s MPI library with the result apparently being 2–2.5 times slower than manually coded MPI despite exhibiting nearly linear scaling efficiency. As compilers mature and further improvements to coarrays comes in Fortran 2015, we expect this performance gap to narrow.« less
On the Suitability of MPI as a PGAS Runtime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.; Vishnu, Abhinav; Palmer, Bruce J.
2014-12-18
Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes {\\em get, put,} and {\\em atomic memory operations}. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA,more » as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speed-up on 1008 processors over the other MPI-based designs.« less
Collignon, Barbara; Schulz, Roland; Smith, Jeremy C; Baudry, Jerome
2011-04-30
A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein-ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds.
A Case for Application Oblivious Energy-Efficient MPI Runtime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkatesh, Akshay; Vishnu, Abhinav; Hamidouche, Khaled
Power has become the major impediment in designing large scale high-end systems. Message Passing Interface (MPI) is the {\\em de facto} communication interface used as the back-end for designing applications, programming models and runtime for these systems. Slack --- the time spent by an MPI process in a single MPI call --- provides a potential for energy and power savings, if an appropriate power reduction technique such as core-idling/Dynamic Voltage and Frequency Scaling (DVFS) can be applied without perturbing application's execution time. Existing techniques that exploit slack for power savings assume that application behavior repeats across iterations/executions. However, an increasingmore » use of adaptive, data-dependent workloads combined with system factors (OS noise, congestion) makes this assumption invalid. This paper proposes and implements Energy Aware MPI (EAM) --- an application-oblivious energy-efficient MPI runtime. EAM uses a combination of communication models of common MPI primitives (point-to-point, collective, progress, blocking/non-blocking) and an online observation of slack for maximizing energy efficiency. Each power lever incurs time overhead, which must be amortized over slack to minimize degradation. When predicted communication time exceeds a lever overhead, the lever is used {\\em as soon as possible} --- to maximize energy efficiency. When mis-prediction occurs, the lever(s) are used automatically at specific intervals for amortization. We implement EAM using MVAPICH2 and evaluate it on ten applications using up to 4096 processes. Our performance evaluation on an InfiniBand cluster indicates that EAM can reduce energy consumption by 5--41\\% in comparison to the default approach, with negligible (less than 4\\% in all cases) performance loss.« less
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Jian; Hamidouche, Khaled; Zheng, Jie
2015-08-05
Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
How Formal Dynamic Verification Tools Facilitate Novel Concurrency Visualizations
NASA Astrophysics Data System (ADS)
Aananthakrishnan, Sriram; Delisi, Michael; Vakkalanka, Sarvani; Vo, Anh; Gopalakrishnan, Ganesh; Kirby, Robert M.; Thakur, Rajeev
With the exploding scale of concurrency, presenting valuable pieces of information collected by formal verification tools intuitively and graphically can greatly enhance concurrent system debugging. Traditional MPI program debuggers present trace views of MPI program executions. Such views are redundant, often containing equivalent traces that permute independent MPI calls. In our ISP formal dynamic verifier for MPI programs, we present a collection of alternate views made possible by the use of formal dynamic verification. Some of ISP’s views help pinpoint errors, some facilitate discerning errors by eliminating redundancy, while others help understand the program better by displaying concurrent even orderings that must be respected by all MPI implementations, in the form of completes-before graphs. In this paper, we describe ISP’s graphical user interface (GUI) capabilities in all these areas which are currently supported by a portable Java based GUI, a Microsoft Visual Studio GUI, and an Eclipse based GUI whose development is in progress.
Accelerating Climate Simulations Through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark
2009-01-01
Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We describe a new problem size, called Class D, for the NAS Parallel Benchmarks (NPB), whose MPI source code implementation is being released as NPB 2.4. A brief rationale is given for how the new class is derived. We also describe the modifications made to the MPI (Message Passing Interface) implementation to allow the new class to be run on systems with 32-bit integers, and with moderate amounts of memory. Finally, we give the verification values for the new problem size.
High Performance Geostatistical Modeling of Biospheric Resources
NASA Astrophysics Data System (ADS)
Pedelty, J. A.; Morisette, J. T.; Smith, J. A.; Schnase, J. L.; Crosier, C. S.; Stohlgren, T. J.
2004-12-01
We are using parallel geostatistical codes to study spatial relationships among biospheric resources in several study areas. For example, spatial statistical models based on large- and small-scale variability have been used to predict species richness of both native and exotic plants (hot spots of diversity) and patterns of exotic plant invasion. However, broader use of geostastics in natural resource modeling, especially at regional and national scales, has been limited due to the large computing requirements of these applications. To address this problem, we implemented parallel versions of the kriging spatial interpolation algorithm. The first uses the Message Passing Interface (MPI) in a master/slave paradigm on an open source Linux Beowulf cluster, while the second is implemented with the new proprietary Xgrid distributed processing system on an Xserve G5 cluster from Apple Computer, Inc. These techniques are proving effective and provide the basis for a national decision support capability for invasive species management that is being jointly developed by NASA and the US Geological Survey.
OpenGeoSys-GEMS: Hybrid parallelization of a reactive transport code with MPI and threads
NASA Astrophysics Data System (ADS)
Kosakowski, G.; Kulik, D. A.; Shao, H.
2012-04-01
OpenGeoSys-GEMS is a generic purpose reactive transport code based on the operator splitting approach. The code couples the Finite-Element groundwater flow and multi-species transport modules of the OpenGeoSys (OGS) project (http://www.ufz.de/index.php?en=18345) with the GEM-Selektor research package to model thermodynamic equilibrium of aquatic (geo)chemical systems utilizing the Gibbs Energy Minimization approach (http://gems.web.psi.ch/). The combination of OGS and the GEM-Selektor kernel (GEMS3K) is highly flexible due to the object-oriented modular code structures and the well defined (memory based) data exchange modules. Like other reactive transport codes, the practical applicability of OGS-GEMS is often hampered by the long calculation time and large memory requirements. • For realistic geochemical systems which might include dozens of mineral phases and several (non-ideal) solid solutions the time needed to solve the chemical system with GEMS3K may increase exceptionally. • The codes are coupled in a sequential non-iterative loop. In order to keep the accuracy, the time step size is restricted. In combination with a fine spatial discretization the time step size may become very small which increases calculation times drastically even for small 1D problems. • The current version of OGS is not optimized for memory use and the MPI version of OGS does not distribute data between nodes. Even for moderately small 2D problems the number of MPI processes that fit into memory of up-to-date workstations or HPC hardware is limited. One strategy to overcome the above mentioned restrictions of OGS-GEMS is to parallelize the coupled code. For OGS a parallelized version already exists. It is based on a domain decomposition method implemented with MPI and provides a parallel solver for fluid and mass transport processes. In the coupled code, after solving fluid flow and solute transport, geochemical calculations are done in form of a central loop over all finite element nodes with calls to GEMS3K and consecutive calculations of changed material parameters. In a first step the existing MPI implementation was utilized to parallelize this loop. Calculations were split between the MPI processes and afterwards data was synchronized by using MPI communication routines. Furthermore, multi-threaded calculation of the loop was implemented with help of the boost thread library (http://www.boost.org). This implementation provides a flexible environment to distribute calculations between several threads. For each MPI process at least one and up to several dozens of worker threads are spawned. These threads do not replicate the complete OGS-GEM data structure and use only a limited amount of memory. Calculation of the central geochemical loop is shared between all threads. Synchronization between the threads is done by barrier commands. The overall number of local threads times MPI processes should match the number of available computing nodes. The combination of multi-threading and MPI provides an effective and flexible environment to speed up OGS-GEMS calculations while limiting the required memory use. Test calculations on different hardware show that for certain types of applications tremendous speedups are possible.
An evaluation of MPI message rate on hybrid-core processors
Barrett, Brian W.; Brightwell, Ron; Grant, Ryan; ...
2014-11-01
Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insightmore » into how MPI implementations for future hybrid-core processors should be designed.« less
Exploiting Efficient Transpacking for One-Sided Communication and MPI-IO
NASA Astrophysics Data System (ADS)
Mir, Faisal Ghias; Träff, Jesper Larsson
Based on a construction of socalled input-output datatypes that define a mapping between non-consecutive input and output buffers, we outline an efficient method for copying of structured data. We term this operation transpacking, and show how transpacking can be applied for the MPI implementation of one-sided communication and MPI-IO. For one-sided communication via shared-memory, we demonstrate the expected performance improvements by up to a factor of two. For individual MPI-IO, the time to read or write from file dominates the overall time, but even here efficient transpacking can in some scenarios reduce file I/O time considerably. The reported results have been achieved on a single NEC SX-8 vector node.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shamis, Pavel; Graham, Richard L; Gorentla Venkata, Manjunath
The scalability and performance of collective communication operations limit the scalability and performance of many scientific applications. This paper presents two new blocking and nonblocking Broadcast algorithms for communicators with arbitrary communication topology, and studies their performance. These algorithms benefit from increased concurrency and a reduced memory footprint, making them suitable for use on large-scale systems. Measuring small, medium, and large data Broadcasts on a Cray-XT5, using 24,576 MPI processes, the Cheetah algorithms outperform the native MPI on that system by 51%, 69%, and 9%, respectively, at the same process count. These results demonstrate an algorithmic approach to the implementationmore » of the important class of collective communications, which is high performing, scalable, and also uses resources in a scalable manner.« less
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code
NASA Astrophysics Data System (ADS)
Payne, J.; McCune, D.; Prater, R.
2010-11-01
NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Saini, Subhash
1997-01-01
Compilers supporting High Performance Form (HPF) features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR), Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI) combinations will be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition, we would also present NPB, (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu CAPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, and SGI Origin2000. We would also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
Procacci, Piero
2016-06-27
We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .
NASA Astrophysics Data System (ADS)
Stuart, J. A.
2011-12-01
This paper explores the challenges in implementing a message passing interface usable on systems with data-parallel processors, and more specifically GPUs. As a case study, we design and implement the ``DCGN'' API on NVIDIA GPUs that is similar to MPI and allows full access to the underlying architecture. We introduce the notion of data-parallel thread-groups as a way to map resources to MPI ranks. We use a method that also allows the data-parallel processors to run autonomously from user-written CPU code. In order to facilitate communication, we use a sleep-based polling system to store and retrieve messages. Unlike previous systems, our method provides both performance and flexibility. By running a test suite of applications with different communication requirements, we find that a tolerable amount of overhead is incurred, somewhere between one and five percent depending on the application, and indicate the locations where this overhead accumulates. We conclude that with innovations in chipsets and drivers, this overhead will be mitigated and provide similar performance to typical CPU-based MPI implementations while providing fully-dynamic communication.
NASA Astrophysics Data System (ADS)
Rudianto, Indra; Sudarmaji
2018-04-01
We present an implementation of the spectral-element method for simulation of two-dimensional elastic wave propagation in fully heterogeneous media. We have incorporated most of realistic geological features in the model, including surface topography, curved layer interfaces, and 2-D wave-speed heterogeneity. To accommodate such complexity, we use an unstructured quadrilateral meshing technique. Simulation was performed on a GPU cluster, which consists of 24 core processors Intel Xeon CPU and 4 NVIDIA Quadro graphics cards using CUDA and MPI implementation. We speed up the computation by a factor of about 5 compared to MPI only, and by a factor of about 40 compared to Serial implementation.
Performance comparison analysis library communication cluster system using merge sort
NASA Astrophysics Data System (ADS)
Wulandari, D. A. R.; Ramadhan, M. E.
2018-04-01
Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.
NDL-v2.0: A new version of the numerical differentiation library for parallel architectures
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.
2014-07-01
We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.
Conflict Detection Algorithm to Minimize Locking for MPI-IO Atomicity
NASA Astrophysics Data System (ADS)
Sehrish, Saba; Wang, Jun; Thakur, Rajeev
Many scientific applications require high-performance concurrent I/O accesses to a file by multiple processes. Those applications rely indirectly on atomic I/O capabilities in order to perform updates to structured datasets, such as those stored in HDF5 format files. Current support for atomicity in MPI-IO is provided by locking around the operations, imposing lock overhead in all situations, even though in many cases these operations are non-overlapping in the file. We propose to isolate non-overlapping accesses from overlapping ones in independent I/O cases, allowing the non-overlapping ones to proceed without imposing lock overhead. To enable this, we have implemented an efficient conflict detection algorithm in MPI-IO using MPI file views and datatypes. We show that our conflict detection scheme incurs minimal overhead on I/O operations, making it an effective mechanism for avoiding locks when they are not needed.
Model-based phase-shifting interferometer
NASA Astrophysics Data System (ADS)
Liu, Dong; Zhang, Lei; Shi, Tu; Yang, Yongying; Chong, Shiyao; Miao, Liang; Huang, Wei; Shen, Yibing; Bai, Jian
2015-10-01
A model-based phase-shifting interferometer (MPI) is developed, in which a novel calculation technique is proposed instead of the traditional complicated system structure, to achieve versatile, high precision and quantitative surface tests. In the MPI, the partial null lens (PNL) is employed to implement the non-null test. With some alternative PNLs, similar as the transmission spheres in ZYGO interferometers, the MPI provides a flexible test for general spherical and aspherical surfaces. Based on modern computer modeling technique, a reverse iterative optimizing construction (ROR) method is employed for the retrace error correction of non-null test, as well as figure error reconstruction. A self-compiled ray-tracing program is set up for the accurate system modeling and reverse ray tracing. The surface figure error then can be easily extracted from the wavefront data in forms of Zernike polynomials by the ROR method. Experiments of the spherical and aspherical tests are presented to validate the flexibility and accuracy. The test results are compared with those of Zygo interferometer (null tests), which demonstrates the high accuracy of the MPI. With such accuracy and flexibility, the MPI would possess large potential in modern optical shop testing.
Message Passing and Shared Address Space Parallelism on an SMP Cluster
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas
2016-04-01
Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
NASA Astrophysics Data System (ADS)
Nguyen, An Hung; Guillemette, Thomas; Lambert, Andrew J.; Pickering, Mark R.; Garratt, Matthew A.
2017-09-01
Image registration is a fundamental image processing technique. It is used to spatially align two or more images that have been captured at different times, from different sensors, or from different viewpoints. There have been many algorithms proposed for this task. The most common of these being the well-known Lucas-Kanade (LK) and Horn-Schunck approaches. However, the main limitation of these approaches is the computational complexity required to implement the large number of iterations necessary for successful alignment of the images. Previously, a multi-pass image interpolation algorithm (MP-I2A) was developed to considerably reduce the number of iterations required for successful registration compared with the LK algorithm. This paper develops a kernel-warping algorithm (KWA), a modified version of the MP-I2A, which requires fewer iterations to successfully register two images and less memory space for the field-programmable gate array (FPGA) implementation than the MP-I2A. These reductions increase feasibility of the implementation of the proposed algorithm on FPGAs with very limited memory space and other hardware resources. A two-FPGA system rather than single FPGA system is successfully developed to implement the KWA in order to compensate insufficiency of hardware resources supported by one FPGA, and increase parallel processing ability and scalability of the system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorentla Venkata, Manjunath; Graham, Richard L; Ladd, Joshua S
This paper describes the design and implementation of InfiniBand (IB) CORE-Direct based blocking and nonblocking broadcast operations within the Cheetah collective operation framework. It describes a novel approach that fully ofFLoads collective operations and employs only user-supplied buffers. For a 64 rank communicator, the latency of CORE-Direct based hierarchical algorithm is better than production-grade Message Passing Interface (MPI) implementations, 150% better than the default Open MPI algorithm and 115% better than the shared memory optimized MVAPICH implementation for a one kilobyte (KB) message, and for eight mega-bytes (MB) it is 48% and 64% better, respectively. Flat-topology broadcast achieves 99.9% overlapmore » in a polling based communication-computation test, and 95.1% overlap for a wait based test, compared with 92.4% and 17.0%, respectively, for a similar Central Processing Unit (CPU) based implementation.« less
Enhancing nurses' roles to improve quality and efficiency of non-medical cardiac stress tests.
Bernhardt, Lizelle; Ross, Lisa; Greaves, Claire
Myocardial perfusion imaging (MPI) is a test that aids the diagnosis of coronary heart disease, of which pharmacological stress is a key component. An increase in demand had resulted in a 42 week waiting time for MPI in Leicester. This article looks at how implementing non-medically led stress tests reduced this waiting list. It discusses the obstacles involved and the measures needed to make the service a success.
Accelerating free breathing myocardial perfusion MRI using multi coil radial k - t SLR
NASA Astrophysics Data System (ADS)
Goud Lingala, Sajan; DiBella, Edward; Adluru, Ganesh; McGann, Christopher; Jacob, Mathews
2013-10-01
The clinical utility of myocardial perfusion MR imaging (MPI) is often restricted by the inability of current acquisition schemes to simultaneously achieve high spatio-temporal resolution, good volume coverage, and high signal to noise ratio. Moreover, many subjects often find it difficult to hold their breath for sufficiently long durations making it difficult to obtain reliable MPI data. Accelerated acquisition of free breathing MPI data can overcome some of these challenges. Recently, an algorithm termed as k - t SLR has been proposed to accelerate dynamic MRI by exploiting sparsity and low rank properties of dynamic MRI data. The main focus of this paper is to further improve k - t SLR and demonstrate its utility in considerably accelerating free breathing MPI. We extend its previous implementation to account for multi-coil radial MPI acquisitions. We perform k - t sampling experiments to compare different radial trajectories and determine the best sampling pattern. We also introduce a novel augmented Lagrangian framework to considerably improve the algorithm’s convergence rate. The proposed algorithm is validated using free breathing rest and stress radial perfusion data sets from two normal subjects and one patient with ischemia. k - t SLR was observed to provide faithful reconstructions at high acceleration levels with minimal artifacts compared to existing MPI acceleration schemes such as spatio-temporal constrained reconstruction and k - t SPARSE/SENSE.
Practical Formal Verification of MPI and Thread Programs
NASA Astrophysics Data System (ADS)
Gopalakrishnan, Ganesh; Kirby, Robert M.
Large-scale simulation codes in science and engineering are written using the Message Passing Interface (MPI). Shared memory threads are widely used directly, or to implement higher level programming abstractions. Traditional debugging methods for MPI or thread programs are incapable of providing useful formal guarantees about coverage. They get bogged down in the sheer number of interleavings (schedules), often missing shallow bugs. In this tutorial we will introduce two practical formal verification tools: ISP (for MPI C programs) and Inspect (for Pthread C programs). Unlike other formal verification tools, ISP and Inspect run directly on user source codes (much like a debugger). They pursue only the relevant set of process interleavings, using our own customized Dynamic Partial Order Reduction algorithms. For a given test harness, DPOR allows these tools to guarantee the absence of deadlocks, instrumented MPI object leaks and communication races (using ISP), and shared memory races (using Inspect). ISP and Inspect have been used to verify large pieces of code: in excess of 10,000 lines of MPI/C for ISP in under 5 seconds, and about 5,000 lines of Pthread/C code in a few hours (and much faster with the use of a cluster or by exploiting special cases such as symmetry) for Inspect. We will also demonstrate the Microsoft Visual Studio and Eclipse Parallel Tools Platform integrations of ISP (these will be available on the LiveCD).
Accelerating free breathing myocardial perfusion MRI using multi coil radial k-t SLR
Lingala, Sajan Goud; DiBella, Edward; Adluru, Ganesh; McGann, Christopher; Jacob, Mathews
2013-01-01
The clinical utility of myocardial perfusion MR imaging (MPI) is often restricted by the inability of current acquisition schemes to simultaneously achieve high spatio-temporal resolution, good volume coverage, and high signal to noise ratio. Moreover, many subjects often find it difficult to hold their breath for sufficiently long durations making it difficult to obtain reliable MPI data. Accelerated acquisition of free breathing MPI data can overcome some of these challenges. Recently, an algorithm termed as k − t SLR has been proposed to accelerate dynamic MRI by exploiting sparsity and low rank properties of dynamic MRI data. The main focus of this paper is to further improve k − t SLR and demonstrate its utility in considerably accelerating free breathing MPI. We extend its previous implementation to account for multi-coil radial MPI acquisitions. We perform k − t sampling experiments to compare different radial trajectories and determine the best sampling pattern. We also introduce a novel augmented Lagrangian framework to considerably improve the algorithm's convergence rate. The proposed algorithm is validated using free breathing rest and stress radial perfusion data sets from two normal subjects and one patient with ischemia. k − t SLR was observed to provide faithful reconstructions at high acceleration levels with minimal artifacts compared to existing MPI acceleration schemes such as spatio-temporal constrained reconstruction (STCR) and k − t SPARSE/SENSE. PMID:24077063
Parallel PAB3D: Experiences with a Prototype in MPI
NASA Technical Reports Server (NTRS)
Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul
1998-01-01
PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Computational Research Division, Lawrence Berkeley National Laboratory; NERSC, Lawrence Berkeley National Laboratory; Computer Science Department, University of California, Berkeley
2009-05-04
We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicore-specific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4x when running on dual- and quad-core Opteron dual-socket SMPs. We extend these studies to the distributed memory arena via a hybrid MPI/pthreads implementation. In addition to conventional auto-tuning at the local SMP node, we tune at the message-passing level to determine the optimal aspect ratio as well as the correct balance between MPI tasks and threads permore » MPI task. Our study presents a detailed performance analysis when moving along an isocurve of constant hardware usage: fixed total memory, total cores, and total nodes. Overall, our work points to approaches for improving intra- and inter-node efficiency on large-scale multicore systems for demanding scientific applications.« less
Zhan, X.
2005-01-01
A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.
DISP: Optimizations towards Scalable MPI Startup
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fu, Huansong; Pophale, Swaroop S; Gorentla Venkata, Manjunath
2016-01-01
Despite the popularity of MPI for high performance computing, the startup of MPI programs faces a scalability challenge as both the execution time and memory consumption increase drastically at scale. We have examined this problem using the collective modules of Cheetah and Tuned in Open MPI as representative implementations. Previous improvements for collectives have focused on algorithmic advances and hardware off-load. In this paper, we examine the startup cost of the collective module within a communicator and explore various techniques to improve its efficiency and scalability. Accordingly, we have developed a new scalable startup scheme with three internal techniques, namelymore » Delayed Initialization, Module Sharing and Prediction-based Topology Setup (DISP). Our DISP scheme greatly benefits the collective initialization of the Cheetah module. At the same time, it helps boost the performance of non-collective initialization in the Tuned module. We evaluate the performance of our implementation on Titan supercomputer at ORNL with up to 4096 processes. The results show that our delayed initialization can speed up the startup of Tuned and Cheetah by an average of 32.0% and 29.2%, respectively, our module sharing can reduce the memory consumption of Tuned and Cheetah by up to 24.1% and 83.5%, respectively, and our prediction-based topology setup can speed up the startup of Cheetah by up to 80%.« less
MPIRUN: A Portable Loader for Multidisciplinary and Multi-Zonal Applications
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Woodrow, Thomas S. (Technical Monitor)
1994-01-01
Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. One method for implementing the message passing portion of these codes is with the new Message Passing Interface (MPI) standard. Unfortunately, this standard only specifies the message passing portion of an application, but does not specify any portable mechanisms for loading an application. MPIRUN was developed to provide a portable means for loading MPI programs, and was specifically targeted at multidisciplinary and multi-zonal applications. Programs using MPIRUN for loading and MPI for message passing are then portable between all machines supported by MPIRUN. MPIRUN is currently implemented for the Intel iPSC/860, TMC CM5, IBM SP-1 and SP-2, Intel Paragon, and workstation clusters. Further, MPIRUN is designed to be simple enough to port easily to any system supporting MPI.
Influence of the ionic liquid [C4mpy][Tf2N] on the structure of the miniprotein Trp-cage.
Baker, Joseph L; Furbish, Jeffrey; Lindberg, Gerrick E
2015-11-01
We examine the effect of the ionic liquid [C4mpy][Tf2N] on the structure of the miniprotein Trp-cage and contrast these results with the behavior of Trp-cage in water. We find the ionic liquid has a dramatic effect on Trp-cage, though many similarities with aqueous Trp-cage are observed. We assess Trp-cage folding by monitoring root mean square deviation from the crystallographic structure, radius of gyration, proline cis/trans isomerization state, protein secondary structure, amino acid contact formation and distance, and native and non-native contact formation. Starting from an unfolded configuration, Trp-cage folds in water at 298 K in less than 500 ns of simulation, but has very little mobility in the ionic liquid at the same temperature, which can be ascribed to the higher ionic liquid viscosity. At 365 K, the mobility of the ionic liquid is increased and initial stages of Trp-cage folding are observed, however Trp-cage does not reach the native folded state in 2 μs of simulation in the ionic liquid. Therefore, in addition to conventional molecular dynamics, we also employ scaled molecular dynamics to expedite sampling, and we demonstrate that Trp-cage in the ionic liquid does closely approach the aqueous folded state. Interestingly, while the reduced mobility of the ionic liquid is found to restrict Trp-cage motion, the ionic liquid does facilitate proline cis/trans isomerization events that are not seen in our aqueous simulations. Copyright © 2015 Elsevier Inc. All rights reserved.
Arparsrithongsagul, Somsak; Kulsomboon, Vithaya; Zuckerman, Ilene H
2015-03-01
In Thailand, antibiotics are rampantly available in village groceries, despite the fact that it is illegal to sell antibiotics without a pharmacy license. This study implemented a multidisciplinary perspectives intervention with community involvement (MPI&CI), which was developed based on information obtained from focus groups that included multidisciplinary stakeholders. Community leaders in the intervention group were trained to implement MPI&CI in their villages. A quasi-experiment with a pretest-posttest design was conducted. Data were collected from 20 villages in Mahasarakham Province (intervention group) along with another 20 villages (comparison group). Using a generalized linear mixed model Poisson regression with repeated measures, groceries in the intervention group had 87% fewer antibiotics available at postintervention compared with preintervention (relative rate = 0.13; 95% confidence interval = 0.07-0.23), whereas the control group had only an 8% reduction in antibiotic availability (relative rate = 0.92; 95% confidence interval = 0.88-0.97) between the 2 time periods. Further study should be made to assess the sustainability and long-term effectiveness of MPI&CI. © 2013 APJPH.
NASA Technical Reports Server (NTRS)
Juang, Hann-Ming Henry; Tao, Wei-Kuo; Zeng, Xi-Ping; Shie, Chung-Lin; Simpson, Joanne; Lang, Steve
2004-01-01
The capability for massively parallel programming (MPP) using a message passing interface (MPI) has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The design for the MPP with MPI uses the concept of maintaining similar code structure between the whole domain as well as the portions after decomposition. Hence the model follows the same integration for single and multiple tasks (CPUs). Also, it provides for minimal changes to the original code, so it is easily modified and/or managed by the model developers and users who have little knowledge of MPP. The entire model domain could be sliced into one- or two-dimensional decomposition with a halo regime, which is overlaid on partial domains. The halo regime requires that no data be fetched across tasks during the computational stage, but it must be updated before the next computational stage through data exchange via MPI. For reproducible purposes, transposing data among tasks is required for spectral transform (Fast Fourier Transform, FFT), which is used in the anelastic version of the model for solving the pressure equation. The performance of the MPI-implemented codes (i.e., the compressible and anelastic versions) was tested on three different computing platforms. The major results are: 1) both versions have speedups of about 99% up to 256 tasks but not for 512 tasks; 2) the anelastic version has better speedup and efficiency because it requires more computations than that of the compressible version; 3) equal or approximately-equal numbers of slices between the x- and y- directions provide the fastest integration due to fewer data exchanges; and 4) one-dimensional slices in the x-direction result in the slowest integration due to the need for more memory relocation for computation.
High performance Python for direct numerical simulations of turbulent flows
NASA Astrophysics Data System (ADS)
Mortensen, Mikael; Langtangen, Hans Petter
2016-06-01
Direct Numerical Simulations (DNS) of the Navier Stokes equations is an invaluable research tool in fluid dynamics. Still, there are few publicly available research codes and, due to the heavy number crunching implied, available codes are usually written in low-level languages such as C/C++ or Fortran. In this paper we describe a pure scientific Python pseudo-spectral DNS code that nearly matches the performance of C++ for thousands of processors and billions of unknowns. We also describe a version optimized through Cython, that is found to match the speed of C++. The solvers are written from scratch in Python, both the mesh, the MPI domain decomposition, and the temporal integrators. The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores. A very important part of the implementation is the mesh decomposition (we implement both slab and pencil decompositions) and 3D parallel Fast Fourier Transforms (FFT). The mesh decomposition and FFT routines have been implemented in Python using serial FFT routines (either NumPy, pyFFTW or any other serial FFT module), NumPy array manipulations and with MPI communications handled by MPI for Python (mpi4py). We show how we are able to execute a 3D parallel FFT in Python for a slab mesh decomposition using 4 lines of compact Python code, for which the parallel performance on Shaheen is found to be slightly better than similar routines provided through the FFTW library. For a pencil mesh decomposition 7 lines of code is required to execute a transform.
Thomas, Dustin M; Lee, Joshua S; Charmforoush, Anthony; Rubal, Bernard J; Rosenblatt, Stephen A; Butler, Joshua T; Clemenshaw, Michael; Cheezum, Michael K; Slim, Ahmad M
2015-12-01
Small, observational trials have suggested a reduction in adjacent gastric activity with ingestion of soda water in myocardial perfusion imaging (MPI). We report our findings prior to and after implementation of soda water in 467 consecutive MPI studies. Consecutive MPI studies performed at a high-volume facility referred for vasodilator (VD) or exercise treadmill testing (ETT) were retrospectively reviewed before and after implementation of the soda water protocol. Patients undergoing the soda water protocol received 100 ml of soda water administered 30 min prior to image acquisition and after stress. Studies were performed using a same day rest/stress protocol. Incidence of adjacent gastric activity, diaphragmatic attenuation, stress and rest perfusion defects, and major adverse cardiovascular events (MACE) outcomes defined as death, myocardial infarction, stroke, reevaluation for chest pain, and late revascularization (>90 days from MPI) were abstracted using International Classification of Diseases, Ninth Revision (ICD-9) search. Two hundred and eighteen studies were performed prior to implementation of the soda water protocol and 249 studies were performed with the use of soda water. Baseline demographic data were equal between the groups with the exception of more patients undergoing VD stress receiving soda water (p < 0.001). Soda water was not associated with a decreased incidence of adjacent gastric activity with stress (54.7% versus 61.9% with no soda water, p = 0.129) or rest (68.6% versus 69.5% with no soda water, p = 0.919) imaging. Less adjacent gastric activity was observed with patients undergoing ETT who received soda water (42.5% versus 56.9% with no soda water, p = 0.031), but no difference was observed between the groups with VD stress (69.0% versus 68.1% with no soda water, p = 1.000). The use of soda water prior to technetium-99m MPI was associated with lower rates of adjacent gastric activity only in patients undergoing ETT stress but not rest or VD stress. This differs from previously published data. © The Author(s), 2015.
Fast 2D FWI on a multi and many-cores workstation.
NASA Astrophysics Data System (ADS)
Thierry, Philippe; Donno, Daniela; Noble, Mark
2014-05-01
Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.
Hybrid x-space: a new approach for MPI reconstruction.
Tateo, A; Iurino, A; Settanni, G; Andrisani, A; Stifanelli, P F; Larizza, P; Mazzia, F; Mininni, R M; Tangaro, S; Bellotti, R
2016-06-07
Magnetic particle imaging (MPI) is a new medical imaging technique capable of recovering the distribution of superparamagnetic particles from their measured induced signals. In literature there are two main MPI reconstruction techniques: measurement-based (MB) and x-space (XS). The MB method is expensive because it requires a long calibration procedure as well as a reconstruction phase that can be numerically costly. On the other side, the XS method is simpler than MB but the exact knowledge of the field free point (FFP) motion is essential for its implementation. Our simulation work focuses on the implementation of a new approach for MPI reconstruction: it is called hybrid x-space (HXS), representing a combination of the previous methods. Specifically, our approach is based on XS reconstruction because it requires the knowledge of the FFP position and velocity at each time instant. The difference with respect to the original XS formulation is how the FFP velocity is computed: we estimate it from the experimental measurements of the calibration scans, typical of the MB approach. Moreover, a compressive sensing technique is applied in order to reduce the calibration time, setting a fewer number of sampling positions. Simulations highlight that HXS and XS methods give similar results. Furthermore, an appropriate use of compressive sensing is crucial for obtaining a good balance between time reduction and reconstructed image quality. Our proposal is suitable for open geometry configurations of human size devices, where incidental factors could make the currents, the fields and the FFP trajectory irregular.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grant, Ryan E.; Barrett, Brian W.; Pedretti, Kevin
The Portals reference implementation is based on the Portals 4.X API, published by Sandia National Laboratories as a freely available public document. It is designed to be an implementation of the Portals Networking Application Programming Interface and is used by several other upper layer protocols like SHMEM, GASNet and MPI. It is implemented over existing networks, specifically Ethernet and InfiniBand networks. This implementation provides Portals networks functionality and serves as a software emulation of Portals compliant networking hardware. It can be used to develop software using the Portals API prior to the debut of Portals networking hardware, such as Bull’smore » BXI interconnect, as well as a substitute for portals hardware on development platforms that do not have Portals compliant hardware. The reference implementation provides new capabilities beyond that of a typical network, namely the ability to have messages matched in hardware in a way compatible with upper layer software such as MPI or SHMEM. It also offers methods of offloading network operations via triggered operations, which can be used to create offloaded collective operations. Specific details on the Portals API can be found at http://portals4.org.« less
Multilevel Parallelization of AutoDock 4.2.
Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P
2011-04-28
Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
Hybrid cloud and cluster computing paradigms for life science applications
2010-01-01
Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Hybrid cloud and cluster computing paradigms for life science applications.
Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey
2010-12-21
Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Accelerating DNA analysis applications on GPU clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less
Compiled MPI: Cost-Effective Exascale Applications Development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Quinlan, D; Lumsdaine, A
2012-04-10
The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Yan, Jerry
1999-01-01
We present an HPF (High Performance Fortran) implementation of ARC3D code along with the profiling and performance data on SGI Origin 2000. Advantages and limitations of HPF as a parallel programming language for CFD applications are discussed. For achieving good performance results we used the data distributions optimized for implementation of implicit and explicit operators of the solver and boundary conditions. We compare the results with MPI and directive based implementations.
Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias
2011-10-01
Future multiscale and multiphysics models that support research into human disease, translational medical science, and treatment can utilize the power of high-performance computing (HPC) systems. We anticipate that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message-passing processes [e.g., the message-passing interface (MPI)] with multithreading (e.g., OpenMP, Pthreads). The objective of this study is to compare the performance of such hybrid programming models when applied to the simulation of a realistic physiological multiscale model of the heart. Our results show that the hybrid models perform favorably when compared to an implementation using only the MPI and, furthermore, that OpenMP in combination with the MPI provides a satisfactory compromise between performance and code complexity. Having the ability to use threads within MPI processes enables the sophisticated use of all processor cores for both computation and communication phases. Considering that HPC systems in 2012 will have two orders of magnitude more cores than what was used in this study, we believe that faster than real-time multiscale cardiac simulations can be achieved on these systems.
Using WEED to simulate the global wetland distribution in a ESM
NASA Astrophysics Data System (ADS)
Stacke, Tobias; Hagemann, Stefan
2016-04-01
Lakes and wetlands are an important land surface feature. In terms of hydrology, they regulate river discharge, mitigate flood events and constitute a significant surface water storage. Considering physical processes, they link the surface water and energy balances by altering the separation of incoming energy into sensible and latent heat fluxes. Finally, they impact biogeochemical processes and may act as carbon sinks or sources. Most global hydrology and climate models regard wetland extent and properties as constant in time. However, to study interactions between wetlands and different states of climate, it is necessary to implement surface water bodies (thereafter referred to as wetlands) with dynamical behavior into these models. Besides an improved representation of geophysical feedbacks between wetlands, land surface and atmosphere, a dynamical wetland scheme could also provide estimates of soil wetness as input for biogeochemical models, which are used to compute methane production in wetlands. Recently, a model for the representation of wetland extent dynamics (WEED) was developed as part of the hydrology model (MPI-HM) of the Max-Planck-Institute for Meteorology (MPI-M). The WEED scheme computes wetland extent in agreement with the range of observations for the high northern latitudes. It simulates a realistic seasonal cycle which shows sensitivity to northern snow-melt as well as rainy seasons in the tropics. Furthermore, flood peaks in river discharge are mitigated. However, the WEED scheme overestimates wetland extent in the Tropics which might be related to the MPI-HM's simplified potential evapotranspiration computation. In order to overcome this limitation, the WEED scheme is implemented into the MPI-M's land surface model JSBACH. Thus, not only its effect on water fluxes can be investigated but also its impact on the energy cycle, which is not included in the MPI-HM. Furthermore, it will be possible to analyze the physical effects of wetlands in a coupled land-atmosphere simulation. First simulations with JSBACH-WEED show results similar to the MPI-HM simulations. As the next step, the scheme is modified to account for energy cycle relevant issues such as the dynamical alteration of surface albedo as well as the allocation of appropriate thermal properties to the wetlands. In our presentation, we will give an overview on the functionality of the WEED scheme and the effect of wetlands in coupled land-atmosphere simulations.
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hjelm, Nathan Thomas; Pritchard, Howard Porter
These are a series of slides for a presentation for ExxonMobil's visit to Los Alamos National Laboratory. Topics covered are: Open MPI - The Release Story, MPI-3 RMA in Open MPI, MPI dynamic process management and Open MPI, and new options with CLE 6. Open MPI RMA features are: since v2.0.0 full support for the MPI-3.1 specification, support for non-contiguous datatypes, support for direct use of the RDMA capabilities of high performance networks (Cray Gemini/Aries, Infiniband), starting in v2.1.0 will have support for using network atomic operations for MPI_Fetch_and_op and MPI_Compare_and_swap, tested with MPI_THREAD_MULTIPLE.
Development of instrumentation for differential spectroscopic measurements at millimeter wavelengths
NASA Astrophysics Data System (ADS)
D'Alessandro, G.; de Bernardis, P.; Masi, S.; Schillaci, A.
2016-07-01
The study of the spectral-spatial anisotropy of the high-latitude mm-wave sky is a powerful tool of cosmology. It can be used to provide deep insight in the Sunyaev-Zeldovich (SZ) effect, the Cosmic Infrared Background, the anisotropy of the CMB, using the spectral dimension to provide substantially increased information with respect to what is achievable by means of standard multiband photometry. Here we focus on spectral measurements of the SZ effect. Large mm-wave telescopes are now routinely mapping photometrically the SZ effect in a number of clusters, estimating the comptonisation parameter and using them as cosmological probes. Low-resolution spectroscopic measurements of the SZ effect would be very effective in removing the degeneracy between parameters inevitable in photometric measurements. We describe a real-world implementation of this measurement strategy, based on an imaging, efficient, differential Fourier transform spectrometer (FTS). The instrument is based on a Martin-Puplett interferometer (MPI) configuration. We combined two MPIs working synchronously to use the entire input power. In our implementation the observed sky field is divided into two halves along the meridian. Each half-field corresponds to one of the two input ports of the MPI. Each detector in the FTS focal planes measures the difference in brightness between two sky pixels, symmetrically located with respect to the meridian. Exploiting the high common mode rejection of the MPI, tiny sky brightness gradients embedded in an overwhelming isotropic background might be measured. We investigate experimentally the common-mode rejection achievable in the MPI at mm wavelengths, and discuss the use of such an instrument to measure the spectrum of cosmic microwave background (CMB) anisotropy and the SZ effect.
A Locality-Based Threading Algorithm for the Configuration-Interaction Method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Johnson, Calvin
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
A Locality-Based Threading Algorithm for the Configuration-Interaction Method
Shan, Hongzhang; Williams, Samuel; Johnson, Calvin; ...
2017-07-03
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
On the Performance of an Algebraic MultigridSolver on Multicore Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, A H; Schulz, M; Yang, U M
2010-04-29
Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG's performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread and process pinning and correct memory associations. We have implemented most of the techniques in a MultiCore SUPport library (MCSup), which helps to map OpenMP applications to multicore machines. We present results using both an MPI-only and a hybrid MPI/OpenMP model.
Toward Abstracting the Communication Intent in Applications to Improve Portability and Productivity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mintz, Tiffany M; Hernandez, Oscar R; Kartsaklis, Christos
Programming with communication libraries such as the Message Passing Interface (MPI) obscures the high-level intent of the communication in an application and makes static communication analysis difficult to do. Compilers are unaware of communication libraries specifics, leading to the exclusion of communication patterns from any automated analysis and optimizations. To overcome this, communication patterns can be expressed at higher-levels of abstraction and incrementally added to existing MPI applications. In this paper, we propose the use of directives to clearly express the communication intent of an application in a way that is not specific to a given communication library. Our communicationmore » directives allow programmers to express communication among processes in a portable way, giving hints to the compiler on regions of computations that can be overlapped with communication and relaxing communication constraints on the ordering, completion and synchronization of the communication imposed by specific libraries such as MPI. The directives can then be translated by the compiler into message passing calls that efficiently implement the intended pattern and be targeted to multiple communication libraries. Thus far, we have used the directives to express point-to-point communication patterns in C, C++ and Fortran applications, and have translated them to MPI and SHMEM.« less
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Balley, David H. (Technical Monitor)
1997-01-01
Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
Performance Analysis of Ivshmem for High-Performance Computing in Virtual Machines
NASA Astrophysics Data System (ADS)
Ivanovic, Pavle; Richter, Harald
2018-01-01
High-Performance computing (HPC) is rarely accomplished via virtual machines (VMs). In this paper, we present a remake of ivshmem which can change this. Ivshmem was a shared memory (SHM) between virtual machines on the same server, with SHM-access synchronization included, until about 5 years ago when newer versions of Linux and its virtualization library libvirt evolved. We restored that SHM-access synchronization feature because it is indispensable for HPC and made ivshmem runnable with contemporary versions of Linux, libvirt, KVM, QEMU and especially MPICH, which is an implementation of MPI - the standard HPC communication library. Additionally, MPICH was transparently modified by us to get ivshmem included, resulting in a three to ten times performance improvement compared to TCP/IP. Furthermore, we have transparently replaced MPI_PUT, a single-side MPICH communication mechanism, by an own MPI_PUT wrapper. As a result, our ivshmem even surpasses non-virtualized SHM data transfers for block lengths greater than 512 KBytes, showing the benefits of virtualization. All improvements were possible without using SR-IOV.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
Fiechter, Michael; Ghadri, Jelena R; Wolfrum, Mathias; Kuest, Silke M; Pazhenkottil, Aju P; Nkoulou, Rene N; Herzog, Bernhard A; Gebhard, Cathérine; Fuchs, Tobias A; Gaemperli, Oliver; Kaufmann, Philipp A
2012-03-01
Low yield of invasive coronary angiography and unnecessary coronary interventions have been identified as key cost drivers in cardiology for evaluation of coronary artery disease (CAD). This has fuelled the search for noninvasive techniques providing comprehensive functional and anatomical information on coronary lesions. We have evaluated the impact of implementation of a novel hybrid cadmium-zinc-telluride (CZT)/64-slice CT camera into the daily clinical routine on downstream resource utilization. Sixty-two patients with known or suspected CAD were referred for same-day single-session hybrid evaluation with CZT myocardial perfusion imaging (MPI) and coronary CT angiography (CCTA). Hybrid MPI/CCTA images from the integrated CZT/CT camera served for decision-making towards conservative versus invasive management. Based on the hybrid images patients were classified into those with and those without matched findings. Matched findings were defined as the combination of MPI defect with a stenosis by CCTA in the coronary artery subtending the respective territory. All patients with normal MPI and CCTA as well as those with isolated MPI or CCTA finding or combined but unmatched findings were categorized as "no match". All 23 patients with a matched finding underwent invasive coronary angiography and 21 (91%) were revascularized. Of the 39 patients with no match, 5 (13%, p < 0.001 vs matched) underwent catheterization and 3 (8%, p < 0.001 vs matched) were revascularized. Cardiac hybrid imaging in CAD evaluation has a profound impact on patient management and may contribute to optimal downstream resource utilization.
Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton
2014-11-11
We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
Pascual, Thomas N B; Mercuri, Mathew; El-Haj, Noura; Bom, Henry Hee-Sung; Lele, Vikram; Al-Mallah, Mouaz H; Luxenburg, Osnat; Karthikeyan, Ganesan; Vitola, Joao; Mahmarian, John J; Better, Nathan; Shaw, Leslee J; Rehani, Madan M; Kashyap, Ravi; Paez, Diana; Dondi, Maurizio; Einstein, Andrew J
2017-03-24
This paper examines the current status of radiation exposure to patients in myocardial perfusion imaging (MPI) in Asia.Methods and Results:Laboratories voluntarily provided information on MPI performed over a 1-week period. Eight best practice criteria regarding MPI were predefined by an expert panel. Implementation of ≥6 best practices (quality index [QI] ≥6) was pre-specified as a desirable goal for keeping radiation exposure at a low level. Radiation effective dose (ED) in 1,469 patients and QI of 69 laboratories in Asia were compared against data from 239 laboratories in the rest of the world (RoW). Mean ED was significantly higher in Asia (11.4 vs. 9.6 mSv; P<0.0001), with significantly lower doses in South-East vs. East Asia (9.7 vs. 12.7 mSv; P<0.0001). QI in Asia was lower than in RoW. In comparison with RoW, Asian laboratories used thallium more frequently, used weight-based technetium dosing less frequently, and trended towards a lower rate of stress-only imaging. MPI radiation dose in Asia is higher than that in the RoW and linked to less consistent use of laboratory best practices such as avoidance of thallium, weight-based dosing, and use of stress-only imaging. Given that MPI is performed in Asia within a diverse array of medical contexts, laboratory-specific adoption of best practices offers numerous opportunities to improve quality of care.
SurF: an innovative framework in biosecurity and animal health surveillance evaluation.
Muellner, Petra; Watts, Jonathan; Bingham, Paul; Bullians, Mark; Gould, Brendan; Pande, Anjali; Riding, Tim; Stevens, Paul; Vink, Daan; Stärk, Katharina Dc
2018-05-16
Surveillance for biosecurity hazards is being conducted by the New Zealand Competent Authority, the Ministry for Primary Industries (MPI) to support New Zealand's biosecurity system. Surveillance evaluation should be an integral part of the surveillance life cycle, as it provides a means to identify and correct problems and to sustain and enhance the existing strengths of a surveillance system. The surveillance evaluation Framework (SurF) presented here was developed to provide a generic framework within which the MPI biosecurity surveillance portfolio, and all of its components, can be consistently assessed. SurF is an innovative, cross-sectoral effort that aims to provide a common umbrella for surveillance evaluation in the animal, plant, environment and aquatic sectors. It supports the conduct of the following four distinct components of an evaluation project: (i) motivation for the evaluation, (ii) scope of the evaluation, (iii) evaluation design and implementation and (iv) reporting and communication of evaluation outputs. Case studies, prepared by MPI subject matter experts, are included in the framework to guide users in their assessment. Three case studies were used in the development of SurF in order to assure practical utility and to confirm usability of SurF across all included sectors. It is anticipated that the structured approach and information provided by SurF will not only be of benefit to MPI but also to other New Zealand stakeholders. Although SurF was developed for internal use by MPI, it could be applied to any surveillance system in New Zealand or elsewhere. © 2018 2018 The Authors. Transboundary and Emerging Diseases Published by Blackwell Verlag GmbH.
NASA Astrophysics Data System (ADS)
Romanova, Vanya; Hense, Andreas; Wahl, Sabrina; Brune, Sebastian; Baehr, Johanna
2016-04-01
The decadal variability and its predictability of the surface net freshwater fluxes is compared in a set of retrospective predictions, all using the same model setup, and only differing in the implemented ocean initialisation method and ensemble generation method. The basic aim is to deduce the differences between the initialization/ensemble generation methods in view of the uncertainty of the verifying observational data sets. The analysis will give an approximation of the uncertainties of the net freshwater fluxes, which up to now appear to be one of the most uncertain products in observational data and model outputs. All ensemble generation methods are implemented into the MPI-ESM earth system model in the framework of the ongoing MiKlip project (www.fona-miklip.de). Hindcast experiments are initialised annually between 2000-2004, and from each start year 10 ensemble members are initialized for 5 years each. Four different ensemble generation methods are compared: (i) a method based on the Anomaly Transform method (Romanova and Hense, 2015) in which the initial oceanic perturbations represent orthogonal and balanced anomaly structures in space and time and between the variables taken from a control run, (ii) one-day-lagged ocean states from the MPI-ESM-LR baseline system (iii) one-day-lagged of ocean and atmospheric states with preceding full-field nudging to re-analysis in both the atmospheric and the oceanic component of the system - the baseline one MPI-ESM-LR system, (iv) an Ensemble Kalman Filter (EnKF) implemented into oceanic part of MPI-ESM (Brune et al. 2015), assimilating monthly subsurface oceanic temperature and salinity (EN3) using the Parallel Data Assimilation Framework (PDAF). The hindcasts are evaluated probabilistically using fresh water flux data sets from four different reanalysis data sets: MERRA, NCEP-R1, GFDL ocean reanalysis and GECCO2. The assessments show no clear differences in the evaluations scores on regional scales. However, on the global scale the physically motivated methods (i) and (iv) provide probabilistic hindcasts with a consistently higher reliability than the lagged initialization methods (ii)/(iii) despite the large uncertainties in the verifying observations and in the simulations.
Processing MPI Datatypes Outside MPI
NASA Astrophysics Data System (ADS)
Ross, Robert; Latham, Robert; Gropp, William; Lusk, Ewing; Thakur, Rajeev
The MPI datatype functionality provides a powerful tool for describing structured memory and file regions in parallel applications, enabling noncontiguous data to be operated on by MPI communication and I/O routines. However, no facilities are provided by the MPI standard to allow users to efficiently manipulate MPI datatypes in their own codes.
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
Users manual for the Chameleon parallel programming tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, W.; Smith, B.
1993-06-01
Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highlymore » portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.« less
NASA Astrophysics Data System (ADS)
Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin
2016-06-01
CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
2017-10-30
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
X-space MPI: magnetic nanoparticles for safe medical imaging.
Goodwill, Patrick William; Saritas, Emine Ulku; Croft, Laura Rose; Kim, Tyson N; Krishnan, Kannan M; Schaffer, David V; Conolly, Steven M
2012-07-24
One quarter of all iodinated contrast X-ray clinical imaging studies are now performed on Chronic Kidney Disease (CKD) patients. Unfortunately, the iodine contrast agent used in X-ray is often toxic to CKD patients' weak kidneys, leading to significant morbidity and mortality. Hence, we are pioneering a new medical imaging method, called Magnetic Particle Imaging (MPI), to replace X-ray and CT iodinated angiography, especially for CKD patients. MPI uses magnetic nanoparticle contrast agents that are much safer than iodine for CKD patients. MPI already offers superb contrast and extraordinary sensitivity. The iron oxide nanoparticle tracers required for MPI are also used in MRI, and some are already approved for human use, but the contrast agents are far more effective at illuminating blood vessels when used in the MPI modality. We have recently developed a systems theoretic framework for MPI called x-space MPI, which has already dramatically improved the speed and robustness of MPI image reconstruction. X-space MPI has allowed us to optimize the hardware for fi ve MPI scanners. Moreover, x-space MPI provides a powerful framework for optimizing the size and magnetic properties of the iron oxide nanoparticle tracers used in MPI. Currently MPI nanoparticles have diameters in the 10-20 nanometer range, enabling millimeter-scale resolution in small animals. X-space MPI theory predicts that larger nanoparticles could enable up to 250 micrometer resolution imaging, which would represent a major breakthrough in safe imaging for CKD patients.
System identification of the JPL micro-precision interferometer truss - Test-analysis reconciliation
NASA Technical Reports Server (NTRS)
Red-Horse, J. R.; Marek, E. L.; Levine-West, M.
1993-01-01
The JPL Micro-Precision Interferometer (MPI) is a testbed for studying the use of control-structure interaction technology in the design of space-based interferometers. A layered control architecture will be employed to regulate the interferometer optical system to tolerances in the nanometer range. An important aspect of designing and implementing the control schemes for such a system is the need for high fidelity, test-verified analytical structural models. This paper focuses on one aspect of the effort to produce such a model for the MPI structure, test-analysis model reconciliation. Pretest analysis, modal testing, and model refinement results are summarized for a series of tests at both the component and full system levels.
NASA Astrophysics Data System (ADS)
Yan, Beichuan; Regueiro, Richard A.
2018-02-01
A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Einstein, Andrew J; Pascual, Thomas N B; Mercuri, Mathew; Karthikeyan, Ganesan; Vitola, João V; Mahmarian, John J; Better, Nathan; Bouyoucef, Salah E; Hee-Seung Bom, Henry; Lele, Vikram; Magboo, V Peter C; Alexánderson, Erick; Allam, Adel H; Al-Mallah, Mouaz H; Flotats, Albert; Jerome, Scott; Kaufmann, Philipp A; Luxenburg, Osnat; Shaw, Leslee J; Underwood, S Richard; Rehani, Madan M; Kashyap, Ravi; Paez, Diana; Dondi, Maurizio
2015-07-07
To characterize patient radiation doses from nuclear myocardial perfusion imaging (MPI) and the use of radiation-optimizing 'best practices' worldwide, and to evaluate the relationship between laboratory use of best practices and patient radiation dose. We conducted an observational cross-sectional study of protocols used for all 7911 MPI studies performed in 308 nuclear cardiology laboratories in 65 countries for a single week in March-April 2013. Eight 'best practices' relating to radiation exposure were identified a priori by an expert committee, and a radiation-related quality index (QI) devised indicating the number of best practices used by a laboratory. Patient radiation effective dose (ED) ranged between 0.8 and 35.6 mSv (median 10.0 mSv). Average laboratory ED ranged from 2.2 to 24.4 mSv (median 10.4 mSv); only 91 (30%) laboratories achieved the median ED ≤ 9 mSv recommended by guidelines. Laboratory QIs ranged from 2 to 8 (median 5). Both ED and QI differed significantly between laboratories, countries, and world regions. The lowest median ED (8.0 mSv), in Europe, coincided with high best-practice adherence (mean laboratory QI 6.2). The highest doses (median 12.1 mSv) and low QI (4.9) occurred in Latin America. In hierarchical regression modelling, patients undergoing MPI at laboratories following more 'best practices' had lower EDs. Marked worldwide variation exists in radiation safety practices pertaining to MPI, with targeted EDs currently achieved in a minority of laboratories. The significant relationship between best-practice implementation and lower doses indicates numerous opportunities to reduce radiation exposure from MPI globally. © The Author 2015. Published by Oxford University Press on behalf of the European Society of Cardiology.
Einstein, Andrew J.; Pascual, Thomas N. B.; Mercuri, Mathew; Karthikeyan, Ganesan; Vitola, João V.; Mahmarian, John J.; Better, Nathan; Bouyoucef, Salah E.; Hee-Seung Bom, Henry; Lele, Vikram; Magboo, V. Peter C.; Alexánderson, Erick; Allam, Adel H.; Al-Mallah, Mouaz H.; Flotats, Albert; Jerome, Scott; Kaufmann, Philipp A.; Luxenburg, Osnat; Shaw, Leslee J.; Underwood, S. Richard; Rehani, Madan M.; Kashyap, Ravi; Paez, Diana; Dondi, Maurizio
2015-01-01
Aims To characterize patient radiation doses from nuclear myocardial perfusion imaging (MPI) and the use of radiation-optimizing ‘best practices’ worldwide, and to evaluate the relationship between laboratory use of best practices and patient radiation dose. Methods and results We conducted an observational cross-sectional study of protocols used for all 7911 MPI studies performed in 308 nuclear cardiology laboratories in 65 countries for a single week in March–April 2013. Eight ‘best practices’ relating to radiation exposure were identified a priori by an expert committee, and a radiation-related quality index (QI) devised indicating the number of best practices used by a laboratory. Patient radiation effective dose (ED) ranged between 0.8 and 35.6 mSv (median 10.0 mSv). Average laboratory ED ranged from 2.2 to 24.4 mSv (median 10.4 mSv); only 91 (30%) laboratories achieved the median ED ≤ 9 mSv recommended by guidelines. Laboratory QIs ranged from 2 to 8 (median 5). Both ED and QI differed significantly between laboratories, countries, and world regions. The lowest median ED (8.0 mSv), in Europe, coincided with high best-practice adherence (mean laboratory QI 6.2). The highest doses (median 12.1 mSv) and low QI (4.9) occurred in Latin America. In hierarchical regression modelling, patients undergoing MPI at laboratories following more ‘best practices’ had lower EDs. Conclusion Marked worldwide variation exists in radiation safety practices pertaining to MPI, with targeted EDs currently achieved in a minority of laboratories. The significant relationship between best-practice implementation and lower doses indicates numerous opportunities to reduce radiation exposure from MPI globally. PMID:25898845
NASA Astrophysics Data System (ADS)
Schillaci, Alessandro; D'Alessandro, Giuseppe; de Bernardis, Paolo; Masi, Silvia; Paiva Novaes, Camila; Gervasi, Massimo; Zannoni, Mario
2014-05-01
Context. Precision measurements of the Sunyaev-Zel'dovich effect in clusters of galaxies require excellent rejection of common-mode signals and wide frequency coverage. Aims: We describe an imaging, efficient, differential Fourier transform spectrometer (FTS), optimized for measurements of faint brightness gradients at millimeter wavelengths. Methods: Our instrument is based on a Martin-Puplett interferometer (MPI) configuration. We combined two MPIs working synchronously to use the whole input power. In our implementation the observed sky field is divided into two halves along the meridian, and each half-field corresponds to one of the two input ports of the MPI. In this way, each detector in the FTS focal planes measures the difference in brightness between two sky pixels, symmetrically located with respect to the meridian. Exploiting the high common-mode rejection of the MPI, we can measure low sky brightness gradients over a high isotropic background. Results: The instrument works in the range ~1-20 cm-1 (30-600 GHz), has a maximum spectral resolution 1 / (2 OPD) = 0.063 cm-1 (1.9 GHz), and an unvignetted throughput of 2.3 cm2sr. It occupies a volume of 0.7 × 0.7 × 0.33 m3 and has a weight of 70 kg. This design can be implemented as a cryogenic unit to be used in space, as well as a room-temperature unit working at the focus of suborbital and ground-based mm-wave telescopes. The first in-flight test of the instrument is with the OLIMPO experiment on a stratospheric balloon; a larger implementation is being prepared for the Sardinia radio telescope.
Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; ...
2017-10-04
The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overallmore » memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sitaraman, Hariswaran; Grout, Ray
2015-10-30
The load balancing strategies for hybrid solvers that involve grid based partial differential equation solution coupled with particle tracking are presented in this paper. A typical Message Passing Interface (MPI) based parallelization of grid based solves are done using a spatial domain decomposition while particle tracking is primarily done using either of the two techniques. One of the techniques is to distribute the particles to MPI ranks to whose grid they belong to while the other is to share the particles equally among all ranks, irrespective of their spatial location. The former technique provides spatial locality for field interpolation butmore » cannot assure load balance in terms of number of particles, which is achieved by the latter. The two techniques are compared for a case of particle tracking in a homogeneous isotropic turbulence box as well as a turbulent jet case. We performed a strong scaling study for more than 32,000 cores, which results in particle densities representative of anticipated exascale machines. The use of alternative implementations of MPI collectives and efficient load equalization strategies are studied to reduce data communication overheads.« less
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Epidemic failure detection and consensus for extreme parallelism
Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas; ...
2017-02-01
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM
NASA Astrophysics Data System (ADS)
de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl
2002-03-01
We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A. T.
2014-04-20
This code adds an implementation of PMIX_Ring to the existing PM12 Library in the SLURM open source software package (Simple Linux Utility for Resource Management). PMIX_Ring executes a particular communication pattern that is used to bootstrap connections between MPI processes in a parallel job.
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry
1999-01-01
As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI
Donato, David I.
2017-01-01
In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.
NASA Astrophysics Data System (ADS)
Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.
2016-04-01
We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.
High-Performance Data Analysis Tools for Sun-Earth Connection Missions
NASA Technical Reports Server (NTRS)
Messmer, Peter
2011-01-01
The data analysis tool of choice for many Sun-Earth Connection missions is the Interactive Data Language (IDL) by ITT VIS. The increasing amount of data produced by these missions and the increasing complexity of image processing algorithms requires access to higher computing power. Parallel computing is a cost-effective way to increase the speed of computation, but algorithms oftentimes have to be modified to take advantage of parallel systems. Enhancing IDL to work on clusters gives scientists access to increased performance in a familiar programming environment. The goal of this project was to enable IDL applications to benefit from both computing clusters as well as graphics processing units (GPUs) for accelerating data analysis tasks. The tool suite developed in this project enables scientists now to solve demanding data analysis problems in IDL that previously required specialized software, and it allows them to be solved orders of magnitude faster than on conventional PCs. The tool suite consists of three components: (1) TaskDL, a software tool that simplifies the creation and management of task farms, collections of tasks that can be processed independently and require only small amounts of data communication; (2) mpiDL, a tool that allows IDL developers to use the Message Passing Interface (MPI) inside IDL for problems that require large amounts of data to be exchanged among multiple processors; and (3) GPULib, a tool that simplifies the use of GPUs as mathematical coprocessors from within IDL. mpiDL is unique in its support for the full MPI standard and its support of a broad range of MPI implementations. GPULib is unique in enabling users to take advantage of an inexpensive piece of hardware, possibly already installed in their computer, and achieve orders of magnitude faster execution time for numerically complex algorithms. TaskDL enables the simple setup and management of task farms on compute clusters. The products developed in this project have the potential to interact, so one can build a cluster of PCs, each equipped with a GPU, and use mpiDL to communicate between the nodes and GPULib to accelerate the computations on each node.
Parallelization of Rocket Engine Simulator Software (PRESS)
NASA Technical Reports Server (NTRS)
Cezzar, Ruknet
1998-01-01
We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10/18/99). At the least, the research would need to be done on Windows 95/Windows NT based platforms. Moreover, with the acquisition of Lahey Fortran package for PC platform, and the existing Borland C + + 5. 0, we can do work on C + + wrapper issues. We have carefully studied the blueprint for Space Transportation Propulsion Integrated Design Environment for the next 25 years [13] and found the inclusion of HBCUs in that effort encouraging. Especially in the long period for which a map is provided, there is no doubt that HBCUs will grow and become better equipped to do meaningful research. In the shorter period, as was suggested in our presentation at the HBCU conference, some key decisions regarding the aging Fortran based software for rocket propellants will need to be made. One important issue is whether or not object oriented languages such as C + + or Java should be used for distributed computing. Whether or not "distributed computing" is necessary for the existing software is yet another, larger, question to be tackled with.
MPI Runtime Error Detection with MUST: Advances in Deadlock Detection
Hilbrich, Tobias; Protze, Joachim; Schulz, Martin; ...
2013-01-01
The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require ( p ) analysis time permore » MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.« less
Scalable Algorithms for Parallel Discrete Event Simulation Systems in Multicore Environments
2013-05-01
consolidated at the sender side. At the receiver side, the messages are deconsolidated and delivered to the appropriate thread. This approach bears some...Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. Panda . Performance comparison of mpi implementations over infiniband, myrinet and quadrics
Scalable Unix commands for parallel processors : a high-performance implementation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ong, E.; Lusk, E.; Gropp, W.
2001-06-22
We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, Adam
2007-05-22
MpiGraph consists of an MPI application called mpiGraph written in C to measure message bandwidth and an associated crunch_mpiGraph script written in Perl to process the application output into an HTMO report. The mpiGraph application is designed to inspect the health and scalability of a high-performance interconnect while under heavy load. This is useful to detect hardware and software problems in a system, such as slow nodes, links, switches, or contention in switch routing. It is also useful to characterize how interconnect performance changes with different settings or how one interconnect type compares to another.
High-performance iron oxide nanoparticles for magnetic particle imaging - guided hyperthermia (hMPI)
NASA Astrophysics Data System (ADS)
Bauer, Lisa M.; Situ, Shu F.; Griswold, Mark A.; Samia, Anna Cristina S.
2016-06-01
Magnetic particle imaging (MPI) is an emerging imaging modality that allows the direct and quantitative mapping of iron oxide nanoparticles. In MPI, the development of tailored iron oxide nanoparticle tracers is paramount to achieving high sensitivity and good spatial resolution. To date, most MPI tracers being developed for potential clinical applications are based on spherical undoped magnetite nanoparticles. For the first time, we report on the systematic investigation of the effects of changes in chemical composition and shape anisotropy on the MPI performance of iron oxide nanoparticle tracers. We observed a 2-fold enhancement in MPI signal through selective doping of magnetite nanoparticles with zinc. Moreover, we demonstrated focused magnetic hyperthermia heating by adapting the field gradient used in MPI. By saturating the iron oxide nanoparticles outside of a field free region (FFR) with an external static field, we can selectively heat a target region in our test sample. By comparing zinc-doped magnetite cubic nanoparticles with undoped spherical nanoparticles, we could show a 5-fold improvement in the specific absorption rate (SAR) in magnetic hyperthermia while providing good MPI signal, thereby demonstrating the potential for high-performance focused hyperthermia therapy through an MPI-guided approach (hMPI).Magnetic particle imaging (MPI) is an emerging imaging modality that allows the direct and quantitative mapping of iron oxide nanoparticles. In MPI, the development of tailored iron oxide nanoparticle tracers is paramount to achieving high sensitivity and good spatial resolution. To date, most MPI tracers being developed for potential clinical applications are based on spherical undoped magnetite nanoparticles. For the first time, we report on the systematic investigation of the effects of changes in chemical composition and shape anisotropy on the MPI performance of iron oxide nanoparticle tracers. We observed a 2-fold enhancement in MPI signal through selective doping of magnetite nanoparticles with zinc. Moreover, we demonstrated focused magnetic hyperthermia heating by adapting the field gradient used in MPI. By saturating the iron oxide nanoparticles outside of a field free region (FFR) with an external static field, we can selectively heat a target region in our test sample. By comparing zinc-doped magnetite cubic nanoparticles with undoped spherical nanoparticles, we could show a 5-fold improvement in the specific absorption rate (SAR) in magnetic hyperthermia while providing good MPI signal, thereby demonstrating the potential for high-performance focused hyperthermia therapy through an MPI-guided approach (hMPI). Electronic supplementary information (ESI) available: Detailed IONP synthetic methods, description of magnetic particle relaxometer set-up, TEM of reference IONP (Senior Scientific PrecisionMRX™ 25 nm oleic acid-coated nanoparticles), concentration dependent PSF of all IONP samples, PSF and SAR of Zn-Sph and Zn-Cube mixture sample, upper right quadrant of field-dependent hysteresis curve labelled with static field strengths, and the magnetic hyperthermia temperature profiles with and without the presence of external magnetic fields. See DOI: 10.1039/c6nr01877g
NASA Astrophysics Data System (ADS)
Sudarmaji; Rudianto, Indra; Eka Nurcahya, Budi
2018-04-01
A strong tectonic earthquake with a magnitude of 5.9 Richter scale has been occurred in Yogyakarta and Central Java on May 26, 2006. The earthquake has caused severe damage in Yogyakarta and the southern part of Central Java, Indonesia. The understanding of seismic response of earthquake among ground shaking and the level of building damage is important. We present numerical modeling of 3D seismic wave propagation around Yogyakarta and the southern part of Central Java using spectral-element method on MPI-GPU (Graphics Processing Unit) computer cluster to observe its seismic response due to the earthquake. The homogeneous 3D realistic model is generated with detailed topography surface. The influences of free surface topography and layer discontinuity of the 3D model among the seismic response are observed. The seismic wave field is discretized using spectral-element method. The spectral-element method is solved on a mesh of hexahedral elements that is adapted to the free surface topography and the internal discontinuity of the model. To increase the data processing capabilities, the simulation is performed on a GPU cluster with implementation of MPI (Message Passing Interface).
Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fiala, David J; Mueller, Frank; Engelmann, Christian
Faults have become the norm rather than the exception for high-end computing on clusters with 10s/100s of thousands of cores. Exacerbating this situation, some of these faults remain undetected, manifesting themselves as silent errors that corrupt memory while applications continue to operate and report incorrect results. This paper studies the potential for redundancy to both detect and correct soft errors in MPI message-passing applications. Our study investigates the challenges inherent to detecting soft errors within MPI application while providing transparent MPI redundancy. By assuming a model wherein corruption in application data manifests itself by producing differing MPI message data betweenmore » replicas, we study the best suited protocols for detecting and correcting MPI data that is the result of corruption. To experimentally validate our proposed detection and correction protocols, we introduce RedMPI, an MPI library which resides in the MPI profiling layer. RedMPI is capable of both online detection and correction of soft errors that occur in MPI applications without requiring any modifications to the application source by utilizing either double or triple redundancy. Our results indicate that our most efficient consistency protocol can successfully protect applications experiencing even high rates of silent data corruption with runtime overheads between 0% and 30% as compared to unprotected applications without redundancy. Using our fault injector within RedMPI, we observe that even a single soft error can have profound effects on running applications, causing a cascading pattern of corruption in most cases causes that spreads to all other processes. RedMPI's protection has been shown to successfully mitigate the effects of soft errors while allowing applications to complete with correct results even in the face of errors.« less
Sharma, Parichit; Mantri, Shrikant S
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.
Sharma, Parichit; Mantri, Shrikant S.
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Towards quantitative magnetic particle imaging: A comparison with magnetic particle spectroscopy
NASA Astrophysics Data System (ADS)
Paysen, Hendrik; Wells, James; Kosch, Olaf; Steinhoff, Uwe; Trahms, Lutz; Schaeffter, Tobias; Wiekhorst, Frank
2018-05-01
Magnetic Particle Imaging (MPI) is a quantitative imaging modality with promising features for several biomedical applications. Here, we study quantitatively the raw data obtained during MPI measurements. We present a method for the calibration of the MPI scanner output using measurements from a magnetic particle spectrometer (MPS) to yield data in units of magnetic moments. The calibration technique is validated in a simplified MPI mode with a 1D excitation field. Using the calibrated results from MPS and MPI, we determine and compare the detection limits for each system. The detection limits were found to be 5.10-12 Am2 for MPS and 3.6.10-10 Am2 for MPI. Finally, the quantitative information contained in a standard MPI measurement with a 3D excitation is analyzed and compared to the previous results, showing a decrease in signal amplitudes of the odd harmonics related to the case of 1D excitation. We propose physical explanations for all acquired results; and discuss the possible benefits for the improvement of MPI technology.
Performance Analysis of Scientific and Engineering Applications Using MPInside and TAU
NASA Technical Reports Server (NTRS)
Saini, Subhash; Mehrotra, Piyush; Taylor, Kenichi Jun Haeng; Shende, Sameer Suresh; Biswas, Rupak
2010-01-01
In this paper, we present performance analysis of two NASA applications using performance tools like Tuning and Analysis Utilities (TAU) and SGI MPInside. MITgcmUV and OVERFLOW are two production-quality applications used extensively by scientists and engineers at NASA. MITgcmUV is a global ocean simulation model, developed by the Estimating the Circulation and Climate of the Ocean (ECCO) Consortium, for solving the fluid equations of motion using the hydrostatic approximation. OVERFLOW is a general-purpose Navier-Stokes solver for computational fluid dynamics (CFD) problems. Using these tools, we analyze the MPI functions (MPI_Sendrecv, MPI_Bcast, MPI_Reduce, MPI_Allreduce, MPI_Barrier, etc.) with respect to message size of each rank, time consumed by each function, and how ranks communicate. MPI communication is further analyzed by studying the performance of MPI functions used in these two applications as a function of message size and number of cores. Finally, we present the compute time, communication time, and I/O time as a function of the number of cores.
DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation
NASA Astrophysics Data System (ADS)
Sousbie, Thierry
2018-01-01
DICE is a C++ template library designed to solve collisionless fluid dynamics in 6D phase space using massively parallel supercomputers via an hybrid OpenMP/MPI parallelization. ColDICE, based on DICE, implements a cosmological and physical VLASOV-POISSON solver for cold systems such as dark matter (CDM) dynamics.
2009-08-01
event of a fire. The mesh prevents cracking to the steel substrate, which would reduce the insulating properties of the char. The procedure is as...Top Coats: MPI #9, Exterior Alkyd Enamel , Gloss, MPI Gloss Level 6 (i.e., a semi-gloss) • System 2: o Primer: MPI #23, Surface Tolerant Metal...Metal Primer X X MPI Paint #9 Exterior Alkyd Enamel , Gloss X MPI Paint #94 Exterior Alkyd
Hierarchical Parallelization of Gene Differential Association Analysis
2011-01-01
Background Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Results Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. Conclusions The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels. PMID:21936916
Hierarchical parallelization of gene differential association analysis.
Needham, Mark; Hu, Rui; Dwarkadas, Sandhya; Qiu, Xing
2011-09-21
Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.
NASA Technical Reports Server (NTRS)
Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Hierarchical Petascale Simulation Framework For Stress Corrosion Cracking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grama, Ananth
2013-12-18
A number of major accomplishments resulted from the project. These include: • Data Structures, Algorithms, and Numerical Methods for Reactive Molecular Dynamics. We have developed a range of novel data structures, algorithms, and solvers (amortized ILU, Spike) for use with ReaxFF and charge equilibration. • Parallel Formulations of ReactiveMD (Purdue ReactiveMolecular Dynamics Package, PuReMD, PuReMD-GPU, and PG-PuReMD) for Messaging, GPU, and GPU Cluster Platforms. We have developed efficient serial, parallel (MPI), GPU (Cuda), and GPU Cluster (MPI/Cuda) implementations. Our implementations have been demonstrated to be significantly better than the state of the art, both in terms of performance and scalability.more » • Comprehensive Validation in the Context of Diverse Applications. We have demonstrated the use of our software in diverse systems, including silica-water, silicon-germanium nanorods, and as part of other projects, extended it to applications ranging from explosives (RDX) to lipid bilayers (biomembranes under oxidative stress). • Open Source Software Packages for Reactive Molecular Dynamics. All versions of our soft- ware have been released over the public domain. There are over 100 major research groups worldwide using our software. • Implementation into the Department of Energy LAMMPS Software Package. We have also integrated our software into the Department of Energy LAMMPS software package.« less
MPI-Defrost: Extension of Defrost to MPI-based Cluster Environment
NASA Astrophysics Data System (ADS)
Amin, Mustafa A.; Easther, Richard; Finkel, Hal
2011-06-01
MPI-Defrost extends Frolov’s Defrost to an MPI-based cluster environment. This version has been restricted to a single field. Restoring two-field support should be straightforward, but will require some code changes. Some output options may also not be fully supported under MPI. This code was produced to support our own work, and has been made available for the benefit of anyone interested in either oscillon simulations or an MPI capable version of Defrost, and it is provided on an "as-is" basis. Andrei Frolov is the primary developer of Defrost and we thank him for placing his work under the GPL (GNU Public License), and thus allowing us to distribute this modified version.
Magnetic Particle Imaging for Real-Time Perfusion Imaging in Acute Stroke.
Ludewig, Peter; Gdaniec, Nadine; Sedlacik, Jan; Forkert, Nils D; Szwargulski, Patryk; Graeser, Matthias; Adam, Gerhard; Kaul, Michael G; Krishnan, Kannan M; Ferguson, R Matthew; Khandhar, Amit P; Walczak, Piotr; Fiehler, Jens; Thomalla, Götz; Gerloff, Christian; Knopp, Tobias; Magnus, Tim
2017-10-24
The fast and accurate assessment of cerebral perfusion is fundamental for the diagnosis and successful treatment of stroke patients. Magnetic particle imaging (MPI) is a new radiation-free tomographic imaging method with a superior temporal resolution, compared to other conventional imaging methods. In addition, MPI scanners can be built as prehospital mobile devices, which require less complex infrastructure than computed tomography (CT) and magnetic resonance imaging (MRI). With these advantages, MPI could accelerate the stroke diagnosis and treatment, thereby improving outcomes. Our objective was to investigate the capabilities of MPI to detect perfusion deficits in a murine model of ischemic stroke. Cerebral ischemia was induced by inserting of a microfilament in the internal carotid artery in C57BL/6 mice, thereby blocking the blood flow into the medial cerebral artery. After the injection of a contrast agent (superparamagnetic iron oxide nanoparticles) specifically tailored for MPI, cerebral perfusion and vascular anatomy were assessed by the MPI scanner within seconds. To validate and compare our MPI data, we performed perfusion imaging with a small animal MRI scanner. MPI detected the perfusion deficits in the ischemic brain, which were comparable to those with MRI but in real-time. For the first time, we showed that MPI could be used as a diagnostic tool for relevant diseases in vivo, such as an ischemic stroke. Due to its shorter image acquisition times and increased temporal resolution compared to that of MRI or CT, we expect that MPI offers the potential to improve stroke imaging and treatment.
Dantas, Roberto Nery; Assuncao, Antonildes Nascimento; Marques, Ismar Aguiar; Fahel, Mateus Guimaraes; Nomura, Cesar Higa; Avila, Luiz Francisco Rodrigues; Giorgi, Maria Clementina Pinto; Soares, Jose; Meneghetti, Jose Claudio; Parga, Jose Rodrigues
2018-06-01
Despite advances in non-invasive myocardial perfusion imaging (MPI) evaluation, computed tomography (CT) multiphase MPI protocols have not yet been compared with the highly accurate rubidium-82 positron emission tomography ( 82 RbPET) MPI. Thus, this study aimed to evaluate agreement between 82 RbPET and 320-detector row CT (320-CT) MPI using a multiphase protocol in suspected CAD patients. Forty-four patients referred for MPI evaluation were prospectively enrolled and underwent dipyridamole stress 82 RbPET and multiphase 320-CT MPI (five consecutive volumetric acquisitions during stress). Statistical analyses were performed using the R software. There was high agreement for recognizing summed stress scores ≥ 4 (kappa 0.77, 95% CI 0.55-0.98, p < 0.001) and moderate for detecting SDS ≥ 2 (kappa 0.51, 95% CI 0.23-0.80, p < 0.001). In a per segment analysis, agreement was high for the presence of perfusion defects during stress and rest (kappa 0.75 and 0.82, respectively) and was moderate for impairment severity (kappa 0.58 and 0.65, respectively). The 320-CT protocol was safe, with low radiation burden (9.3 ± 2.4 mSv). There was a significant agreement between dipyridamole stress 320-CT MPI and 82 RbPET MPI in the evaluation of suspected CAD patients of intermediate risk. The multiphase 320-CT MPI protocol was feasible, diagnostic and with relatively low radiation exposure. • Rubidium-82 PET and 320-MDCT can perform MPI studies for CAD investigation. • There is high agreement between rubidium-82 PET and 320-MDCT for MPI assessment. • Multiphase CT perfusion protocols are feasible and with low radiation. • Multiphase CT perfusion protocols can identify image artefacts.
Salamon, Johannes; Hofmann, Martin; Jung, Caroline; Kaul, Michael Gerhard; Werner, Franziska; Them, Kolja; Reimer, Rudolph; Nielsen, Peter; Vom Scheidt, Annika; Adam, Gerhard; Knopp, Tobias; Ittrich, Harald
2016-01-01
In-vitro evaluation of the feasibility of 4D real time tracking of endovascular devices and stenosis treatment with a magnetic particle imaging (MPI) / magnetic resonance imaging (MRI) road map approach and an MPI-guided approach using a blood pool tracer. A guide wire and angioplasty-catheter were labeled with a thin layer of magnetic lacquer. For real time MPI a custom made software framework was developed. A stenotic vessel phantom filled with saline or superparamagnetic iron oxide nanoparticles (MM4) was equipped with bimodal fiducial markers for co-registration in preclinical 7T MRI and MPI. In-vitro angioplasty was performed inflating the balloon with saline or MM4. MPI data were acquired using a field of view of 37.3×37.3×18.6 mm3 and a frame rate of 46 volumes/sec. Analysis of the magnetic lacquer-marks on the devices were performed with electron microscopy, atomic absorption spectrometry and micro-computed tomography. Magnetic marks allowed for MPI/MRI guidance of interventional devices. Bimodal fiducial markers enable MPI/MRI image fusion for MRI based roadmapping. MRI roadmapping and the blood pool tracer approach facilitate MPI real time monitoring of in-vitro angioplasty. Successful angioplasty was verified with MPI and MRI. Magnetic marks consist of micrometer sized ferromagnetic plates mainly composed of iron and iron oxide. 4D real time MP imaging, tracking and guiding of endovascular instruments and in-vitro angioplasty is feasible. In addition to an approach that requires a blood pool tracer, MRI based roadmapping might emerge as a promising tool for radiation free 4D MPI-guided interventions.
NASA Astrophysics Data System (ADS)
Jung, C.; Salamon, J.; Hofmann, M.; Kaul, M. G.; Adam, G.; Ittrich, H.; Knopp, T.
2016-03-01
Purpose: The goal of this study was to achieve a real time 3D visualisation of the murine cardiovascular system by intravenously injected superparamagnetic nanoparticles using Magnetic particle imaging (MPI). Material and Methods: MPI scans of FVB mice were performed using a 3D imaging sequence (1T/m gradient strength, 10mT drive-field strength). A dynamic scan with a temporal resolution of 21.5ms per 3D volume acquisition was performed. 50μl ferucarbotran (Resovist®, Bayer Healthcare AG) were injected into the tail vein after baseline MPI measurements. As MPI delivers no anatomic information, MRI scans at a 7T ClinScan (Bruker) were performed using a T2-weighted 2D TSE sequence. The reconstruction of the MPI data was performed on the MPI console (ParaVision 6.0/MPI, Bruker). Image fusion was done using additional image processing software (Imalytics, Philips). The dynamic information was extracted using custom software developed in the Julia programming environment. Results: The combined MRI-MPI measurements were carried out successfully. MPI data clearly demonstrated the passage of the SPIO tracer through the inferior vena cava, the heart and finally the liver. By co-registration with MRI the anatomical regions were identified. Due to the volume frame rate of about 46 volumes per second a signal modulation with the frequency of the heart beat was detectable and a heart beat of 520 beats per minute (bpm) has been assumed. Moreover, the blood flow velocity of approximately 5cm/s in the vena cava has been estimated. Conclusions: The high temporal resolution of MPI allows real-time imaging and bolus tracking of intravenous injected nanoparticles and offers a real time tool to assess blood flow velocity.
Evaluation of SuperLU on multicore architectures
NASA Astrophysics Data System (ADS)
Li, X. S.
2008-07-01
The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different high-performance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We included both Pthreads and MPI implementations in this study and found that the Pthreads implementation consistently delivers good performance and that a left-looking algorithm is usually superior.
Parallel implementation of approximate atomistic models of the AMOEBA polarizable model
NASA Astrophysics Data System (ADS)
Demerdash, Omar; Head-Gordon, Teresa
2016-11-01
In this work we present a replicated data hybrid OpenMP/MPI implementation of a hierarchical progression of approximate classical polarizable models that yields speedups of up to ∼10 compared to the standard OpenMP implementation of the exact parent AMOEBA polarizable model. In addition, our parallel implementation exhibits reasonable weak and strong scaling. The resulting parallel software will prove useful for those who are interested in how molecular properties converge in the condensed phase with respect to the MBE, it provides a fruitful test bed for exploring different electrostatic embedding schemes, and offers an interesting possibility for future exascale computing paradigms.
On a model of three-dimensional bursting and its parallel implementation
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.
2008-04-01
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
Parallel processing optimization strategy based on MapReduce model in cloud storage environment
NASA Astrophysics Data System (ADS)
Cui, Jianming; Liu, Jiayi; Li, Qiuyan
2017-05-01
Currently, a large number of documents in the cloud storage process employed the way of packaging after receiving all the packets. From the local transmitter this stored procedure to the server, packing and unpacking will consume a lot of time, and the transmission efficiency is low as well. A new parallel processing algorithm is proposed to optimize the transmission mode. According to the operation machine graphs model work, using MPI technology parallel execution Mapper and Reducer mechanism. It is good to use MPI technology to implement Mapper and Reducer parallel mechanism. After the simulation experiment of Hadoop cloud computing platform, this algorithm can not only accelerate the file transfer rate, but also shorten the waiting time of the Reducer mechanism. It will break through traditional sequential transmission constraints and reduce the storage coupling to improve the transmission efficiency.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oxberry, Geoffrey
Google Test MPI Listener is a plugin for the Google Test c++ unit testing library that organizes test output of software that uses both the MPI parallel programming model and Google Test. Typically, such output is ordered arbitrarily and disorganized, making difficult the process of interpreting test output. This plug organizes output in MPI rank order, enabling easy interpretation of test results.
Final report: Compiled MPI. Cost-Effective Exascale Application Development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, William Douglas
2015-12-21
This is the final report on Compiled MPI: Cost-Effective Exascale Application Development, and summarizes the results under this project. The project investigated runtime enviroments that improve the performance of MPI (Message-Passing Interface) programs; work at Illinois in the last period of this project looked at optimizing data access optimizations expressed with MPI datatypes.
Psychometric evaluation of the Spanish version of the MPI-SCI.
Soler, M D; Cruz-Almeida, Y; Saurí, J; Widerström-Noga, E G
2013-07-01
Postal surveys. To confirm the factor structure of the Spanish version of the MPI-SCI (MPI-SCI-S, Multidimensional Pain Inventory in the SCI population) and to test its internal consistency and construct validity in a Spanish population. Guttmann Institute, Barcelona, Spain. The MPI-SCI-S along with Spanish measures of pain intensity (Numerical Rating Scale), pain interference (Brief Pain Inventory), functional independence (Functional Independence Measure), depression (Beck Depression Inventory), locus of control (Multidimensional health Locus of Control), support (Functional Social Support Questionnaire (Duke-UNC)), psychological well-being (Psychological Global Well-Being Index) and demographic/injury characteristics were assessed in persons with spinal cord injury (SCI) and chronic pain (n=126). Confirmatory factor analysis suggested an adequate factor structure for the MPI-SCI-S. The internal consistency of the MPI-SCI-S subscales ranged from acceptable (r=0.66, Life Control) to excellent (r=0.94, Life Interference). All MPI-SCI-S subscales showed adequate construct validity, with the exception of the Negative and Solicitous Responses subscales. The Spanish version of the MPI-SCI is adequate for evaluating chronic pain impact following SCI in a Spanish-speaking population. Future studies should include additional measures of pain-related support in the Spanish-speaking SCI population.
Relaxation in x-space magnetic particle imaging.
Croft, Laura R; Goodwill, Patrick W; Conolly, Steven M
2012-12-01
Magnetic particle imaging (MPI) is a new imaging modality that noninvasively images the spatial distribution of superparamagnetic iron oxide nanoparticles (SPIOs). MPI has demonstrated high contrast and zero attenuation with depth, and MPI promises superior safety compared to current angiography methods, X-ray, computed tomography, and magnetic resonance imaging angiography. Nanoparticle relaxation can delay the SPIO magnetization, and in this work we investigate the open problem of the role relaxation plays in MPI scanning and its effect on the image. We begin by amending the x-space theory of MPI to include nanoparticle relaxation effects. We then validate the amended theory with experiments from a Berkeley x-space relaxometer and a Berkeley x-space projection MPI scanner. Our theory and experimental data indicate that relaxation reduces SNR and asymmetrically blurs the image in the scanning direction. While relaxation effects can have deleterious effects on the MPI scan, we show theoretically and experimentally that x-space reconstruction remains robust in the presence of relaxation. Furthermore, the role of relaxation in x-space theory provides guidance as we develop methods to minimize relaxation-induced blurring. This will be an important future area of research for the MPI community.
Javadi, Hamid; Jallalat, Sara; Semnani, Shahriar; Mogharrabi, Mehdi; Nabipour, Iraj; Abbaszadeh, Moloud; Assadi, Majid
2013-01-01
False-positive findings with myocardial perfusion imaging (MPI) have frequently been identified in the presence of left bundle branch block (LBBB) and tend to lower the accuracy of MPI in individuals with normal coronary angiographs. Pharmacologic stress is recognized as the preferred method for MPI in patients with LBBB. In contrast, very few studies have evaluated the effect of right bundle branch block (RBBB) on MPI, and there is no consensus regarding the selection of pharmacologic versus exercise stress during MPI for the RBBB patient. In this study, we present a 45-year-old man with RBBB, who has a normal coronary artery angiography, but who showed abnormal myocardial perfusion with exercise MPI, and normal perfusion on dipyridamole MPI. The aim of the study is to stimulate awareness that the stress method selected for patients with RBBB can potentially interfere with the accuracy of the data.
MPI_XSTAR: MPI-based parallelization of XSTAR program
NASA Astrophysics Data System (ADS)
Danehkar, A.
2017-12-01
MPI_XSTAR parallelizes execution of multiple XSTAR runs using Message Passing Interface (MPI). XSTAR (ascl:9910.008), part of the HEASARC's HEAsoft (ascl:1408.004) package, calculates the physical conditions and emission spectra of ionized gases. MPI_XSTAR invokes XSTINITABLE from HEASoft to generate a job list of XSTAR commands for given physical parameters. The job list is used to make directories in ascending order, where each individual XSTAR is spawned on each processor and outputs are saved. HEASoft's XSTAR2TABLE program is invoked upon the contents of each directory in order to produce table model FITS files for spectroscopy analysis tools.
Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...
2016-11-16
Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
Woo, Minjeong; Wood, Connor; Kwon, Doyoon; Park, Kyu-Ho Paul; Fejer, György; Delorme, Vincent
2018-01-01
Lung alveolar macrophages (AMs) are in the first line of immune defense against respiratory pathogens and play key roles in the pathogenesis of Mycobacterium tuberculosis ( Mtb ) in humans. Nevertheless, AMs are available only in limited amounts for in vitro studies, which hamper the detailed molecular understanding of host- Mtb interactions in these macrophages. The recent establishment of the self-renewing and primary Max Planck Institute (MPI) cells, functionally very close to lung AMs, opens unique opportunities for in vitro studies of host-pathogen interactions in respiratory diseases. Here, we investigated the suitability of MPI cells as a host cell system for Mtb infection. Bacterial, cellular, and innate immune features of MPI cells infected with Mtb were characterized. Live bacteria were readily internalized and efficiently replicated in MPI cells, similarly to primary murine macrophages and other cell lines. MPI cells were also suitable for the determination of anti-tuberculosis (TB) drug activity. The primary innate immune response of MPI cells to live Mtb showed significantly higher and earlier induction of the pro-inflammatory cytokines TNFα, interleukin 6 (IL-6), IL-1α, and IL-1β, as compared to stimulation with heat-killed (HK) bacteria. MPI cells previously showed a lack of induction of the anti-inflammatory cytokine IL-10 to a wide range of stimuli, including HK Mtb . By contrast, we show here that live Mtb is able to induce significant amounts of IL-10 in MPI cells. Autophagy experiments using light chain 3B immunostaining, as well as LysoTracker labeling of acidic vacuoles, demonstrated that MPI cells efficiently control killed Mtb by elimination through phagolysosomes. MPI cells were also able to accumulate lipid droplets in their cytoplasm following exposure to lipoproteins. Collectively, this study establishes the MPI cells as a relevant, versatile host cell model for TB research, allowing a deeper understanding of AMs functions in this pathology.
ConnectX2 In niBand Management Queues: New support for Network Of oaded
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Richard L; Poole, Stephen W; Shamis, Pavel
2010-01-01
This paper introduces the newly developed InfiniBand (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data dependency description provided by the user. This functionality is supported by the ConnectX-2 HCA, and provides the means for delegating collective communication management and progress to the HCA, also known as collective communication offload. This provides a means for overlapping collective communications managed by the HCAmore » and computation on the Central Processing Unit (CPU), thus making it possible to reduce the impact of system noise on parallel applications using collective operations. This paper further describes how this new capability can be used to implement scalable Message Passing Interface (MPI) collective operations, describing the high level details of how this new capability is used to implement the MPI Barrier collective operation, focusing on the latency sensitive performance aspects of this new capability. This paper concludes with small scale benchmark experiments comparing implementations of the barrier collective operation, using the new network offload capabilities, with established point-to-point based implementations of these same algorithms, which manage the data flow using the central processing unit. These early results demonstrate the promise this new capability provides to improve the scalability of high-performance applications using collective communications. The latency of the HCA based implementation of the barrier is similar to that of the best performing point-to-point based implementation managed by the central processing unit, starting to outperform these as the number of processes involved in the collective operation increases.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Richard L; Poole, Stephen W; Shamis, Pavel
2010-01-01
This paper introduces the newly developed Infini-Band (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data dependency description provided by the user. This functionality is supported by the ConnectX-2 HCA, and provides the means for delegating collective communication management and progress to the HCA, also known as collective communication offload. This provides a means for overlapping collective communications managed by the HCAmore » and computation on the Central Processing Unit (CPU), thus making it possible to reduce the impact of system noise on parallel applications using collective operations. This paper further describes how this new capability can be used to implement scalable Message Passing Interface (MPI) collective operations, describing the high level details of how this new capability is used to implement the MPI Barrier collective operation, focusing on the latency sensitive performance aspects of this new capability. This paper concludes with small scale benchmark experiments comparing implementations of the barrier collective operation, using the new network offload capabilities, with established point-to-point based implementations of these same algorithms, which manage the data flow using the central processing unit. These early results demonstrate the promise this new capability provides to improve the scalability of high performance applications using collective communications. The latency of the HCA based implementation of the barrier is similar to that of the best performing point-to-point based implementation managed by the central processing unit, starting to outperform these as the number of processes involved in the collective operation increases.« less
Mathematical analysis of the 1D model and reconstruction schemes for magnetic particle imaging
NASA Astrophysics Data System (ADS)
Erb, W.; Weinmann, A.; Ahlborg, M.; Brandt, C.; Bringout, G.; Buzug, T. M.; Frikel, J.; Kaethner, C.; Knopp, T.; März, T.; Möddel, M.; Storath, M.; Weber, A.
2018-05-01
Magnetic particle imaging (MPI) is a promising new in vivo medical imaging modality in which distributions of super-paramagnetic nanoparticles are tracked based on their response in an applied magnetic field. In this paper we provide a mathematical analysis of the modeled MPI operator in the univariate situation. We provide a Hilbert space setup, in which the MPI operator is decomposed into simple building blocks and in which these building blocks are analyzed with respect to their mathematical properties. In turn, we obtain an analysis of the MPI forward operator and, in particular, of its ill-posedness properties. We further get that the singular values of the MPI core operator decrease exponentially. We complement our analytic results by some numerical studies which, in particular, suggest a rapid decay of the singular values of the MPI operator.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Automatic Thread-Level Parallelization in the Chombo AMR Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christen, Matthias; Keen, Noel; Ligocki, Terry
2011-05-26
The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less
Temperature dependence in magnetic particle imaging
NASA Astrophysics Data System (ADS)
Wells, James; Paysen, Hendrik; Kosch, Olaf; Trahms, Lutz; Wiekhorst, Frank
2018-05-01
Experimental results are presented demonstrating how temperature can influence the dynamics of magnetic nanoparticles (MNPs) in liquid suspension, when exposed to alternating magnetic fields in the kilohertz frequency range. The measurements used to probe the nanoparticle systems are directly linked to both the emerging biomedical technique of magnetic particle imaging (MPI), and to the recently proposed concept of remote nanoscale thermometry using MNPs under AC field excitation. Here, we report measurements on three common types of MNPs, two of which are currently leading candidates for use as tracers in MPI. Using highly-sensitive magnetic particle spectroscopy (MPS), we demonstrate significant and divergent thermal dependences in several key measures used in the evaluation of MNP dynamics for use in MPI and other applications. The temperature range studied was between 296 and 318 Kelvin, making our findings of particular importance for MPI and other biomedical technologies. Furthermore, we report the detection of the same temperature dependences in measurements conducted using the detection coils within an operational preclinical MPI scanner. This clearly shows the importance of considering temperature during MPI development, and the potential for temperature-resolved MPI using this system. We propose possible physical explanations for the differences in the behaviors observed between the different particle types, and discuss our results in terms of the opportunities and concerns they raise for MPI and other MNP based technologies.
Parallel Implementation of the Discontinuous Galerkin Method
NASA Technical Reports Server (NTRS)
Baggag, Abdalkader; Atkins, Harold; Keyes, David
1999-01-01
This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2016-01-01
In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Characterization of UMT2013 Performance on Advanced Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howell, Louis
2014-12-31
This paper presents part of a larger effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. The focus here is on UMT2013, a proxy implementation of deterministic transport for unstructured meshes. I present weak and strong MPI scaling results and studies of OpenMP efficiency on the Sequoia BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. The hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while informationmore » from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Preliminary tests that exploit NVRAM as extended memory on an Ivy Bridge machine designed for “Big Data” applications are also included.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele
2014-11-11
has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gittens, Alex; Devarakonda, Aditya; Racah, Evan
We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausibility), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to 1.6TB particle physics, 2.2TB and 16TB climate modeling and 1.1TB bioimaging data. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark’s data parallel model. We perform scalingmore » experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.« less
Effect of central hypothyroidism on Doppler-derived myocardial performance index.
Doin, Fabio Luiz Casanova; Borges, Mariana da Rosa; Campos, Orlando; de Camargo Carvalho, Antonio Carlos; de Paola, Angelo Amato Vincenzo; Paiva, Marcelo Goulart; Abucham, Julio; Moises, Valdir Ambrosio
2004-06-01
Myocardial performance index (MPI) has been used to assess global ventricular function in different types of cardiac disease. Thyroid hormones influence cardiac performance directly and indirectly by changes in peripheral circulation. The aim of this study was to evaluate the possible effect of central hypothyroidism (CH) on MPI. The study included 28 control subjects and 7 patients with CH without cardiac disease. MPI was defined as the sum of isovolumetric contraction time (ICT) and isovolumetric relaxation time divided by ejection time. Patients were submitted to hormonal therapy with thyroxin and the study was repeated after 35 to 42 days. MPI was significantly higher in patients with CH (0.54 +/- 0.08) than in control subjects (0.40 +/- 0.05) (P =.002). The increase in MPI was caused by the prolongation of ICT without a significant variation of isovolumetric relaxation time and ejection time. After hormonal therapy there was a significant reduction of MPI (0.54 +/- 0.08 vs 0.42 +/- 0.07; P =.028) and ICT. MPI was increased in patients with untreated CH. The increase was related to prolongation of ICT and reverted by hormonal therapy.
Addressing the challenges of standalone multi-core simulations in molecular dynamics
NASA Astrophysics Data System (ADS)
Ocaya, R. O.; Terblans, J. J.
2017-07-01
Computational modelling in material science involves mathematical abstractions of force fields between particles with the aim to postulate, develop and understand materials by simulation. The aggregated pairwise interactions of the material's particles lead to a deduction of its macroscopic behaviours. For practically meaningful macroscopic scales, a large amount of data are generated, leading to vast execution times. Simulation times of hours, days or weeks for moderately sized problems are not uncommon. The reduction of simulation times, improved result accuracy and the associated software and hardware engineering challenges are the main motivations for many of the ongoing researches in the computational sciences. This contribution is concerned mainly with simulations that can be done on a "standalone" computer based on Message Passing Interfaces (MPI), parallel code running on hardware platforms with wide specifications, such as single/multi- processor, multi-core machines with minimal reconfiguration for upward scaling of computational power. The widely available, documented and standardized MPI library provides this functionality through the MPI_Comm_size (), MPI_Comm_rank () and MPI_Reduce () functions. A survey of the literature shows that relatively little is written with respect to the efficient extraction of the inherent computational power in a cluster. In this work, we discuss the main avenues available to tap into this extra power without compromising computational accuracy. We also present methods to overcome the high inertia encountered in single-node-based computational molecular dynamics. We begin by surveying the current state of the art and discuss what it takes to achieve parallelism, efficiency and enhanced computational accuracy through program threads and message passing interfaces. Several code illustrations are given. The pros and cons of writing raw code as opposed to using heuristic, third-party code are also discussed. The growing trend towards graphical processor units and virtual computing clouds for high-performance computing is also discussed. Finally, we present the comparative results of vacancy formation energy calculations using our own parallelized standalone code called Verlet-Stormer velocity (VSV) operating on 30,000 copper atoms. The code is based on the Sutton-Chen implementation of the Finnis-Sinclair pairwise embedded atom potential. A link to the code is also given.
Tycho 2: A Proxy Application for Kinetic Transport Sweeps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrett, Charles Kristopher; Warsa, James S.
2016-09-14
Tycho 2 is a proxy application that implements discrete ordinates (SN) kinetic transport sweeps on unstructured, 3D, tetrahedral meshes. It has been designed to be small and require minimal dependencies to make collaboration and experimentation as easy as possible. Tycho 2 has been released as open source software. The software is currently in a beta release with plans for a stable release (version 1.0) before the end of the year. The code is parallelized via MPI across spatial cells and OpenMP across angles. Currently, several parallelization algorithms are implemented.
The Wang Landau parallel algorithm for the simple grids. Optimizing OpenMPI parallel implementation
NASA Astrophysics Data System (ADS)
Kussainov, A. S.
2017-12-01
The Wang Landau Monte Carlo algorithm to calculate density of states for the different simple spin lattices was implemented. The energy space was split between the individual threads and balanced according to the expected runtime for the individual processes. Custom spin clustering mechanism, necessary for overcoming of the critical slowdown in the certain energy subspaces, was devised. Stable reconstruction of the density of states was of primary importance. Some data post-processing techniques were involved to produce the expected smooth density of states.
Ruisi, Michael; Levine, Michael; Finkielstein, Dennis
2013-12-01
The myocardial performance index (MPI) first described by Chuwa Tei in 1995 is a relatively new echocardiographic variable used for assessment of overall cardiac function. Previous studies have demonstrated the MPI to be a sum representation of both left ventricular systolic and diastolic function with prognostic value in patients with coronary artery disease as well as symptomatic heart failure. Ninety patients with either established coronary artery disease (CAD) or CAD risk factors underwent routine treadmill exercise stress testing with two-dimensional Doppler echocardiography using the standard Bruce protocol. Both resting and stress MPI values were measured for all 90 of the patients. Using a normal MPI cut off of ≤ 0.47, the prevalence of an abnormal resting MPI in our 90 subjects was 72/90 or 80% and the prevalence of an abnormal stress MPI in our 90 subjects was 48/90 or 53.33%. The average MPI observed in the resting portion of the stress test for the cohort was: 0.636 with a standard deviation of 0.182. The average MPI in the stress portion of the stress test for the cohort was 0.530 with a standard deviation of 0.250. The P value with the use of a one-tailed dependent T test was calculated to be < 0.05. We postulate that these findings reflect that the MPI (Tei) index assessed during exercise may be a sensitive indicator of occult coronary disease in an at risk group independent of wall motion assessment.
Modified personal interviews: resurrecting reliable personal interviews for admissions?
Hanson, Mark D; Kulasegaram, Kulamakan Mahan; Woods, Nicole N; Fechtig, Lindsey; Anderson, Geoff
2012-10-01
Traditional admissions personal interviews provide flexible faculty-student interactions but are plagued by low inter-interview reliability. Axelson and Kreiter (2009) retrospectively showed that multiple independent sampling (MIS) may improve reliability of personal interviews; thus, the authors incorporated MIS into the admissions process for medical students applying to the University of Toronto's Leadership Education and Development Program (LEAD). They examined the reliability and resource demands of this modified personal interview (MPI) format. In 2010-2011, LEAD candidates submitted written applications, which were used to screen for participation in the MPI process. Selected candidates completed four brief (10-12 minutes) independent MPIs each with a different interviewer. The authors blueprinted MPI questions to (i.e., aligned them with) leadership attributes, and interviewers assessed candidates' eligibility on a five-point Likert-type scale. The authors analyzed inter-interview reliability using the generalizability theory. Sixteen candidates submitted applications; 10 proceeded to the MPI stage. Reliability of the written application components was 0.75. The MPI process had overall inter-interview reliability of 0.79. Correlation between the written application and MPI scores was 0.49. A decision study showed acceptable reliability of 0.74 with only three MPIs scored using one global rating. Furthermore, a traditional admissions interview format would take 66% more time than the MPI format. The MPI format, used during the LEAD admissions process, achieved high reliability with minimal faculty resources. The MPI format's reliability and effective resource use were possible through MIS and employment of expert interviewers. MPIs may be useful for other admissions tasks.
Design analysis of an MPI human functional brain scanner
Mason, Erica E.; Cooley, Clarissa Z.; Cauley, Stephen F.; Griswold, Mark A.; Conolly, Steven M.; Wald, Lawrence L.
2017-01-01
MPI’s high sensitivity makes it a promising modality for imaging brain function. Functional contrast is proposed based on blood SPION concentration changes due to Cerebral Blood Volume (CBV) increases during activation, a mechanism utilized in fMRI studies. MPI offers the potential for a direct and more sensitive measure of SPION concentration, and thus CBV, than fMRI. As such, fMPI could surpass fMRI in sensitivity, enhancing the scientific and clinical value of functional imaging. As human-sized MPI systems have not been attempted, we assess the technical challenges of scaling MPI from rodent to human brain. We use a full-system MPI simulator to test arbitrary hardware designs and encoding practices, and we examine tradeoffs imposed by constraints that arise when scaling to human size as well as safety constraints (PNS and central nervous system stimulation) not considered in animal scanners, thereby estimating spatial resolutions and sensitivities achievable with current technology. Using a projection FFL MPI system, we examine coil hardware options and their implications for sensitivity and spatial resolution. We estimate that an fMPI brain scanner is feasible, although with reduced sensitivity (20×) and spatial resolution (5×) compared to existing rodent systems. Nonetheless, it retains sufficient sensitivity and spatial resolution to make it an attractive future instrument for studying the human brain; additional technical innovations can result in further improvements. PMID:28752130
Lee, M-Y; Won, H-S; Jeon, E-J; Yoon, H C; Choi, J Y; Hong, S J; Kim, M-J
2014-06-01
To evaluate the reproducibility of measurement of the fetal left modified myocardial performance index (Mod-MPI) determined using a novel automated system. This was a prospective study of 116 ultrasound examinations from 110 normal singleton pregnancies at 12 + 1 to 37 + 1 weeks' gestation. Two experienced operators each measured the left Mod-MPI twice manually and twice automatically using the Auto Mod-MPI system. Intra- and interoperator reproducibility were assessed using intraclass correlation coefficients (ICCs) and the manual and automated measurements obtained by the more experienced operator were compared using Bland-Altman plots and ICCs. Both operators successfully measured the left Mod-MPI in all cases using the Auto Mod-MPI system. For both operators, intraoperator reproducibility was higher when performing automated measurements (ICC = 0.967 and 0.962 for Operators 1 and 2, respectively) than when performing manual measurements (ICC = 0.857 and 0.856 for Operators 1 and 2, respectively). Interoperator agreement was also better for automated than for manual measurements (ICC = 0.930 vs 0.723, respectively). There was good agreement between the automated and manual values measured by the more experienced operator. The Auto Mod-MPI system is a reliable technique for measuring fetal left Mod-MPI and demonstrates excellent reproducibility. Copyright © 2013 ISUOG. Published by John Wiley & Sons Ltd.
Tejani, Furqan H; Thompson, Randall C; Iskandrian, Ami E; McNutt, Bruce E; Franks, Billy
2011-02-01
Caffeine attenuates the coronary hyperemic response to adenosine by competitive A₂(A) receptor blockade. This study aims to determine whether oral caffeine administration compromises diagnostic accuracy in patients undergoing vasodilator stress myocardial perfusion imaging (MPI) with regadenoson, a selective adenosine A(2A) agonist. This multicenter, randomized, double-blind, placebo-controlled, parallel-group study includes patients with suspected coronary artery disease who regularly consume caffeine. Each participant undergoes three SPECT MPI studies: a rest study on day 1 (MPI-1); a regadenoson stress study on day 3 (MPI-2), and a regadenoson stress study on day 5 with double-blind administration of oral caffeine 200 or 400 mg or placebo capsules (MPI-3; n = 90 per arm). Only participants with ≥ 1 reversible defect on the second MPI study undergo the subsequent stress MPI test. The primary endpoint is the difference in the number of reversible defects on the two stress tests using a 17-segment model. Pharmacokinetic/pharmacodynamic analyses will evaluate the effect of caffeine on the regadenoson exposure-response relationship. Safety will also be assessed. The results of this study will show whether the consumption of caffeine equivalent to 2-4 cups of coffee prior to an MPI study with regadenoson affects the diagnostic validity of stress testing (ClinicalTrials.gov number, NCT00826280).
Fernandes, José Maria G; Rivera, Ivan Romero; de Oliveira Romão, Benício; Mendonça, Maria Alayde; Vasconcelos, Miriam Lira Castro; Carvalho, Antônio Carlos; Campos, Orlando; De Paola, Angelo Amato V; Moisés, Valdir A
2009-09-01
The Doppler-derived myocardial performance index (MPI) has been used in the evaluation of left ventricular (LV) function in several diseases. In patients with isolated diastolic dysfunction, the diagnostic utility of this index remains unclear. The aim of this study was to determine the diagnostic utility of MPI in patients with systemic hypertension, impaired LV relaxation, and normal ejection fraction. Thirty hypertensive patients with impaired LV relaxation were compared to 30 control subjects. MPI and its components, isovolumetric relaxation time (IRT), isovolumetric contraction time (ICT), and the ejection time (ET), were measured from LV outflow and mitral inflow Doppler velocity profiles. MPI was higher in patients than in control subjects (0.45 +/- 0.13 vs 0.37 +/- 0.07 P < 0.0029). The increase in MPI was due to the prolongation of IRT without significant change of ICT and ET. MPI cutoff value of > or =0.40 identified impaired LV relaxation with a sensitivity of 63% and specificity of 70% while an IRT >94 ms had a sensitivity of 67% and specificity of 80%. Multivariate analysis identified relative wall thickness, mitral early filling wave velocity (E), and systolic myocardial velocity (Sm) as independent predictors of MPI in patients with hypertension. MPI was increase in patients with hypertension, diastolic dysfunction, and normal ejection fraction but was not superior to IRT to detect impaired LV relaxation.
Yao, Zhiming; Zhu, Hui; Li, Wenchan; Chen, Congxia; Wang, Hua; Shi, Lei; Zhang, Wenjie
2017-04-01
We investigated the cardiac risk stratification value of adenosine triphosphate stress myocardial perfusion imaging (ATP-MPI) in patients aged 70 years and older with suspected coronary artery disease (CAD). We identified a series of 415 consecutive patients aged 70 years and older with suspected CAD, who had undergone ATP-MPI with 99m Tc-MIBI. The presence of a fixed and/or reversible perfusion defect was considered as an abnormal MPI. Follow-up was available in 399 patients (96.1%) over 3.45 ± 1.71 years after excluding 16 patients who underwent early coronary revascularization <60 days after MPI. The major adverse cardiac events (MACE), including cardiac death, nonfatal infarction, and late coronary revascularization, were recorded. One hundred twenty-five (31.3%) patients had abnormal MPI and the remaining had normal MPI. A multivariable analysis using Cox regression demonstrated that abnormal MPI was independently associated with MACE (hazard ratio 19.50 and 95% confidence interval 5.91-64.31, P value .000). The patients with SSS > 8 had significantly higher cumulative MACE rate than patients with SSS ≤ 8 had (37.8% vs 5.2%, respectively, P < .001). The Kaplan-Meier cumulative MACE-free survival in patients with abnormal MPI (57.0%) was significantly lower than that in patients with normal MPI (89.6%), P < .0001. Among patients with SSS > 8, the Kaplan-Meier cumulative MACE-free survival were 36.9% in patients ≥80 years old and 49.5% in patients 70-79 years old, respectively, P < .05. However, among patients with SSS ≤ 8, there was no difference between the Kaplan-Meier cumulative MACE-free survivals of these two age groups. ATP-MPI data are useful for the prediction of major adverse cardiac events in patients aged 70 years and older with suspected CAD.
PETSc Users Manual Revision 3.7
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, Satish; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
PETSc Users Manual Revision 3.8
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balay, S.; Abhyankar, S.; Adams, M.
This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication.
Magnetic Particle Imaging (MPI) for NMR and MRI researchers
NASA Astrophysics Data System (ADS)
Saritas, Emine U.; Goodwill, Patrick W.; Croft, Laura R.; Konkle, Justin J.; Lu, Kuan; Zheng, Bo; Conolly, Steven M.
2013-04-01
Magnetic Particle Imaging (MPI) is a new tracer imaging modality that is gaining significant interest from NMR and MRI researchers. While the physics of MPI differ substantially from MRI, it employs hardware and imaging concepts that are familiar to MRI researchers, such as magnetic excitation and detection, pulse sequences, and relaxation effects. Furthermore, MPI employs the same superparamagnetic iron oxide (SPIO) contrast agents that are sometimes used for MR angiography and are often used for MRI cell tracking studies. These SPIOs are much safer for humans than iodine or gadolinium, especially for Chronic Kidney Disease (CKD) patients. The weak kidneys of CKD patients cannot safely excrete iodine or gadolinium, leading to increased morbidity and mortality after iodinated X-ray or CT angiograms, or after gadolinium-MRA studies. Iron oxides, on the other hand, are processed in the liver, and have been shown to be safe even for CKD patients. Unlike the “black blood” contrast generated by SPIOs in MRI due to increased T2∗ dephasing, SPIOs in MPI generate positive, “bright blood” contrast. With this ideal contrast, even prototype MPI scanners can already achieve fast, high-sensitivity, and high-contrast angiograms with millimeter-scale resolutions in phantoms and in animals. Moreover, MPI shows great potential for an exciting array of applications, including stem cell tracking in vivo, first-pass contrast studies to diagnose or stage cancer, and inflammation imaging in vivo. So far, only a handful of prototype small-animal MPI scanners have been constructed worldwide. Hence, MPI is open to great advances, especially in hardware, pulse sequence, and nanoparticle improvements, with the potential to revolutionize the biomedical imaging field.
Martin, Wade H; Xian, Hong; Chandiramani, Pooja; Bainter, Emily; Klein, Andrew J P
2015-08-01
No data exist comparing outcome prediction from arm exercise vs pharmacologic myocardial perfusion imaging (MPI) stress test variables in patients unable to perform treadmill exercise. In this retrospective study, 2,173 consecutive lower extremity disabled veterans aged 65.4 ± 11.0years (mean ± SD) underwent either pharmacologic MPI (1730 patients) or arm exercise stress tests (443 patients) with MPI (n = 253) or electrocardiography alone (n = 190) between 1997 and 2002. Cox multivariate regression models and reclassification analysis by integrated discrimination improvement (IDI) were used to characterize stress test and MPI predictors of cardiovascular mortality at ≥10-year follow-up after inclusion of significant demographic, clinical, and other variables. Cardiovascular death occurred in 561 pharmacologic MPI and 102 arm exercise participants. Multivariate-adjusted cardiovascular mortality was predicted by arm exercise resting metabolic equivalents (hazard ratio [HR] 0.52, 95% CI 0.39-0.69, P < .001), 1-minute heart rate recovery (HR 0.61, 95% CI 0.44-0.86, P < .001), and pharmacologic and arm exercise delta (peak-rest) heart rate (both P < .001). Only an abnormal arm exercise MPI prognosticated cardiovascular death by multivariate Cox analysis (HR 1.98, 95% CI 1.04-3.77, P < .05). Arm exercise MPI defect number, type, and size provided IDI over covariates for prediction of cardiovascular mortality (IDI = 0.074-0.097). Only pharmacologic defect size prognosticated cardiovascular mortality (IDI = 0.022). Arm exercise capacity, heart rate recovery, and pharmacologic and arm exercise heart rate responses are robust predictors of cardiovascular mortality. Arm exercise MPI results are equivalent and possibly superior to pharmacologic MPI for cardiovascular mortality prediction in patients unable to perform treadmill exercise. Published by Elsevier Inc.
Forberg, Jakob L; Hilmersson, Catarina E; Carlsson, Marcus; Arheden, Håkan; Björk, Jonas; Hjalte, Krister; Ekelund, Ulf
2009-01-01
Background Previous studies from the USA have shown that acute nuclear myocardial perfusion imaging (MPI) in low risk emergency department (ED) patients with suspected acute coronary syndrome (ACS) can be of clinical value. The aim of this study was to evaluate the utility and hospital economics of acute MPI in Swedish ED patients with suspected ACS. Methods We included 40 patients (mean age 55 ± 2 years, 50% women) who were admitted from the ED at Lund University Hospital for chest pain suspicious of ACS, and who had a normal or non-ischemic ECG and no previous myocardial infarction. All patients underwent MPI from the ED, and the results were analyzed only after patient discharge. The current diagnostic practice of admitting the included patients for observation and further evaluation was compared to a theoretical "MPI strategy", where patients with a normal MPI test would have been discharged home from the ED. Results Twenty-seven patients had normal MPI results, and none of them had ACS. MPI thus had a negative predictive value for ACS of 100%. With the MPI strategy, 2/3 of the patients would thus have been discharged from the ED, resulting in a reduction of total hospital cost by some 270 EUR and of bed occupancy by 0.8 days per investigated patient. Conclusion Our findings in a Swedish ED support the results of larger American trials that acute MPI has the potential to safely reduce the number of admissions and decrease overall costs for low-risk ED patients with suspected ACS. PMID:19545365
Development and Initial Validation of the Multicultural Personality Inventory (MPI).
Ponterotto, Joseph G; Fietzer, Alexander W; Fingerhut, Esther C; Woerner, Scott; Stack, Lauren; Magaldi-Dopman, Danielle; Rust, Jonathan; Nakao, Gen; Tsai, Yu-Ting; Black, Natasha; Alba, Renaldo; Desai, Miraj; Frazier, Chantel; LaRue, Alyse; Liao, Pei-Wen
2014-01-01
Two studies summarize the development and initial validation of the Multicultural Personality Inventory (MPI). In Study 1, the 115-item prototype MPI was administered to 415 university students where exploratory factor analysis resulted in a 70-item, 7-factor model. In Study 2, the 70-item MPI and theoretically related companion instruments were administered to a multisite sample of 576 university students. Confirmatory factory analysis found the 7-factor structure to be a relatively good fit to the data (Comparative Fit Index =.954; root mean square error of approximation =.057), and MPI factors predicted variance in criterion variables above and beyond the variance accounted for by broad personality traits (i.e., Big Five). Study limitations and directions for further validation research are specified.
NASA Technical Reports Server (NTRS)
Pedretti, Kevin T.; Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)
1997-01-01
A variety of different network technologies and topologies are currently being evaluated as part of the Whitney Project. This paper reports on the implementation and performance of a Fast Ethernet network configured in a 4x4 2D torus topology in a testbed cluster of 'commodity' Pentium Pro PCs. Several benchmarks were used for performance evaluation: an MPI point to point message passing benchmark, an MPI collective communication benchmark, and the NAS Parallel Benchmarks version 2.2 (NPB2). Our results show that for point to point communication on an unloaded network, the hub and 1 hop routes on the torus have about the same bandwidth and latency. However, the bandwidth decreases and the latency increases on the torus for each additional route hop. Collective communication benchmarks show that the torus provides roughly four times more aggregate bandwidth and eight times faster MPI barrier synchronizations than a hub based network for 16 processor systems. Finally, the SOAPBOX benchmarks, which simulate real-world CFD applications, generally demonstrated substantially better performance on the torus than on the hub. In the few cases the hub was faster, the difference was negligible. In total, our experimental results lead to the conclusion that for Fast Ethernet networks, the torus topology has better performance and scales better than a hub based network.
Multi-GPU hybrid programming accelerated three-dimensional phase-field model in binary alloy
NASA Astrophysics Data System (ADS)
Zhu, Changsheng; Liu, Jieqiong; Zhu, Mingfang; Feng, Li
2018-03-01
In the process of dendritic growth simulation, the computational efficiency and the problem scales have extremely important influence on simulation efficiency of three-dimensional phase-field model. Thus, seeking for high performance calculation method to improve the computational efficiency and to expand the problem scales has a great significance to the research of microstructure of the material. A high performance calculation method based on MPI+CUDA hybrid programming model is introduced. Multi-GPU is used to implement quantitative numerical simulations of three-dimensional phase-field model in binary alloy under the condition of multi-physical processes coupling. The acceleration effect of different GPU nodes on different calculation scales is explored. On the foundation of multi-GPU calculation model that has been introduced, two optimization schemes, Non-blocking communication optimization and overlap of MPI and GPU computing optimization, are proposed. The results of two optimization schemes and basic multi-GPU model are compared. The calculation results show that the use of multi-GPU calculation model can improve the computational efficiency of three-dimensional phase-field obviously, which is 13 times to single GPU, and the problem scales have been expanded to 8193. The feasibility of two optimization schemes is shown, and the overlap of MPI and GPU computing optimization has better performance, which is 1.7 times to basic multi-GPU model, when 21 GPUs are used.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gyllenhaal, J.
CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading. For simplicity, it does not use MPI by default but it is expected to be run on the resources a threaded MPI task would use (e.g., a portion of a shared memory compute node). Compiling with -DWITH_MPI allows packing one or more nodes with CLOMP tasks and having CLOMP report OpenMP performance for the slowest MPI task. On current systems, the strong scaling performance results for 4, 8, or 16 threads are of the most interest. Suggested weakmore » scaling inputs are provided for evaluating future systems. Since MPI is often used to place at least one MPI task per coherence or NUMA domain, it is recommended to focus OpenMP runtime measurements on a subset of node hardware where it is most possible to have low OpenMP overheads (e.g., within one coherence domain or NUMA domain).« less
NASA Astrophysics Data System (ADS)
Behrens, Jörg; Hanke, Moritz; Jahns, Thomas
2014-05-01
In this talk we present a way to facilitate efficient use of MPI communication for developers of climate models. Exploitation of the performance potential of today's highly parallel supercomputers with real world simulations is a complex task. This is partly caused by the low level nature of the MPI communication library which is the dominant communication tool at least for inter-node communication. In order to manage the complexity of the task, climate simulations with non-trivial communication patterns often use an internal abstraction layer above MPI without exploiting the benefits of communication aggregation or MPI-datatypes. The solution for the complexity and performance problem we propose is the communication library YAXT. This library is built on top of MPI and takes high level descriptions of arbitrary domain decompositions and automatically derives an efficient collective data exchange. Several exchanges can be aggregated in order to reduce latency costs. Examples are given which demonstrate the simplicity and the performance gains for selected climate applications.
Dashti, Noor H; Abidin, Rufika S; Sainsbury, Frank
2018-05-22
Bioinspired self-sorting and self-assembling systems using engineered versions of natural protein cages are being developed for biocatalysis and therapeutic delivery. The packaging and intracellular delivery of guest proteins is of particular interest for both in vitro and in vivo cell engineering. However, there is a lack of bionanotechnology platforms that combine programmable guest protein encapsidation with efficient intracellular uptake. We report a minimal peptide anchor for in vivo self-sorting of cargo-linked capsomeres of murine polyomavirus (MPyV) that enables controlled encapsidation of guest proteins by in vitro self-assembly. Using Förster resonance energy transfer, we demonstrate the flexibility in this system to support coencapsidation of multiple proteins. Complementing these ensemble measurements with single-particle analysis by super-resolution microscopy shows that the stochastic nature of coencapsidation is an overriding principle. This has implications for the design and deployment of both native and engineered self-sorting encapsulation systems and for the assembly of infectious virions. Taking advantage of the encoded affinity for sialic acids ubiquitously displayed on the surface of mammalian cells, we demonstrate the ability of self-assembled MPyV virus-like particles to mediate efficient delivery of guest proteins to the cytosol of primary human cells. This platform for programmable coencapsidation and efficient cytosolic delivery of complementary biomolecules therefore has enormous potential in cell engineering.
Eriksson, Mathilda; Andreasson, Kalle; Weidmann, Joachim; Lundberg, Kajsa; Tegerstedt, Karin
2011-01-01
Virus-like particles (VLPs) consist of capsid proteins from viruses and have been shown to be usable as carriers of protein and peptide antigens for immune therapy. In this study, we have produced and assayed murine polyomavirus (MPyV) VLPs carrying the entire human Prostate Specific Antigen (PSA) (PSA-MPyVLPs) for their potential use for immune therapy in a mouse model system. BALB/c mice immunized with PSA-MPyVLPs were only marginally protected against outgrowth of a PSA-expressing tumor. To improve protection, PSA-MPyVLPs were co-injected with adjuvant CpG, either alone or loaded onto murine dendritic cells (DCs). Immunization with PSA-MPyVLPs loaded onto DCs in the presence of CpG was shown to efficiently protect mice from tumor outgrowth. In addition, cellular and humoral immune responses after immunization were examined. PSA-specific CD4+ and CD8+ cells were demonstrated, but no PSA-specific IgG antibodies. Vaccination with DCs loaded with PSA-MPyVLPs induced an eight-fold lower titre of anti-VLP antibodies than vaccination with PSA-MPyVLPs alone. In conclusion, immunization of BALB/c mice with PSA-MPyVLPs, loaded onto DCs and co-injected with CpG, induces an efficient PSA-specific tumor protective immune response, including both CD4+ and CD8+ cells with a low induction of anti-VLP antibodies. PMID:21858228
New NAS Parallel Benchmarks Results
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)
1997-01-01
NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Pilotto, Alberto; Ferrucci, Luigi; Scarcelli, Carlo; Niro, Valeria; Di Mario, Francesco; Seripa, Davide; Andriulli, Angelo; Leandro, Gioacchino; Franceschi, Marilisa
2007-01-01
The potential usefulness of standardized comprehensive geriatric assessment (CGA) in evaluating treatment and follow-up of older patients with upper gastrointestinal bleeding is unknown. To evaluate the usefulness of the CGA as a 2-year mortality multidimensional prognostic index (MPI) in older patients hospitalized for upper gastrointestinal bleeding. Patients aged > or =65 years consecutively hospitalized for acute upper gastrointestinal bleeding were included. Diagnosis of bleeding was based on clinical and endoscopic features. All patients underwent a CGA that included six standardized scales, i.e., Activities of Daily Living (ADL), Instrumental Activities of Daily Living (IADL), Short Portable Mental Status Questionnaire (SPMSQ), Mini Nutritional Assessment (MNA), Exton-Smith Score (ESS) and Comorbity Index Rating Scale (CIRS), as well as information on medication history and cohabitation, for a total of 63 items. A MPI was calculated from the integrated total scores and expressed as MPI 1 = low risk, MPI 2 = moderate risk, and MPI 3 = severe risk. The predictive value of the MPI for mortality over a 24-month follow-up was calculated. 36 elderly patients (M 16/F 20, mean age 82.8 +/- 7.9 years, range 70-101 years) were included in the study. A significant difference in mean age was observed between males and females (M 80.1 +/- 4.8 vs. F 84.9 +/- 9.3 years; p < 0.05). The causes of upper gastrointestinal bleeding were duodenal ulcer in 38.8%, gastric ulcer in 22.2%, and erosive gastritis in 16.6% of the patients, while 16.6% had gastrointestinal bleeding from unknown origin. The overall 2-year mortality rate was 30.5%. 18 patients (50%) were classified as having a low-risk MPI (mean value 0.18 +/- 0.09), 12 (33.3%) as having a moderate-risk MPI (mean value 0.48 +/- 0.08) and 6 (16.6%) as having a severe-risk MPI (mean value 0.83 +/- 0.06). Higher MPI grades were significantly associated with higher mortality (grade 1 = 12.5%, grade 2 = 41.6%, grade 3 = 83.3%; p = 0.001). Adjusting for age and sex, the prognostic efficacy of MPI for mortality was confirmed and highly significant (odds ratio 10.47, 95% CI 2.04-53.6). CGA is a useful tool for calculating a MPI that significantly predicts the risk of 2-year mortality in older patients with upper gastrointestinal bleeding. Copyright 2007 S. Karger AG, Basel.
Singapore Students' Performance on Australian and Singapore Assessment Items
ERIC Educational Resources Information Center
Ho, Siew Yin; Lowrie, Tom
2012-01-01
This study describes Singapore students' (N = 607) performance on a recently developed Mathematics Processing Instrument (MPI). The MPI comprised tasks sourced from Australia's NAPLAN and Singapore's PSLE. In addition, the MPI had a corresponding question which encouraged students to describe how they solved the respective tasks. In particular,…
Zhou, Xinyi Y; Tay, Zhi Wei; Chandrasekharan, Prashant; Yu, Elaine Y; Hensley, Daniel W; Orendorff, Ryan; Jeffris, Kenneth E; Mai, David; Zheng, Bo; Goodwill, Patrick W; Conolly, Steven M
2018-05-10
Magnetic particle imaging (MPI) is an emerging ionizing radiation-free biomedical tracer imaging technique that directly images the intense magnetization of superparamagnetic iron oxide nanoparticles (SPIOs). MPI offers ideal image contrast because MPI shows zero signal from background tissues. Moreover, there is zero attenuation of the signal with depth in tissue, allowing for imaging deep inside the body quantitatively at any location. Recent work has demonstrated the potential of MPI for robust, sensitive vascular imaging and cell tracking with high contrast and dose-limited sensitivity comparable to nuclear medicine. To foster future applications in MPI, this new biomedical imaging field is welcoming researchers with expertise in imaging physics, magnetic nanoparticle synthesis and functionalization, nanoscale physics, and small animal imaging applications. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Pablico-Lansigan, Michele H.; Situ, Shu F.; Samia, Anna Cristina S.
2013-05-01
Magnetic particle imaging (MPI) is an emerging biomedical imaging technology that allows the direct quantitative mapping of the spatial distribution of superparamagnetic iron oxide nanoparticles. MPI's increased sensitivity and short image acquisition times foster the creation of tomographic images with high temporal and spatial resolution. The contrast and sensitivity of MPI is envisioned to transcend those of other medical imaging modalities presently used, such as magnetic resonance imaging (MRI), X-ray scans, ultrasound, computed tomography (CT), positron emission tomography (PET) and single photon emission computed tomography (SPECT). In this review, we present an overview of the recent advances in the rapidly developing field of MPI. We begin with a basic introduction of the fundamentals of MPI, followed by some highlights over the past decade of the evolution of strategies and approaches used to improve this new imaging technique. We also examine the optimization of iron oxide nanoparticle tracers used for imaging, underscoring the importance of size homogeneity and surface engineering. Finally, we present some future research directions for MPI, emphasizing the novel and exciting opportunities that it offers as an important tool for real-time in vivo monitoring. All these opportunities and capabilities that MPI presents are now seen as potential breakthrough innovations in timely disease diagnosis, implant monitoring, and image-guided therapeutics.
Comparative accuracy of supine-only and combined supine-prone myocardial perfusion imaging in men.
Taasan, Vicente; Wokhlu, Anita; Taasan, Michael V; Dusaj, Raman S; Mehta, Ajay; Kraft, Steven; Winchester, David; Wymer, David
2016-12-01
Combined supine-prone myocardial perfusion imaging (CSP MPI) has been shown to reduce attenuation artifact in comparison to supine-only (SU) MPI in mixed-gender populations with varying risk for coronary artery disease (CAD), often where patients served as their own controls. However, there is limited direct comparison of these imaging strategies in men. 934 male patients underwent CSP or SU MPI. Diagnostic certainty of interpretation was compared. Within the cohort, 116 were referred for left heart catheterization (LHC) to assess for CAD. Sensitivity, specificity, and area under the curve (AUC) were compared with additional analysis based on body mass index (BMI). 597 patients completed the SU protocol and 337 patients completed the CSP protocol. Equivocal studies were seen more frequently in the SU group (13%) than in the CSP group (4%, P < .001). At catheterization, the specificity for CSP MPI of 70% was higher than 40% for SU MPI (P = .032). The CSP AUC (0.80 ± 0.06) was significantly larger than SU AUC (0.57 ± 0.05, P = .004). CSP specificity was significantly higher in obese patients. CSP MPI increases diagnostic certainty and improves test accuracy for CAD detection in men with CAD risk factors, especially obese patients, compared to SU MPI.
NASA Astrophysics Data System (ADS)
Sloan, Gregory James
The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Challenges at Petascale for Pseudo-Spectral Methods on Spheres (A Last Hurrah?)
NASA Technical Reports Server (NTRS)
Clune, Thomas
2011-01-01
Conclusions: a) Proper software abstractions should enable rapid-exploration of platform-specific optimizations/ tradeoffs. b) Pseudo-spectra! methods are marginally viable for at least some classes of petascaie problems. i.e., GPU based machine with good bisection would be best. c) Scalability at exascale is possible, but the necessary resolution will make algorithm prohibitively expensive. Efficient implementations of realistic global transposes are mtricate and tedious in MPI. PS at petascaie requires exploration of a variety of strategies for spreading local and remote communic3tions. PGAS allows far simpler implementation and thus rapid exploration of variants.
NASA Astrophysics Data System (ADS)
Destefano, Anthony; Heerikhuisen, Jacob
2015-04-01
Fully 3D particle simulations can be a computationally and memory expensive task, especially when high resolution grid cells are required. The problem becomes further complicated when parallelization is needed. In this work we focus on computational methods to solve these difficulties. Hilbert curves are used to map the 3D particle space to the 1D contiguous memory space. This method of organization allows for minimized cache misses on the GPU as well as a sorted structure that is equivalent to an octal tree data structure. This type of sorted structure is attractive for uses in adaptive mesh implementations due to the logarithm search time. Implementations using the Message Passing Interface (MPI) library and NVIDIA's parallel computing platform CUDA will be compared, as MPI is commonly used on server nodes with many CPU's. We will also compare static grid structures with those of adaptive mesh structures. The physical test bed will be simulating heavy interstellar atoms interacting with a background plasma, the heliosphere, simulated from fully consistent coupled MHD/kinetic particle code. It is known that charge exchange is an important factor in space plasmas, specifically it modifies the structure of the heliosphere itself. We would like to thank the Alabama Supercomputer Authority for the use of their computational resources.
Determination of the optimal atrioventricular interval in sick sinus syndrome during DDD pacing.
Kato, Masaya; Dote, Keigo; Sasaki, Shota; Goto, Kenji; Takemoto, Hiroaki; Habara, Seiji; Hasegawa, Daiji; Matsuda, Osamu
2005-09-01
Although the AAI pacing mode has been shown to be electromechanically superior to the DDD pacing mode in sick sinus syndrome (SSS), there is evidence suggesting that during AAI pacing the presence of natural ventricular activation pattern is not enough for hemodynamic benefit to occur. Myocardial performance index (MPI) is a simply measurable Doppler-derived index of combined systolic and diastolic myocardial performance. The aim of this study was to investigate whether AAI pacing mode is electromechanically superior to the DDD mode in patients with SSS by using Doppler-derived MPI. Thirty-nine SSS patients with dual-chamber pacing devices were evaluated by using Doppler echocardiography in AAI mode and DDD mode. The optimal atrioventricular (AV) interval in DDD mode was determined and atrial stimulus-R interval was measured in AAI mode. The ratio of the atrial stimulus-R interval to the optimal AV interval was defined as relative AV interval (rAVI) and the ratio of MPI in AAI mode to that in DDD mode was defined as relative MPI (rMPI). The rMPI was significantly correlated with atrial stimulus-R interval and rAVI (r = 0.57, P = 0.0002, and r = 0.67, P < 0.0001, respectively). A cutoff point of 1.73 for rAVI provided optimum sensitivity and specificity for rMPI >1 based on the receiver operator curves. Even though the intrinsic AV conduction is moderately prolonged, some SSS patients with dual-chamber pacing devices benefit from the ventricular pacing with optimal AV interval. MPI is useful to determine the optimal pacing mode in acute experiment.
Victor, Bart; Blevins, Meridith; Green, Ann F; Ndatimana, Elisée; González-Calvo, Lázaro; Fischer, Edward F; Vergara, Alfredo E; Vermund, Sten H; Olupona, Omo; Moon, Troy D
2014-01-01
Poverty is a multidimensional phenomenon and unidimensional measurements have proven inadequate to the challenge of assessing its dynamics. Dynamics between poverty and public health intervention is among the most difficult yet important problems faced in development. We sought to demonstrate how multidimensional poverty measures can be utilized in the evaluation of public health interventions; and to create geospatial maps of poverty deprivation to aid implementers in prioritizing program planning. Survey teams interviewed a representative sample of 3,749 female heads of household in 259 enumeration areas across Zambézia in August-September 2010. We estimated a multidimensional poverty index, which can be disaggregated into context-specific indicators. We produced an MPI comprised of 3 dimensions and 11 weighted indicators selected from the survey. Households were identified as "poor" if were deprived in >33% of indicators. Our MPI is an adjusted headcount, calculated by multiplying the proportion identified as poor (headcount) and the poverty gap (average deprivation). Geospatial visualizations of poverty deprivation were created as a contextual baseline for future evaluation. In our rural (96%) and urban (4%) interviewees, the 33% deprivation cut-off suggested 58.2% of households were poor (29.3% of urban vs. 59.5% of rural). Among the poor, households experienced an average deprivation of 46%; thus the MPI/adjusted headcount is 0.27 ( = 0.58×0.46). Of households where a local language was the primary language, 58.6% were considered poor versus Portuguese-speaking households where 73.5% were considered non-poor. Living standard is the dominant deprivation, followed by health, and then education. Multidimensional poverty measurement can be integrated into program design for public health interventions, and geospatial visualization helps examine the impact of intervention deployment within the context of distinct poverty conditions. Both permit program implementers to focus resources and critically explore linkages between poverty and its social determinants, thus deriving useful findings for evidence-based planning.
Victor, Bart; Blevins, Meridith; Green, Ann F.; Ndatimana, Elisée; González-Calvo, Lázaro; Fischer, Edward F.; Vergara, Alfredo E.; Vermund, Sten H.; Olupona, Omo; Moon, Troy D.
2014-01-01
Background Poverty is a multidimensional phenomenon and unidimensional measurements have proven inadequate to the challenge of assessing its dynamics. Dynamics between poverty and public health intervention is among the most difficult yet important problems faced in development. We sought to demonstrate how multidimensional poverty measures can be utilized in the evaluation of public health interventions; and to create geospatial maps of poverty deprivation to aid implementers in prioritizing program planning. Methods Survey teams interviewed a representative sample of 3,749 female heads of household in 259 enumeration areas across Zambézia in August-September 2010. We estimated a multidimensional poverty index, which can be disaggregated into context-specific indicators. We produced an MPI comprised of 3 dimensions and 11 weighted indicators selected from the survey. Households were identified as “poor” if were deprived in >33% of indicators. Our MPI is an adjusted headcount, calculated by multiplying the proportion identified as poor (headcount) and the poverty gap (average deprivation). Geospatial visualizations of poverty deprivation were created as a contextual baseline for future evaluation. Results In our rural (96%) and urban (4%) interviewees, the 33% deprivation cut-off suggested 58.2% of households were poor (29.3% of urban vs. 59.5% of rural). Among the poor, households experienced an average deprivation of 46%; thus the MPI/adjusted headcount is 0.27 ( = 0.58×0.46). Of households where a local language was the primary language, 58.6% were considered poor versus Portuguese-speaking households where 73.5% were considered non-poor. Living standard is the dominant deprivation, followed by health, and then education. Conclusions Multidimensional poverty measurement can be integrated into program design for public health interventions, and geospatial visualization helps examine the impact of intervention deployment within the context of distinct poverty conditions. Both permit program implementers to focus resources and critically explore linkages between poverty and its social determinants, thus deriving useful findings for evidence-based planning. PMID:25268951
Starmans, Lucas W. E.; Burdinski, Dirk; Haex, Nicole P. M.; Moonen, Rik P. M.; Strijkers, Gustav J.; Nicolay, Klaas; Grüll, Holger
2013-01-01
Background Iron oxide nanoparticles (IONs) are a promising nanoplatform for contrast-enhanced MRI. Recently, magnetic particle imaging (MPI) was introduced as a new imaging modality, which is able to directly visualize magnetic particles and could serve as a more sensitive and quantitative alternative to MRI. However, MPI requires magnetic particles with specific magnetic properties for optimal use. Current commercially available iron oxide formulations perform suboptimal in MPI, which is triggering research into optimized synthesis strategies. Most synthesis procedures aim at size control of iron oxide nanoparticles rather than control over the magnetic properties. In this study, we report on the synthesis, characterization and application of a novel ION platform for sensitive MPI and MRI. Methods and Results IONs were synthesized using a thermal-decomposition method and subsequently phase-transferred by encapsulation into lipidic micelles (ION-Micelles). Next, the material and magnetic properties of the ION-Micelles were analyzed. Most notably, vibrating sample magnetometry measurements showed that the effective magnetic core size of the IONs is 16 nm. In addition, magnetic particle spectrometry (MPS) measurements were performed. MPS is essentially zero-dimensional MPI and therefore allows to probe the potential of iron oxide formulations for MPI. ION-Micelles induced up to 200 times higher signal in MPS measurements than commercially available iron oxide formulations (Endorem, Resovist and Sinerem) and thus likely allow for significantly more sensitive MPI. In addition, the potential of the ION-Micelle platform for molecular MPI and MRI was showcased by MPS and MRI measurements of fibrin-binding peptide functionalized ION-Micelles (FibPep-ION-Micelles) bound to blood clots. Conclusions The presented data underlines the potential of the ION-Micelle nanoplatform for sensitive (molecular) MPI and warrants further investigation of the FibPep-ION-Micelle platform for in vivo, non-invasive imaging of fibrin in preclinical disease models of thrombus-related pathologies and atherosclerosis. PMID:23437371
Starmans, Lucas W E; Burdinski, Dirk; Haex, Nicole P M; Moonen, Rik P M; Strijkers, Gustav J; Nicolay, Klaas; Grüll, Holger
2013-01-01
Iron oxide nanoparticles (IONs) are a promising nanoplatform for contrast-enhanced MRI. Recently, magnetic particle imaging (MPI) was introduced as a new imaging modality, which is able to directly visualize magnetic particles and could serve as a more sensitive and quantitative alternative to MRI. However, MPI requires magnetic particles with specific magnetic properties for optimal use. Current commercially available iron oxide formulations perform suboptimal in MPI, which is triggering research into optimized synthesis strategies. Most synthesis procedures aim at size control of iron oxide nanoparticles rather than control over the magnetic properties. In this study, we report on the synthesis, characterization and application of a novel ION platform for sensitive MPI and MRI. IONs were synthesized using a thermal-decomposition method and subsequently phase-transferred by encapsulation into lipidic micelles (ION-Micelles). Next, the material and magnetic properties of the ION-Micelles were analyzed. Most notably, vibrating sample magnetometry measurements showed that the effective magnetic core size of the IONs is 16 nm. In addition, magnetic particle spectrometry (MPS) measurements were performed. MPS is essentially zero-dimensional MPI and therefore allows to probe the potential of iron oxide formulations for MPI. ION-Micelles induced up to 200 times higher signal in MPS measurements than commercially available iron oxide formulations (Endorem, Resovist and Sinerem) and thus likely allow for significantly more sensitive MPI. In addition, the potential of the ION-Micelle platform for molecular MPI and MRI was showcased by MPS and MRI measurements of fibrin-binding peptide functionalized ION-Micelles (FibPep-ION-Micelles) bound to blood clots. The presented data underlines the potential of the ION-Micelle nanoplatform for sensitive (molecular) MPI and warrants further investigation of the FibPep-ION-Micelle platform for in vivo, non-invasive imaging of fibrin in preclinical disease models of thrombus-related pathologies and atherosclerosis.
Large Scale Frequent Pattern Mining using MPI One-Sided Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Agarwal, Khushbu
In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. Anmore » experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tessier, Francois; Vishwanath, Venkatram
2017-11-28
Reading and writing data efficiently from different tiers of storage is necessary for most scientific simulations to achieve good performance at scale. Many software solutions have been developed to decrease the I/O bottleneck. One wellknown strategy, in the context of collective I/O operations, is the two-phase I/O scheme. This strategy consists of selecting a subset of processes to aggregate contiguous pieces of data before performing reads/writes. In our previous work, we implemented the two-phase I/O scheme with a MPI-based topology-aware algorithm. Our algorithm showed very good performance at scale compared to the standard I/O libraries such as POSIX I/O andmore » MPI I/O. However, the algorithm had several limitations hindering a satisfying reproducibility of our experiments. In this paper, we extend our work by 1) identifying the obstacles we face to reproduce our experiments and 2) discovering solutions that reduce the unpredictability of our results.« less
A novel artificial neural network method for biomedical prediction based on matrix pseudo-inversion.
Cai, Binghuang; Jiang, Xia
2014-04-01
Biomedical prediction based on clinical and genome-wide data has become increasingly important in disease diagnosis and classification. To solve the prediction problem in an effective manner for the improvement of clinical care, we develop a novel Artificial Neural Network (ANN) method based on Matrix Pseudo-Inversion (MPI) for use in biomedical applications. The MPI-ANN is constructed as a three-layer (i.e., input, hidden, and output layers) feed-forward neural network, and the weights connecting the hidden and output layers are directly determined based on MPI without a lengthy learning iteration. The LASSO (Least Absolute Shrinkage and Selection Operator) method is also presented for comparative purposes. Single Nucleotide Polymorphism (SNP) simulated data and real breast cancer data are employed to validate the performance of the MPI-ANN method via 5-fold cross validation. Experimental results demonstrate the efficacy of the developed MPI-ANN for disease classification and prediction, in view of the significantly superior accuracy (i.e., the rate of correct predictions), as compared with LASSO. The results based on the real breast cancer data also show that the MPI-ANN has better performance than other machine learning methods (including support vector machine (SVM), logistic regression (LR), and an iterative ANN). In addition, experiments demonstrate that our MPI-ANN could be used for bio-marker selection as well. Copyright © 2013 Elsevier Inc. All rights reserved.
Iskandar, Aline; Limone, Brendan; Parker, Matthew W; Perugini, Andrew; Kim, Hyejin; Jones, Charles; Calamari, Brian; Coleman, Craig I; Heller, Gary V
2013-02-01
It remains controversial whether the diagnostic accuracy of single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI) is different in men as compared to women. We performed a meta-analysis to investigate gender differences of SPECT MPI for the diagnosis of CAD (≥50% stenosis). Two investigators independently performed a systematic review of the MEDLINE and EMBASE databases from inception through January 2012 for English-language studies determining the diagnostic accuracy of SPECT MPI. We included prospective studies that compared SPECT MPI with conventional coronary angiography which provided sufficient data to calculate gender-specific true and false positives and negatives. Data from studies evaluating <20 patients of one gender were excluded. Bivariate meta-analysis was used to create summary receiver operating curves. Twenty-six studies met inclusion criteria, representing 1,148 women and 1,142 men. Bivariate meta-analysis yielded a mean sensitivity and specificity of 84.2% (95% confidence interval [CI] 78.7%-88.6%) and 78.7% (CI 70.0%-85.3%) for SPECT MPI in women and 89.1% (CI 84.0%-92.7%) and 71.2% (CI 60.8%-79.8%) for SPECT MPI in men. There was no significant difference in the sensitivity (P = .15) or specificity (P = .23) between male and female subjects. In a bivariate meta-analysis of the available literature, the diagnostic accuracy of SPECT MPI is similar for both men and women.
Direct comparison of rest and adenosine stress myocardial perfusion CT with rest and stress SPECT
Okada, David R.; Ghoshhajra, Brian B.; Blankstein, Ron; Rocha-Filho, Jose A.; Shturman, Leonid D.; Rogers, Ian S.; Bezerra, Hiram G.; Sarwar, Ammar; Gewirtz, Henry; Hoffmann, Udo; Mamuya, Wilfred S.; Brady, Thomas J.; Cury, Ricardo C.
2010-01-01
Introduction We have recently described a technique for assessing myocardial perfusion using adenosine-mediated stress imaging (CTP) with dual source computed tomography. SPECT myocardial perfusion imaging (SPECT-MPI) is a widely utilized and extensively validated method for assessing myocardial perfusion. The aim of this study was to determine the level of agreement between CTP and SPECT-MPI at rest and under stress on a per-segment, per-vessel, and per-patient basis. Methods Forty-seven consecutive patients underwent CTP and SPECT-MPI. Perfusion images were interpreted using the 17 segment AHA model and were scored on a 0 (normal) to 3 (abnormal) scale. Summed rest and stress scores were calculated for each vascular territory and patient by adding corresponding segmental scores. Results On a per-segment basis (n = 799), CTP and SPECT-MPI demonstrated excellent correlation: Goodman-Kruskall γ = .59 (P < .0001) for stress and .75 (P < .0001) for rest. On a per-vessel basis (n = 141), CTP and SPECT-MPI summed scores demonstrated good correlation: Pearson r = .56 (P < .0001) for stress and .66 (P < .0001) for rest. On a per-patient basis (n = 47), CTP and SPECT-MPI demonstrated good correlation: Pearson r = .60 (P < .0001) for stress and .76 (P < .0001) for rest. Conclusions CTP compares favorably with SPECT-MPI for detection, extent, and severity of myocardial perfusion defects at rest and stress. PMID:19936863
Pilotto, Alberto; Polidori, Maria Cristina; Veronese, Nicola; Panza, Francesco; Arboretti Giancristofaro, Rosa; Pilotto, Andrea; Daragjati, Julia; Carrozzo, Eleonora; Prete, Camilla; Gallina, Pietro; Padovani, Alessandro; Maggi, Stefania
2018-02-01
To evaluate whether treatment with antidementia drugs is associated with reduced mortality in older patients with different mortality risk at baseline. Retrospective. Community-dwelling. A total of 6818 older people who underwent a Standardized Multidimensional Assessment Schedule for Adults and Aged Persons (SVaMA) evaluation to determine accessibility to homecare services or nursing home admission from 2005 to 2013 in the Padova Health District, Italy were included. Mortality risk at baseline was calculated by the Multidimensional Prognostic Index (MPI), based on information collected with the SVaMA. Participants were categorized to have mild (MPI-SVaMA-1), moderate (MPI-SVaMA-2), and high (MPI-SVaMA-3) mortality risk. Propensity score-adjusted hazard ratios (HR) of 2-year mortality were calculated according to antidementia drug treatment. Patients treated with antidementia drugs had a significant lower risk of death than untreated patients (HR 0.82; 95% confidence interval [CI] 0.73-0.92 and 0.56; 95% CI 0.49-0.65 for patients treated less than 2 years and more than 2 years treatment, respectively). After dividing patients according to their MPI-SVaMA grade, antidementia treatment was significantly associated with reduced mortality in the MPI-SVaMA-1 mild (HR 0.71; 95% CI 0.54-0.92) and MPI-SVaMA-2 moderate risk (HR 0.61; 95% CI 0.40-0.91, matched sample), but not in the MPI-SVaMA-3 high risk of death. This large community-dwelling patient study suggests that antidementia drugs might contribute to increased survival in older adults with dementia with lower mortality risk. Copyright © 2017 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.
Cardiovascular outcomes after pharmacologic stress myocardial perfusion imaging.
Lee, Douglas S; Husain, Mansoor; Wang, Xuesong; Austin, Peter C; Iwanochko, Robert M
2016-04-01
While pharmacologic stress single photon emission computed tomography myocardial perfusion imaging (SPECT-MPI) is used for noninvasive evaluation of patients who are unable to perform treadmill exercise, its impact on net reclassification improvement (NRI) of prognosis is unknown. We evaluated the prognostic value of pharmacologic stress MPI for prediction of cardiovascular death or non-fatal myocardial infarction (MI) within 1 year at a single-center, university-based laboratory. We examined continuous and categorical NRI of pharmacologic SPECT-MPI for prediction of outcomes beyond clinical factors alone. Six thousand two hundred forty patients (median age 66 years [IQR 56-74], 3466 men) were studied and followed for 5963 person-years. SPECT-MPI variables associated with increased risk of cardiovascular death or non-fatal MI included summed stress score, stress ST-shift, and post-stress resting left ventricular ejection fraction ≤50%. Compared to a clinical model which included age, sex, cardiovascular disease, risk factors, and medications, model χ(2) (210.5 vs. 281.9, P < .001) and c-statistic (0.74 vs. 0.78, P < .001) were significantly increased by addition of SPECT-MPI predictors (summed stress score, stress ST-shift and stress resting left ventricular ejection fraction). SPECT-MPI predictors increased continuous NRI by 49.4% (P < .001), reclassifying 66.5% of patients as lower risk and 32.8% as higher risk of cardiovascular death or non-fatal MI. Addition of MPI predictors to clinical factors using risk categories, defined as <1%, 1% to 3%, and >3% annualized risk of cardiovascular death or non-fatal MI, yielded a 15.0% improvement in NRI (95% CI 7.6%-27.6%, P < .001). Pharmacologic stress MPI substantially improved net reclassification of cardiovascular death or MI risk beyond that afforded by clinical factors. Copyright © 2016 Elsevier Inc. All rights reserved.
Paz, Yehuda; Morgenstern, Rachelle; Weinberg, Richard; Chiles, Mariana; Bhatti, Navdeep; Ali, Ziad; Mohan, Sumit; Bokhari, Sabahat
2017-12-01
Cardiovascular disease is the leading cause of death in patients with end-stage renal disease (ESRD) and often goes undetected. Abnormal coronary flow reserve (CFR), which predicts increased risk of cardiac death, may be present in patients with ESRD without other evidence of coronary artery disease (CAD). We prospectively studied 131 patients who had rest and dipyridamole pharmacologic stress N 13 -ammonia positron emission tomography myocardial perfusion imaging (PET MPI) for kidney transplant evaluation. Thirty-four patients also had left heart catheterization. Abnormal PET MPI was defined as qualitative ischemia or infarct, stress electrocardiogram ischemia, or transient ischemic dilation. CFR was calculated as the ratio of stress to rest coronary blood flow. Global CFR < 2 was defined as abnormal. Of 131 patients who had PET MPI (66% male, 55.6 ± 12.1 years), 30% (39 of 131) had abnormal PET MPI and 59% (77 of 131) had abnormal CFR. In a subset of 34 patients who had left heart catheterization (66% male, 61.0 ± 12.1 years), 68% (23 of 34) had abnormal CFR on PET MPI, and 68% (23 of 34) had ≥70% obstruction on left heart catheterization. Abnormal CFR was not significantly associated with abnormal PET MPI (p = 0.13) or obstructive CAD on left heart catheterization (p = 0.26). In conclusion, in the first prospective study of PET MPI in patients with ESRD, abnormal CFR is highly prevalent and is independent of abnormal findings on PET MPI or obstructive CAD on left heart catheterization. Copyright © 2017 Elsevier Inc. All rights reserved.
Dobutamine stress myocardial perfusion imaging: 8-year outcomes in patients with diabetes mellitus.
Boiten, Hendrik J; van Domburg, Ron T; Valkema, Roelf; Zijlstra, Felix; Schinkel, Arend F L
2016-08-01
Many studies have examined the prognostic value of myocardial perfusion imaging (MPI) using single-photon emission computed tomography (SPECT) for the prediction of short- to medium-term outcomes. However, the long-term prognostic value of MPI in patients with diabetes mellitus remains unclear. Therefore, this study assessed the long-term prognostic value of MPI in a high-risk cohort of patients with diabetes mellitus. A high-risk cohort of 207 patients with diabetes mellitus who were unable to undergo exercise testing underwent dobutamine stress MPI. Follow-up was successful in 206 patients; 12 patients were excluded due to early revascularization. The current data are based on the remaining 194 patients. Follow-up end points were all-cause mortality, cardiac mortality, and nonfatal myocardial infarction. The Kaplan-Meier survival curves were constructed, and univariable and multivariable analyses were performed to identify predictors of long-term outcome. During a mean follow-up of 8.1 ± 5.9 years, 134 (69%) patients died of which 68 (35%) died due to cardiac causes. Nonfatal myocardial infarction occurred in 24 patients (12%), and late (>60 days) coronary revascularization was performed in 61 (13%) patients. Survival analysis showed that MPI provided optimal risk stratification up to 4 years after testing. After that period, the outcome was comparable in patients with normal and abnormal MPI. Multivariable analyses showed that MPI provided incremental prognostic value up to 4 years after testing. In high-risk patients with diabetes mellitus, dobutamine MPI provides incremental prognostic information in addition to clinical data for a 4-year period after testing. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2016. For permissions please email: journals.permissions@oup.com.
Pilotto, Alberto; Addante, Filomena; D'Onofrio, Grazia; Sancarlo, Daniele; Ferrucci, Luigi
2009-01-01
The Comprehensive Geriatric Assessment (CGA) is a multidimensional, usually interdisciplinary, diagnostic process intended to determine an elderly person's medical, psychosocial, and functional capacity and problems with the objective of developing an overall plan for treatment and short- and long-term follow-up. The potential usefulness of the CGA in evaluating treatment and follow-up of older patients with gastroenterological disorders is unknown. In the paper we reported the efficacy of a Multidimensional-Prognostic Index (MPI), calculated from information collected by a standardized CGA, in predicting mortality risk in older patients hospitalized with upper gastrointestinal bleeding and liver cirrhosis. Patients underwent a CGA that included six standardized scales, i.e. Activities of Daily Living (ADL), Instrumental Activities of Daily Living (IADL), Short-Portable Mental Status Questionnaire (SPMSQ), Mini-Nutritional Assessment (MNA), Exton-Smith Score (ESS) and Comorbity Index Rating Scale (CIRS), as well as information on medication history and cohabitation, for a total of 63 items. The MPI was calculated from the integrated total scores and expressed as MPI 1=low risk, MPI 2=moderate risk and MPI 3=severe risk of mortality. Higher MPI values were significantly associated with higher short- and long-term mortality in older patients with both upper gastrointestinal bleeding and liver cirrhosis. A close agreement was found between the estimated mortality by MPI and the observed mortality. Moreover, MPI seems to have a greater discriminatory power than organ-specific prognostic indices such as Rockall and Blatchford scores (in upper gastrointestinal bleeding patients) and Child-Plugh score (in liver cirrhosis patients). All these findings support the concept that a multidimensional approach may be appropriate for the evaluation of older patients with gastroenterological disorders, like it has been reported for patients with other pathological conditions.
Myocardial perfusion imaging with PET
Nakazato, Ryo; Berman, Daniel S; Alexanderson, Erick; Slomka, Piotr
2013-01-01
PET-myocardial perfusion imaging (MPI) allows accurate measurement of myocardial perfusion, absolute myocardial blood flow and function at stress and rest in a single study session performed in approximately 30 min. Various PET tracers are available for MPI, and rubidium-82 or nitrogen-13-ammonia is most commonly used. In addition, a new fluorine-18-based PET-MPI tracer is currently being evaluated. Relative quantification of PET perfusion images shows very high diagnostic accuracy for detection of obstructive coronary artery disease. Dynamic myocardial blood flow analysis has demonstrated additional prognostic value beyond relative perfusion imaging. Patient radiation dose can be reduced and image quality can be improved with latest advances in PET/CT equipment. Simultaneous assessment of both anatomy and perfusion by hybrid PET/CT can result in improved diagnostic accuracy. Compared with SPECT-MPI, PET-MPI provides higher diagnostic accuracy, using lower radiation doses during a shorter examination time period for the detection of coronary artery disease. PMID:23671459
Multiphoton imaging with high peak power VECSELs
NASA Astrophysics Data System (ADS)
Mirkhanov, Shamil; Quarterman, Adrian H.; Swift, Samuel; Praveen, Bavishna B.; Smyth, Conor J. C.; Wilcox, Keith G.
2016-03-01
Multiphoton imaging (MMPI) has become one of thee key non-invasive light microscopy techniques. This technique allows deep tissue imaging with high resolution and less photo-damage than conventional confocal microscopy. MPI is type of laser-scanning microscopy that employs localized nonlinear excitation, so that fluorescence is excited only with is scanned focal volume. For many years, Ti: sapphire femtosecond lasers have been the leading light sources for MPI applications. However, recent developments in laser sources and new types of fluorophores indicate that longer wavelength excitation could be a good alternative for these applications. Mode-locked VECSEELs have the potential to be low cost, compact light sources for MPI systems, with the additional advantage of broad wavelength coverage through use of different semiconductor material systems. Here, we use a femtosecond fibber laser to investigate the effect average power and repetition rate has on MPI image quality, to allow us to optimize our mode-locked VVECSELs for MPI.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murase, Kenya, E-mail: murase@sahs.med.osaka-u.ac.jp; Song, Ruixiao; Hiratsuka, Samu
We investigated the feasibility of visualizing blood coagulation using a system for magnetic particle imaging (MPI). A magnetic field-free line is generated using two opposing neodymium magnets and transverse images are reconstructed from the third-harmonic signals received by a gradiometer coil, using the maximum likelihood-expectation maximization algorithm. Our MPI system was used to image the blood coagulation induced by adding CaCl{sub 2} to whole sheep blood mixed with magnetic nanoparticles (MNPs). The “MPI value” was defined as the pixel value of the transverse image reconstructed from the third-harmonic signals. MPI values were significantly smaller for coagulated blood samples than thosemore » without coagulation. We confirmed the rationale of these results by calculating the third-harmonic signals for the measured viscosities of samples, with an assumption that the magnetization and particle size distribution of MNPs obey the Langevin equation and log-normal distribution, respectively. We concluded that MPI can be useful for visualizing blood coagulation.« less
Facilitating Co-Design for Extreme-Scale Systems Through Lightweight Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engelmann, Christian; Lauer, Frank
This work focuses on tools for investigating algorithm performance at extreme scale with millions of concurrent threads and for evaluating the impact of future architecture choices to facilitate the co-design of high-performance computing (HPC) architectures and applications. The approach focuses on lightweight simulation of extreme-scale HPC systems with the needed amount of accuracy. The prototype presented in this paper is able to provide this capability using a parallel discrete event simulation (PDES), such that a Message Passing Interface (MPI) application can be executed at extreme scale, and its performance properties can be evaluated. The results of an initial prototype aremore » encouraging as a simple 'hello world' MPI program could be scaled up to 1,048,576 virtual MPI processes on a four-node cluster, and the performance properties of two MPI programs could be evaluated at up to 16,384 virtual MPI processes on the same system.« less
Evaluating and extending user-level fault tolerance in MPI applications
Laguna, Ignacio; Richards, David F.; Gamblin, Todd; ...
2016-01-11
The user-level failure mitigation (ULFM) interface has been proposed to provide fault-tolerant semantics in the Message Passing Interface (MPI). Previous work presented performance evaluations of ULFM; yet questions related to its programability and applicability, especially to non-trivial, bulk synchronous applications, remain unanswered. In this article, we present our experiences on using ULFM in a case study with a large, highly scalable, bulk synchronous molecular dynamics application to shed light on the advantages and difficulties of this interface to program fault-tolerant MPI applications. We found that, although ULFM is suitable for master–worker applications, it provides few benefits for more common bulkmore » synchronous MPI applications. Furthermore, to address these limitations, we introduce a new, simpler fault-tolerant interface for complex, bulk synchronous MPI programs with better applicability and support than ULFM for application-level recovery mechanisms, such as global rollback.« less
A Convex Formulation for Magnetic Particle Imaging X-Space Reconstruction.
Konkle, Justin J; Goodwill, Patrick W; Hensley, Daniel W; Orendorff, Ryan D; Lustig, Michael; Conolly, Steven M
2015-01-01
Magnetic Particle Imaging (mpi) is an emerging imaging modality with exceptional promise for clinical applications in rapid angiography, cell therapy tracking, cancer imaging, and inflammation imaging. Recent publications have demonstrated quantitative mpi across rat sized fields of view with x-space reconstruction methods. Critical to any medical imaging technology is the reliability and accuracy of image reconstruction. Because the average value of the mpi signal is lost during direct-feedthrough signal filtering, mpi reconstruction algorithms must recover this zero-frequency value. Prior x-space mpi recovery techniques were limited to 1d approaches which could introduce artifacts when reconstructing a 3d image. In this paper, we formulate x-space reconstruction as a 3d convex optimization problem and apply robust a priori knowledge of image smoothness and non-negativity to reduce non-physical banding and haze artifacts. We conclude with a discussion of the powerful extensibility of the presented formulation for future applications.
NASA Astrophysics Data System (ADS)
Keselman, Paul; Yu, Elaine Y.; Zhou, Xinyi Y.; Goodwill, Patrick W.; Chandrasekharan, Prashant; Ferguson, R. Matthew; Khandhar, Amit P.; Kemp, Scott J.; Krishnan, Kannan M.; Zheng, Bo; Conolly, Steven M.
2017-05-01
Magnetic particle imaging (MPI) is an emerging tracer-based medical imaging modality that images non-radioactive, kidney-safe superparamagnetic iron oxide (SPIO) tracers. MPI offers quantitative, high-contrast and high-SNR images, so MPI has exceptional promise for applications such as cell tracking, angiography, brain perfusion, cancer detection, traumatic brain injury and pulmonary imaging. In assessing MPI’s utility for applications mentioned above, it is important to be able to assess tracer short-term biodistribution as well as long-term clearance from the body. Here, we describe the biodistribution and clearance for two commonly used tracers in MPI: Ferucarbotran (Meito Sangyo Co., Japan) and LS-oo8 (LodeSpin Labs, Seattle, WA). We successfully demonstrate that 3D MPI is able to quantitatively assess short-term biodistribution, as well as long-term tracking and clearance of these tracers in vivo.
Reincarnation of Streaming Applications
2009-10-01
will be procured and implemented over the next few years. Once operational , the IC SOA/grid/EDA will continue to l t h l i l d i d l d ievo ve as new...inter- 10/6/200916 , operable with Fortran with MPI. 76 Further Data Isn’t Surprising Either. Total j t age total number of Largest...burden, to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and Reports (0704-0188), 1215 Jefferson Davis
Introduction to Phase-Resolving Wave Modeling with FUNWAVE
2015-07-01
Boussinesq wave models have become a useful tool for modeling surface wave transformation from deep water to the swash zone, as well as wave-induced...overlapping area of ghost cells, three rows deep , as required by the fourth-order MUSCL-TVD scheme. The MPI with nonblocking communication was used to...implemented ERDC/CHL CHETN-I-87 July 2015 12 SPONGE LAYER SPONGE_ON Sponge_west_width Sponge_east_width Sponge_south_width
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A. T.
2014-12-26
Avalaunch implements a tree-based process launcher. It first bootstraps itself on to a set of compute nodes by launching children processes, which immediately connect back to the parent process to acquire info needed t launch their own children. Once the tree is established, user processes are started by broadcasting commands and application binaries through the tree. All communication flows over high-performance network protocols via spawnnet. The goal is to start MPI jobs having hundreds of thousands of processes within seconds.
Using Modules with MPICH-G2 (and "Loose Ends")
NASA Technical Reports Server (NTRS)
Chang, Johnny; Thigpen, William W. (Technical Monitor)
2002-01-01
A new approach to running complex, distributed MPI jobs using the MPICH-G2 library is described. This approach allows the user to switch between different versions of compilers, system libraries, MPI libraries, etc. via the "module" command. The key idea is a departure from the prescribed "(jobtype=mpi)" approach to running distributed MPI jobs. The new method requires the user to provide a script that will be run as the "executable" with the "(jobtype=single)" RSL attribute. The major advantage of the proposed method is to enable users to decide in their own script what modules, environment, etc. they would like to have in running their job.
A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hursey, Joshua J; Naughton, III, Thomas J; Vallee, Geoffroy R
The lack of fault tolerance is becoming a limiting factor for application scalability in HPC systems. The MPI does not provide standardized fault tolerance interfaces and semantics. The MPI Forum's Fault Tolerance Working Group is proposing a collective fault tolerant agreement algorithm for the next MPI standard. Such algorithms play a central role in many fault tolerant applications. This paper combines a log-scaling two-phase commit agreement algorithm with a reduction operation to provide the necessary functionality for the new collective without any additional messages. Error handling mechanisms are described that preserve the fault tolerance properties while maintaining overall scalability.
New Trends in Radionuclide Myocardial Perfusion Imaging
Hung, Guang-Uei; Wang, Yuh-Feng; Su, Hung-Yi; Hsieh, Te-Chun; Ko, Chi-Lun; Yen, Ruoh-Fang
2016-01-01
Radionuclide myocardial perfusion imaging (MPI) with single photon emission computed tomography (SPECT) has been widely used clinically as one of the major functional imaging modalities for patients with coronary artery disease (CAD) for decades. Ample evidence has supported the use of MPI as a useful and important tool in the diagnosis, risk stratification and treatment planning for CAD. Although popular in the United States, MPI has become the most frequently used imaging modality among all nuclear medicine tests in Taiwan. However, it should be acknowledged that MPI SPECT does have its limitations. These include false-positive results due to certain artifacts, false-negative due to balanced ischemia, complexity and adverse reaction arising from current pharmacological stressors, time consuming nature of the imaging procedure, no blood flow quantitation and relatively high radiation exposure. The purpose of this article was to review the recent trends in nuclear cardiology, including the utilization of positron emission tomography (PET) for MPI, new stressor, new SPECT camera with higher resolution and higher sensitivity, dynamic SPECT protocol for blood flow quantitation, new software of phase analysis for evaluation of LV dyssynchrony, and measures utilized for reducing radiation exposure of MPI. PMID:27122946
Relaxation-based viscosity mapping for magnetic particle imaging.
Utkur, M; Muslu, Y; Saritas, E U
2017-05-07
Magnetic particle imaging (MPI) has been shown to provide remarkable contrast for imaging applications such as angiography, stem cell tracking, and cancer imaging. Recently, there is growing interest in the functional imaging capabilities of MPI, where 'color MPI' techniques have explored separating different nanoparticles, which could potentially be used to distinguish nanoparticles in different states or environments. Viscosity mapping is a promising functional imaging application for MPI, as increased viscosity levels in vivo have been associated with numerous diseases such as hypertension, atherosclerosis, and cancer. In this work, we propose a viscosity mapping technique for MPI through the estimation of the relaxation time constant of the nanoparticles. Importantly, the proposed time constant estimation scheme does not require any prior information regarding the nanoparticles. We validate this method with extensive experiments in an in-house magnetic particle spectroscopy (MPS) setup at four different frequencies (between 250 Hz and 10.8 kHz) and at three different field strengths (between 5 mT and 15 mT) for viscosities ranging between 0.89 mPa · s-15.33 mPa · s. Our results demonstrate the viscosity mapping ability of MPI in the biologically relevant viscosity range.
Development of training modules for magnetic particle inspection
NASA Astrophysics Data System (ADS)
Kosaka, Daigo; Eisenmann, David J.; Enyart, Darrel; Nakagawa, Norio; Lo, Chester; Orman, David
2015-03-01
Magnetic particle inspection (MPI) is a nondestructive evaluation technique used with ferromagnetic materials. Although the application of this method may appear straightforward, MPI combines the complicated nature of electromagnetics, metallurgical material effects, fluid-particle motion dynamics, and physiological human factors into a single inspection. To fully appreciate industry specifications such as ASTM E-1444, users should develop a basic understanding of the many factors that are involved in MPI. We have developed a series of MPI training modules that are aimed at addressing this requirement. The modules not only offer qualitative explanations, but also show quantitative explanations in terms of measurement and numerical simulation data in many instances. There are five modules in all. Module ♯1 shows characteristics of waveforms and magnetizing methods. This allows MPI practitioners to make optimum choice of waveform and magnetizing method. Module ♯2 explains how material properties relate to the magnetic characteristics. Module ♯3 shows the strength of the excitation field or the flux leakage from a crack and how it compares to the detectability of a crack by MPI. Module ♯4 shows how specimen status may influence defect detection. Module ♯5 shows the effects of particle properties on defect detection.
Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization
NASA Astrophysics Data System (ADS)
Berg, Bernd A.; Wu, Hao
2012-10-01
We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11.5. Nature of problem: Physics of pure SU(3) Quantum Field Theory (QFT). This is relevant for our understanding of Quantum Chromodynamics (QCD). It includes the glueball spectrum, topological properties and the deconfining phase transition of pure SU(3) QFT. For instance, Relativistic Heavy Ion Collision (RHIC) experiments at the Brookhaven National Laboratory provide evidence that quarks confined in hadrons undergo at high enough temperature and pressure a transition into a Quark-Gluon Plasma (QGP). Investigations of its thermodynamics in pure SU(3) QFT are of interest. Solution method: Markov Chain Monte Carlo (MCMC) simulations of SU(3) Lattice Gauge Theory (LGT) with the Wilson action. This is a regularization of pure SU(3) QFT on a hypercubic lattice, which allows approaching the continuum SU(3) QFT by means of Finite Size Scaling (FSS) studies. Specifically, we provide updating routines for the Cabibbo-Marinari heatbath with and without checkerboard parallelization. While the first is suitable for pedagogical purposes and small scale projects, the latter allows for efficient parallel processing. Targetting the geometry of RHIC experiments, we have implemented a Double-Layered Torus (DLT) lattice geometry, which has previously not been used in LGT MCMC simulations and enables inside and outside layers at distinct temperatures, the lower-temperature layer acting as the outside boundary for the higher-temperature layer, where the deconfinement transition goes on. Restrictions: The checkerboard partition of the lattice makes the development of measurement programs more tedious than is the case for an unpartitioned lattice. Presently, only one measurement routine for Polyakov loops is provided. Unusual features: We provide three different versions for the send/receive function of the MPI library, which work for different operating system +compiler +MPI combinations. This involves activating the correct row in the last three rows of our latmpi.par parameter file. The underlying reason is distinct buffer conventions. Running time: For a typical run using an Intel i7 processor, it takes (1.8-6) E-06 seconds to update one link of the lattice, depending on the compiler used. For example, if we do a simulation on a small (4 * 83) DLT lattice with a statistics of 221 sweeps (i.e., update the two lattice layers of 4 * (4 * 83) links each 221 times), the total CPU time needed can be 2 * 4 * (4 * 83) * 221 * 3 E-06 seconds = 1.7 minutes, where 2 — two layers of lattice 4 — four dimensions 83 * 4 — lattice size 221 — sweeps of updating 6 E-06 s mdash; average time to update one link variable. If we divide the job into 8 parallel processes, then the real time is (for negligible communication overhead) 1.7 mins / 8 = 0.2 mins.
Cazet, Aurélie; Charest, Jonathan; Bennett, Daniel C; Sambrooks, Cecilia Lopez; Contessa, Joseph N
2014-01-01
Asparagine-linked glycosylation is an endoplasmic reticulum co- and post-translational modification that enables the transit and function of receptor tyrosine kinase (RTK) glycoproteins. To gain insight into the regulatory role of glycosylation enzymes on RTK function, we investigated shRNA and siRNA knockdown of mannose phosphate isomerase (MPI), an enzyme required for mature glycan precursor biosynthesis. Loss of MPI activity reduced phosphorylation of FGFR family receptors in U-251 and SKMG-3 malignant glioma cell lines and also resulted in significant decreases in FRS2, Akt, and MAPK signaling. However, MPI knockdown did not affect ligand-induced activation or signaling of EGFR or MET RTKs, suggesting that FGFRs are more susceptible to MPI inhibition. The reductions in FGFR signaling were not caused by loss of FGF ligands or receptors, but instead were caused by interference with receptor dimerization. Investigations into the cellular consequences of MPI knockdown showed that cellular programs driven by FGFR signaling, and integral to the clinical progression of malignant glioma, were impaired. In addition to a blockade of cellular migration, MPI knockdown also significantly reduced glioma cell clonogenic survival following ionizing radiation. Therefore our results suggest that targeted inhibition of enzymes required for cell surface receptor glycosylation can be manipulated to produce discrete and limited consequences for critical client glycoproteins expressed by tumor cells. Furthermore, this work identifies MPI as a potential enzymatic target for disrupting cell surface receptor-dependent survival signaling and as a novel approach for therapeutic radiosensitization.
The Particle Accelerator Simulation Code PyORBIT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorlov, Timofey V; Holmes, Jeffrey A; Cousineau, Sarah M
2015-01-01
The particle accelerator simulation code PyORBIT is presented. The structure, implementation, history, parallel and simulation capabilities, and future development of the code are discussed. The PyORBIT code is a new implementation and extension of algorithms of the original ORBIT code that was developed for the Spallation Neutron Source accelerator at the Oak Ridge National Laboratory. The PyORBIT code has a two level structure. The upper level uses the Python programming language to control the flow of intensive calculations performed by the lower level code implemented in the C++ language. The parallel capabilities are based on MPI communications. The PyORBIT ismore » an open source code accessible to the public through the Google Open Source Projects Hosting service.« less
Eddy current-shielded x-space relaxometer for sensitive magnetic nanoparticle characterization
Bauer, L. M.; Hensley, D. W.; Zheng, B.; Tay, Z. W.; Goodwill, P. W.; Griswold, M. A.; Conolly, S. M.
2016-01-01
The development of magnetic particle imaging (MPI) has created a need for optimized magnetic nanoparticles. Magnetic particle relaxometry is an excellent tool for characterizing potential tracers for MPI. In this paper, we describe the design and construction of a high-throughput tabletop relaxometer that is able to make sensitive measurements of MPI tracers without the need for a dedicated shield room. PMID:27250472
Eddy current-shielded x-space relaxometer for sensitive magnetic nanoparticle characterization.
Bauer, L M; Hensley, D W; Zheng, B; Tay, Z W; Goodwill, P W; Griswold, M A; Conolly, S M
2016-05-01
The development of magnetic particle imaging (MPI) has created a need for optimized magnetic nanoparticles. Magnetic particle relaxometry is an excellent tool for characterizing potential tracers for MPI. In this paper, we describe the design and construction of a high-throughput tabletop relaxometer that is able to make sensitive measurements of MPI tracers without the need for a dedicated shield room.
[Peritonitis in diverticulitis: the Bern concept].
Seiler, C A; Brügger, L; Maurer, C A; Renzulli, P; Büchler, M W
1998-01-01
The colon is the most frequent origine for a diffuse peritonitis and diverticular perforation is again the most common source of a spontaneous secondary peritonitis. This paper first focuses on the treatment of peritonitis and secondly on the strategies of source control in peritonitis with special emphasis on the tactics (primary anastomosis vs. Hartmann procedure with colostomy) for surgical source control. Prospective analysis of 404 patients suffering from peritonitis (11/93-2/98), treated with an uniform treatment concept including early operation, source control and extensive intraoperative lavage (20 to 30 liters) as a standard procedure. Other treatment measures were added in special indications "on demand" only. Peritonitis was graded with the Mannheim Peritonitis Index (MPI). Tactics of source control in peritonitis due to diverticulitis were performed according to "general condition" respectively the MPI of the patient. The 404 patients averaged a MPI of 19 (0-35) in "local" peritonitis and a MPI of 26 (11-43) in "diffuse" peritonitis. The colon as a source of peritonitis resulted in MPI of 16 (0-33) in the case of "local" respectively 27 (11-43) in "diffuse" peritonitis. From 181 patients suffering from diverticulitis 144 needed an operation and in 78 (54%) peritonitis was present. Fourty-six percent (36) of the patients suffered from "local", 54% (42) from "diffuse" peritonitis. Resection with primary anastomosis was performed in 26% (20/78) whereas in 74% (58/78) of the patients a Hartmann procedure with colostomy was performed. The correlating MPI was 16 (0-28) vs. 23 (16-27) respectively. The analysis of complications and mortality based on the MPI showed a decent discrimination potential for primary anastomosis vs Hartmann procedure: morbidity 35% vs. 41%; reoperation 5% vs. 5%; mortality 0% vs. 14%. In case of peritonitis due to diverticulitis the treatment of peritonitis comes first. Thanks to advances in intensive care and improved anti-inflammatory care, a more conservative surgical concept nowadays is accepted. In the case of diverticulitis the MPI is helpful to choose between primary anastomosis vs. Hartmann procedure with colostomy as source control. The MPI includes the "general condition" of the patient into the tactical decision how to attain source control.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
ERIC Educational Resources Information Center
Vincent, Claudia; Tobin, Tary; Van Ryzin, Mark
2017-01-01
The Native Community strongly recommends integrating Native language and culture (NLC) into reading instruction to improve outcomes for American Indian/Alaska Native (AI/AN) students. However, little is known about the extent to which recommended practices are used and what might facilitate their implementation. The National Indian Education Study…
Improved field free line magnetic particle imaging using saddle coils.
Erbe, Marlitt; Sattel, Timo F; Buzug, Thorsten M
2013-12-01
Magnetic particle imaging (MPI) is a novel tracer-based imaging method detecting the distribution of superparamagnetic iron oxide (SPIO) nanoparticles in vivo in three dimensions and in real time. Conventionally, MPI uses the signal emitted by SPIO tracer material located at a field free point (FFP). To increase the sensitivity of MPI, however, an alternative encoding scheme collecting the particle signal along a field free line (FFL) was proposed. To provide the magnetic fields needed for line imaging in MPI, a very efficient scanner setup regarding electrical power consumption is needed. At the same time, the scanner needs to provide a high magnetic field homogeneity along the FFL as well as parallel to its alignment to prevent the appearance of artifacts, using efficient radon-based reconstruction methods arising for a line encoding scheme. This work presents a dynamic FFL scanner setup for MPI that outperforms all previously presented setups in electrical power consumption as well as magnetic field quality.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2017-01-01
In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
NASA Astrophysics Data System (ADS)
Chen, M.; Wei, S.
2016-12-01
The serious damage of Mexico City caused by the 1985 Michoacan earthquake 400 km away indicates that urban areas may be affected by remote earthquakes. To asses earthquake risk of urban areas imposed by distant earthquakes, we developed a hybrid Frequency Wavenumber (FK) and Finite Difference (FD) code implemented with MPI, since the computation of seismic wave propagation from a distant earthquake using a single numerical method (e.g. Finite Difference, Finite Element or Spectral Element) is very expensive. In our approach, we compute the incident wave field (ud) at the boundaries of the excitation box, which surrounding the local structure, using a paralleled FK method (Zhu and Rivera, 2002), and compute the total wave field (u) within the excitation box using a parallelled 2D FD method. We apply perfectly matched layer (PML) absorbing condition to the diffracted wave field (u-ud). Compared to previous Generalized Ray Theory and Finite Difference (Wen and Helmberger, 1998), Frequency Wavenumber and Spectral Element (Tong et al., 2014), and Direct Solution Method and Spectral Element hybrid method (Monteiller et al., 2013), our absorbing boundary condition dramatically suppress the numerical noise. The MPI implementation of our method can greatly speed up the calculation. Besides, our hybrid method also has a potential use in high resolution array imaging similar to Tong et al. (2014).
Performance of OVERFLOW-D Applications based on Hybrid and MPI Paradigms on IBM Power4 System
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biegel, Bryan (Technical Monitor)
2002-01-01
This report briefly discusses our preliminary performance experiments with parallel versions of OVERFLOW-D applications. These applications are based on MPI and hybrid paradigms on the IBM Power4 system here at the NAS Division. This work is part of an effort to determine the suitability of the system and its parallel libraries (MPI/OpenMP) for specific scientific computing objectives.
: A Scalable and Transparent System for Simulating MPI Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2010-01-01
is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
Assessment of Fetal Myocardial Performance Index in Women with Placenta Previa.
Zhang, Na; Sun, Lijuan; Zhang, Lina; Li, Zhen; Han, Jijing; Wu, Qingqing
2017-12-15
BACKGROUND This study investigated whether fetuses of placenta previa pregnancies have cardiac dysfunction by use of a modified myocardial performance index (Mod-MPI). MATERIAL AND METHODS A prospective cross-sectional study was conducted including 178 fetuses at 28-40 weeks of gestation. Eighty-nine fetuses of mothers with placenta previa and without pregnancy complications were recruited (placenta previa group) and matched with 89 fetuses of mothers with normal pregnancies (control group). Fetal cardiac function parameters and perinatal outcomes as well as the Mod-MPI were compared between the 2 groups. RESULTS The median Mod-MPI was significantly increased in fetuses of mothers with placenta previa compared with controls (0.47±0.05 vs. 0.45±0.05; P<0.01). Among fetuses of mothers with or without placenta previa, the Mod-MPI was significantly higher in the incomplete placenta previa group compared with the complete placenta previa group and control group (P<0.01). An increased Mod-MPI in placenta previa pregnancies was independently associated with fetal cord pH <7.2 (odds ratio, 4.8; 95% confidence interval, 0.98-23.54; P=0.003). CONCLUSIONS There is impairment of fetal cardiac function in pregnancies with placenta previa. An increased MPI was independently associated with adverse perinatal outcomes to some extent in the placenta previa pregnancies.
Qinghua, Zhao; Jipeng, Li; Yongxing, Zhang; He, Liang; Xuepeng, Wang; Peng, Yan; Xiaofeng, Wu
2015-04-07
To employ three-dimensional finite element modeling and biomechanical simulation for evaluating the stability and stress conduction of two postoperative internal fixed modeling-multilevel posterior instrumentation ( MPI) and MPI with anterior instrumentation (MPAI) with neck-thoracic vertebral tumor en bloc resection. Mimics software and computed tomography (CT) images were used to establish the three-dimensional (3D) model of vertebrae C5-T2 and simulated the C7 en bloc vertebral resection for MPI and MPAI modeling. Then the statistics and images were transmitted into the ANSYS finite element system and 20N distribution load (simulating body weight) and applied 1 N · m torque on neutral point for simulating vertebral displacement and stress conduction and distribution of motion mode, i. e. flexion, extension, bending and rotating. With a better stability, the displacement of two adjacent vertebral bodies of MPI and MPAI modeling was less than that of complete vertebral modeling. No significant differences existed between each other. But as for stress shielding effect reduction, MPI was slightly better than MPAI. From biomechanical point of view, two internal instrumentations with neck-thoracic tumor en bloc resection may achieve an excellent stability with no significant differences. But with better stress conduction, MPI is more advantageous in postoperative reconstruction.
Zheng, Bo; von See, Marc P.; Yu, Elaine; Gunel, Beliz; Lu, Kuan; Vazin, Tandis; Schaffer, David V.; Goodwill, Patrick W.; Conolly, Steven M.
2016-01-01
Stem cell therapies have enormous potential for treating many debilitating diseases, including heart failure, stroke and traumatic brain injury. For maximal efficacy, these therapies require targeted cell delivery to specific tissues followed by successful cell engraftment. However, targeted delivery remains an open challenge. As one example, it is common for intravenous deliveries of mesenchymal stem cells (MSCs) to become entrapped in lung microvasculature instead of the target tissue. Hence, a robust, quantitative imaging method would be essential for developing efficacious cell therapies. Here we show that Magnetic Particle Imaging (MPI), a novel technique that directly images iron-oxide nanoparticle-tagged cells, can longitudinally monitor and quantify MSC administration in vivo. MPI offers near-ideal image contrast, depth penetration, and robustness; these properties make MPI both ultra-sensitive and linearly quantitative. Here, we imaged, for the first time, the dynamic trafficking of intravenous MSC administrations using MPI. Our results indicate that labeled MSC injections are immediately entrapped in lung tissue and then clear to the liver within one day, whereas standard iron oxide particle (Resovist) injections are immediately taken up by liver and spleen. Longitudinal MPI-CT imaging also indicated a clearance half-life of MSC iron oxide labels in the liver at 4.6 days. Finally, our ex vivo MPI biodistribution measurements of iron in liver, spleen, heart, and lungs after injection showed excellent agreement (R2 = 0.943) with measurements from induction coupled plasma spectrometry. These results demonstrate that MPI offers strong utility for noninvasively imaging and quantifying the systemic distribution of cell therapies and other therapeutic agents. PMID:26909106
Min, James K; Hasegawa, James T; Machacz, Susanne F; O'Day, Ken
2016-02-01
This study compared costs and clinical outcomes of invasive versus non-invasive diagnostic evaluations for patients with suspected in-stent restenosis (ISR) after percutaneous coronary intervention. We developed a decision model to compare 2 year diagnosis-related costs for patients who presented with suspected ISR and were evaluated by: (1) invasive coronary angiography (ICA); (2) non-invasive stress testing strategy of myocardial perfusion imaging (MPI) with referral to ICA based on MPI; (3) coronary CT angiography-based testing strategy with referral to ICA based on CCTA. Costs were modeled from the payer's perspective using 2014 Medicare rates. 56 % of patients underwent follow-up diagnostic testing over 2 years. Compared to ICA, MPI (98.6 %) and CCTA (98.1 %) exhibited lower rates of correct diagnoses. Non-invasive strategies were associated with reduced referrals to ICA and costs compared to an ICA-based strategy, with diagnostic costs lower for CCTA than MPI. Overall 2-year costs were highest for ICA for both metallic as well as BVS stents ($1656 and $1656, respectively) when compared to MPI ($1444 and $1411) and CCTA. CCTA costs differed based upon stent size and type, and were highest for metallic stents >3.0 mm followed by metallic stents <3.0 mm, BVS < 3.0 mm and BVS > 3.0 mm ($1466 vs. $1242 vs. $855 vs. $490, respectively). MPI for suspected ISR results in lower costs and rates of complications than invasive strategies using ICA while maintaining high diagnostic performance. Depending upon stent size and type, CCTA results in lower costs than MPI.
Einstein, Andrew J.; Weiner, Shepard D.; Bernheim, Adam; Kulon, Michal; Bokhari, Sabahat; Johnson, Lynne L.; Moses, Jeffrey W.; Balter, Stephen
2013-01-01
Context Myocardial perfusion imaging (MPI) is the single medical test with the highest radiation burden to the US population. While many patients undergoing MPI receive repeat MPI testing, or additional procedures involving ionizing radiation, no data are available characterizing their total longitudinal radiation burden and relating radiation burden with reasons for testing. Objective To characterize procedure counts, cumulative estimated effective doses of radiation, and clinical indications, for patients undergoing MPI. Design, Setting, Patients Retrospective cohort study evaluating, for 1097 consecutive patients undergoing index MPI during the first 100 days of 2006 at Columbia University Medical Center, all preceding medical imaging procedures involving ionizing radiation undergone beginning October 1988, and all subsequent procedures through June 2008, at that center. Main Outcome Measures Cumulative estimated effective dose of radiation, number of procedures involving radiation, and indications for testing. Results Patients underwent a median (interquartile range, mean) of 15 (6–32, 23.9) procedures involving radiation exposure; 4 (2–8, 6.5) were high-dose (≥3 mSv, i.e. one year's background radiation), including 1 (1–2, 1.8) MPI studies per patient. 31% of patients received cumulative estimated effective dose from all medical sources >100mSv. Multiple MPIs were performed in 39% of patients, for whom cumulative estimated effective dose was 121 (81–189, 149) mSv. Men and whites had higher cumulative estimated effective doses, and there was a trend towards men being more likely to undergo multiple MPIs than women (40.8% vs. 36.6%, Odds ratio 1.29, 95% confidence interval 0.98–1.69). Over 80% of initial and 90% of repeat MPI exams were performed in patients with known cardiac disease or symptoms consistent with it. Conclusion In this institution, multiple testing with MPI was very common, and in many patients associated with very high cumulative estimated doses of radiation. PMID:21078807
Meybeck, Michel; Horowitz, A.J.; Grosbois, C.
2004-01-01
Spatial analysis (1994-2001) and temporal trends (1980-2000) for particulate-associated metals at key stations in the Seine River Basin have been determined using a new metal pollution index (MPI). The MPI is based on the concentrations of Cd, Cu, Hg, Pb and Zn, normalized to calculated background levels estimated for each particulate matter samples for four fractions (clays and other aluminosilicates, carbonates, organic matter, and quartz). Background levels ascribed to each fraction were determined from a specific set of samples collected from relatively pristine areas in the upper Seine basin and validated on prehistoric samples. The unitless MPI is designed to vary between 0 for pristine samples to 100 for the ones extremely impacted by human activities and to assess the trends of general metal contamination and its mapping. Throughout the Seine basin, MPI currently range from 1 to 40, but values exceeding 100 have been found in periurban streams and the Eure tributary. Based on the MPI spatial distribution, the Seine River Basin displays a wide range of anthropogenic impacts linked to variations in population density, stream order, wastewater discharges and industrial activities. Correlations between the MPI and other trace elements indicate that anthropogenic impacts also strongly affect the concentrations of Ag, Sb, and P, marginally affect the concentrations of Ba, Ni, and Cr, and appear to have little effect on the concentrations of Li, Be, V, Co, and the major elements. Temporal MPI trends can also be reconstituted from past regulatory surveys. In the early 1980s, MPI were 2-5 times higher than nowadays at most locations, particularly downstream of Greater Paris where it reached levels as high as 250 (now 40), a value characteristic of present Paris urban sewage. The exceptional contamination of the Seine basin is gradually improving over the last 20 years but remains very high. ?? 2004 Elsevier B.V. All rights reserved.
Yıldırım Poyraz, Nilüfer; Özdemir, Elif; Poyraz, Barış Mustafa; Kandemir, Zuhal; Keskin, Mutlay; Türkölmez, Şeyda
2014-01-01
Objective: The aim of this study was to investigate the relationship between patient characteristics and adenosine-related side-effects during stress myocard perfusion imaging (MPI). The effect of presence of adenosine-related side-effects on the diagnostic value of MPI with integrated SPECT/CT system for coronary artery disease (CAD), was also assessed in this study. Methods: Total of 281 patients (109 M, 172 F; mean age:62.6±10) who underwent standard adenosine stress protocol for MPI, were included in this study. All symptoms during adenosine infusion were scored according to the severity and duration. For the estimation of diagnostic value of adenosine MPI with integrated SPECT/CT system, coronary angiography (CAG) or clinical follow-up were used as gold standard. Results: Total of 173 patients (61.6%) experienced adenosine-related side-effects (group 1); flushing, dyspnea, and chest pain were the most common. Other 108 patients completed pharmacologic stress (PS) test without any side-effects (group 2). Test tolerability were similar in the patients with cardiovascular or airway disease to others, however dyspnea were observed significantly more common in patients with mild airway disease. Body mass index (BMI) ≥30 kg/m2 and age ≤45 years were independent predictors of side-effects. The diagnostic value of MPI was similar in both groups. Sensitivity of adenosine MPI SPECT/CT was calculated to be 86%, specificity was 94% and diagnostic accuracy was 92% for diagnosis of CAD. Conclusion: Adenosine MPI is a feasible and well tolerated method in patients who are not suitable for exercise stress test as well as patients with cardiopulmonary disease. However age ≤45 years and BMI ≥30 kg/m2 are the positive predictors of adenosine-related side-effects, the diagnostic value of adenosine MPI SPECT/CT is not affected by the presence of adenosine related side-effects. PMID:25541932
MPI investigation for 40G NRZ link with low-RL cable assemblies
NASA Astrophysics Data System (ADS)
Satake, Toshiaki; Berdinskikh, Tatiana; Thongdaeng, Rutsuda; Faysanyo, Pitak; Gurreri, Michael
2017-01-01
Bit Error Ratio (BER) dependence on received power was studied for 40Gb/s NRZ short optical fiber transmission, including a series of four low return loss (RL 21dB) and low insertion loss (IL 0.1dB) connections. The calculated power penalty (PP) was 0.15dB for BER 10-11. Although the fiber length was within DFB laser's coherent length of 100m and the multi path interference (MPI) value was 34.3dB, no PP of BER was observed. There was no PP due to low MPI probably because the polarization of the signal pulses were not aligned for optical interference, indicating that NRZ systems have a high resistance to MPI.
Shelat, Vishal G; Ahmed, Saleem; Chia, Clement L K; Cheah, Yee Lee
2015-02-01
Application of minimal access surgery in acute care surgery is limited due to various reasons. Laparoscopic omental patch repair (LOPR) for perforated peptic ulcer (PPU) surgery is safe and feasible but not widely implemented. We report our early experience of LOPR with emphasis on strict selection criteria. This is a descriptive study of all patients operated on for PPU at academic university-affiliated institutes from December 2010 to February 2012. All the patients who were operated on for LOPR were included as the study population and their records were studied. Perioperative outcomes, Boey score, Mannheim Peritonitis Index (MPI), and physiologic and operative severity scores for enumeration of mortality and morbidity (POSSUM) scores were calculated. All the data were tabulated in a Microsoft Excel spreadsheet and analyzed using Stata Version 8.x. (StataCorp, College Station, TX, USA). Fourteen patients had LOPR out of a total of 45 patients operated for the PPU. Mean age was 46 years (range 22-87 years). Twelve patients (86%) had a Boey score of 0 and all patients had MPI < 21 (mean MPI = 14). The predicted POSSUM morbidity and mortality were 36% and 7%, respectively. Mean ulcer size was 5 mm (range 2-10 mm), mean operating time was 100 minutes (range 70-123 minutes) and mean length of hospital stay was 4 days (range 3-6 days). There was no morbidity or mortality pertaining to LOPR. LOPR should be offered by acute care surgical teams when local expertise is available. This can optimize patient outcomes when strict selection criteria are applied.
FLY MPI-2: a parallel tree code for LSS
NASA Astrophysics Data System (ADS)
Becciani, U.; Comparato, M.; Antonuccio-Delogu, V.
2006-04-01
New version program summaryProgram title: FLY 3.1 Catalogue identifier: ADSC_v2_0 Licensing provisions: yes Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland No. of lines in distributed program, including test data, etc.: 158 172 No. of bytes in distributed program, including test data, etc.: 4 719 953 Distribution format: tar.gz Programming language: Fortran 90, C Computer: Beowulf cluster, PC, MPP systems Operating system: Linux, Aix RAM: 100M words Catalogue identifier of previous version: ADSC_v1_0 Journal reference of previous version: Comput. Phys. Comm. 155 (2003) 159 Does the new version supersede the previous version?: yes Nature of problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force Solution method: FLY is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986) Reasons for the new version: The new version of FLY is implemented by using the MPI-2 standard: the distributed version 3.1 was developed by using the MPICH2 library on a PC Linux cluster. Today the FLY performance allows us to consider the FLY code among the most powerful parallel codes for tree N-body simulations. Another important new feature regards the availability of an interface with hydrodynamical Paramesh based codes. Simulations must follow a box large enough to accurately represent the power spectrum of fluctuations on very large scales so that we may hope to compare them meaningfully with real data. The number of particles then sets the mass resolution of the simulation, which we would like to make as fine as possible. The idea to build an interface between two codes, that have different and complementary cosmological tasks, allows us to execute complex cosmological simulations with FLY, specialized for DM evolution, and a code specialized for hydrodynamical components that uses a Paramesh block structure. Summary of revisions: The parallel communication schema was totally changed. The new version adopts the MPICH2 library. Now FLY can be executed on all Unix systems having an MPI-2 standard library. The main data structure, is declared in a module procedure of FLY (fly_h.F90 routine). FLY creates the MPI Window object for one-sided communication for all the shared arrays, with a call like the following: CALL MPI_WIN_CREATE(POS, SIZE, REAL8, MPI_INFO_NULL, MPI_COMM_WORLD, WIN_POS, IERR) the following main window objects are created: win_pos, win_vel, win_acc: particles positions velocities and accelerations, win_pos_cell, win_mass_cell, win_quad, win_subp, win_grouping: cells positions, masses, quadrupole momenta, tree structure and grouping cells. Other windows are created for dynamic load balance and global counters. Restrictions: The program uses the leapfrog integrator schema, but could be changed by the user. Unusual features: FLY uses the MPI-2 standard: the MPICH2 library on Linux systems was adopted. To run this version of FLY the working directory must be shared among all the processors that execute FLY. Additional comments: Full documentation for the program is included in the distribution in the form of a README file, a User Guide and a Reference manuscript. Running time: IBM Linux Cluster 1350, 512 nodes with 2 processors for each node and 2 GB RAM for each processor, at Cineca, was adopted to make performance tests. Processor type: Intel Xeon Pentium IV 3.0 GHz and 512 KB cache (128 nodes have Nocona processors). Internal Network: Myricom LAN Card "C" Version and "D" Version. Operating System: Linux SuSE SLES 8. The code was compiled using the mpif90 compiler version 8.1 and with basic optimization options in order to have performances that could be useful compared with other generic clusters Processors
PARAVT: Parallel Voronoi tessellation code
NASA Astrophysics Data System (ADS)
González, R. E.
2016-10-01
In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Cury, Alexandre Ferreira; Bonilha, Andre; Saraiva, Roberto; Campos, Orlando; Carvalho, Antonio Carlos C; De Paola, Angelo Amato V; Fischer, Claudio; Tucci, Paulo Ferreira; Moises, Valdir Ambrosio
2005-05-01
The aim of the study was to analyze the myocardial performance index (MPI), its relationship with the standard variables of systolic and diastolic functions, and the influence of time intervals in an experimental model of female rats with myocardial infarction (MI). Forty-one Wistar female rats were submitted to surgery to induce MI. Six weeks later, Doppler echocardiography was performed to assess infarct size (IS,%), fractional area change (FAC,%), ejection fraction biplane Simpson (EF), E/A ratio of mitral inflow, MPI and its time intervals: isovolumetric contraction (IVCT, ms) and relaxation (IVRT, ms) times, and ejection time (ET, ms); MPI = IVCT + IVRT/ET. EF and FAC were progressively lower in rats with small, medium and large-size MI ( P < .001). E/A ratio was higher only in rats with large-size MI (6.25 +/- 2.69; P < .001). MPI was not different between control rats and small-size MI (0.37 +/- 0.03 vs 0.34 +/- 0.06, P = .87), but different between large and medium-size MI (0.69 +/- 0.08 vs 0.47 +/- 0.07; P < .001) and between these two compared to small-size MI. MPI correlated with IS (r = 0.85; P < .001), EF (r = -0.86; P < .001), FAC (r = -0.77; P < .001) and E/A ratio (r = 0.77; P < .001, non-linear). IVCT was longer in large size MI compared to medium-size MI (31.87 +/- 7.99 vs 15.92 +/- 5.88; P < .001) and correlated with IS (r = 0.85; P < .001) and MPI (r = 0.92; P < .001). ET was shorter only in large-size MI (81.07 +/- 7.23; P < .001), and correlated with IS (r = -0.70; P < .001) and MPI (r = -0.85; P < .001). IVRT was shorter only in large-size compared to medium-size MI (24.40 +/- 5.38 vs 29.69 +/- 5.92; P < .037), had borderline correlation with MPI (r = 0.34; P = .0534) and no correlation with IS (r = 0.26; p = 0.144). The MPI increased with IS, correlated inversely with systolic function parameters and had a non-linear relationship with diastolic function. These changes were due to the increase of IVCT and a decrease of ET, without significant influence of IVRT.
Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
GPU acceleration of Eulerian-Lagrangian particle-laden turbulent flow simulations
NASA Astrophysics Data System (ADS)
Richter, David; Sweet, James; Thain, Douglas
2017-11-01
The Lagrangian point-particle approximation is a popular numerical technique for representing dispersed phases whose properties can substantially deviate from the local fluid. In many cases, particularly in the limit of one-way coupled systems, large numbers of particles are desired; this may be either because many physical particles are present (e.g. LES of an entire cloud), or because the use of many particles increases statistical convergence (e.g. high-order statistics). Solving the trajectories of very large numbers of particles can be problematic in traditional MPI implementations, however, and this study reports the benefits of using graphical processing units (GPUs) to integrate the particle equations of motion while preserving the original MPI version of the Eulerian flow solver. It is found that GPU acceleration becomes cost effective around one million particles, and performance enhancements of up to 15x can be achieved when O(108) particles are computed on the GPU rather than the CPU cluster. Optimizations and limitations will be discussed, as will prospects for expanding to two- and four-way coupled systems. ONR Grant No. N00014-16-1-2472.
Taqueti, Viviany R.; Di Carli, Marcelo F.
2018-01-01
Over the last several decades, radionuclide myocardial perfusion imaging (MPI) with single photon emission tomography and positron emission tomography has been a mainstay for the evaluation of patients with known or suspected coronary artery disease (CAD). More recently, technical advances in separate and complementary imaging modalities including coronary computed tomography angiography, computed tomography perfusion, cardiac magnetic resonance imaging, and contrast stress echocardiography have expanded the toolbox of diagnostic testing for cardiac patients. While the growth of available technologies has heralded an exciting era of multimodality cardiovascular imaging, coordinated and dispassionate utilization of these techniques is needed to implement the right test for the right patient at the right time, a promise of “precision medicine.” In this article, we review the maturing role of MPI in the current era of multimodality cardiovascular imaging, particularly in the context of recent advances in myocardial blood flow quantitation, and as applied to the evaluation of patients with known or suspected CAD. PMID:25770849
2011-01-01
Background Rapeseed is an emerging and promising source of dietary protein for human nutrition and health. We previously found that rapeseed protein displayed atypical nutritional properties in humans, characterized by low bioavailability and a high postprandial biological value. The objective of the present study was to investigate the metabolic fate of rapeseed protein isolate (RPI) and its effect on protein fractional synthesis rates (FSR) in various tissues when compared to a milk protein isolate (MPI). Methods Rats (n = 48) were given a RPI or MPI meal, either for the first time or after 2-week adaptation to a MPI or RPI-based diet. They were divided in two groups for measuring the fed-state tissue FSR 2 h after the meal (using a flooding dose of 13C-valine) and the dietary N postprandial distribution at 5 h (using 15N-labeled meals). Results RPI and MPI led to similar FSR and dietary nitrogen (N) losses (ileal and deamination losses of 4% and 12% of the meal, respectively). By contrast, the dietary N incorporation was significantly higher in the intestinal mucosa and liver (+36% and +16%, respectively) and lower in skin (-24%) after RPI than MPI. Conclusions Although RPI and MPI led to the same overall level of postprandial dietary N retention in rats (in line with our findings in humans), this global response conceals marked qualitative differences at the tissue level regarding dietary N accretion. The fact that FSR did not however differed between groups suggest a differential modulation of proteolysis after RPI or MPI ingestion, or other mechanisms that warrant further study. PMID:21787407
Abdelwahab, N A; Morsy, E M H
2018-03-01
TiO 2 /Fe 3 O 4 , TiO 2 /Fe 3 O 4 /chitosan and Methylpyrazolone functionalized TiO 2 /Fe 3 O 4 /chitosan (MPyTMChi) were successfully prepared. The chemical structure of the prepared materials was confirmed by FT-IR spectra, XRD, SEM and TEM. BET surface area increased from 2.4 to 3.1m 2 /g, E g decreased from 2.58 to 2.25eV and more quenching of PL emission spectra was observed upon functionalization of TMChi by MPy. Moreover, high Ti and oxygen percentages were detected by EDX. Magnetization value (M s ) reached 21 emu.g -1 for MPyTMChi. MPyTMChi showed enhanced photocatalytic degradation rate of methylene blue (MB) dye under visibe light irradiation (99.8% after 40min) as compared with that for TiO 2 /Fe 3 O 4 (96.7% after 100min) and TMChi (98.9% after 60min), respectively. It was regarded that the photocatalytic degradation of MB dye on MPyTMChi follows apparent pseudo-first-order according to the Langmuir-Hinshelwood (L-H) model and k app value was 0.089min -1 . Active species trapping experiment revealed that h + and O 2 - played the main role in the photodegradation of MB dye while OH quenching did not greatly affect photodegradation rate. Additionally, MPyTMChi can be efficiently reused for six repetitive cycles. MPyTMChi showed higher antimicrobial activity against gram-positive, gram- negative bacterial and fungal strains while large inhibition zone was observed for gram-positive bacteria. Copyright © 2017 Elsevier B.V. All rights reserved.
Goyal, Parag; Kim, Jiwon; Feher, Attila; Ma, Claudia L.; Gurevich, Sergey; Veal, David R.; Szulc, Massimiliano; Wong, Franklin J.; Ratcliffe, Mark B.; Levine, Robert A.; Devereux, Richard B.; Weinsaft, Jonathan W.
2015-01-01
Objective Ischemic mitral regurgitation (MR) is common, but its response to percutaneous coronary intervention (PCI) is poorly understood. This study tested utility of myocardial perfusion imaging (MPI) for stratification of MR response to PCI. Methods MPI and echo were performed among patients undergoing PCI. MPI was used to assess stress/rest myocardial perfusion. MR was assessed via echo (performed pre- and post-PCI). Results 317 patients with abnormal myocardial perfusion on MPI underwent echo 25±39 days prior to PCI. MR was present in 52%, among whom 24% had advanced (≥moderate) MR. MR was associated with LV chamber dilation on MPI and echo (both p<0.001). Magnitude of global LV perfusion deficits increased in relation to MR severity (p<0.01). Perfusion differences were greatest for global summed rest scores, which were 1.6-fold higher among patients with advanced MR vs. those with mild MR (p=0.004), and 2.4-fold higher vs. those without MR (p<0.001). In multivariate analysis, advanced MR was associated with fixed perfusion defect size on MPI (OR 1.16 per segment [CI 1.002–1.34], p=0.046) independent of LV volume (OR 1.10 per 10ml [CI 1.04–1.17], p=0.002). Follow-up via echo (1.0±0.6 years) demonstrated MR to decrease (≥1 grade) in 31% of patients, and increase in 12%. Patients with increased MR after PCI had more severe inferior perfusion defects on baseline MPI (p=0.028), whereas defects in other distributions and LV volumes were similar (p=NS). Conclusions Extent and distribution of SPECT-evidenced myocardial perfusion defects impacts MR response to revascularization. Increased magnitude of inferior fixed perfusion defects predicts post-PCI progression of MR. PMID:26049923
Cationic ionene as an n-dopant agent of poly(3,4-ethylenedioxythiophene).
Saborío, Maricruz G; Bertran, Oscar; Lanzalaco, Sonia; Häring, Marleen; Díaz Díaz, David; Estrany, Francesc; Alemán, Carlos
2018-04-18
We report the reduction of poly(3,4-ethylenedioxythiophene) (PEDOT) films with a cationic 1,4-diazabicyclo[2.2.2]octane-based ionene bearing N,N'-(meta-phenylene)dibenzamide linkages (mPI). Our main goal is to obtain n-doped PEDOT using a polymeric dopant agent rather than small conventional tetramethylammonium (TMA), as is usual. This has been achieved using a three-step process, which has been individually optimized: (1) preparation of p-doped (oxidized) PEDOT at a constant potential of +1.40 V in acetonitrile with LiClO4 as the electrolyte; (2) dedoping of oxidized PEDOT using a fixed potential of -1.30 V in water; and (3) redoping of dedoped PEDOT applying a reduction potential of -1.10 V in water with mPI. The resulting films display the globular appearance typically observed for PEDOT, with mPI being structured in separated phases forming nanospheres or ultrathin sheets. This organization, which has been supported by atomistic molecular dynamics simulations, resembles the nanosegregated phase distribution observed for PEDOT p-doped with poly(styrenesulfonate). Furthermore, the doping level achieved using mPI as the doping agent is comparable to that achieved using TMA, even though ionene provides distinctive properties to the conducting polymer. For example, films redoped with mPI exhibit much more hydrophilicity than the oxidized ones, whereas films redoped with TMA are hydrophobic. Similarly, films redoped with mPI exhibit the highest thermal stability, while those redoped with TMA show thermal stability that is intermediate between those of the latter and the dedoped PEDOT. Overall, the incorporation of an mPI polycation as the n-dopant into PEDOT has important advantages for modulating the properties of this emblematic conducting polymer.
Smit, Jeff M; Koning, Gerhard; van Rosendael, Alexander R; Dibbets-Schneider, Petra; Mertens, Bart J; Jukema, J Wouter; Delgado, Victoria; Reiber, Johan H C; Bax, Jeroen J; Scholte, Arthur J
2017-10-01
A new method has been developed to calculate fractional flow reserve (FFR) from invasive coronary angiography, the so-called "contrast-flow quantitative flow ratio (cQFR)". Recently, cQFR was compared to invasive FFR in intermediate coronary lesions showing an overall diagnostic accuracy of 85%. The purpose of this study was to investigate the relationship between cQFR and myocardial ischemia assessed by single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI). Patients who underwent SPECT MPI and coronary angiography within 3 months were included. The cQFR computation was performed offline, using dedicated software. The cQFR computation was based on 3-dimensional quantitative coronary angiography (QCA) and computational fluid dynamics. The standard 17-segment model was used to determine the vascular territories. Myocardial ischemia was defined as a summed difference score ≥2 in a vascular territory. A cQFR of ≤0.80 was considered abnormal. Two hundred and twenty-four coronary arteries were analysed in 85 patients. Overall accuracy of cQFR to detect ischemia on SPECT MPI was 90%. In multivariable analysis, cQFR was independently associated with ischemia on SPECT MPI (OR per 0.01 decrease of cQFR: 1.10; 95% CI 1.04-1.18, p = 0.002), whereas clinical and QCA parameters were not. Furthermore, cQFR showed incremental value for the detection of ischemia compared to clinical and QCA parameters (global chi square 48.7 to 62.6; p <0.001). A good relationship between cQFR and SPECT MPI was found. cQFR was independently associated with ischemia on SPECT MPI and showed incremental value to detect ischemia compared to clinical and QCA parameters.
Safety and efficacy of Regadenoson in myocardial perfusion imaging (MPI) stress tests: A review
NASA Astrophysics Data System (ADS)
Ahmed, Ambereen
2018-02-01
Myocardial perfusion imaging (MPI) tests are often used to help diagnose coronary heart disease (CAD). The tests usually involve applying stress, such as hard physical exercise together with administration of vasodilators, to the patients. To date, many of these tests use non-selective A2A adenosine receptor agonists which, however, can be associated with highly undesirable and life-threatening side effects such as chest pain, dyspnea, severe bronchoconstriction and atrioventricular conduction anomalies. Regadenoson is a relatively new, highly selective A2A adenosine receptor agonist, suitable for use in MPI tests which exhibits far fewer adverse side effects and, unlike others testing agents, can be used without the necessity of excessive concomitant exercise. Also, the dose of regadenoson required is not dependent upon patient weight or renal impairment, and it can be rapidly administered by i.v. Injection. Regadenoson use in MPI testing thus has the potential as a simplified, relatively safe, time-saving and cost-effective method for helping diagnose CAD. The present study was designed to review several articles on the safety, efficacy, and suitability of regadenoson in MPI testing for CAD. Overall, the combined studies demonstrated that use of regadenoson in conjunction with low-level exercise in MPI is a highly efficient and relatively safe test for CAD, especially for more severe health-compromised patients.
Geometry planning and image registration in magnetic particle imaging using bimodal fiducial markers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Werner, F., E-mail: f.werner@uke.de; Hofmann, M.; Them, K.
Purpose: Magnetic particle imaging (MPI) is a quantitative imaging modality that allows the distribution of superparamagnetic nanoparticles to be visualized. Compared to other imaging techniques like x-ray radiography, computed tomography (CT), and magnetic resonance imaging (MRI), MPI only provides a signal from the administered tracer, but no additional morphological information, which complicates geometry planning and the interpretation of MP images. The purpose of the authors’ study was to develop bimodal fiducial markers that can be visualized by MPI and MRI in order to create MP–MR fusion images. Methods: A certain arrangement of three bimodal fiducial markers was developed and usedmore » in a combined MRI/MPI phantom and also during in vivo experiments in order to investigate its suitability for geometry planning and image fusion. An algorithm for automated marker extraction in both MR and MP images and rigid registration was established. Results: The developed bimodal fiducial markers can be visualized by MRI and MPI and allow for geometry planning as well as automated registration and fusion of MR–MP images. Conclusions: To date, exact positioning of the object to be imaged within the field of view (FOV) and the assignment of reconstructed MPI signals to corresponding morphological regions has been difficult. The developed bimodal fiducial markers and the automated image registration algorithm help to overcome these difficulties.« less
Assessment of Fetal Myocardial Performance Index in Women with Placenta Previa
Zhang, Na; Sun, Lijuan; Zhang, Lina; Li, Zhen; Han, Jijing; Wu, Qingqing
2017-01-01
Background This study investigated whether fetuses of placenta previa pregnancies have cardiac dysfunction by use of a modified myocardial performance index (Mod-MPI). Material/Methods A prospective cross-sectional study was conducted including 178 fetuses at 28–40 weeks of gestation. Eighty-nine fetuses of mothers with placenta previa and without pregnancy complications were recruited (placenta previa group) and matched with 89 fetuses of mothers with normal pregnancies (control group). Fetal cardiac function parameters and perinatal outcomes as well as the Mod-MPI were compared between the 2 groups. Results The median Mod-MPI was significantly increased in fetuses of mothers with placenta previa compared with controls (0.47±0.05 vs. 0.45±0.05; P<0.01). Among fetuses of mothers with or without placenta previa, the Mod-MPI was significantly higher in the incomplete placenta previa group compared with the complete placenta previa group and control group (P<0.01). An increased Mod-MPI in placenta previa pregnancies was independently associated with fetal cord pH <7.2 (odds ratio, 4.8; 95% confidence interval, 0.98–23.54; P=0.003). Conclusions There is impairment of fetal cardiac function in pregnancies with placenta previa. An increased MPI was independently associated with adverse perinatal outcomes to some extent in the placenta previa pregnancies. PMID:29242496
Ito, Mikio; Noguchi, Hidenori; Ikeda, Katsuyoshi; Uosaki, Kohei
2010-04-07
Effects of metal substrate on the bonding nature of isocyanide group of two aryl isocyanides, 1,4-phenylene diisocyanide (PDI) and 4-methylphenyl isocyanide (MPI), and tilt angle of MPI were examined by measuring sum frequency generation (SFG) spectra of the self-assembled monolayers (SAMs) of these molecules on Au, Pt, Ag, and Pd surfaces. The SFG peaks due to "metal bonded" and "free"-NC groups were resolved by comparing the SFG spectra of PDI with IR spectra obtained by DFT calculations and previous results of vibrational spectroscopy. Based on the peak positions of the "metal bonded"-NC, it is clarified that while PDI and MPI were adsorbed at top sites on Au, Ag, and Pt surfaces, they adsorbed at bridge sites on the Pd surface. The tilt angles of MPI were determined from the intensity ratio between the SFG peaks of C-H symmetric and asymmetric stretching vibrational modes of the CH(3) group. The tilt angles of the MPI SAMs were in the order of Pt < Pd < Ag < Au, reflecting the bonding nature between the -NC group and the substrate atoms.
First in vivo magnetic particle imaging of lung perfusion in rats
NASA Astrophysics Data System (ADS)
Zhou, Xinyi Y.; Jeffris, Kenneth E.; Yu, Elaine Y.; Zheng, Bo; Goodwill, Patrick W.; Nahid, Payam; Conolly, Steven M.
2017-05-01
Pulmonary embolism (PE), along with the closely related condition of deep vein thrombosis, affect an estimated 600 000 patients in the US per year. Untreated, PE carries a mortality rate of 30%. Because many patients experience mild or non-specific symptoms, imaging studies are necessary for definitive diagnosis of PE. Iodinated CT pulmonary angiography is recommended for most patients, while nuclear medicine-based ventilation/perfusion (V/Q) scans are reserved for patients in whom the use of iodine is contraindicated. Magnetic particle imaging (MPI) is an emerging tracer imaging modality with high image contrast (no tissue background signal) and sensitivity to superparamagnetic iron oxide (SPIO) tracer. Importantly, unlike CT or nuclear medicine, MPI uses no ionizing radiation. Further, MPI is not derived from magnetic resonance imaging (MRI); MPI directly images SPIO tracers via their strong electronic magnetization, enabling deep imaging of anatomy including within the lungs, which is very challenging with MRI. Here, the first high-contrast in vivo MPI lung perfusion images of rats are shown using a novel lung perfusion agent, MAA-SPIOs.
Kim, Jeonghyo; Lee, Kil-Soo; Kim, Eun Bee; Paik, Seungwha; Chang, Chulhun L; Park, Tae Jung; Kim, Hwa-Jung; Lee, Jaebeom
2017-10-15
Tuberculosis (TB) is an often neglected, epidemic disease that remains to be controlled by contemporary techniques of medicine and biotechnology. In this study, a nanoscale sensing system, referred to as magnetophoretic immunoassay (MPI) was designed to capture culture filtrate protein (CFP)-10 antigens effectively using two different types of nanoparticles (NPs). Two specific monoclonal antibodies against CFP-10 antigen were used, including gold NPs for signaling and magnetic particles for separation. These results were carefully compared with those obtained using the commercial mycobacteria growth indicator tube (MGIT) test via 2 sequential clinical tests (with ca. 260 clinical samples). The sensing linearity of MPI was shown in the range of pico- to micromoles and the detection limit was 0.3pM. MPI using clinical samples shows robust and reliable sensing while monitoring Mycobacterium tuberculosis (MTB) growth with monitoring time 3-10 days) comparable to that with the MGIT test. Furthermore, MPI distinguished false-positive samples from MGIT-positive samples, probably containing non-tuberculous mycobacteria. Thus, MPI shows promise in early TB diagnosis. Copyright © 2017 Elsevier B.V. All rights reserved.
Shaikh, Ayaz Hussain; Hanif, Bashir; Siddiqui, Adeel M; Shahab, Hunaina; Qazi, Hammad Ali; Mujtaba, Iqbal
2010-04-01
To determine the association of prolonged ST segment depression after an exercise test with severity of coronary artery disease. A cross sectional study of 100 consecutive patients referred to the cardiology laboratory for stress myocardial perfusion imaging (MPI) conducted between April-August 2008. All selected patients were monitored until their ST segment depression was recovered to baseline. ST segment recovery time was categorized into less and more than 5 minutes. Subsequent gated SPECT-MPI was performed and stratified according to severity of perfusion defect. Association was determined between post exercise ST segment depression recovery time (<5 minutes and >5 minutes) and severity of perfusion defect on MPI. The mean age of the patients was 57.12 +/- 9.0 years. The results showed statistically insignificant association (p > 0.05) between ST segment recovery time of <5 minutes and >5 minutes with low, intermediate or high risk MPI. Our findings suggest that the commonly used cut-off levels used in literature for prolonged, post exercise ST segment depression (>5 minutes into recovery phase) does not correlate with severity of ischaemia based on MPI results.
NASA Technical Reports Server (NTRS)
Katz, Daniel
2004-01-01
PVM Wrapper is a software library that makes it possible for code that utilizes the Parallel Virtual Machine (PVM) software library to run using the message-passing interface (MPI) software library, without needing to rewrite the entire code. PVM and MPI are the two most common software libraries used for applications that involve passing of messages among parallel computers. Since about 1996, MPI has been the de facto standard. Codes written when PVM was popular often feature patterns of {"initsend," "pack," "send"} and {"receive," "unpack"} calls. In many cases, these calls are not contiguous and one set of calls may even exist over multiple subroutines. These characteristics make it difficult to obtain equivalent functionality via a single MPI "send" call. Because PVM Wrapper is written to run with MPI- 1.2, some PVM functions are not permitted and must be replaced - a task that requires some programming expertise. The "pvm_spawn" and "pvm_parent" function calls are not replaced, but a programmer can use "mpirun" and knowledge of the ranks of parent and child tasks with supplied macroinstructions to enable execution of codes that use "pvm_spawn" and "pvm_parent."
Creating a Parallel Version of VisIt for Microsoft Windows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitlock, B J; Biagas, K S; Rawson, P L
2011-12-07
VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer.
Hines, Michael; Kumar, Sameer; Schürmann, Felix
2011-01-01
For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·10(10) connections (K is 1024, M is 1024(2), and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods-the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores-had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.
Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H
2014-05-28
Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the field of electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
A study of patients with spinal disease using Maudsley Personality Inventory.
Kasai, Yuichi; Takegami, Kenji; Uchida, Atsumasa
2004-02-01
We administered the Maudsley Personality Inventory (MPI) preoperatively to 303 patients with spinal diseases about to undergo surgery. Patients younger than 20 years, patients previously treated in the Department of Psychiatry, and patients with poor postoperative results were excluded. Patients with N-scores (neuroticism scale) of 39 points or greater or L-scores (lie scale) of 26 points or greater were regarded as "abnormal." Based on clinical definitions we identified 24 "problem patients" during the course and categorized them as "Unsatisfied," "Indecisive," "Doctor shoppers," or "Distrustful." Preoperative MPI categorized 26 patients as abnormal; 22 patients categorized as abnormal became problem patients ( p<0.001). MPI sensitivity and specificity was 84.6% and 99.3%, respectively. Preoperative MPI to patients with spinal disease was found to be useful in detecting problem patients.
Efficiently passing messages in distributed spiking neural network simulation.
Thibeault, Corey M; Minkovich, Kirill; O'Brien, Michael J; Harris, Frederick C; Srinivasa, Narayan
2013-01-01
Efficiently passing spiking messages in a neural model is an important aspect of high-performance simulation. As the scale of networks has increased so has the size of the computing systems required to simulate them. In addition, the information exchange of these resources has become more of an impediment to performance. In this paper we explore spike message passing using different mechanisms provided by the Message Passing Interface (MPI). A specific implementation, MVAPICH, designed for high-performance clusters with Infiniband hardware is employed. The focus is on providing information about these mechanisms for users of commodity high-performance spiking simulators. In addition, a novel hybrid method for spike exchange was implemented and benchmarked.
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Azad, Ariful; Ballard, Grey; Buluc, Aydin; ...
2016-11-08
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achievingmore » significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.« less
32 CFR 637.8 - Identification of MPI.
Code of Federal Regulations, 2010 CFR
2010-07-01
... CRIMINAL INVESTIGATIONS MILITARY POLICE INVESTIGATION Investigations § 637.8 Identification of MPI. (a... referring to themselves as “INVESTIGATOR.” When signing military police records the title “Military Police...
Specification of Fenix MPI Fault Tolerance library version 1.0.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamble, Marc; Van Der Wijngaart, Rob; Teranishi, Keita
This document provides a specification of Fenix, a software library compatible with the Message Passing Interface (MPI) to support fault recovery without application shutdown. The library consists of two modules. The first, termed process recovery , restores an application to a consistent state after it has suffered a loss of one or more MPI processes (ranks). The second specifies functions the user can invoke to store application data in Fenix managed redundant storage, and to retrieve it from that storage after process recovery.
Simultaneous monitoring technique for ASE and MPI noises in distributed Raman Amplified Systems.
Choi, H Y; Jun, S B; Shin, S K; Chung, Y C
2007-07-09
We develop a new technique for simultaneously monitoring the amplified spontaneous emission (ASE) and multi-path interference (MPI) noises in distributed Raman amplified (DRA) systems. This technique utilizes the facts that the degree-of polarization (DOP) of the MPI noise is 1/9, while the ASE noise is unpolarized. The results show that the proposed technique can accurately monitor both of these noises regardless of the bit rates, modulation formats, and optical signal-to-noise ratio (OSNR) levels of the signals.
Kaul, Michael Gerhard; Mummert, Tobias; Jung, Caroline; Salamon, Johannes; Khandhar, Amit P; Ferguson, R Matthew; Kemp, Scott J; Ittrich, Harald; Krishnan, Kannan M; Adam, Gerhard; Knopp, Tobias
2017-05-07
Optimizing tracers for individual imaging techniques is an active field of research. The purpose of this study was to perform in vitro and in vivo magnetic particle imaging (MPI) measurements using a new monodisperse and size-optimized tracer, LS-008, and to compare it with the performance of Resovist, the standard MPI tracer. Magnetic particle spectroscopy (MPS) and in vitro MPI measurements were performed in concerns of concentration and amount of tracer in a phantom. In vivo studies were carried out in healthy FVB mice. The first group (n = 3) received 60 µl LS-008 (87 mM) and the second (n = 3) diluted Resovist of the same concentration and volume. Tracer injections were performed with a syringe pump during a dynamic MPI scan. For anatomic referencing MRI was applied beforehand of the MPI measurements. Summing up MPS examinations and in vitro MPI experiments, LS-008 showed better sensitivity and spatial resolution than Resovist. In vivo both tracers can visualize the propagation of the bolus through the inferior vena cava. MPI with LS-008 did show less temporal fluctuation artifacts and the pulsation of blood due to respiratory and cardiac cycle was detectable. With LS-008 the aorta was distinguishable from the caval vein while with Resovist this failed. A liver vessel and a vessel structure leading cranially could only be observed with LS-008 and not with Resovist. Beside these structural advantages both tracers showed very different blood half-life. For LS-008 we found 88 min. Resovist did show a fast liver accumulation and a half-life of 13 min. Only with LS-008 the perfusion fraction in liver and kidney was measureable. MPI for angiography can be significantly improved by applying more effective tracers. LS-008 shows a clear improvement concerning the delineation while resolving a larger number of vessels in comparison to Resovist. Therefore, in aspects of quality and quantity LS-008 is clearly favorable for angiographic and perfusion studies.
Verra, Martin L; Angst, Felix; Staal, J Bart; Brioschi, Roberto; Lehmann, Susanne; Aeschlimann, André; de Bie, Rob A
2011-06-30
Patients with non-specific back pain are not a homogeneous group but heterogeneous with regard to their bio-psycho-social impairments. This study examined a sample of 173 highly disabled patients with chronic back pain to find out how the three subgroups based on the Multidimensional Pain Inventory (MPI) differed in their response to an inpatient pain management program. Subgroup classification was conducted by cluster analysis using MPI subscale scores at entry into the program. At program entry and at discharge after four weeks, participants completed the MPI, the MOS Short Form-36 (SF-36), the Hospital Anxiety and Depression Scale (HADS), and the Coping Strategies Questionnaire (CSQ). Pairwise analyses of the score changes of the mentioned outcomes of the three MPI subgroups were performed using the Mann-Whitney-U-test for significance. Cluster analysis identified three MPI subgroups in this highly disabled sample: a dysfunctional, interpersonally distressed and an adaptive copers subgroup. The dysfunctional subgroup (29% of the sample) showed the highest level of depression in SF-36 mental health (33.4 ± 13.9), the interpersonally distressed subgroup (35% of the sample) a modest level of depression (46.8 ± 20.4), and the adaptive copers subgroup (32% of the sample) the lowest level of depression (57.8 ± 19.1). Significant differences in pain reduction and improvement of mental health and coping were observed across the three MPI subgroups, i.e. the effect sizes for MPI pain reduction were: 0.84 (0.44-1.24) for the dysfunctional subgroup, 1.22 (0.86-1.58) for the adaptive copers subgroup, and 0.53 (0.24-0.81) for the interpersonally distressed subgroup (p = 0.006 for pairwise comparison). Significant score changes between subgroups concerning activities and physical functioning could not be identified. MPI subgroup classification showed significant differences in score changes for pain, mental health and coping. These findings underscore the importance of assessing individual differences to understand how patients adjust to chronic back pain.
Ge, Zhongming; Feng, Yan; Muthupalani, Sureshkumar; Eurell, Laura Lemke; Taylor, Nancy S.; Whary, Mark T.; Fox, James G.
2011-01-01
To investigate how different enterohepatic Helicobacter species (EHS) influence Helicobacter pylori gastric pathology, C57BL/6 mice were infected with Helicobacter hepaticus or Helicobacter muridarum, followed by H. pylori infection 2 weeks later. Compared to H. pylori-infected mice, mice infected with H. muridarum and H. pylori (HmHp mice) developed significantly lower histopathologic activity index (HAI) scores (P < 0.0001) at 6 and 11 months postinoculation (MPI). However, mice infected with H. hepaticus and H. pylori (HhHp mice) developed more severe gastric pathology at 6 MPI (P = 0.01), with a HAI at 11 MPI (P = 0.8) similar to that of H. pylori-infected mice. H. muridarum-mediated attenuation of gastritis in coinfected mice was associated with significant downregulation of proinflammatory Th1 (interlukin-1beta [Il-1β], gamma interferon [Ifn-γ], and tumor necrosis factor-alpha [Tnf-α]) cytokines at both time points and Th17 (Il-17A) cytokine mRNA levels at 6 MPI in murine stomachs compared to those of H. pylori-infected mice (P < 0.01). Coinfection with H. hepaticus also suppressed H. pylori-induced elevation of gastric Th1 cytokines Ifn-γ and Tnf-α (P < 0.0001) but increased Th17 cytokine mRNA levels (P = 0.028) at 6 MPI. Furthermore, mRNA levels of Il-17A were positively correlated with the severity of helicobacter-induced gastric pathology (HhHp>H. pylori>HmHp) (at 6 MPI, r2 = 0.92, P < 0.0001; at 11 MPI, r2 = 0.82, P < 0.002). Despite disparate effects on gastritis, colonization levels of gastric H. pylori were increased in HhHp mice (at 6 MPI) and HmHp mice (at both time points) compared to those in mono-H. pylori-infected mice. These data suggest that despite consistent downregulation of Th1 responses, EHS coinfection either attenuated or promoted the severity of H. pylori-induced gastric pathology in C57BL/6 mice. This modulation was related to the variable effects of EHS on gastric interleukin 17 (IL-17) responses to H. pylori infection. PMID:21788386
NASA Astrophysics Data System (ADS)
Kaul, Michael Gerhard; Mummert, Tobias; Jung, Caroline; Salamon, Johannes; Khandhar, Amit P.; Ferguson, R. Matthew; Kemp, Scott J.; Ittrich, Harald; Krishnan, Kannan M.; Adam, Gerhard; Knopp, Tobias
2017-05-01
Optimizing tracers for individual imaging techniques is an active field of research. The purpose of this study was to perform in vitro and in vivo magnetic particle imaging (MPI) measurements using a new monodisperse and size-optimized tracer, LS-008, and to compare it with the performance of Resovist, the standard MPI tracer. Magnetic particle spectroscopy (MPS) and in vitro MPI measurements were performed in concerns of concentration and amount of tracer in a phantom. In vivo studies were carried out in healthy FVB mice. The first group (n = 3) received 60 µl LS-008 (87 mM) and the second (n = 3) diluted Resovist of the same concentration and volume. Tracer injections were performed with a syringe pump during a dynamic MPI scan. For anatomic referencing MRI was applied beforehand of the MPI measurements. Summing up MPS examinations and in vitro MPI experiments, LS-008 showed better sensitivity and spatial resolution than Resovist. In vivo both tracers can visualize the propagation of the bolus through the inferior vena cava. MPI with LS-008 did show less temporal fluctuation artifacts and the pulsation of blood due to respiratory and cardiac cycle was detectable. With LS-008 the aorta was distinguishable from the caval vein while with Resovist this failed. A liver vessel and a vessel structure leading cranially could only be observed with LS-008 and not with Resovist. Beside these structural advantages both tracers showed very different blood half-life. For LS-008 we found 88 min. Resovist did show a fast liver accumulation and a half-life of 13 min. Only with LS-008 the perfusion fraction in liver and kidney was measureable. MPI for angiography can be significantly improved by applying more effective tracers. LS-008 shows a clear improvement concerning the delineation while resolving a larger number of vessels in comparison to Resovist. Therefore, in aspects of quality and quantity LS-008 is clearly favorable for angiographic and perfusion studies.
NASA Astrophysics Data System (ADS)
Somavarapu, Dhathri H.
This thesis proposes a new parallel computing genetic algorithm framework for designing fuel-optimal trajectories for interplanetary spacecraft missions. The framework can capture the deep search space of the problem with the use of a fixed chromosome structure and hidden-genes concept, can explore the diverse set of candidate solutions with the use of the adaptive and twin-space crowding techniques and, can execute on any high-performance computing (HPC) platform with the adoption of the portable message passing interface (MPI) standard. The algorithm is implemented in C++ with the use of the MPICH implementation of the MPI standard. The algorithm uses a patched-conic approach with two-body dynamics assumptions. New procedures are developed for determining trajectories in the Vinfinity-leveraging legs of the flight from the launch and non-launch planets and, deep-space maneuver legs of the flight from the launch and non-launch planets. The chromosome structure maintains the time of flight as a free parameter within certain boundaries. The fitness or the cost function of the algorithm uses only the mission Delta V, and does not include time of flight. The optimization is conducted with two variations for the minimum mission gravity-assist sequence, the 4-gravity-assist, and the 3-gravity-assist, with a maximum of 5 gravity-assists allowed in both the cases. The optimal trajectories discovered using the framework in both of the cases demonstrate the success of this framework.
Varsos, Constantinos; Patkos, Theodore; Pavloudi, Christina; Gougousis, Alexandros; Ijaz, Umer Zeeshan; Filiopoulou, Irene; Pattakos, Nikolaos; Vanden Berghe, Edward; Fernández-Guerra, Antonio; Faulwetter, Sarah; Chatzinikolaou, Eva; Pafilis, Evangelos; Bekiari, Chryssoula; Doerr, Martin; Arvanitidis, Christos
2016-01-01
Abstract Background Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. New information In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/ PMID:27932907
Varsos, Constantinos; Patkos, Theodore; Oulas, Anastasis; Pavloudi, Christina; Gougousis, Alexandros; Ijaz, Umer Zeeshan; Filiopoulou, Irene; Pattakos, Nikolaos; Vanden Berghe, Edward; Fernández-Guerra, Antonio; Faulwetter, Sarah; Chatzinikolaou, Eva; Pafilis, Evangelos; Bekiari, Chryssoula; Doerr, Martin; Arvanitidis, Christos
2016-01-01
Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data - Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/.
Regional native plant strategies
Wendell G. Hassell
1999-01-01
Because of increasing public interest in native plants, regional groups have been cooperating to develop native species. The Federal Native Plants Initiative was formed in 1994 to coordinate and encourage the development and use of native plants. The program they developed includes public involvement, organizational structure, technical work groups, implementation...
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.
González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil
2016-12-15
MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Myocardial Performance Index for Patients with Overt and Subclinical Hypothyroidism.
Karabulut, Aziz; Doğan, Abdullah; Tuzcu, Alpaslan Kemal
2017-05-25
BACKGROUND Hypothyroid has several effects on the cardiovascular system. Global myocardial performance index (MPI) is used in assessment of both left ventricular (LV) systolic and diastolic function. We compared MPI in hypothyroidism patients vs. normal control subjects. MATERIAL AND METHODS Eighty-two hypothyroid patients were divided into 2 groups: a subclinical hypothyroid (SH) group (n=50), and an overt hypothyroid (OH) group (n=32). The healthy control group (CG) constituted of 37 patients. TSH, FT3, and FT4, anti-TPO, anti-TG, insulin, lipid values, and fasting glucose levels were studied. All patients underwent an echocardiographic examination. Myocardial performance indexes were assessed and standard echocardiographic examinations were investigated. RESULTS MPI averages in OH, SH, and control groups were 0.53±0.06, 0.51±0.05, and 0.44±0.75 mm, respectively. MPI was increased in the OH and SH groups in comparison to CG (p<0.001, p<0.001, respectively). CONCLUSIONS MPI value was significantly higher in hypothyroid patients in comparison to the control group, showing that regression in global left ventricular functions is an important echocardiographic finding. Future studies are required to determine the effects of this finding on long-term cardiovascular outcomes.
Relaxation-based viscosity mapping for magnetic particle imaging
NASA Astrophysics Data System (ADS)
Utkur, M.; Muslu, Y.; Saritas, E. U.
2017-05-01
Magnetic particle imaging (MPI) has been shown to provide remarkable contrast for imaging applications such as angiography, stem cell tracking, and cancer imaging. Recently, there is growing interest in the functional imaging capabilities of MPI, where ‘color MPI’ techniques have explored separating different nanoparticles, which could potentially be used to distinguish nanoparticles in different states or environments. Viscosity mapping is a promising functional imaging application for MPI, as increased viscosity levels in vivo have been associated with numerous diseases such as hypertension, atherosclerosis, and cancer. In this work, we propose a viscosity mapping technique for MPI through the estimation of the relaxation time constant of the nanoparticles. Importantly, the proposed time constant estimation scheme does not require any prior information regarding the nanoparticles. We validate this method with extensive experiments in an in-house magnetic particle spectroscopy (MPS) setup at four different frequencies (between 250 Hz and 10.8 kHz) and at three different field strengths (between 5 mT and 15 mT) for viscosities ranging between 0.89 mPa · s-15.33 mPa · s. Our results demonstrate the viscosity mapping ability of MPI in the biologically relevant viscosity range.
Comparative Implementation of High Performance Computing for Power System Dynamic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng
Dynamic simulation for transient stability assessment is one of the most important, but intensive, computations for power system planning and operation. Present commercial software is mainly designed for sequential computation to run a single simulation, which is very time consuming with a single processer. The application of High Performance Computing (HPC) to dynamic simulations is very promising in accelerating the computing process by parallelizing its kernel algorithms while maintaining the same level of computation accuracy. This paper describes the comparative implementation of four parallel dynamic simulation schemes in two state-of-the-art HPC environments: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP).more » These implementations serve to match the application with dedicated multi-processor computing hardware and maximize the utilization and benefits of HPC during the development process.« less
32 CFR 637.2 - Use of MPI and DAC Detectives/Investigators.
Code of Federal Regulations, 2010 CFR
2010-07-01
.../investigators may be employed in joint MPI/USACIDC drug suppression teams; however, the conduct of such... and DAC detectives/investigators may also be utilized to make controlled buys of suspected controlled...
76 FR 18865 - Airworthiness Directives; Bell Helicopter Textron, Inc. Model 212 Helicopters
Federal Register 2010, 2011, 2012, 2013, 2014
2011-04-06
... also requires performing a magnetic particle inspection (MPI) on fittings with certain serial numbers... expanding the applicability to require performing a magnetic particle inspection (MPI) for a crack on the...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi
Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...
2016-06-01
Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
3D streamers simulation in a pin to plane configuration using massively parallel computing
NASA Astrophysics Data System (ADS)
Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.
2018-03-01
This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
Shelat, Vishal G.; Ahmed, Saleem; Chia, Clement L. K.; Cheah, Yee Lee
2015-01-01
Application of minimal access surgery in acute care surgery is limited due to various reasons. Laparoscopic omental patch repair (LOPR) for perforated peptic ulcer (PPU) surgery is safe and feasible but not widely implemented. We report our early experience of LOPR with emphasis on strict selection criteria. This is a descriptive study of all patients operated on for PPU at academic university-affiliated institutes from December 2010 to February 2012. All the patients who were operated on for LOPR were included as the study population and their records were studied. Perioperative outcomes, Boey score, Mannheim Peritonitis Index (MPI), and physiologic and operative severity scores for enumeration of mortality and morbidity (POSSUM) scores were calculated. All the data were tabulated in a Microsoft Excel spreadsheet and analyzed using Stata Version 8.x. (StataCorp, College Station, TX, USA). Fourteen patients had LOPR out of a total of 45 patients operated for the PPU. Mean age was 46 years (range 22−87 years). Twelve patients (86%) had a Boey score of 0 and all patients had MPI < 21 (mean MPI = 14). The predicted POSSUM morbidity and mortality were 36% and 7%, respectively. Mean ulcer size was 5 mm (range 2−10 mm), mean operating time was 100 minutes (range 70−123 minutes) and mean length of hospital stay was 4 days (range 3−6 days). There was no morbidity or mortality pertaining to LOPR. LOPR should be offered by acute care surgical teams when local expertise is available. This can optimize patient outcomes when strict selection criteria are applied. PMID:25692444
MPI-IO: A Parallel File I/O Interface for MPI Version 0.3
NASA Technical Reports Server (NTRS)
Corbett, Peter; Feitelson, Dror; Hsu, Yarsun; Prost, Jean-Pierre; Snir, Marc; Fineberg, Sam; Nitzberg, Bill; Traversat, Bernard; Wong, Parkson
1995-01-01
Thanks to MPI [9], writing portable message passing parallel programs is almost a reality. One of the remaining problems is file I/0. Although parallel file systems support similar interfaces, the lack of a standard makes developing a truly portable program impossible. Further, the closest thing to a standard, the UNIX file interface, is ill-suited to parallel computing. Working together, IBM Research and NASA Ames have drafted MPI-I0, a proposal to address the portable parallel I/0 problem. In a nutshell, this proposal is based on the idea that I/0 can be modeled as message passing: writing to a file is like sending a message, and reading from a file is like receiving a message. MPI-IO intends to leverage the relatively wide acceptance of the MPI interface in order to create a similar I/0 interface. The above approach can be materialized in different ways. The current proposal represents the result of extensive discussions (and arguments), but is by no means finished. Many changes can be expected as additional participants join the effort to define an interface for portable I/0. This document is organized as follows. The remainder of this section includes a discussion of some issues that have shaped the style of the interface. Section 2 presents an overview of MPI-IO as it is currently defined. It specifies what the interface currently supports and states what would need to be added to the current proposal to make the interface more complete and robust. The next seven sections contain the interface definition itself. Section 3 presents definitions and conventions. Section 4 contains functions for file control, most notably open. Section 5 includes functions for independent I/O, both blocking and nonblocking. Section 6 includes functions for collective I/O, both blocking and nonblocking. Section 7 presents functions to support system-maintained file pointers, and shared file pointers. Section 8 presents constructors that can be used to define useful filetypes (the role of filetypes is explained in Section 2 below). Section 9 presents how the error handling mechanism of MPI is supported by the MPI-IO interface. All this is followed by a set of appendices, which contain information about issues that have not been totally resolved yet, and about design considerations. The reader can find there the motivation behind some of our design choices. More information on this would definitely be welcome and will be included in a further release of this document. The first appendix contains a description of MPI-I0's 'hints' structure which is used when opening a file. Appendix B is a discussion of various issues in the support for file pointers. Appendix C explains what we mean in talking about atomic access. Appendix D provides detailed examples of filetype constructors, and Appendix E contains a collection of arguments for and against various design decisions.
Doyle, Mark; Pohost, Gerald M; Bairey Merz, C Noel; Shaw, Leslee J; Sopko, George; Rogers, William J; Sharaf, Barry L; Pepine, Carl J; Thompson, Diane V; Rayarao, Geetha; Tauxe, Lindsey; Kelsey, Sheryl F; Biederman, Robert W W
2016-10-01
We introduce an algorithmic approach to optimize diagnostic and prognostic value of gated cardiac single photon emission computed tomography (SPECT) and magnetic resonance (MR) myocardial perfusion imaging (MPI) modalities in women with suspected myocardial ischemia. The novel approach: bio-informatics assessment schema (BIAS) forms a mathematical model utilizing MPI data and cardiac metrics generated by one modality to predict the MPI status of another modality. The model identifies cardiac features that either enhance or mask the image-based evidence of ischemia. For each patient, the BIAS model value is used to set an appropriate threshold for the detection of ischemia. Women (n=130), with symptoms and signs of suspected myocardial ischemia, underwent MPI assessment for regional perfusion defects using two different modalities: gated SPECT and MR. To determine perfusion status, MR data were evaluated qualitatively (MRI QL ) and semi-quantitatively (MRI SQ ) while SPECT data were evaluated using conventional clinical criteria. Evaluators were masked to results of the alternate modality. These MPI status readings were designated "original". Two regression models designated "BIAS" models were generated to model MPI status obtained with one modality (e.g., MRI) compared with a second modality (e.g., SPECT), but importantly, the BIAS models did not include the primary Original MPI reading of the predicting modality. Instead, the BIAS models included auxiliary measurements like left ventricular chamber volumes and myocardial wall thickness. For each modality, the BIAS model was used to set a progressive threshold for interpretation of MPI status. Women were then followed for 38±14 months for the development of a first major adverse cardiovascular event [MACE: CV death, nonfatal myocardial infarction (MI) or hospitalization for heart failure]. Original and BIAS-augmented perfusion status were compared in their ability to detect coronary artery disease (CAD) and for prediction of MACE. Adverse events occurred in 14 (11%) women and CAD was present in 13 (10%). There was a positive correlation of maximum coronary artery stenosis and BIAS score for MRI and SPECT (P<0.001). Receiver operator characteristic (ROC) analysis was conducted and showed an increase in the area under the curve of the BIAS-augmented MPI interpretation of MACE vs . the original for MRI SQ (0.78 vs . 0.54), MRI QL (0.78 vs . 0.64), SPECT (0.82 vs . 0.63) and the average of the three readings (0.80±0.02 vs . 0.60±0.05, P<0.05). Increasing values of the BIAS score generated by both MRI and SPECT corresponded to the increasing prevalence of CAD and MACE. The BIAS-augmented detection of ischemia better predicted MACE compared with the Original reading for the MPI data for both MRI and SPECT.
Pohost, Gerald M.; Bairey Merz, C. Noel; Shaw, Leslee J.; Sopko, George; Rogers, William J.; Sharaf, Barry L.; Pepine, Carl J.; Thompson, Diane V.; Rayarao, Geetha; Tauxe, Lindsey; Kelsey, Sheryl F.; Biederman, Robert W. W.
2016-01-01
Background We introduce an algorithmic approach to optimize diagnostic and prognostic value of gated cardiac single photon emission computed tomography (SPECT) and magnetic resonance (MR) myocardial perfusion imaging (MPI) modalities in women with suspected myocardial ischemia. The novel approach: bio-informatics assessment schema (BIAS) forms a mathematical model utilizing MPI data and cardiac metrics generated by one modality to predict the MPI status of another modality. The model identifies cardiac features that either enhance or mask the image-based evidence of ischemia. For each patient, the BIAS model value is used to set an appropriate threshold for the detection of ischemia. Methods Women (n=130), with symptoms and signs of suspected myocardial ischemia, underwent MPI assessment for regional perfusion defects using two different modalities: gated SPECT and MR. To determine perfusion status, MR data were evaluated qualitatively (MRIQL) and semi-quantitatively (MRISQ) while SPECT data were evaluated using conventional clinical criteria. Evaluators were masked to results of the alternate modality. These MPI status readings were designated “original”. Two regression models designated “BIAS” models were generated to model MPI status obtained with one modality (e.g., MRI) compared with a second modality (e.g., SPECT), but importantly, the BIAS models did not include the primary Original MPI reading of the predicting modality. Instead, the BIAS models included auxiliary measurements like left ventricular chamber volumes and myocardial wall thickness. For each modality, the BIAS model was used to set a progressive threshold for interpretation of MPI status. Women were then followed for 38±14 months for the development of a first major adverse cardiovascular event [MACE: CV death, nonfatal myocardial infarction (MI) or hospitalization for heart failure]. Original and BIAS-augmented perfusion status were compared in their ability to detect coronary artery disease (CAD) and for prediction of MACE. Results Adverse events occurred in 14 (11%) women and CAD was present in 13 (10%). There was a positive correlation of maximum coronary artery stenosis and BIAS score for MRI and SPECT (P<0.001). Receiver operator characteristic (ROC) analysis was conducted and showed an increase in the area under the curve of the BIAS-augmented MPI interpretation of MACE vs. the original for MRISQ (0.78 vs. 0.54), MRIQL (0.78 vs. 0.64), SPECT (0.82 vs. 0.63) and the average of the three readings (0.80±0.02 vs. 0.60±0.05, P<0.05). Conclusions Increasing values of the BIAS score generated by both MRI and SPECT corresponded to the increasing prevalence of CAD and MACE. The BIAS-augmented detection of ischemia better predicted MACE compared with the Original reading for the MPI data for both MRI and SPECT. PMID:27747165
Nudi, Francesco; Schillaci, Orazio; Di Belardino, Natale; Versaci, Francesco; Tomai, Fabrizio; Pinto, Annamaria; Neri, Giandomenico; Procaccini, Enrica; Nudi, Alessandro; Frati, Giacomo; Biondi-Zoccai, Giuseppe
2017-10-15
The definition, presentation, and management of myocardial infarction (MI) have changed substantially in the last decade. Whether these changes have impacted on the presence, severity, and localization of necrosis at myocardial perfusion imaging (MPI) has not been appraised to date. Subjects undergoing MPI and reporting a history of clinical MI were shortlisted. We focused on the presence, severity, and localization of necrosis at MPI with a retrospective single-center analysis. A total of 10,476 patients were included, distinguishing 5 groups according to the period in which myocardial perfusion scintigraphy had been performed (2004 to 2005, 2006 to 2007, 2008 to 2009, 2010 to 2011, 2012 to 2013). Trend analysis showed over time a significant worsening in baseline features (e.g., age, diabetes mellitus, and Q waves at electrocardiogram), whereas medical therapy and revascularization were offered with increasing frequency. Over the years, there was also a lower prevalence of normal MPI (from 16.8% to 13.6%) and ischemic MPI (from 35.6% to 32.8%), and a higher prevalence of ischemic and necrotic MPI (from 12.0% to 12.7%) or solely necrotic MPI (from 35.7% to 40.9%, p <0.001). Yet the prevalence of severe ischemia decreased over time from 11.4% to 2.0%, with a similar trend for moderate ischemia (from 15.9% to 11.8%, p <0.001). Similarly sobering results were wound for the prevalence of severe necrosis (from 19.8% to 8.2%) and moderate necrosis (from 8.5% to 7.8%, p = 0.028). These trends were largely confirmed at regional level and after propensity score matching. In conclusion, the outlook of stable patients with previous MI has substantially improved in the last decade, with a decrease in the severity of residual myocardial ischemia and necrosis, despite an apparent worsening in baseline features. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Dieckhoff, J.; Kaul, M. G.; Mummert, T.; Jung, C.; Salamon, J.; Adam, G.; Knopp, T.; Ludwig, F.; Balceris, C.; Ittrich, H.
2017-05-01
Magnetic particle imaging (MPI) facilitates the rapid determination of 3D in vivo magnetic nanoparticle distributions. In this work, liver MPI following intravenous injections of ferucarbotran (Resovist®) was studied. The image reconstruction was based on a calibration measurement, the so called system function. The application of an enhanced system function sample reflecting the particle mobility and aggregation status of ferucarbotran resulted in significantly improved image reconstructions. The finding was supported by characterizations of different ferucarbotran compositions with the magnetorelaxometry and magnetic particle spectroscopy technique. For instance, similar results were obtained between ferucarbotran embedded in freeze-dried mannitol sugar and liver tissue harvested after a ferucarbotran injection. In addition, the combination of multiple shifted measurement patches for a joint reconstruction of the MPI data enlarged the field of view and increased the covering of liver MPI on magnetic resonance images noticeably.
Dieckhoff, J; Kaul, M G; Mummert, T; Jung, C; Salamon, J; Adam, G; Knopp, T; Ludwig, F; Balceris, C; Ittrich, H
2017-05-07
Magnetic particle imaging (MPI) facilitates the rapid determination of 3D in vivo magnetic nanoparticle distributions. In this work, liver MPI following intravenous injections of ferucarbotran (Resovist ® ) was studied. The image reconstruction was based on a calibration measurement, the so called system function. The application of an enhanced system function sample reflecting the particle mobility and aggregation status of ferucarbotran resulted in significantly improved image reconstructions. The finding was supported by characterizations of different ferucarbotran compositions with the magnetorelaxometry and magnetic particle spectroscopy technique. For instance, similar results were obtained between ferucarbotran embedded in freeze-dried mannitol sugar and liver tissue harvested after a ferucarbotran injection. In addition, the combination of multiple shifted measurement patches for a joint reconstruction of the MPI data enlarged the field of view and increased the covering of liver MPI on magnetic resonance images noticeably.
Toxoplasma gondii strain-dependent effects on mouse behaviour.
Kannan, Geetha; Moldovan, Krisztina; Xiao, Jian-Chun; Yolken, Robert H; Jones-Brando, Lorraine; Pletnikov, Mikhail V
2010-06-01
Toxoplasma gondii reportedly manipulates rodent behaviour to increase transmission to its definitive feline host. We compared the effects of mouse infection by two Type II strains of T. gondii, Prugniaud (PRU) and ME49, on attraction to cat odour, locomotor activity, anxiety, sensorimotor gating, and spatial working and recognition memory 2 months post-infection (mpi). Attraction to cat odour was reassessed 7 mpi. At 2 mpi, mice infected with either strain exhibited significantly more attraction to cat odour than uninfected animals did, but only PRU-infected mice exhibited this behaviour 7 mpi. PRU-infected mice had significantly greater body weights and hyperactivity, while ME49-infected mice exhibited impaired spatial working memory. No differences in parasite antibody titres were seen between PRU- and ME49-infected mice. The present data suggest the effect of T. gondii infection on mouse behaviour is parasite strain-dependent.
Architecture and method for a burst buffer using flash technology
Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing-bung
2016-03-15
A parallel supercomputing cluster includes compute nodes interconnected in a mesh of data links for executing an MPI job, and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage. Each solid-state storage node presents a file system interface to the MPI job, and multiple MPI processes of the MPI job write the checkpoint data to a shared file in the solid-state storage in a strided fashion, and the solid-state storage node asynchronously migrates the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writes the checkpoint data to the magnetic disk storage in a sequential fashion.
Zhang, Lijun; Song, Xiantao; Dong, Li; Li, Jianan; Dou, Ruiyu; Fan, Zhanming; An, Jing; Li, Debiao
2018-04-30
The purpose of the work was to evaluate the incremental diagnostic value of free-breathing, contrast-enhanced, whole-heart, 3 T cardiovascular magnetic resonance coronary angiography (CE-MRCA) to stress/rest myocardial perfusion imaging (MPI) and late gadolinium enhancement (LGE) imaging for detecting coronary artery disease (CAD). Fifty-one patients with suspected CAD underwent a comprehensive cardiovascular magnetic resonance (CMR) examination (CE-MRCA, MPI, and LGE). The additive diagnostic value of MRCA to MPI and LGE was evaluated using invasive x-ray coronary angiography (XA) as the standard for defining functionally significant CAD (≥ 50% stenosis in vessels > 2 mm in diameter). 90.2% (46/51) patients (54.0 ± 11.5 years; 71.7% men) completed CE-MRCA successfully. On per-patient basis, compared to MPI/LGE alone or MPI alone, the addition of MRCA resulted in higher sensitivity (100% vs. 76.5%, p < 0.01), no change in specificity (58.3% vs. 66.7%, p = 0.6), and higher accuracy (89.1% vs 73.9%, p < 0.01) for CAD detection (prevalence = 73.9%). Compared to LGE alone, the addition of CE-MRCA resulted in higher sensitivity (97.1% vs. 41.2%, p < 0.01), inferior specificity (83.3% vs. 91.7%, p = 0.02), and higher diagnostic accuracy (93.5% vs. 54.3%, p < 0.01). The inclusion of successful free-breathing, whole-heart, 3 T CE-MRCA significantly improved the sensitivity and diagnostic accuracy as compared to MPI and LGE alone for CAD detection.
Parallelization of PANDA discrete ordinates code using spatial decomposition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humbert, P.
2006-07-01
We present the parallel method, based on spatial domain decomposition, implemented in the 2D and 3D versions of the discrete Ordinates code PANDA. The spatial mesh is orthogonal and the spatial domain decomposition is Cartesian. For 3D problems a 3D Cartesian domain topology is created and the parallel method is based on a domain diagonal plane ordered sweep algorithm. The parallel efficiency of the method is improved by directions and octants pipelining. The implementation of the algorithm is straightforward using MPI blocking point to point communications. The efficiency of the method is illustrated by an application to the 3D-Ext C5G7more » benchmark of the OECD/NEA. (authors)« less
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-30
... Leased Workers From Echelon Service Company, Sun Associated Industries, Inc., MPI Consultants LLC...-site leased workers from Echelon Service Company, Sun Associated Industries, Inc., MPI Consultants LLC...
Shojaeifard, Maryam; Ghaedian, Tahereh; Yaghoobi, Nahid; Malek, Hadi; Firoozabadi, Hasan; Bitarafan-Rajabi, Ahmad; Haghjoo, Majid; Amin, Ahmad; Azizian, Nasrin; Rastgou, Feridoon
2015-01-01
Background: Gated single-photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI) is known as a feasible tool for the measurement of left ventricular ejection fraction (EF) and volumes, which are of great importance in the management and follow-up of patients with coronary artery diseases. However, considering the technical shortcomings of SPECT in the presence of perfusion defect, the accuracy of this method in heart failure patients is still controversial. Objectives: The aim of the present study was to compare the results from gated SPECT MPI with those from echocardiography in heart failure patients to compare echocardiographically-derived left ventricular dimension and function data to those from gated SPECT MPI in heart failure patients. Patients and Methods: Forty-one patients with severely reduced left ventricular systolic function (EF ≤ 35%) who were referred for gated SPECT MPI were prospectively enrolled. Quantification of EF, end-diastolic volume (EDV), and end-systolic volume (ESV) was performed by using quantitative gated spect (QGS) (QGS, version 0.4, May 2009) and emory cardiac toolbox (ECTb) (ECTb, revision 1.0, copyright 2007) software packages. EF, EDV, and ESV were also measured with two-dimensional echocardiography within 3 days after MPI. Results: A good correlation was found between echocardiographically-derived EF, EDV, and ESV and the values derived using QGS (r = 0.67, r = 0.78, and r = 0.80 for EF, EDV, and ESV, respectively; P < 0.001) and ECTb (r = 0.68, 0.79, and r = 0.80 for EF, EDV, and ESV, respectively; P < 0.001). However, Bland-Altman plots indicated significantly different mean values for EF, 11.4 and 20.9 using QGS and ECTb, respectively, as compared with echocardiography. ECTb-derived EDV was also significantly higher than the EDV measured with echocardiography and QGS. The highest correlation between echocardiography and gated SPECT MPI was found for mean values of ESV different. Conclusions: Gated SPECT MPI has a good correlation with echocardiography for the measurement of left ventricular EF, EDV, and ESV in patients with severe heart failure. However, the absolute values of these functional parameters from echocardiography and gated SPECT MPI measured with different software packages should not be used interchangeably. PMID:26889455
Application of the Low-dose One-stop-shop Cardiac CT Protocol with Third-generation Dual-source CT.
Lin, Lu; Wang, Yining; Yi, Yan; Cao, Jian; Kong, Lingyan; Qian, Hao; Zhang, Hongzhi; Wu, Wei; Wang, Yun; Jin, Zhengyu
2017-02-20
Objective To evaluate the feasibility of a low-dose one-stop-shop cardiac CT imaging protocol with third-generation dual-source CT (DSCT). Methods Totally 23 coronary artery disease (CAD) patients were prospectively enrolled between March to September in 2016. All patients underwent an ATP stress dynamic myocardial perfusion imaging (MPI) (data acquired prospectively ECG-triggered during end systole by table shuttle mode in 32 seconds) at 70 kV combined with prospectively ECG-triggered high-pitch coronary artery angiography (CCTA) on a third-generation DSCT system. Myocardial blood flow (MBF) was quantified and compared between perfusion normal and abnormal myocardial segments based on AHA-17-segment model. CCTA images were evaluated qualitatively based on SCCT-18-segment model and the effective dose(ED) was calculated. In patients with subsequent catheter coronary angiography (CCA) as reference,the diagnosis performance of MPI (for per-vessel ≥50% and ≥70% stenosis) and CCTA (for≥50% stenosis) were assessed. Results Of 23 patients who had completed the examination of ATP stress MPI plus CCTA,12 patients received follow-up CCA. At ATP stress MPI,77 segments (19.7%) in 13 patients (56.5%) had perfusion abnormalities. The MBF values of hypo-perfused myocardial segments decreased significantly compared with normal segments [(93±22)ml/(100 ml·min) vs. (147±27)ml/(100 ml·min);t=15.978,P=0.000]. At CCTA,93.9% (308/328) of the coronary segments had diagnostic image quality. With CCA as the reference standard,the per-vessel and per-segment sensitivity,specificity,and accuracy of CCTA for stenosis≥50% were 94.1%,93.5%,and 93.7% and 90.9%,97.8%,and 96.8%,and the per-vessel sensitivity,specificity and accuracy of ATP stress MPI for stenosis≥50% and ≥70% were 68.7%,100%,and 89.5% and 91.7%,100%,and 97.9%. The total ED of MPI and CCTA was (3.9±1.3) mSv [MPI:(3.5±1.2) mSv,CCTA:(0.3±0.1) mSv]. Conclusion The third-generation DSCT stress dynamic MPI at 70 kV combined with prospectively ECG-triggered high-pitch CCTA is a feasible and reliable tool for clinical diagnosis,with remarkably reduced radiation dose.
LMC: Logarithmantic Monte Carlo
NASA Astrophysics Data System (ADS)
Mantz, Adam B.
2017-06-01
LMC is a Markov Chain Monte Carlo engine in Python that implements adaptive Metropolis-Hastings and slice sampling, as well as the affine-invariant method of Goodman & Weare, in a flexible framework. It can be used for simple problems, but the main use case is problems where expensive likelihood evaluations are provided by less flexible third-party software, which benefit from parallelization across many nodes at the sampling level. The parallel/adaptive methods use communication through MPI, or alternatively by writing/reading files, and mostly follow the approaches pioneered by CosmoMC (ascl:1106.025).
pcircle - A Suite of Scalable Parallel File System Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
WANG, FEIYI
2015-10-01
Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Universal Common Communication Substrate (UCCS) is a low-level communication substrate that exposes high-performance communication primitives, while providing network interoperability. It is intended to support multiple upper layer protocol (ULPs) or programming models including SHMEM,UPC,Titanium,Co-Array Fortran,Global Arrays,MPI,GASNet, and File I/O. it provides various communication operations including one-sided and two-sided point-to-point, collectives, and remote atomic operations. In addition to operations for ULPs, it provides an out-of-band communication channel required typically required to wire-up communication libraries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergen, Ben; Moss, Nicholas; Charest, Marc Robert Joseph
FleCSI is a compile-time configurable framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. Current support includes multi-dimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures. FleCSI also introduces a functional programming model with control, execution, and data abstractions that are consistent with both MPI and state-of-the-art task-based runtimes such as Legion and Charm++. The FleCSI abstraction layer providesmore » the developer with insulation from the underlying runtime, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. The control and execution models in FleCSI also provide formal nomenclature for describing poorly understood concepts like kernels and tasks.« less
gpuPOM: a GPU-based Princeton Ocean Model
NASA Astrophysics Data System (ADS)
Xu, S.; Huang, X.; Zhang, Y.; Fu, H.; Oey, L.-Y.; Xu, F.; Yang, G.
2014-11-01
Rapid advances in the performance of the graphics processing unit (GPU) have made the GPU a compelling solution for a series of scientific applications. However, most existing GPU acceleration works for climate models are doing partial code porting for certain hot spots, and can only achieve limited speedup for the entire model. In this work, we take the mpiPOM (a parallel version of the Princeton Ocean Model) as our starting point, design and implement a GPU-based Princeton Ocean Model. By carefully considering the architectural features of the state-of-the-art GPU devices, we rewrite the full mpiPOM model from the original Fortran version into a new Compute Unified Device Architecture C (CUDA-C) version. We take several accelerating methods to further improve the performance of gpuPOM, including optimizing memory access in a single GPU, overlapping communication and boundary operations among multiple GPUs, and overlapping input/output (I/O) between the hybrid Central Processing Unit (CPU) and the GPU. Our experimental results indicate that the performance of the gpuPOM on a workstation containing 4 GPUs is comparable to a powerful cluster with 408 CPU cores and it reduces the energy consumption by 6.8 times.
Basu, Protonu; Williams, Samuel; Van Straalen, Brian; ...
2017-04-05
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias
2011-01-01
Future multiscale and multiphysics models must use the power of high performance computing (HPC) systems to enable research into human disease, translational medical science, and treatment. Previously we showed that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message passing processes (e.g. the message passing interface (MPI)) with multithreading (e.g. OpenMP, POSIX pthreads). The objective of this work is to compare the performance of such hybrid programming models when applied to the simulation of a lightweight multiscale cardiac model. Our results show that the hybrid models do not perform favourably when compared to an implementation using only MPI which is in contrast to our results using complex physiological models. Thus, with regards to lightweight multiscale cardiac models, the user may not need to increase programming complexity by using a hybrid programming approach. However, considering that model complexity will increase as well as the HPC system size in both node count and number of cores per node, it is still foreseeable that we will achieve faster than real time multiscale cardiac simulations on these systems using hybrid programming models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Basu, Protonu; Williams, Samuel; Van Straalen, Brian
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks
NASA Technical Reports Server (NTRS)
Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;
2006-01-01
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.
Normal Databases for the Relative Quantification of Myocardial Perfusion
Rubeaux, Mathieu; Xu, Yuan; Germano, Guido; Berman, Daniel S.; Slomka, Piotr J.
2016-01-01
Purpose of review Myocardial perfusion imaging (MPI) with SPECT is performed clinically worldwide to detect and monitor coronary artery disease (CAD). MPI allows an objective quantification of myocardial perfusion at stress and rest. This established technique relies on normal databases to compare patient scans against reference normal limits. In this review, we aim to introduce the process of MPI quantification with normal databases and describe the associated perfusion quantitative measures that are used. Recent findings New equipment and new software reconstruction algorithms have been introduced which require the development of new normal limits. The appearance and regional count variations of normal MPI scan may differ between these new scanners and standard Anger cameras. Therefore, these new systems may require the determination of new normal limits to achieve optimal accuracy in relative myocardial perfusion quantification. Accurate diagnostic and prognostic results rivaling those obtained by expert readers can be obtained by this widely used technique. Summary Throughout this review, we emphasize the importance of the different normal databases and the need for specific databases relative to distinct imaging procedures. use of appropriate normal limits allows optimal quantification of MPI by taking into account subtle image differences due to the hardware and software used, and the population studied. PMID:28138354
NASA Astrophysics Data System (ADS)
Raju, S. G.; Hariharan, Krishnan S.; Park, Da-Hye; Kang, HyoRang; Kolake, Subramanya Mayya
2015-10-01
Molecular dynamics (MD) simulations of ternary polymer electrolyte - ionic liquid mixtures are conducted using an all-atom model. N-alkyl-N-methylpyrrolidinium bis(trifluoromethylsulfonyl)imide ([CnMPy][TFSI], n = 1, 3, 6, 9) and polyethylene oxide (PEO) are used. Microscopic structure, energetics and dynamics of ionic liquid (IL) in these ternary mixtures are studied. Properties of these four pure IL are also calculated and compared to that in ternary mixtures. Interaction between pyrrolidinium cation and TFSI is stronger and there is larger propensity of ion-pair formation in ternary mixtures. Unlike the case in imidazolium IL, near neighbor structural correlation between TFSI reduces with increase in chain length on cation in both pure IL and ternary mixtures. Using spatial density maps, regions where PEO and TFSI interact with pyrrolidinium cation are identified. Oxygens of PEO are above and below the pyrrolidinium ring and away from the bulky alkyl groups whereas TFSI is present close to nitrogen atom of CnMPy. In pure IL, diffusion coefficient (D) of C3MPy is larger than of TFSI but D of C9MPy and C6MPy are larger than that of TFSI. The reasons for alkyl chain dependent phenomena are explored.
Fenix, A Fault Tolerant Programming Framework for MPI Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gamel, Marc; Teranihi, Keita; Valenzuela, Eric
2016-10-05
Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.
GANDALF - Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids
NASA Astrophysics Data System (ADS)
Hubber, D. A.; Rosotti, G. P.; Booth, R. A.
2018-01-01
GANDALF is a new hydrodynamics and N-body dynamics code designed for investigating planet formation, star formation and star cluster problems. GANDALF is written in C++, parallelized with both OPENMP and MPI and contains a PYTHON library for analysis and visualization. The code has been written with a fully object-oriented approach to easily allow user-defined implementations of physics modules or other algorithms. The code currently contains implementations of smoothed particle hydrodynamics, meshless finite-volume and collisional N-body schemes, but can easily be adapted to include additional particle schemes. We present in this paper the details of its implementation, results from the test suite, serial and parallel performance results and discuss the planned future development. The code is freely available as an open source project on the code-hosting website github at https://github.com/gandalfcode/gandalf and is available under the GPLv2 license.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Efficient Tracing for On-the-Fly Space-Time Displays in a Debugger for Message Passing Programs
NASA Technical Reports Server (NTRS)
Hood, Robert; Matthews, Gregory
2001-01-01
In this work we describe the implementation of a practical mechanism for collecting and displaying trace information in a debugger for message passing programs. We introduce a trace format that is highly compressible while still providing information adequate for debugging purposes. We make the mechanism convenient for users to access by incorporating the trace collection in a set of wrappers for the MPI (message passing interface) communication library. We implement several debugger operations that use the trace display: consistent stoplines, undo, and rollback. They all are implemented using controlled replay, which executes at full speed in target processes until the appropriate position in the computation is reached. They provide convenient mechanisms for getting to places in the execution where the full power of a state-based debugger can be brought to bear on isolating communication errors.
Noe, Timothy D; Kaufman, Carol E; Kaufmann, L Jeanne; Brooks, Elizabeth; Shore, Jay H
2014-09-01
We conducted an exploratory study to determine what organizational characteristics predict the provision of culturally competent services for American Indian and Alaska Native (AI/AN) veterans in Department of Veterans Affairs (VA) health facilities. In 2011 to 2012, we adapted the Organizational Readiness to Change Assessment (ORCA) for a survey of 27 VA facilities in the Western Region to assess organizational readiness and capacity to adopt and implement native-specific services and to profile the availability of AI/AN veteran programs and interest in and resources for such programs. Several ORCA subscales (Program Needs, Leader's Practices, and Communication) statistically significantly predicted whether VA staff perceived that their facilities were meeting the needs of AI/AN veterans. However, none predicted greater implementation of native-specific services. Our findings may aid in developing strategies for adopting and implementing promising native-specific programs and services for AI/AN veterans, and may be generalizable for other veteran groups.
Kaufman, Carol E.; Kaufmann, L. Jeanne; Brooks, Elizabeth; Shore, Jay H.
2014-01-01
Objectives. We conducted an exploratory study to determine what organizational characteristics predict the provision of culturally competent services for American Indian and Alaska Native (AI/AN) veterans in Department of Veterans Affairs (VA) health facilities. Methods. In 2011 to 2012, we adapted the Organizational Readiness to Change Assessment (ORCA) for a survey of 27 VA facilities in the Western Region to assess organizational readiness and capacity to adopt and implement native-specific services and to profile the availability of AI/AN veteran programs and interest in and resources for such programs. Results. Several ORCA subscales (Program Needs, Leader’s Practices, and Communication) statistically significantly predicted whether VA staff perceived that their facilities were meeting the needs of AI/AN veterans. However, none predicted greater implementation of native-specific services. Conclusions. Our findings may aid in developing strategies for adopting and implementing promising native-specific programs and services for AI/AN veterans, and may be generalizable for other veteran groups. PMID:25100420
Al-Mallah, Mouaz H; Pascual, Thomas N B; Mercuri, Mathew; Vitola, João V; Karthikeyan, Ganesan; Better, Nathan; Dondi, Maurizio; Paez, Diana; Einstein, Andrew J
2018-05-15
There is growing concern about radiation exposure from nuclear myocardial perfusion imaging (MPI), particularly among younger patients who are more prone to develop untoward effects of ionizing radiation, and hence US and European professional society guidelines recommend age as a consideration in weighing radiation risk from MPI. We aimed to determine how patient radiation doses from MPI vary across age groups in a large contemporary international cohort. Data were collected as part of a global cross-sectional study of centers performing MPI coordinated by the International Atomic Energy Agency (IAEA). Sites provided information on each MPI study completed during a single week in March-April 2013. We compared across age groups laboratory adherence to pre-specified radiation-related best practices, radiation effective dose (ED; a whole-body measure reflecting the amount of radiation to each organ and its relative sensitivity to radiation's deleterious effects), and the proportion of patients with ED ≤ 9 mSv, a target level specified in guidelines. Among 7911 patients undergoing MPI in 308 laboratories in 65 countries, mean ED was 10.0 ± 4.5 mSv with slightly higher exposure among younger age groups (trend p value < 0.001). There was no difference in the proportion of patients with ED ≤ 9 mSv across age groups, or in adherence to best practices based on the median age of patients in a laboratory. In contemporary nuclear cardiology practice, the age of the patient appears not to impact protocol selection and radiation dose, contrary to professional society guidelines. Copyright © 2018. Published by Elsevier B.V.
Calibration free beam hardening correction for cardiac CT perfusion imaging
NASA Astrophysics Data System (ADS)
Levi, Jacob; Fahmi, Rachid; Eck, Brendan L.; Fares, Anas; Wu, Hao; Vembar, Mani; Dhanantwari, Amar; Bezerra, Hiram G.; Wilson, David L.
2016-03-01
Myocardial perfusion imaging using CT (MPI-CT) and coronary CTA have the potential to make CT an ideal noninvasive gate-keeper for invasive coronary angiography. However, beam hardening artifacts (BHA) prevent accurate blood flow calculation in MPI-CT. BH Correction (BHC) methods require either energy-sensitive CT, not widely available, or typically a calibration-based method. We developed a calibration-free, automatic BHC (ABHC) method suitable for MPI-CT. The algorithm works with any BHC method and iteratively determines model parameters using proposed BHA-specific cost function. In this work, we use the polynomial BHC extended to three materials. The image is segmented into soft tissue, bone, and iodine images, based on mean HU and temporal enhancement. Forward projections of bone and iodine images are obtained, and in each iteration polynomial correction is applied. Corrections are then back projected and combined to obtain the current iteration's BHC image. This process is iterated until cost is minimized. We evaluate the algorithm on simulated and physical phantom images and on preclinical MPI-CT data. The scans were obtained on a prototype spectral detector CT (SDCT) scanner (Philips Healthcare). Mono-energetic reconstructed images were used as the reference. In the simulated phantom, BH streak artifacts were reduced from 12+/-2HU to 1+/-1HU and cupping was reduced by 81%. Similarly, in physical phantom, BH streak artifacts were reduced from 48+/-6HU to 1+/-5HU and cupping was reduced by 86%. In preclinical MPI-CT images, BHA was reduced from 28+/-6 HU to less than 4+/-4HU at peak enhancement. Results suggest that the algorithm can be used to reduce BHA in conventional CT and improve MPI-CT accuracy.
Asih, Sali; Mayer, Tom G; Williams, Mark; Choi, Yun Hee; Gatchel, Robert J
2015-12-01
The objectives of this study: (1) to assess whether Multidimensional Pain Inventory (MPI) profiles predicted differential responses to a functional restoration program (FRP) in chronic disabling occupational musculoskeletal disorder (CDOMD) patients; (2) to examine whether coping style improves following FRP; and (3) to determine whether discharge MPI profiles predict discharge psychosocial and 1-year socioeconomic outcomes. Consecutive CDOMD patients (N=716) were classified into Adaptive Coper (AC, n=209), Interpersonally Distressed (ID, n=154), Dysfunctional (DYS, n=310), and Anomalous (n=43) using the MPI, and reclassified at discharge. Profiles were compared on psychosocial measures and 1-year socioeconomic outcomes. An intent-to-treat sample analyzed the effect of drop-outs on treatment responsiveness. The MPI classification significantly predicted program completion (P=0.001), although the intent-to-treat analyses found no significant effects of drop-out on treatment responsiveness. There was a significant increase in the number of patients who became AC or Anomalous at FRP discharge and a decrease in those who were ID or DYS. Patients who changed or remained as DYS at FRP discharge reported the highest levels of pain, disability, and depression. No significant interaction effect was found between MPI group and time for pain intensity or disability. All groups improved on psychosocial measures at discharge. DYS patients had decreased work retention and a greater health care utilization at 1 year. An FRP was clinically effective for CDOMD patients regardless of initial MPI profiles. The FRP modified profiles, with patients changing from negative to positive profiles. Discharge DYS were more likely to have poor 1-year outcomes. Those classified as Anomalous had a good prognosis for functional recovery similar to ACs.
The effect of patient anxiety and depression on motion during myocardial perfusion SPECT imaging.
Lyra, Vassiliki; Kallergi, Maria; Rizos, Emmanouil; Lamprakopoulos, Georgios; Chatziioannou, Sofia N
2016-08-22
Patient motion during myocardial perfusion SPECT imaging (MPI) may be triggered by a patient's physical and/or psychological discomfort. The aim of this study was to investigate the impact of state anxiety (patient's reaction to exam-related stress), trait anxiety (patient's personality characteristic) and depression on patient motion during MPI. All patients that underwent MPI in our department in a six-month period were prospectively enrolled. One hundred eighty-three patients (45 females; 138 males) filled in the State-Trait Anxiety Inventory (STAI) and the Beck Depression Inventory (BDI), along with a short questionnaire regarding their age, height and weight, level of education in years, occupation, and marital status. Cardiovascular and other co-morbidity factors were also evaluated. Through inspection of raw data on cinematic display, the presence or absence of patient motion was registered and classified into mild, moderate and severe, for both phases involved in image acquisition. The correlation of patient motion in the stress and delay phases of MPI and each of the other variables was investigated and the corresponding Pearson's coefficients of association were calculated. The anxiety-motion (r = 0.43, P < 0.0001) and depression-motion (r = 0.32, P < 0.0001) correlation results were moderately strong and statistically significant for the female but not the male patients. All the other variables did not demonstrate any association with motion in MPI, except a weak correlation between age and motion in females (r = 0.23, P < 0.001). The relationship between anxiety-motion and depression-motion identified in female patients represents the first supporting evidence of psychological discomfort as predisposing factor for patient motion during MPI.
NASA Astrophysics Data System (ADS)
Dhavalikar, Rohan; Rinaldi, Carlos
2016-12-01
Magnetic nanoparticles in alternating magnetic fields (AMFs) transfer some of the field's energy to their surroundings in the form of heat, a property that has attracted significant attention for use in cancer treatment through hyperthermia and in developing magnetic drug carriers that can be actuated to release their cargo externally using magnetic fields. To date, most work in this field has focused on the use of AMFs that actuate heat release by nanoparticles over large regions, without the ability to select specific nanoparticle-loaded regions for heating while leaving other nanoparticle-loaded regions unaffected. In parallel, magnetic particle imaging (MPI) has emerged as a promising approach to image the distribution of magnetic nanoparticle tracers in vivo, with sub-millimeter spatial resolution. The underlying principle in MPI is the application of a selection magnetic field gradient, which defines a small region of low bias field, superimposed with an AMF (of lower frequency and amplitude than those normally used to actuate heating by the nanoparticles) to obtain a signal which is proportional to the concentration of particles in the region of low bias field. Here we extend previous models for estimating the energy dissipation rates of magnetic nanoparticles in uniform AMFs to provide theoretical predictions of how the selection magnetic field gradient used in MPI can be used to selectively actuate heating by magnetic nanoparticles in the low bias field region of the selection magnetic field gradient. Theoretical predictions are given for the spatial decay in energy dissipation rate under magnetic field gradients representative of those that can be achieved with current MPI technology. These results underscore the potential of combining MPI and higher amplitude/frequency actuation AMFs to achieve selective magnetic fluid hyperthermia (MFH) guided by MPI.
Korosoglou, G; Hansen, A; Bekeredjian, R; Filusch, A; Hardt, S; Wolf, D; Schellberg, D; Katus, H A; Kuecherer, H
2006-03-01
To evaluate whether myocardial parametric imaging (MPI) is superior to visual assessment for the evaluation of myocardial viability. Myocardial contrast echocardiography (MCE) was assessed in 11 pigs before, during, and after left anterior descending coronary artery occlusion and in 32 patients with ischaemic heart disease by using intravenous SonoVue administration. In experimental studies perfusion defect area assessment by MPI was compared with visually guided perfusion defect planimetry. Histological assessment of necrotic tissue was the standard reference. In clinical studies viability was assessed on a segmental level by (1) visual analysis of myocardial opacification; (2) quantitative estimation of myocardial blood flow in regions of interest; and (3) MPI. Functional recovery between three and six months after revascularisation was the standard reference. In experimental studies, compared with visually guided perfusion defect planimetry, planimetric assessment of infarct size by MPI correlated more significantly with histology (r2 = 0.92 versus r2 = 0.56) and had a lower intraobserver variability (4% v 15%, p < 0.05). In clinical studies, MPI had higher specificity (66% v 43%, p < 0.05) than visual MCE and good accuracy (81%) for viability detection. It was less time consuming (3.4 (1.6) v 9.2 (2.4) minutes per image, p < 0.05) than quantitative blood flow estimation by regions of interest and increased the agreement between observers interpreting myocardial perfusion (kappa = 0.87 v kappa = 0.75, p < 0.05). MPI is useful for the evaluation of myocardial viability both in animals and in patients. It is less time consuming than quantification analysis by regions of interest and less observer dependent than visual analysis. Thus, strategies incorporating this technique may be valuable for the evaluation of myocardial viability in clinical routine.
Weinsaft, Jonathan W; Manoushagian, Shant J; Patel, Taral; Shakoor, Aqsa; Kim, Robert J; Mirchandani, Sunil; Lin, Fay; Wong, Franklin J; Szulc, Massimiliano; Okin, Peter M; Kligfield, Paul D; Min, James K
2009-01-01
To assess the utility of stress electrocardiography (ECG) for identifying the presence and severity of obstructive coronary artery disease (CAD) defined by coronary computed tomographic angiography (CCTA) among patients with normal nuclear myocardial perfusion imaging (MPI). The study population comprised 119 consecutive patients with normal MPI who also underwent CCTA (interval 3.5+/-3.8 months). Stress ECG was performed at the time of MPI. CCTA and MPI were interpreted using established scoring systems, and CCTA was used to define the presence and extent of CAD, which was quantified by a coronary artery jeopardy score. Within this population, 28 patients (24%) had obstructive CAD identified by CCTA. The most common CAD pattern was single-vessel CAD (61%), although proximal vessel involvement was present in 46% of patients. Patients with CAD were nearly three times more likely to have positive standard test responses (1 mm ST-segment deviation) than patients with patent coronary arteries (36 vs. 13%, P=0.007). In multivariate analysis, a positive ST-segment test response was an independent marker for CAD (odds ratio: 2.02, confidence interval: 1.09-3.78, P=0.03) even after adjustment for a composite of clinical cardiac risk factors (odds ratio: 1.85, confidence interval: 1.05-3.23, P=0.03). Despite uniformly normal MPI, mean coronary jeopardy score was three-fold higher among patients with positive compared to those with negative ST-segment response to exercise or dobutamine stress (1.9+/-2.7 vs. 0.5+/-1.4, P=0.03). Stress-induced ST-segment deviation is an independent marker for obstructive CAD among patients with normal MPI. A positive stress ECG identifies patients with a greater anatomic extent of CAD as quantified by coronary jeopardy score.
Kosonen, Jukka; Kulmala, Juha-Pekka; Müller, Erich; Avela, Janne
2017-03-21
Anti-pronation orthoses, like medially posted insoles (MPI), have traditionally been used to treat various of lower limb problems. Yet, we know surprisingly little about their effects on overall foot motion and lower limb mechanics across walking and running, which represent highly different loading conditions. To address this issue, multi-segment foot and lower limb mechanics was examined among 11 overpronating men with normal (NORM) and MPI insoles during walking (self-selected speed 1.70±0.19m/s vs 1.72±0.20m/s, respectively) and running (4.04±0.17m/s vs 4.10±0.13m/s, respectively). The kinematic results showed that MPI reduced the peak forefoot eversion movement in respect to both hindfoot and tibia across walking and running when compared to NORM (p<0.05-0.01). No differences were found in hindfoot eversion between conditions. The kinetic results showed no insole effects in walking, but during running MPI shifted center of pressure medially under the foot (p<0.01) leading to an increase in frontal plane moments at the hip (p<0.05) and knee (p<0.05) joints and a reduction at the ankle joint (p<0.05). These findings indicate that MPI primarily controlled the forefoot motion across walking and running. While kinetic response to MPI was more pronounced in running than walking, kinematic effects were essentially similar across both modes. This suggests that despite higher loads placed upon lower limb during running, there is no need to have a stiffer insoles to achieve similar reduction in the forefoot motion than in walking. Copyright © 2017 Elsevier Ltd. All rights reserved.
Magnetic particle imaging for in vivo blood flow velocity measurements in mice
NASA Astrophysics Data System (ADS)
Kaul, Michael G.; Salamon, Johannes; Knopp, Tobias; Ittrich, Harald; Adam, Gerhard; Weller, Horst; Jung, Caroline
2018-03-01
Magnetic particle imaging (MPI) is a new imaging technology. It is a potential candidate to be used for angiographic purposes, to study perfusion and cell migration. The aim of this work was to measure velocities of the flowing blood in the inferior vena cava of mice, using MPI, and to evaluate it in comparison with magnetic resonance imaging (MRI). A phantom mimicking the flow within the inferior vena cava with velocities of up to 21 cm s‑1 was used for the evaluation of the applied analysis techniques. Time–density and distance–density analyses for bolus tracking were performed to calculate flow velocities. These findings were compared with the calibrated velocities set by a flow pump, and it can be concluded that velocities of up to 21 cm s‑1 can be measured by MPI. A time–density analysis using an arrival time estimation algorithm showed the best agreement with the preset velocities. In vivo measurements were performed in healthy FVB mice (n = 10). MRI experiments were performed using phase contrast (PC) for velocity mapping. For MPI measurements, a standardized injection of a superparamagnetic iron oxide tracer was applied. In vivo MPI data were evaluated by a time–density analysis and compared to PC MRI. A Bland–Altman analysis revealed good agreement between the in vivo velocities acquired by MRI of 4.0 ± 1.5 cm s‑1 and those measured by MPI of 4.8 ± 1.1 cm s‑1. Magnetic particle imaging is a new tool with which to measure and quantify flow velocities. It is fast, radiation-free, and produces 3D images. It therefore offers the potential for vascular imaging.
Development of Modeling and Simulation for Magnetic Particle Inspection Using Finite Elements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Jun-Youl
2003-01-01
Magnetic particle inspection (MPI) is a widely used nondestructive inspection method for aerospace applications essentially limited to experiment-based approaches. The analysis of MPI characteristics that affect sensitivity and reliability contributes not only reductions in inspection design cost and time but also improvement of analysis of experimental data. Magnetic particles are easily attracted toward a high magnetic field gradient. Selection of a magnetic field source, which produces a magnetic field gradient large enough to detect a defect in a test sample or component, is an important factor in magnetic particle inspection. In this work a finite element method (FEM) has beenmore » employed for numerical calculation of the MPI simulation technique. The FEM method is known to be suitable for complicated geometries such as defects in samples. This thesis describes the research that is aimed at providing a quantitative scientific basis for magnetic particle inspection. A new FEM solver for MPI simulation has been developed in this research for not only nonlinear reversible permeability materials but also irreversible hysteresis materials that are described by the Jiles-Atherton model. The material is assumed to have isotropic ferromagnetic properties in this research (i.e., the magnetic properties of the material are identical in all directions in a single crystal). In the research, with a direct current field mode, an MPI situation has been simulated to measure the estimated volume of magnetic particles around defect sites before and after removing any external current fields. Currently, this new MPI simulation package is limited to solving problems with the single current source from either a solenoid or an axial directional current rod.« less
Applications Performance Under MPL and MPI on NAS IBM SP2
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.
libSRES: a C library for stochastic ranking evolution strategy for parameter estimation.
Ji, Xinglai; Xu, Ying
2006-01-01
Estimation of kinetic parameters in a biochemical pathway or network represents a common problem in systems studies of biological processes. We have implemented a C library, named libSRES, to facilitate a fast implementation of computer software for study of non-linear biochemical pathways. This library implements a (mu, lambda)-ES evolutionary optimization algorithm that uses stochastic ranking as the constraint handling technique. Considering the amount of computing time it might require to solve a parameter-estimation problem, an MPI version of libSRES is provided for parallel implementation, as well as a simple user interface. libSRES is freely available and could be used directly in any C program as a library function. We have extensively tested the performance of libSRES on various pathway parameter-estimation problems and found its performance to be satisfactory. The source code (in C) is free for academic users at http://csbl.bmb.uga.edu/~jix/science/libSRES/
NASA Astrophysics Data System (ADS)
Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar
2017-12-01
We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Accelerating list management for MPI.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hemmert, K. Scott; Rodrigues, Arun F.; Underwood, Keith Douglas
2005-07-01
The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a message is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By movingmore » functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages.« less
Porting the AVS/Express scientific visualization software to Cray XT4.
Leaver, George W; Turner, Martin J; Perrin, James S; Mummery, Paul M; Withers, Philip J
2011-08-28
Remote scientific visualization, where rendering services are provided by larger scale systems than are available on the desktop, is becoming increasingly important as dataset sizes increase beyond the capabilities of desktop workstations. Uptake of such services relies on access to suitable visualization applications and the ability to view the resulting visualization in a convenient form. We consider five rules from the e-Science community to meet these goals with the porting of a commercial visualization package to a large-scale system. The application uses message-passing interface (MPI) to distribute data among data processing and rendering processes. The use of MPI in such an interactive application is not compatible with restrictions imposed by the Cray system being considered. We present details, and performance analysis, of a new MPI proxy method that allows the application to run within the Cray environment yet still support MPI communication required by the application. Example use cases from materials science are considered.
Andreu, Yolanda; Galdon, Maria J; Durá, Estrella; Ferrando, Maite; Pascual, Juan; Turk, Dennis C; Jiménez, Yolanda; Poveda, Rafael
2006-01-01
Background This paper seeks to analyse the psychometric and structural properties of the Multidimensional Pain Inventory (MPI) in a sample of temporomandibular disorder patients. Methods The internal consistency of the scales was obtained. Confirmatory Factor Analysis was carried out to test the MPI structure section by section in a sample of 114 temporomandibular disorder patients. Results Nearly all scales obtained good reliability indexes. The original structure could not be totally confirmed. However, with a few adjustments we obtained a satisfactory structural model of the MPI which was slightly different from the original: certain items and the Self control scale were eliminated; in two cases, two original scales were grouped in one factor, Solicitous and Distracting responses on the one hand, and Social activities and Away from home activities, on the other. Conclusion The MPI has been demonstrated to be a reliable tool for the assessment of pain in temporomandibular disorder patients. Some divergences to be taken into account have been clarified. PMID:17169143
Projection x-space magnetic particle imaging.
Goodwill, Patrick W; Konkle, Justin J; Zheng, Bo; Saritas, Emine U; Conolly, Steven M
2012-05-01
Projection magnetic particle imaging (MPI) can improve imaging speed by over 100-fold over traditional 3-D MPI. In this work, we derive the 2-D x-space signal equation, 2-D image equation, and introduce the concept of signal fading and resolution loss for a projection MPI imager. We then describe the design and construction of an x-space projection MPI scanner with a field gradient of 2.35 T/m across a 10 cm magnet free bore. The system has an expected resolution of 3.5 × 8.0 mm using Resovist tracer, and an experimental resolution of 3.8 × 8.4 mm resolution. The system images 2.5 cm × 5.0 cm partial field-of views (FOVs) at 10 frames/s, and acquires a full field-of-view of 10 cm × 5.0 cm in 4 s. We conclude by imaging a resolution phantom, a complex "Cal" phantom, mice injected with Resovist tracer, and experimentally confirm the theoretically predicted x-space spatial resolution.
Dhavalikar, R; Hensley, D; Maldonado-Camargo, L; Croft, L R; Ceron, S; Goodwill, P W; Conolly, S M; Rinaldi, C
2016-08-03
Magnetic Particle Imaging (MPI) is an emerging tomographic imaging technology that detects magnetic nanoparticle tracers by exploiting their non-linear magnetization properties. In order to predict the behavior of nanoparticles in an imager, it is possible to use a non-imaging MPI relaxometer or spectrometer to characterize the behavior of nanoparticles in a controlled setting. In this paper we explore the use of ferrohydrodynamic magnetization equations for predicting the response of particles in an MPI relaxometer. These include a magnetization equation developed by Shliomis (Sh) which has a constant relaxation time and a magnetization equation which uses a field-dependent relaxation time developed by Martsenyuk, Raikher and Shliomis (MRSh). We compare the predictions from these models with measurements and with the predictions based on the Langevin function that assumes instantaneous magnetization response of the nanoparticles. The results show good qualitative and quantitative agreement between the ferrohydrodynamic models and the measurements without the use of fitting parameters and provide further evidence of the potential of ferrohydrodynamic modeling in MPI.
Gayed, Isis; Gohar, Salman; Liao, Zhongxing; McAleer, Mary; Bassett, Roland; Yusuf, Syed Wamique
2009-06-01
This study aims to identify the clinical implications of myocardial perfusion defects after chemoradiation therapy (CRT) in patients with esophageal and lung cancer. We retrospectively compared myocardial perfusion imaging (MPI) results before and after CRT in 16 patients with esophageal cancer and 24 patients with lung cancer. New MPI defects in the radiation therapy (RT) fields were considered related to RT. Follow-up to evaluate for cardiac complications and their relation with the results of MPI was performed. Statistical analysis identified predictors of cardiac morbidities. Eleven females and twenty nine males at a mean age of 66.7 years were included. Five patients (31%) with esophageal cancer and seven patients (29%) with lung cancer developed myocardial ischemia in the RT field at mean intervals of 7.0 and 8.4 months after RT. The patients were followed-up for mean intervals of 15 and 23 months in the esophageal and lung cancer groups, respectively. Seven patients in each of the esophageal (44%) and lung (29%) cancer patients (P = 0.5) developed cardiac complications of which one patient with esophageal cancer died of complete heart block. Six out of the fourteen patients (43%) with cardiac complication had new ischemia on MPI after CRT of which only one developed angina. The remaining eight patients with cardiac complications had normal MPI results. MPI result was not a statistically significant predictor of future cardiac complications after CRT. A history of congestive heart failure (CHF) (P = 0.003) or arrhythmia (P = 0.003) is a significant predictor of cardiac morbidity after CRT in univariate analysis but marginal predictors when multivariate analysis was performed (P = 0.06 and 0.06 for CHF and arrhythmia, respectively). Cardiac complications after CRT are more common in esophageal than lung cancer patients but the difference is not statistically significant. MPI abnormalities are frequently seen after CRT but are not predictive of future cardiac complications. A history of arrhythmia or CHF is significantly associated with cardiac complications after CRT.
Performance Improvements of the CYCOFOS Flow Model
NASA Astrophysics Data System (ADS)
Radhakrishnan, Hari; Moulitsas, Irene; Syrakos, Alexandros; Zodiatis, George; Nikolaides, Andreas; Hayes, Daniel; Georgiou, Georgios C.
2013-04-01
The CYCOFOS-Cyprus Coastal Ocean Forecasting and Observing System has been operational since early 2002, providing daily sea current, temperature, salinity and sea level forecasting data for the next 4 and 10 days to end-users in the Levantine Basin, necessary for operational application in marine safety, particularly concerning oil spills and floating objects predictions. CYCOFOS flow model, similar to most of the coastal and sub-regional operational hydrodynamic forecasting systems of the MONGOOS-Mediterranean Oceanographic Network for Global Ocean Observing System is based on the POM-Princeton Ocean Model. CYCOFOS is nested with the MyOcean Mediterranean regional forecasting data and with SKIRON and ECMWF for surface forcing. The increasing demand for higher and higher resolution data to meet coastal and offshore downstream applications motivated the parallelization of the CYCOFOS POM model. This development was carried out in the frame of the IPcycofos project, funded by the Cyprus Research Promotion Foundation. The parallel processing provides a viable solution to satisfy these demands without sacrificing accuracy or omitting any physical phenomena. Prior to IPcycofos project, there are been several attempts to parallelise the POM, as for example the MP-POM. The existing parallel code models rely on the use of specific outdated hardware architectures and associated software. The objective of the IPcycofos project is to produce an operational parallel version of the CYCOFOS POM code that can replicate the results of the serial version of the POM code used in CYCOFOS. The parallelization of the CYCOFOS POM model use Message Passing Interface-MPI, implemented on commodity computing clusters running open source software and not depending on any specialized vendor hardware. The parallel CYCOFOS POM code constructed in a modular fashion, allowing a fast re-locatable downscaled implementation. The MPI takes advantage of the Cartesian nature of the POM mesh, and use the built-in functionality of MPI routines to split the mesh, using a weighting scheme, along longitude and latitude among the processors. Each server processor work on the model based on domain decomposition techniques. The new parallel CYCOFOS POM code has been benchmarked against the serial POM version of CYCOFOS for speed, accuracy, and resolution and the results are more than satisfactory. With a higher resolution CYCOFOS Levantine model domain the forecasts need much less time than the serial CYCOFOS POM coarser version, both with identical accuracy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Golovanov, Georgy
The thesis is devoted to the study of processes with multiple parton interactions (MPI) in a ppbar collision collected by D0 detector at the Fermilab Tevatron collider at sqrt(s) = 1.96 TeV. The study includes measurements of MPI event fraction and effective cross section, a process-independent parameter related to the effective interaction region inside the nucleon. The measurements are done using events with a photon and three hadronic jets in the final state. The measured effective cross section is used to estimate background from MPI for WH production at the Tevatron energy
A Generator-Produced Gallium-68 Radiopharmaceutical for PET Imaging of Myocardial Perfusion
Sharma, Vijay; Sivapackiam, Jothilingam; Harpstrite, Scott E.; Prior, Julie L.; Gu, Hannah; Rath, Nigam P.; Piwnica-Worms, David
2014-01-01
Lipophilic cationic technetium-99m-complexes are widely used for myocardial perfusion imaging (MPI). However, inherent uncertainties in the supply chain of molybdenum-99, the parent isotope required for manufacturing 99Mo/99mTc generators, intensifies the need for discovery of novel MPI agents incorporating alternative radionuclides. Recently, germanium/gallium (Ge/Ga) generators capable of producing high quality 68Ga, an isotope with excellent emission characteristics for clinical PET imaging, have emerged. Herein, we report a novel 68Ga-complex identified through mechanism-based cell screening that holds promise as a generator-produced radiopharmaceutical for PET MPI. PMID:25353349
Addison, Daniel; Singh, Vinita; Okyere-Asante, K; Okafor, Henry
2014-01-01
Patients presenting with chest pain and evidence of functional ischemia by myocardial perfusion imaging (MPI), but lacking commensurate angiographic disease pose a diagnostic and therapeutic dilemma. They are often dismissed as having 'false-positive MPI'. Moreover, a majority of the available long-term outcome data for it has been derived from homogenous female populations. In this study, we sought to evaluate the long-term outcomes of this presentation in a multiethnic male-predominant cohort. We retrospectively identified 47 patients who presented to our institution between 2002 and 2005 with chest pain and evidence of ischemia on MPI, but with no significant angiographic disease on subsequent cardiac catheterization (cases). The occurrence of adverse cardiovascular outcomes (chest pain, congestive heart failure, acute myocardial infarction and stroke) post-index coronary angiogram was tracked. Similar data was collected for 37 patients who also presented with chest pain, but normal MPI over the same period (controls). Overall average follow-up was over 22 months. Fifty-three percent (26/47) of the cases had one or more of the adverse outcomes as compared with 22% (8/37) of controls (P < 0.01). Of these, 13 (50.0%) and 3 (37.5%) were males, respectively. Ischemia on MPI is predictive of long-term adverse cardiovascular outcomes despite normal ('false-negative') coronary angiography. This appears to be gender-neutral.
Advances in PET myocardial perfusion imaging: F-18 labeled tracers.
Rischpler, Christoph; Park, Min-Jae; Fung, George S K; Javadi, Mehrbod; Tsui, Benjamin M W; Higuchi, Takahiro
2012-01-01
Coronary artery disease and its related cardiac disorders represent the most common cause of death in the USA and Western world. Despite advancements in treatment and accompanying improvements in outcome with current diagnostic and therapeutic modalities, it is the correct assignment of these diagnostic techniques and treatment options which are crucial. From a diagnostic standpoint, SPECT myocardial perfusion imaging (MPI) using traditional radiotracers like thallium-201 chloride, Tc-99m sestamibi or Tc-99m tetrofosmin is the most utilized imaging technique. However, PET MPI using N-13 ammonia, rubidium-82 chloride or O-15 water is increasing in availability and usage as a result of the growing number of medical centers with new-generation PET/CT systems taking advantage of the superior imaging properties of PET over SPECT. The routine clinical use of PET MPI is still limited, in part because of the short half-life of conventional PET MPI tracers. The disadvantages of these conventional PET tracers include expensive onsite production and inconvenient on-scanner tracer administration making them unsuitable for physical exercise stress imaging. Recently, two F-18 labeled radiotracers with longer radioactive half-lives than conventional PET imaging agents have been introduced. These are flurpiridaz F 18 (formerly known as F-18 BMS747158-02) and F-18 fluorobenzyltriphenylphosphonium. These longer half-life F-18 labeled perfusion tracers can overcome the production and protocol limitations of currently used radiotracers for PET MPI.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gowda, Giri; Sagurthi, Someswar Rao; Savithri, H. S.
2008-02-01
The cloning, expression, purification, crystallization and preliminary X-ray crystallographic studies of mannose 6-phosphate isomerase from S. typhimurium are reported. Mannose 6-phosphate isomerase (MPI; EC 5.3.1.8) catalyzes the reversible isomerization of d-mannose 6-phosphate (M6P) and d-fructose 6-phosphate (F6P). In the eukaryotes and prokaryotes investigated to date, the enzyme has been reported to play a crucial role in d-mannose metabolism and supply of the activated mannose donor guanosine diphosphate d-mannose (GDP-d-mannose). In the present study, MPI was cloned from Salmonella typhimurium, overexpressed in Escherichia coli and purified using Ni–NTA affinity column chromatography. Purified MPI crystallized in space group P2{sub 1}2{sub 1}2{sub 1},more » with unit-cell parameters a = 36.03, b = 92.2, c = 111.01 Å. A data set extending to 1.66 Å resolution was collected with 98.8% completeness using an image-plate detector system mounted on a rotating-anode X-ray generator. The asymmetric unit of the crystal cell was compatible with the presence of a monomer of MPI. A preliminary structure solution of the enzyme has been obtained by molecular replacement using Candida albicans MPI as the phasing model and the program Phaser. Further refinement and model building are in progress.« less
DeCicco, Anthony E; Sokil, Alexis B; Marhefka, Gregary D; Reist, Kirk; Hansen, Christopher L
2015-04-01
Obesity is not only associated with an increased risk of coronary artery disease, but also decreases the accuracy of many diagnostic modalities pertinent to this disease. Advances in myocardial perfusion imaging (MPI) have mitigated somewhat the effects of obesity, although the feasibility of MPI in the super-obese (defined as a BMI > 50) is currently untested. We undertook this study to assess the practicality of MPI in the super-obese using a multi-headed solid-state gamma camera with attenuation correction. We retrospectively identified consecutive super-obese patients referred for MPI at our institution. The images were interpreted by 3 blinded, experienced readers and graded for quality and diagnosis, and subjectively evaluated the contribution of attenuation correction. Clinical follow-up was obtained from review of medical records. 72 consecutive super-obese patients were included. Their BMI ranged from 50 to 67 (55.7 ± 5.1). Stress image quality was considered good or excellent in 45 (63%), satisfactory in 24 (33%), poor in 3 (4%), and uninterpretable in 0 patients. Rest images were considered good or excellent in 34 (49%), satisfactory in 23 (33%), poor in 13 (19%), and uninterpretable in 0 patients. Attenuation correction changed the interpretation in 34 (47%) of studies. MPI is feasible and provides acceptable image quality for super-obese patients, although it may be camera and protocol dependent.
Nakatani, Akiho; Li, Xuan; Miyamoto, Junki; Igarashi, Miki; Watanabe, Hitoshi; Sutou, Asuka; Watanabe, Keita; Motoyama, Takayasu; Tachibana, Nobuhiko; Kohno, Mitsutaka; Inoue, Hiroshi; Kimura, Ikuo
2018-07-02
The 8-globulin-rich mung bean protein (MPI) suppresses hepatic lipogenesis in rodent models and reduces fasting plasma glucose and insulin levels in obese adults. However, its effects on mitigating high fat diet (HFD)-induced obesity and the mechanism underlying these effects remain to be elucidated. Herein, we examined the metabolic phenotype, intestinal bile acid (BA) pool, and gut microbiota of conventionally raised (CONV-R) male C57BL/6 mice and germ-free (GF) mice that were randomized to receive either regular HFD or HFD containing mung bean protein isolate (MPI) instead of the dairy protein present in regular HFD. MPI intake significantly reduced HFD-induced weight gain and adipose tissue accumulation, and attenuated hepatic steatosis. Enhancement in the secretion of intestinal glucagon-like peptide-1 (GLP-1) and an enlarged cecal and fecal BA pool of dramatically elevated secondary/primary BA ratio were observed in mice that had consumed MPI. These effects were abolished in GF mice, indicating that the effects were dependent upon the presence of the microbiota. As revealed by 16S rRNA gene sequence analysis, MPI intake also elicited dramatic changes in the gut microbiome, such as an expansion of taxa belonging to the phylum Bacteroidetes along with a reduced abundance of the Firmicutes. Copyright © 2018 Elsevier Inc. All rights reserved.
Ortiz, Javier U; Torres, Ximena; Eixarch, Elisenda; Bennasar, Mar; Cruz-Lemini, Monica; Gómez, Olga; Lobmaier, Silvia M; Martínez, Josep M; Gratacós, Eduard; Crispi, Fatima
2018-01-19
To evaluate left myocardial performance index (MPI) and time intervals in fetuses with twin-to-twin transfusion syndrome (TTTS) before and after laser surgery. Fifty-one fetal pairs with TTTS and 47 uncomplicated monochorionic twin pairs were included. Left ventricular isovolumetric contraction time (ICT), ejection time (ET), and isovolumetric relaxation time (IRT) were measured using conventional Doppler. Recipients showed prolonged ICT (46 ± 12 vs. 31 ± 8 vs. 30 ± 5 ms; p < 0.001) and IRT (51 ± 9 vs. 43 ± 8 vs. 43 ± 5 ms; p < 0.001) and higher MPI (0.57 ± 0.12 vs. 0.47 ± 0.09 vs. 0.44 ± 0.05; p < 0.001) than donors and controls. Donors showed shorter ET than recipients and controls (157 ± 12 vs. 169 ± 10 vs. 168 ± 10 ms; p < 0.001) and higher MPI than controls (0.47 ± 0.09 vs. 0.44 ± 0.05; p = 0.006). Preoperative MPI changes were observed in all TTTS stages. Time intervals partially improved after surgery. Donor and recipient twins had higher MPI due to different changes in the time intervals, possibly reflecting the state of hypovolemia in the donor and hypervolemia and pressure overload in the recipient. © 2018 S. Karger AG, Basel.
NASA Astrophysics Data System (ADS)
Al-Refaie, Ahmed F.; Tennyson, Jonathan
2017-12-01
Construction and diagonalization of the Hamiltonian matrix is the rate-limiting step in most low-energy electron - molecule collision calculations. Tennyson (1996) implemented a novel algorithm for Hamiltonian construction which took advantage of the structure of the wavefunction in such calculations. This algorithm is re-engineered to make use of modern computer architectures and the use of appropriate diagonalizers is considered. Test calculations demonstrate that significant speed-ups can be gained using multiple CPUs. This opens the way to calculations which consider higher collision energies, larger molecules and / or more target states. The methodology, which is implemented as part of the UK molecular R-matrix codes (UKRMol and UKRMol+) can also be used for studies of bound molecular Rydberg states, photoionization and positron-molecule collisions.
CFD Analysis and Design Optimization Using Parallel Computers
NASA Technical Reports Server (NTRS)
Martinelli, Luigi; Alonso, Juan Jose; Jameson, Antony; Reuther, James
1997-01-01
A versatile and efficient multi-block method is presented for the simulation of both steady and unsteady flow, as well as aerodynamic design optimization of complete aircraft configurations. The compressible Euler and Reynolds Averaged Navier-Stokes (RANS) equations are discretized using a high resolution scheme on body-fitted structured meshes. An efficient multigrid implicit scheme is implemented for time-accurate flow calculations. Optimum aerodynamic shape design is achieved at very low cost using an adjoint formulation. The method is implemented on parallel computing systems using the MPI message passing interface standard to ensure portability. The results demonstrate that, by combining highly efficient algorithms with parallel computing, it is possible to perform detailed steady and unsteady analysis as well as automatic design for complex configurations using the present generation of parallel computers.
A practical guide to replica-exchange Wang—Landau simulations
NASA Astrophysics Data System (ADS)
Vogel, Thomas; Li, Ying Wai; Landau, David P.
2018-04-01
This paper is based on a series of tutorial lectures about the replica-exchange Wang-Landau (REWL) method given at the IX Brazilian Meeting on Simulational Physics (BMSP 2017). It provides a practical guide for the implementation of the method. A complete example code for a model system is available online. In this paper, we discuss the main parallel features of this code after a brief introduction to the REWL algorithm. The tutorial section is mainly directed at users who have written a single-walker Wang–Landau program already but might have just taken their first steps in parallel programming using the Message Passing Interface (MPI). In the last section, we answer “frequently asked questions” from users about the implementation of REWL for different scientific problems.
An Approach Using Parallel Architecture to Storage DICOM Images in Distributed File System
NASA Astrophysics Data System (ADS)
Soares, Tiago S.; Prado, Thiago C.; Dantas, M. A. R.; de Macedo, Douglas D. J.; Bauer, Michael A.
2012-02-01
Telemedicine is a very important area in medical field that is expanding daily motivated by many researchers interested in improving medical applications. In Brazil was started in 2005, in the State of Santa Catarina has a developed server called the CyclopsDCMServer, which the purpose to embrace the HDF for the manipulation of medical images (DICOM) using a distributed file system. Since then, many researches were initiated in order to seek better performance. Our approach for this server represents an additional parallel implementation in I/O operations since HDF version 5 has an essential feature for our work which supports parallel I/O, based upon the MPI paradigm. Early experiments using four parallel nodes, provide good performance when compare to the serial HDF implemented in the CyclopsDCMServer.
Climate Projections over Mediterranean Basin under RCP8.5 and RCP4.5 emission scenarios
NASA Astrophysics Data System (ADS)
Ilhan, Asli; Ünal, Yurdanur S.
2017-04-01
Climate Projections over Mediterranean Basin under RCP8.5 and RCP4.5 emission scenarios A. ILHAN ve Y. S. UNAL Istanbul Technical University, Department of Meteorology In the study, 50 km resolution downscaled results of two different Earth System Models (ESM) HadGEM2-ES and MPI-ESM with regional climate model of RegCM are used to estimate present and future climate conditions over Mediterranean Basin. The purpose of this study is to compare the projections of two ESMs under Representative Concentration Pathways 4.5 (RCP4.5) and 8.5 (RCP8.5) over the region of interest seasonally and annually with 50 km resolution. Temperature and precipitation parameters for reference period (1971-2000) and future (2015-2100) are analyzed. The average temperature and total precipitation distributions of each downscaled ESM simulations were compared with observation data (Climate Research Unit-CRU data) to explore the capability of each model for the representation of the current climate. According to reference period values of CRU, HadGEM2-ES and MPI-ESM, it is seen that both models are warmer and wetter than observations and have positive temperature biases only around Caspian Sea and positive precipitation biases over Eastern and Central Europe. The future projections (from 2015 to 2100) of HadGEM2-ES and MPI-ESM-MR simulations under RCP4.5 and RCP8.5 emission scenarios are compared with reference period (from 1971 to 2000) and analyzed for temperature and precipitation parameters. The downscaled HadGEM2-ES forced by RCP8.5 scenario produces higher temperatures than the MPI-ESM-MR. The reasons of this warming can be sensitivity of HadGEM2-ES to greenhouse gases and high radiative forcing (+8.5 W/m2). On the other hand, MPI-ESM produce more precipitation than HadGEM2-ES. In order to analyze regional responses of the climate model chains, five main regions are selected which are Turkey, Central Europe, Western Europe, Eastern Europe and North Africa. The average biases of the HadGEM2-ES+RegCM and MPI-ESM-MR+RegCM model chains are also calculated for temperature and precipitation variables, and future expectations in each region are discussed under RCP4.5 and RCP8.5 scenarios. According to the regional analysis, North Africa is the warmest region for HadGEM2-ES and MPI-ESM-MR, and Central Europe warms up similar to North Africa in MPI-ESM-MR coupled simulations under both RCPs. In addition, Eastern Europe is expected to be the wettest region in both models and in both emission scenarios. On the other hand, the driest conditions are expected over Western Europe for MPI-ESM-MR and over Turkey for HadGEM2-ES under RCPs.
ERIC Educational Resources Information Center
Burhansstipanov, Linda, Comp.; Barry, Kathleen Cooleen, Comp.
This directory provides information on cancer education materials that have been developed specifically for American Indians and Alaska Natives. The goal is to develop and implement culturally appropriate cancer prevention and control programs for Native Americans. The directory includes a matrix of cancer education materials that identifies…
Recruitment and retention of Alaska natives into nursing (RRANN).
DeLapp, Tina; Hautman, Mary Ann; Anderson, Mary Sue
2008-07-01
In recognition of the severe underrepresentation of Alaska Natives in the Alaska RN workforce, the University of Alaska Anchorage School of Nursing implemented Project RRANN (Recruitment and Retention of Alaska Natives into Nursing) to recruit Alaska Natives into a nursing career and to facilitate their success in the nursing programs. Activities that created connections and facilitated student success were implemented. Connection-creating activities included establishing community partnerships, sponsoring a dormitory wing, hosting social and professionally related events, and offering stipends. Success facilitation activities included intensive academic advising, tutoring, and mentoring. The effectiveness of Project RRANN is evident in the 66 Alaska Native/American Indian students admitted to the clinical major since 1998, when Project RRANN was initiated; of those, 70% have completed the major and become licensed, and 23% continue to pursue program completion.
78 FR 9793 - Airworthiness Directives; Bell Helicopter Textron Helicopters
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-12
...-numbered main rotor hub inboard strap fittings (fittings). This AD requires magnetic particle inspecting..., data, or views. We also invite comments relating to the economic, environmental, energy, or federalism..., perform a magnetic particle inspection (MPI) of each fitting for a crack. If an MPI was already performed...
Students' Performance on a Symmetry Task
ERIC Educational Resources Information Center
Ho, Siew Yin; Logan, Tracy
2013-01-01
This paper describes Singapore and Australian Grade 6 students' (n=1,187) performance on a symmetry task in a recently developed Mathematics Processing Instrument (MPI). The MPI comprised tasks sourced from Australia and Singapore's national assessments, NAPLAN and PSLE. Only half of the cohort solved the item successfully. It is possible that…
WinHPC System Programming | High-Performance Computing | NREL
Programming WinHPC System Programming Learn how to build and run an MPI (message passing interface (mpi.h) and library (msmpi.lib) are. To build from the command line, run... Start > Intel Software Development Tools > Intel C++ Compiler Professional... > C++ Build Environment for applications running
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-16
... Leased Workers From Echelon Service Company, Sun Associated Industries, INC., MPI Consultants LLC... International, including on-site leased workers from Echelon Service Company, Sun Associated Industries, Inc... Company, Sun Associated Industries, Inc., MPI Consultants LLC, Alliance Engineering, Inc., Washington...
Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito
2016-11-15
A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Ibrahim, Khaled Z.; Madduri, Kamesh; Williams, Samuel; ...
2013-07-18
The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This paper presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Finally, our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.
NASA Astrophysics Data System (ADS)
Jarecka, D.; Arabas, S.; Fijalkowski, M.; Gaynor, A.
2012-04-01
The language of choice for numerical modelling in geoscience has long been Fortran. A choice of a particular language and coding paradigm comes with different set of tradeoffs such as that between performance, ease of use (and ease of abuse), code clarity, maintainability and reusability, availability of open source compilers, debugging tools, adequate external libraries and parallelisation mechanisms. The availability of trained personnel and the scale and activeness of the developer community is of importance as well. We present a short comparison study aimed at identification and quantification of these tradeoffs for a particular example of an object oriented implementation of a parallel 2D-advection-equation solver in Python/NumPy, C++/Blitz++ and modern Fortran. The main angles of comparison will be complexity of implementation, performance of various compilers or interpreters and characterisation of the "added value" gained by a particular choice of the language. The choice of the numerical problem is dictated by the aim to make the comparison useful and meaningful to geoscientists. Python is chosen as a language that traditionally is associated with ease of use, elegant syntax but limited performance. C++ is chosen for its traditional association with high performance but even higher complexity and syntax obscurity. Fortran is included in the comparison for its widespread use in geoscience often attributed to its performance. We confront the validity of these traditional views. We point out how the usability of a particular language in geoscience depends on the characteristics of the language itself and the availability of pre-existing software libraries (e.g. NumPy, SciPy, PyNGL, PyNIO, MPI4Py for Python and Blitz++, Boost.Units, Boost.MPI for C++). Having in mind the limited complexity of the considered numerical problem, we present a tentative comparison of performance of the three implementations with different open source compilers including CPython and PyPy, Clang++ and GNU g++, and GNU gfortran.
Development of mpi_EPIC model for global agroecosystem modeling
Kang, Shujiang; Wang, Dali; Jeff A. Nichols; ...
2014-12-31
Models that address policy-maker concerns about multi-scale effects of food and bioenergy production systems are computationally demanding. We integrated the message passing interface algorithm into the process-based EPIC model to accelerate computation of ecosystem effects. Simulation performance was further enhanced by applying the Vampir framework. When this enhanced mpi_EPIC model was tested, total execution time for a global 30-year simulation of a switchgrass cropping system was shortened to less than 0.5 hours on a supercomputer. The results illustrate that mpi_EPIC using parallel design can balance simulation workloads and facilitate large-scale, high-resolution analysis of agricultural production systems, management alternatives and environmentalmore » effects.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lim, Hyun; Loiseau, Julien
FleCSI is a compile-time con gurable framework designed to support multi-physics application development. As such, FleCSI provides a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. FleCSI currently supports multi-dimensional mesh topology, geometry, and adjacency information, as well as n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures. FleCSI introduces a functional programming model with control, execution, and data abstractions that are consistent both with MPI and with state-of-the-art, task-based runtimes such as Legion and Charm++. The abstraction layer insulates developersmore » from the underlying runtime, while allowing support for multiple runtime systems including conventional models like asynchronous MPI. The intent is to provide developers with a concrete set of user-friendly programming tools that can be used now, while allowing exibility in choosing runtime implementations and optimization that can be applied to future architectures and runtimes. FleCSI's control and execution models provide formal nomenclature for describing poorly understood concepts such as kernels and tasks. FleCSI's data model provides a low-buy-in approach that makes it an attractive option for many application projects, as developers are not locked into particular layouts or data structure representations. FleCSI currently provides a parallel but not distributed implementation of Binary, Quad and Oct-tree topology. This implementation is base on space lling curves domain decomposition, the Morton order. The current FleCSI version requires the implementation of a driver and a specialization driver. The role of the specialization driver is to provide the data distribution. This feature is not complete in FleCSI code and we provide it. The next step will be to incorporate it directly from FleCSPH to FleCSI as we reach a good level of performance. Then the driver represent the general execution of the resolution without worrying of the data locality and communications. As FleCSI is an On-Development code the structure may change in the future and we keep track of these changes in FleCSPH.« less
Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items
ERIC Educational Resources Information Center
Penfield, Randall D.
2006-01-01
This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…
Optimising the Parallelisation of OpenFOAM Simulations
2014-06-01
UNCLASSIFIED UNCLASSIFIED Optimising the Parallelisation of OpenFOAM Simulations Shannon Keough Maritime Division Defence...Science and Technology Organisation DSTO-TR-2987 ABSTRACT The OpenFOAM computational fluid dynamics toolbox allows parallel computation of...performance of a given high performance computing cluster with several OpenFOAM cases, running using a combination of MPI libraries and corresponding MPI
32 CFR 637.2 - Use of MPI and DAC Detectives/Investigators.
Code of Federal Regulations, 2011 CFR
2011-07-01
... employed in the following investigations: (a) Offenses for which the maximum punishment listed in the Table of Maximum Punishment, Manual for Courts-Martial, United States, 2002 is confinement for 1 year or... MPI. The same punishment criteria apply. (b) Property-related offenses when the value is less than $1...
32 CFR 637.2 - Use of MPI and DAC Detectives/Investigators.
Code of Federal Regulations, 2014 CFR
2014-07-01
... employed in the following investigations: (a) Offenses for which the maximum punishment listed in the Table of Maximum Punishment, Manual for Courts-Martial, United States, 2002 is confinement for 1 year or... MPI. The same punishment criteria apply. (b) Property-related offenses when the value is less than $1...
32 CFR 637.2 - Use of MPI and DAC Detectives/Investigators.
Code of Federal Regulations, 2013 CFR
2013-07-01
... employed in the following investigations: (a) Offenses for which the maximum punishment listed in the Table of Maximum Punishment, Manual for Courts-Martial, United States, 2002 is confinement for 1 year or... MPI. The same punishment criteria apply. (b) Property-related offenses when the value is less than $1...
32 CFR 637.2 - Use of MPI and DAC Detectives/Investigators.
Code of Federal Regulations, 2012 CFR
2012-07-01
... employed in the following investigations: (a) Offenses for which the maximum punishment listed in the Table of Maximum Punishment, Manual for Courts-Martial, United States, 2002 is confinement for 1 year or... MPI. The same punishment criteria apply. (b) Property-related offenses when the value is less than $1...
Tardif, Keith D; Rogers, Aaron; Cassiano, Jared; Roth, Bruce L; Cimbora, Daniel M; McKinnon, Rena; Peterson, Ashley; Douce, Thomas B; Robinson, Rosann; Dorweiler, Irene; Davis, Thaylon; Hess, Mark A; Ostanin, Kirill; Papac, Damon I; Baichwal, Vijay; McAlexander, Ian; Willardsen, J Adam; Saunders, Michael; Christophe, Hoarau; Kumar, D Vijay; Wettstein, Daniel A; Carlson, Robert O; Williams, Brandi L
2011-12-01
Mps1 is a dual specificity protein kinase that is essential for the bipolar attachment of chromosomes to the mitotic spindle and for maintaining the spindle assembly checkpoint until all chromosomes are properly attached. Mps1 is expressed at high levels during mitosis and is abundantly expressed in cancer cells. Disruption of Mps1 function induces aneuploidy and cell death. We report the identification of MPI-0479605, a potent and selective ATP competitive inhibitor of Mps1. Cells treated with MPI-0479605 undergo aberrant mitosis, resulting in aneuploidy and formation of micronuclei. In cells with wild-type p53, this promotes the induction of a postmitotic checkpoint characterized by the ATM- and RAD3-related-dependent activation of the p53-p21 pathway. In both wild-type and p53 mutant cells lines, there is a growth arrest and inhibition of DNA synthesis. Subsequently, cells undergo mitotic catastrophe and/or an apoptotic response. In xenograft models, MPI-0479605 inhibits tumor growth, suggesting that drugs targeting Mps1 may have utility as novel cancer therapeutics.
Dhavalikar, R; Hensley, D; Maldonado-Camargo, L; Croft, L R; Ceron, S; Goodwill, P W; Conolly, S M
2016-01-01
Magnetic Particle Imaging (MPI) is an emerging tomographic imaging technology that detects magnetic nanoparticle tracers by exploiting their non-linear magnetization properties. In order to predict the behavior of nanoparticles in an imager, it is possible to use a non-imaging MPI relaxometer or spectrometer to characterize the behavior of nanoparticles in a controlled setting. In this paper we explore the use of ferrohydrodynamic magnetization equations for predicting the response of particles in an MPI relaxometer. These include a magnetization equation developed by Shliomis (Sh) which has a constant relaxation time and a magnetization equation which uses a field-dependent relaxation time developed by Martsenyuk, Raikher and Shliomis (MRSh). We compare the predictions from these models with measurements and with the predictions based on the Langevin function that assumes instantaneous magnetization response of the nanoparticles. The results show good qualitative and quantitative agreement between the ferrohydrodynamic models and the measurements without the use of fitting parameters and provide further evidence of the potential of ferrohydrodynamic modeling in MPI. PMID:27867219
Shehzad, Danish; Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.
Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models. PMID:27413363
Multipath interference test method for distributed amplifiers
NASA Astrophysics Data System (ADS)
Okada, Takahiro; Aida, Kazuo
2005-12-01
A method for testing distributed amplifiers is presented; the multipath interference (MPI) is detected as a beat spectrum between the multipath signal and the direct signal using a binary frequency shifted keying (FSK) test signal. The lightwave source is composed of a DFB-LD that is directly modulated by a pulse stream passing through an equalizer, and emits the FSK signal of the frequency deviation of about 430MHz at repetition rate of 80-100 kHz. The receiver consists of a photo-diode and an electrical spectrum analyzer (ESA). The base-band power spectrum peak appeared at the frequency of the FSK frequency deviation can be converted to amount of MPI using a calibration chart. The test method has improved the minimum detectable MPI as low as -70 dB, compared to that of -50 dB of the conventional test method. The detailed design and performance of the proposed method are discussed, including the MPI simulator for calibration procedure, computer simulations for evaluating the error caused by the FSK repetition rate and the fiber length under test and experiments on singlemode fibers and distributed Raman amplifier.
Glauber gluons and multiple parton interactions
NASA Astrophysics Data System (ADS)
Gaunt, Jonathan R.
2014-07-01
We show that for hadronic transverse energy E T in hadron-hadron collisions, the classic Collins-Soper-Sterman (CSS) argument for the cancellation of Glauber gluons breaks down at the level of two Glauber gluons exchanged between the spectators. Through an argument that relates the diagrams with these Glauber gluons to events containing additional soft scatterings, we suggest that this failure of the CSS cancellation actually corresponds to a failure of the `standard' factorisation formula with hard, soft and collinear functions to describe E T at leading power. This is because the observable receives a leading power contribution from multiple parton interaction (or spectator-spectator Glauber) processes. We also suggest that the same argument can be used to show that a whole class of observables, which we refer to as MPI sensitive observables, do not obey the standard factorisation at leading power. MPI sensitive observables are observables whose distributions in hadron-hadron collisions are disrupted strongly by the presence of multiple parton interactions (MPI) in the event. Examples of further MPI sensitive observables include the beam thrust B {/a, b +} and transverse thrust.
Sánchez-Ayala, Alfonso; Vilanova, Larissa Soares Reis; Costa, Marina Abrantes; Farias-Neto, Arcelino
2014-01-01
The aim of this study was to evaluate the reproducibility of the condensation silicone Optosil Comfort® as an artificial test food for masticatory performance evaluation. Twenty dentate subjects with mean age of 23.3±0.7 years were selected. Masticatory performance was evaluated using the simple (MPI), the double (IME) and the multiple sieve methods. Trials were carried out five times by three examiners: three times by the first, and once by the second and third examiners. Friedman's test was used to find the differences among time trials. Reproducibility was determined by the intra-class correlation (ICC) test (α=0.05). No differences among time trials were found, except for MPI-4 mm (p=0.022) from the first examiner results. The intra-examiner reproducibility (ICC) of almost all data was high (ICC≥0.92, p<0.001), being moderate only for MPI-0.50 mm (ICC=0.89, p<0.001). The inter-examiner reproducibility was high (ICC>0.93, p<0.001) for all results. For the multiple sieve method, the average mean of absolute difference from repeated measurements were lower than 1 mm. This trend was observed only from MPI-0.50 to MPI-1.4 for the single sieve method, and from IME-0.71/0.50 to IME-1.40/1.00 for the double sieve method. The results suggest that regardless of the method used, the reproducibility of Optosil Comfort® is high.
Chen, Chun; Li, Dianfu; Miao, Changqing; Feng, Jianlin; Zhou, Yanli; Cao, Kejiang; Lloyd, Michael S; Chen, Ji
2012-07-01
The purpose of this study was to evaluate left ventricular (LV) mechanical dyssynchrony in patients with Wolff-Parkinson-White (WPW) syndrome pre- and post-radiofrequency catheter ablation (RFA) using phase analysis of gated single photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI). Forty-five WPW patients were enrolled and had gated SPECT MPI pre- and 2-3 days post-RFA. Electrophysiological study (EPS) was used to locate accessory pathways (APs) and categorize the patients according to the AP locations (septal, left and right free wall). Electrocardiography (ECG) was performed pre- and post-RFA to confirm successful elimination of the APs. Phase analysis of gated SPECT MPI was used to assess LV dyssynchrony pre- and post-RFA. Among the 45 patients, 3 had gating errors, and thus 42 had SPECT phase analysis. Twenty-two patients (52.4%) had baseline LV dyssynchrony. Baseline LV dyssynchrony was more prominent in the patients with septal APs than in the patients with left or right APs (p < 0.05). RFA improved LV synchrony in the entire cohort and in the patients with septal APs (p < 0.01). Phase analysis of gated SPECT MPI demonstrated that LV mechanical dyssynchrony can be present in patients with WPW syndrome. Septal APs result in the greatest degree of LV mechanical dyssynchrony and afford the most benefit after RFA. This study supports further investigation in the relationship between electrical and mechanical activation using EPS and phase analysis of gated SPECT MPI.
The Native Comic Book Project: native youth making comics and healthy decisions.
Montgomery, Michelle; Manuelito, Brenda; Nass, Carrie; Chock, Tami; Buchwald, Dedra
2012-04-01
American Indians and Alaska Natives have traditionally used stories and drawings to positively influence the well-being of their communities. The objective of this study was to describe the development of a curriculum that trains Native youth leaders to plan, write, and design original comic books to enhance healthy decision making. Project staff developed the Native Comic Book Project by adapting Dr. Michael Bitz's Comic Book Project to incorporate Native comic book art, Native storytelling, and decision-making skills. After conducting five train-the-trainer sessions for Native youth, staff were invited by youth participants to implement the full curriculum as a pilot test at one tribal community site in the Pacific Northwest. Implementation was accompanied by surveys and weekly participant observations and was followed by an interactive meeting to assess youth engagement, determine project acceptability, and solicit suggestions for curriculum changes. Six youths aged 12 to 15 (average age = 14) participated in the Native Comic Book Project. Youth participants stated that they liked the project and gained knowledge of the harmful effects of commercial tobacco use but wanted better integration of comic book creation, decision making, and Native storytelling themes. Previous health-related comic book projects did not recruit youth as active producers of content. This curriculum shows promise as a culturally appropriate intervention to help Native youth adopt healthy decision-making skills and healthy behaviors by creating their own comic books.
Korosoglou, G; Hansen, A; Bekeredjian, R; Filusch, A; Hardt, S; Wolf, D; Schellberg, D; Katus, H A; Kuecherer, H
2006-01-01
Objective To evaluate whether myocardial parametric imaging (MPI) is superior to visual assessment for the evaluation of myocardial viability. Methods and results Myocardial contrast echocardiography (MCE) was assessed in 11 pigs before, during, and after left anterior descending coronary artery occlusion and in 32 patients with ischaemic heart disease by using intravenous SonoVue administration. In experimental studies perfusion defect area assessment by MPI was compared with visually guided perfusion defect planimetry. Histological assessment of necrotic tissue was the standard reference. In clinical studies viability was assessed on a segmental level by (1) visual analysis of myocardial opacification; (2) quantitative estimation of myocardial blood flow in regions of interest; and (3) MPI. Functional recovery between three and six months after revascularisation was the standard reference. In experimental studies, compared with visually guided perfusion defect planimetry, planimetric assessment of infarct size by MPI correlated more significantly with histology (r2 = 0.92 versus r2 = 0.56) and had a lower intraobserver variability (4% v 15%, p < 0.05). In clinical studies, MPI had higher specificity (66% v 43%, p < 0.05) than visual MCE and good accuracy (81%) for viability detection. It was less time consuming (3.4 (1.6) v 9.2 (2.4) minutes per image, p < 0.05) than quantitative blood flow estimation by regions of interest and increased the agreement between observers interpreting myocardial perfusion (κ = 0.87 v κ = 0.75, p < 0.05). Conclusion MPI is useful for the evaluation of myocardial viability both in animals and in patients. It is less time consuming than quantification analysis by regions of interest and less observer dependent than visual analysis. Thus, strategies incorporating this technique may be valuable for the evaluation of myocardial viability in clinical routine. PMID:15939722
Mineccia, Michela; Zimmitti, Giuseppe; Ribero, Dario; Giraldi, Francesco; Bertolino, Franco; Brambilla, Romeo; Ferrero, Alessandro
2016-01-01
fecal peritonitis due to colorectal perforation is a dramatic event characterized by high mortality. Our study aims at determining how results of sigmoid resection (eventually extended to upper rectum) for colorectal perforation with fecal peritonitis changed in recent years and which factors affected eventual changes. Seventy-four patients were operated on at our institution (2005-2014) for colorectal perforation with fecal peritonitis and were divided into two numerically equal groups (operated on before (ERA1-group) and after (ERA2-group) May 2010). Mannheim Peritonitis Index (MPI) was calculated for each patient. Characteristics of two groups were compared. Predictors of postoperative outcomes were identified. Postoperative overall complications, major complications, and mortality occurred in 59%, 28%, and 18% of cases, respectively, and were less frequent in ERA2-group (51%, 16%, and 8%, respectively), compared to ERA1-group (68%, 41%, and 27%, respectively; p = .155, .02, and .032, respectively). Such results paralleled lower MPI values in ERA2-group, compared to ERA1-group (23(16-39) vs. 28(21-43), p = .006). Using receiver operating characteristic analysis, the best cut-off value for MPI for predicting postoperative complications and mortality was 28.5. MPI>28 was the only independent predictor of postoperative overall (p = .009, OR = 4.491) and major complications (p < .001, OR = 23.182) and was independently associated with a higher risk of mortality (p = .016, OR = 13.444), as well as duration of preoperative peritonitis longer than 24 h (p = .045, OR = 17.099). results of surgery for colorectal perforation with fecal peritonitis have improved over time, matching a concurrent decrease of MPI values and a better preoperative patient management. MPI value may help in selecting patients benefitting from surgical treatment. Copyright © 2015 IJS Publishing Group Limited. Published by Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.
2017-10-20
Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems
Expósito, Roberto R.
2018-01-01
Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/. PMID:29608567
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems.
González-Domínguez, Jorge; Expósito, Roberto R
2018-01-01
Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.
NASA Astrophysics Data System (ADS)
Kar, Somnath; Choudhury, Subikash; Muhuri, Sanjib; Ghosh, Premomoy
2017-01-01
Satisfactory description of data by hydrodynamics-motivated models, as has been reported recently by experimental collaborations at the LHC, confirm "collectivity" in high-multiplicity proton-proton (p p ) collisions. Notwithstanding this, a detailed study of high-multiplicity p p data in other approaches or models is essential for better understanding of the specific phenomenon. In this study, the focus is on a pQCD-inspired multiparton interaction (MPI) model, including a color reconnection (CR) scheme as implemented in the Monte Carlo code, PYTHIA8 tune 4C. The MPI with the color reconnection reproduces the dependence of the mean transverse momentum ⟨pT⟩ on the charged particle multiplicity Nch in p p collisions at the LHC, providing an alternate explanation to the signature of "hydrodynamic collectivity" in p p data. It is, therefore, worth exploring how this model responds to other related features of high-multiplicity p p events. This comparative study with recent experimental results demonstrates the limitations of the model in explaining some of the prominent features of the final-state charged particles up to the intermediate-pT (pT<2.0 GeV /c ) range in high-multiplicity p p events.
Ben-Haim, Simona; Kacperski, Krzysztof; Hain, Sharon; Van Gramberg, Dean; Hutton, Brian F; Erlandsson, Kjell; Sharir, Tali; Roth, Nathaniel; Waddington, Wendy A; Berman, Daniel S; Ell, Peter J
2010-08-01
We compared simultaneous dual-radionuclide (DR) stress and rest myocardial perfusion imaging (MPI) with a novel solid-state cardiac camera and a conventional SPECT camera with separate stress and rest acquisitions. Of 27 consecutive patients recruited, 24 (64.5+/-11.8 years of age, 16 men) were injected with 74 MBq of (201)Tl (rest) and 250 MBq (99m)Tc-MIBI (stress). Conventional MPI acquisition times for stress and rest are 21 min and 16 min, respectively. Rest (201)Tl for 6 min and simultaneous DR 15-min list mode gated scans were performed on a D-SPECT cardiac scanner. In 11 patients DR D-SPECT was performed first and in 13 patients conventional stress (99m)Tc-MIBI SPECT imaging was performed followed by DR D-SPECT. The DR D-SPECT data were processed using a spill-over and scatter correction method. DR D-SPECT images were compared with rest (201)Tl D-SPECT and with conventional SPECT images by visual analysis employing the 17-segment model and a five-point scale (0 normal, 4 absent) to calculate the summed stress and rest scores. Image quality was assessed on a four-point scale (1 poor, 4 very good) and gut activity was assessed on a four-point scale (0 none, 3 high). Conventional MPI studies were abnormal at stress in 17 patients and at rest in 9 patients. In the 17 abnormal stress studies DR D-SPECT MPI showed 113 abnormal segments and conventional MPI showed 93 abnormal segments. In the nine abnormal rest studies DR D-SPECT showed 45 abnormal segments and conventional MPI showed 48 abnormal segments. The summed stress and rest scores on conventional SPECT and DR D-SPECT were highly correlated (r=0.9790 and 0.9694, respectively). The summed scores of rest (201)Tl D-SPECT and DR-DSPECT were also highly correlated (r=0.9968, p<0.0001 for all). In six patients stress perfusion defects were significantly larger on stress DR D-SPECT images, and five of these patients were imaged earlier by D-SPECT than by conventional SPECT. Fast and high-quality simultaneous DR MPI is feasible with D-SPECT in a single imaging session with comparable diagnostic performance and image quality to conventional SPECT and to a separate rest (201)Tl D-SPECT acquisition.
Lyngholm, Ann Marie; Pedersen, Begitte H; Petersen, Lars J
2008-09-01
Intestinal activity at the inferior myocardial wall represents an issue for assessment of myocardial perfusion imaging (MPI) with 99mTc-labelled tracers. The aim of this study was to investigate the effect of time and food on upper abdominal activity in 99mTc-tetrofosmin MPI. The study population consisted of 152 consecutive patients referred for routine MPI. All patients underwent 2-day stress-rest 99mTc-tetrofosmin single-photon emission computed tomography MPI. Before stress testing, patients were randomized in a factorial design to four different regimens. Group A: early scan (image acquisition initiated within 15 min after injection of the tracer) and no food; group B: early scan and food (two pieces of white bread with butter and a minimum of 450 ml of water); group C: late scan (image acquisition 30-60 min after injection of the tracer) and no food; and group D: late and scan with food. Patients underwent standard bicycle exercise or pharmacological stress test. The degree of upper abdominal activity was evaluated by trained observers blinded to the randomization code. The primary endpoint was the proportion of accepted scans in the intention-to-treat population in stress MPI. The results showed statistical significant impact on both time and food on upper abdominal activity. The primary endpoint showed that the acceptance rate improved from 55% in group A to 100% success rate in group D. An early scan reduced the acceptance rate by 30% versus a late scan [hazard ratio 0.70, 95% confidence interval 0.58-0.84; P<0.0001], whereas the addition of food improved the success rate versus no food by 27% (hazard ratio 1.27, 95% confidence interval 1.07-1.51; P=0.006). No significant interaction between food and time was observed. An analysis of accepted scans according to the actual scan time and food consumption confirmed the findings of the intention-to-treat analysis. In addition, similar findings were seen in 116 of 152 patients with a rest MPI (success rate of 53% in group A vs. 96% in group D). A combination of solid food and water administered after injection of the tracer and delayed image acquisition led to significant and clinically relevant decrease of interfering upper abdominal activity in 99mTc-tetrofosmin MPI.
Assimilating soil moisture into an Earth System Model
NASA Astrophysics Data System (ADS)
Stacke, Tobias; Hagemann, Stefan
2017-04-01
Several modelling studies reported potential impacts of soil moisture anomalies on regional climate. In particular for short prediction periods, perturbations of the soil moisture state may result in significant alteration of surface temperature in the following season. However, it is not clear yet whether or not soil moisture anomalies affect climate also on larger temporal and spatial scales. In an earlier study, we showed that soil moisture anomalies can persist for several seasons in the deeper soil layers of a land surface model. Additionally, those anomalies can influence root zone moisture, in particular during explicitly dry or wet periods. Thus, one prerequisite for predictability, namely the existence of long term memory, is evident for simulated soil moisture and might be exploited to improve climate predictions. The second prerequisite is the sensitivity of the climate system to soil moisture. In order to investigate this sensitivity for decadal simulations, we implemented a soil moisture assimilation scheme into the Max-Planck Institute for Meteorology's Earth System Model (MPI-ESM). The assimilation scheme is based on a simple nudging algorithm and updates the surface soil moisture state once per day. In our experiments, the MPI-ESM is used which includes model components for the interactive simulation of atmosphere, land and ocean. Artificial assimilation data is created from a control simulation to nudge the MPI-ESM towards predominantly dry and wet states. First analyses are focused on the impact of the assimilation on land surface variables and reveal distinct differences in the long-term mean values between wet and dry state simulations. Precipitation, evapotranspiration and runoff are larger in the wet state compared to the dry state, resulting in an increased moisture transport from the land to atmosphere and ocean. Consequently, surface temperatures are lower in the wet state simulations by more than one Kelvin. In terms of spatial pattern, the largest differences between both simulations are seen for continental areas, while regions with a maritime climate are least sensitive to soil moisture assimilation.
Implementation of a cardiac PET stress program: comparison of outcomes to the preceding SPECT era.
Knight, Stacey; Min, David B; Le, Viet T; Meredith, Kent G; Dhar, Ritesh; Biswas, Santanu; Jensen, Kurt R; Mason, Steven M; Ethington, Jon-David; Lappe, Donald L; Muhlestein, Joseph B; Anderson, Jeffrey L; Knowlton, Kirk U
2018-05-03
Cardiac positron emission testing (PET) is more accurate than single photon emission computed tomography (SPECT) at identifying coronary artery disease (CAD); however, the 2 modalities have not been thoroughly compared in a real-world setting. We conducted a retrospective analysis of 60-day catheterization outcomes and 1-year major adverse cardiovascular events (MACE) after the transition from a SPECT- to a PET-based myocardial perfusion imaging (MPI) program. MPI patients at Intermountain Medical Center from January 2011-December 2012 (the SPECT era, n = 6,777) and January 2014-December 2015 (the PET era, n = 7,817) were studied. Outcomes studied were 60-day coronary angiography, high-grade obstructive CAD, left main/severe 3-vessel disease, revascularization, and 1-year MACE-revascularization (MACE-revasc; death, myocardial infarction [MI], or revascularization >60 days). Patients were 64 ± 13 years old; 54% were male and 90% were of European descent; and 57% represented a screening population (no prior MI, revascularization, or CAD). During the PET era, compared with the SPECT era, a higher percentage of patients underwent coronary angiography (13.2% vs. 9.7%, P < 0.0001), had high-grade obstructive CAD (10.5% vs. 6.9%, P < 0.0001), had left main or severe 3-vessel disease (3.0% vs. 2.3%, P = 0.012), and had coronary revascularization (56.7% vs. 47.1%, P = 0.0001). Similar catheterization outcomes were seen when restricted to the screening population. There was no difference in 1-year MACE-revasc (PET [5.8%] vs. SPECT [5.3%], P = 0.31). The PET-based MPI program resulted in improved identification of patients with high-grade obstructive CAD, as well as a larger percentage of revascularization, thus resulting in fewer patients undergoing coronary angiography without revascularization. This observational study was funded using internal departmental funds.
Impact of physical permafrost processes on hydrological change
NASA Astrophysics Data System (ADS)
Hagemann, Stefan; Blome, Tanja; Beer, Christian; Ekici, Altug
2015-04-01
Permafrost or perennially frozen ground is an important part of the terrestrial cryosphere; roughly one quarter of Earth's land surface is underlain by permafrost. As it is a thermal phenomenon, its characteristics are highly dependent on climatic factors. The impact of the currently observed warming, which is projected to persist during the coming decades due to anthropogenic CO2 input, certainly has effects for the vast permafrost areas of the high northern latitudes. The quantification of these effects, however, is scientifically still an open question. This is partly due to the complexity of the system, where several feedbacks are interacting between land and atmosphere, sometimes counterbalancing each other. Moreover, until recently, many global circulation models (GCMs) and Earth system models (ESMs) lacked the sufficient representation of permafrost physics in their land surface schemes. Within the European Union FP7 project PAGE21, the land surface scheme JSBACH of the Max-Planck-Institute for Meteorology ESM (MPI-ESM) has been equipped with the representation of relevant physical processes for permafrost studies. These processes include the effects of freezing and thawing of soil water for both energy and water cycles, thermal properties depending on soil water and ice contents, and soil moisture movement being influenced by the presence of soil ice. In the present study, it will be analysed how these permafrost relevant processes impact projected hydrological changes over northern hemisphere high latitude land areas. For this analysis, the atmosphere-land part of MPI-ESM, ECHAM6-JSBACH, is driven by prescribed SST and sea ice in an AMIP2-type setup with and without the newly implemented permafrost processes. Observed SST and sea ice for 1979-1999 are used to consider induced changes in the simulated hydrological cycle. In addition, simulated SST and sea ice are taken from a MPI-ESM simulation conducted for CMIP5 following the RCP8.5 scenario. The corresponding simulations with ECHAM6-JSBACH are used to assess differences in projected hydrological changes induced by the permafrost relevant processes.
Performance Analysis of and Tool Support for Transactional Memory on BG/Q
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schindewolf, M
2011-12-08
Martin Schindewolf worked during his internship at the Lawrence Livermore National Laboratory (LLNL) under the guidance of Martin Schulz at the Computer Science Group of the Center for Applied Scientific Computing. We studied the performance of the TM subsystem of BG/Q as well as researched the possibilities for tool support for TM. To study the performance, we run CLOMP-TM. CLOMP-TM is a benchmark designed for the purpose to quantify the overhead of OpenMP and compare different synchronization primitives. To advance CLOMP-TM, we added Message Passing Interface (MPI) routines for a hybrid parallelization. This enables to run multiple MPI tasks, eachmore » running OpenMP, on one node. With these enhancements, a beneficial MPI task to OpenMP thread ratio is determined. Further, the synchronization primitives are ranked as a function of the application characteristics. To demonstrate the usefulness of these results, we investigate a real Monte Carlo simulation called Monte Carlo Benchmark (MCB). Applying the lessons learned yields the best task to thread ratio. Further, we were able to tune the synchronization by transactifying the MCB. Further, we develop tools that capture the performance of the TM run time system and present it to the application's developer. The performance of the TM run time system relies on the built-in statistics. These tools use the Blue Gene Performance Monitoring (BGPM) interface to correlate the statistics from the TM run time system with performance counter values. This combination provides detailed insights in the run time behavior of the application and enables to track down the cause of degraded performance. Further, one tool has been implemented that separates the performance counters in three categories: Successful Speculation, Unsuccessful Speculation and No Speculation. All of the tools are crafted around IBM's xlc compiler for C and C++ and have been run and tested on a Q32 early access system.« less
SPITFIRE within the MPI Earth system model: Model development and evaluation
NASA Astrophysics Data System (ADS)
Lasslop, Gitta; Thonicke, Kirsten; Kloster, Silvia
2014-09-01
Quantification of the role of fire within the Earth system requires an adequate representation of fire as a climate-controlled process within an Earth system model. To be able to address questions on the interaction between fire and the Earth system, we implemented the mechanistic fire model SPITFIRE, in JSBACH, the land surface model of the MPI Earth system model. Here, we document the model implementation as well as model modifications. We evaluate our model results by comparing the simulation to the GFED version 3 satellite-based data set. In addition, we assess the sensitivity of the model to the meteorological forcing and to the spatial variability of a number of fire relevant model parameters. A first comparison of model results with burned area observations showed a strong correlation of the residuals with wind speed. Further analysis revealed that the response of the fire spread to wind speed was too strong for the application on global scale. Therefore, we developed an improved parametrization to account for this effect. The evaluation of the improved model shows that the model is able to capture the global gradients and the seasonality of burned area. Some areas of model-data mismatch can be explained by differences in vegetation cover compared to observations. We achieve benchmarking scores comparable to other state-of-the-art fire models. The global total burned area is sensitive to the meteorological forcing. Adjustment of parameters leads to similar model results for both forcing data sets with respect to spatial and seasonal patterns. This article was corrected on 29 SEP 2014. See the end of the full text for details.
The influence of hunger on meal to pellet intervals in barred owls
Duke, G.E.; Fuller, M.R.; Huberty, B.J.
1980-01-01
1. Barred owls fed at a sub-maintenance (SM) level had significantly (P < 0.01) longer meal to pellet intervals (MPI)/g eaten/kg body weight (BW) than those fed at an above maintenance (AM) level; MPI/g per kg for owls fed at a maintenance (M) level was intermediate but significantly (P < 0.01) different from both SM and AM.2. During SM feeding, MPI/g per kg gradually increased.3. The proportion of a meal occurring in a pellet was less in “hungry” owls whether losing weight (SM) or gaining (AM) as compared to owls maintaining their normal body weight (M).4. SM fed owls appear to be able to increase digestion time as well as thoroughness of digestion.
Contaminant studies in the Sierra Nevadas
Sparling, Don; Fellers, Gary M.
2002-01-01
1. 1. Barred owls fed at a sub-maintenance (SM) level had significantly (P < 0.01) longer meal to pellet intervals (MPI)/g eaten/kg body weight (BW) than those fed at an above maintenance (AM) level; MPI/g per kg for owls fed at a maintenance (M) level was intermediate but significantly (P < 0.01) different from both SM and AM. 2. 2. During SM feeding, MPI/g per kg gradually increased. 3. 3. The proportion of a meal occurring in a pellet was less in ?hungry? owls whether losing weight (SM) or gaining (AM) as compared to owls maintaining their normal body weight (M). 4. 4. SM fed owls appear to be able to increase digestion time as well as thoroughness of digestion.
A Posteriori Error Bounds for the Empirical Interpolation Method
2010-03-18
paramètres (x̄1, x̄2) ≡ µ ∈ DII ≡ [0.4, 0.6]2 et α = 0.1 fixé, les résultats sont similaires au cas d’un seul paramètre (Fig. 2). 1. Introduction...and denote the set of all distinct multi-indices β of dimension P of length I by MPI . The cardinality of MPI is given by card (MPI ) = ( P+I−1 I...operations, and we compute the interpolation errors ‖F (β)(·; τ) − F (β)M (·; τ)‖L∞(Ω), 0 < |β| < p − 1, for all τ ∈ Φ, in O(nΦMN ) ∑p−1 j=0 card (MPj
SKIRT: Hybrid parallelization of radiative transfer simulations
NASA Astrophysics Data System (ADS)
Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.
2017-07-01
We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
Near-Body Grid Adaption for Overset Grids
NASA Technical Reports Server (NTRS)
Buning, Pieter G.; Pulliam, Thomas H.
2016-01-01
A solution adaption capability for curvilinear near-body grids has been implemented in the OVERFLOW overset grid computational fluid dynamics code. The approach follows closely that used for the Cartesian off-body grids, but inserts refined grids in the computational space of original near-body grids. Refined curvilinear grids are generated using parametric cubic interpolation, with one-sided biasing based on curvature and stretching ratio of the original grid. Sensor functions, grid marking, and solution interpolation tasks are implemented in the same fashion as for off-body grids. A goal-oriented procedure, based on largest error first, is included for controlling growth rate and maximum size of the adapted grid system. The adaption process is almost entirely parallelized using MPI, resulting in a capability suitable for viscous, moving body simulations. Two- and three-dimensional examples are presented.
Viscoelastic Finite Difference Modeling Using Graphics Processing Units
NASA Astrophysics Data System (ADS)
Fabien-Ouellet, G.; Gloaguen, E.; Giroux, B.
2014-12-01
Full waveform seismic modeling requires a huge amount of computing power that still challenges today's technology. This limits the applicability of powerful processing approaches in seismic exploration like full-waveform inversion. This paper explores the use of Graphics Processing Units (GPU) to compute a time based finite-difference solution to the viscoelastic wave equation. The aim is to investigate whether the adoption of the GPU technology is susceptible to reduce significantly the computing time of simulations. The code presented herein is based on the freely accessible software of Bohlen (2002) in 2D provided under a General Public License (GNU) licence. This implementation is based on a second order centred differences scheme to approximate time differences and staggered grid schemes with centred difference of order 2, 4, 6, 8, and 12 for spatial derivatives. The code is fully parallel and is written using the Message Passing Interface (MPI), and it thus supports simulations of vast seismic models on a cluster of CPUs. To port the code from Bohlen (2002) on GPUs, the OpenCl framework was chosen for its ability to work on both CPUs and GPUs and its adoption by most of GPU manufacturers. In our implementation, OpenCL works in conjunction with MPI, which allows computations on a cluster of GPU for large-scale model simulations. We tested our code for model sizes between 1002 and 60002 elements. Comparison shows a decrease in computation time of more than two orders of magnitude between the GPU implementation run on a AMD Radeon HD 7950 and the CPU implementation run on a 2.26 GHz Intel Xeon Quad-Core. The speed-up varies depending on the order of the finite difference approximation and generally increases for higher orders. Increasing speed-ups are also obtained for increasing model size, which can be explained by kernel overheads and delays introduced by memory transfers to and from the GPU through the PCI-E bus. Those tests indicate that the GPU memory size and the slow memory transfers are the limiting factors of our GPU implementation. Those results show the benefits of using GPUs instead of CPUs for time based finite-difference seismic simulations. The reductions in computation time and in hardware costs are significant and open the door for new approaches in seismic inversion.
Implementation of polyatomic MCTDHF capability
NASA Astrophysics Data System (ADS)
Haxton, Daniel; Jones, Jeremiah; Rescigno, Thomas; McCurdy, C. William; Ibrahim, Khaled; Williams, Sam; Vecharynski, Eugene; Rouet, Francois-Henry; Li, Xiaoye; Yang, Chao
2015-05-01
The implementation of the Multiconfiguration Time-Dependent Hartree-Fock method for poly- atomic molecules using a cartesian product grid of sinc basis functions will be discussed. The focus will be on two key components of the method: first, the use of a resolution-of-the-identity approximation; sec- ond, the use of established techniques for triple Toeplitz matrix algebra using fast Fourier transform over distributed memory architectures (MPI 3D FFT). The scaling of two-electron matrix element transformations is converted from O(N4) to O(N log N) by including these components. Here N = n3, with n the number of points on a side. We test the prelim- inary implementation by calculating absorption spectra of small hydro- carbons, using approximately 16-512 points on a side. This work is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under the Early Career program, and by the offices of BES and Advanced Scientific Computing Research, under the SciDAC program.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
The Native Comic Book Project: Native Youth Making Comics and Healthy Decisions
Montgomery, Michelle; Manuelito, Brenda; Nass, Carrie; Chock, Tami; Buchwald, Dedra
2015-01-01
Background American Indians and Alaska Natives have traditionally used stories and drawings to positively influence the well-being of their communities. Objectives The objective of this study was to describe the development of a curriculum that trains Native youth leaders to plan, write, and design original comic books to enhance healthy decision making. Methods Project staff developed the Native Comic Book Project by adapting Dr. Michael Bitz’s Comic Book Project to incorporate Native comic book art, Native storytelling, and decision-making skills. After conducting five train-the-trainer sessions for Native youth, staff were invited by youth participants to implement the full curriculum as a pilot test at one tribal community site in the Pacific Northwest. Implementation was accompanied by surveys and weekly participant observations and was followed by an interactive meeting to assess youth engagement, determine project acceptability, and solicit suggestions for curriculum changes. Results Six youths aged 12 to 15 (average age = 14) participated in the Native Comic Book Project. Youth participants stated that they liked the project and gained knowledge of the harmful effects of commercial tobacco use but wanted better integration of comic book creation, decision making, and Native storytelling themes. Conclusion Previous health-related comic book projects did not recruit youth as active producers of content. This curriculum shows promise as a culturally appropriate intervention to help Native youth adopt healthy decision-making skills and healthy behaviors by creating their own comic books. PMID:22259070
Evaluation of the Alaska Native Science & Engineering Program (ANSEP). Research Report
ERIC Educational Resources Information Center
Bernstein, Hamutal; Martin, Carlos; Eyster, Lauren; Anderson, Theresa; Owen, Stephanie; Martin-Caughey, Amanda
2015-01-01
The Urban Institute conducted an implementation and participant-outcomes evaluation of the Alaska Native Science & Engineering Program (ANSEP). ANSEP is a multi-stage initiative designed to prepare and support Alaska Native students from middle school through graduate school to succeed in science, technology, engineering, and math (STEM)…
Murase, Kenya; Konishi, Takashi; Takeuchi, Yuki; Takata, Hiroshige; Saito, Shigeyoshi
2013-07-01
Our purpose in this study was to investigate the behavior of signal harmonics in magnetic particle imaging (MPI) by experimental and simulation studies. In the experimental studies, we made an apparatus for MPI in which both a drive magnetic field (DMF) and a selection magnetic field (SMF) were generated with a Maxwell coil pair. The MPI signals from magnetic nanoparticles (MNPs) were detected with a solenoid coil. The odd- and even-numbered harmonics were calculated by Fourier transformation with or without background subtraction. The particle size of the MNPs was measured by transmission electron microscopy (TEM), dynamic light-scattering, and X-ray diffraction methods. In the simulation studies, the magnetization and particle size distribution of MNPs were assumed to obey the Langevin theory of paramagnetism and a log-normal distribution, respectively. The odd- and even-numbered harmonics were calculated by Fourier transformation under various conditions of DMF and SMF and for three different particle sizes. The behavior of the harmonics largely depended on the size of the MNPs. When we used the particle size obtained from the TEM image, the simulation results were most similar to the experimental results. The similarity between the experimental and simulation results for the even-numbered harmonics was better than that for the odd-numbered harmonics. This was considered to be due to the fact that the odd-numbered harmonics were more sensitive to background subtraction than were the even-numbered harmonics. This study will be useful for a better understanding, optimization, and development of MPI and for designing MNPs appropriate for MPI.
NASA Astrophysics Data System (ADS)
Çatıkkaş, Berna; Aktan, Ebru; Yalçın, Ergin
2016-08-01
This work deals with the optimized molecular structure, vibrational spectra, nonlinear optic (NLO) and frontier molecule orbital (FMO) properties of 1-Methyl-2-phenyl-3-(1,3,4-thiadiazol-2-yldiazenyl)-1H-indole (MPI) by quantum chemical calculations. The Fourier transform infrared (FT-MIR and FT-FIR) and Raman spectra of 1-Methyl-2-phenyl-3-(1,3,4-thiadiazol-2-yldiazenyl)-1H-indole (MPI) were recorded in the region (4000-400 cm-1 and 400-30 cm-1) and (3200-92 cm-1), respectively. The analysis and complete vibrational assignments of the fundamental modes of the MPI molecule were carried out by using the observed FT-IR and FT-Raman data and calculated Total Energy Distribution (TED) according to Scaled Quantum Mechanics procedure. The calculated geometrical parameters of the MPI molecule are in agreement with the obtained values from XRD studies. On the other hand, the difference between the scaled and observed wavenumber values of the most of the fundamentals are very small. 1H NMR and 13C NMR chemical shift values, and energy gap between LUMO-HOMO and molecular electrostatic potential (MEP) were investigated by using density functional theory (B3LYP) methods. UV/Visible spectra and λ maximum absorption values, the oscillator strengths in the chloroform, methanol and DMSO solvation in combination with different basis sets were calculated by using the time-dependent density functional theory (TD-DFT). Additionally, the predicted nonlinear optical (NLO) properties of the MPI are quite greater than that of urea at the B3LYP/6-31++G(d,p) level.
Zucoloto, Miriane Lucindo; Maroco, João; Duarte Bonini Campos, Juliana Alvares
2015-01-01
To evaluate the psychometric properties of the Multidimensional Pain Inventory (MPI) in a Brazilian sample of patients with orofacial pain. A total of 1,925 adult patients, who sought dental care in the School of Dentistry of São Paulo State University's Araraquara campus, were invited to participate; 62.5% (n=1,203) agreed to participate. Of these, 436 presented with orofacial pain and were included. The mean age was 39.9 (SD=13.6) years and 74.5% were female. Confirmatory factor analysis was conducted using χ²/df, comparative fit index, goodness of fit index, and root mean square error of approximation as indices of goodness of fit. Convergent validity was estimated by the average variance extracted and composite reliability, and internal consistency by Cronbach's alpha standardized coefficient (α). The stability of the models was tested in independent samples (test and validation; dental pain and orofacial pain). The factorial invariance was estimated by multigroup analysis (Δχ²). Factorial, convergent validity, and internal consistency were adequate in all three parts of the MPI. To achieve this adequate fit for Part 1, item 15 needed to be deleted (λ=0.13). Discriminant validity was compromised between the factors "activities outside the home" and "social activities" of Part 3 of the MPI in the total sample, validation sample, and in patients with dental pain and with orofacial pain. A strong invariance between different subsamples from the three parts of the MPI was detected. The MPI produced valid, reliable, and stable data for pain assessment among Brazilian patients with orofacial pain.
Zhou, Yanli; Faber, Tracy L.; Patel, Zenic; Folks, Russell D.; Cheung, Alice A.; Garcia, Ernest V.; Soman, Prem; Li, Dianfu; Cao, Kejiang; Chen, Ji
2013-01-01
Objective Left ventricular (LV) function and dyssynchrony parameters measured from serial gated single-photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI) using blinded processing had a poorer repeatability than when manual side-by-side processing was used. The objective of this study was to validate whether an automatic alignment tool can reduce the variability of LV function and dyssynchrony parameters in serial gated SPECT MPI. Methods Thirty patients who had undergone serial gated SPECT MPI were prospectively enrolled in this study. Thirty minutes after the first acquisition, each patient was repositioned and a gated SPECT MPI image was reacquired. The two data sets were first processed blinded from each other by the same technologist in different weeks. These processed data were then realigned by the automatic tool, and manual side-by-side processing was carried out. All processing methods used standard iterative reconstruction and Butterworth filtering. The Emory Cardiac Toolbox was used to measure the LV function and dyssynchrony parameters. Results The automatic tool failed in one patient, who had a large, severe scar in the inferobasal wall. In the remaining 29 patients, the repeatability of the LV function and dyssynchrony parameters after automatic alignment was significantly improved from blinded processing and was comparable to manual side-by-side processing. Conclusion The automatic alignment tool can be an alternative method to manual side-by-side processing to improve the repeatability of LV function and dyssynchrony measurements by serial gated SPECT MPI. PMID:23211996
Furuhashi, Tatsuhiko; Moroi, Masao; Joki, Nobuhiko; Hase, Hiroki; Masai, Hirofumi; Kunimasa, Taeko; Fukuda, Hiroshi; Sugi, Kaoru
2013-02-01
Pretest probability of coronary artery disease (CAD) facilitates diagnosis and risk stratification of CAD. Stress myocardial perfusion imaging (MPI) and chronic kidney disease (CKD) are established major predictors of cardiovascular events. However, the role of CKD to assess pretest probability of CAD has been unclear. This study evaluates the role of CKD to assess the predictive value of cardiovascular events under consideration of pretest probability in patients who underwent stress MPI. Patients with no history of CAD underwent stress MPI (n = 310; male = 166; age = 70; CKD = 111; low/intermediate/high pretest probability = 17/194/99) and were followed for 24 months. Cardiovascular events included cardiac death and nonfatal acute coronary syndrome. Cardiovascular events occurred in 15 of the 310 patients (4.8 %), but not in those with low pretest probability which included 2 CKD patients. In patients with intermediate to high pretest probability (n = 293), multivariate Cox regression analysis identified only CKD [hazard ratio (HR) = 4.88; P = 0.022) and summed stress score of stress MPI (HR = 1.50; P < 0.001) as independent and significant predictors of cardiovascular events. Cardiovascular events were not observed in patients with low pretest probability. In patients with intermediate to high pretest probability, CKD and stress MPI are independent predictors of cardiovascular events considering the pretest probability of CAD in patients with no history of CAD. In assessing pretest probability of CAD, CKD might be an important factor for assessing future cardiovascular prognosis.
Chen, Chun; Miao, Changqing; Feng, Jianlin; Zhou, Yanli; Cao, Kejiang; Lloyd, Michael S.; Chen, Ji
2013-01-01
Purpose The purpose of this study was to evaluate left ventricular (LV) mechanical dyssynchrony in patients with Wolff-Parkinson-White (WPW) syndrome pre- and post-radiofrequency catheter ablation (RFA) using phase analysis of gated single photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI). Methods Forty-five WPW patients were enrolled and had gated SPECT MPI pre- and 2–3 days post-RFA. Electrophysiological study (EPS) was used to locate accessory pathways (APs) and categorize the patients according to the AP locations (septal, left and right free wall). Electrocardiography (ECG) was performed pre- and post-RFA to confirm successful elimination of the APs. Phase analysis of gated SPECT MPI was used to assess LV dyssynchrony pre- and post-RFA. Results Among the 45 patients, 3 had gating errors, and thus 42 had SPECT phase analysis. Twenty-two patients (52.4 %) had baseline LV dyssynchrony. Baseline LV dyssynchrony was more prominent in the patients with septal APs than in the patients with left or right APs (p<0.05). RFA improved LV synchrony in the entire cohort and in the patients with septal APs (p<0.01). Conclusion Phase analysis of gated SPECT MPI demonstrated that LV mechanical dyssynchrony can be present in patients with WPW syndrome. Septal APs result in the greatest degree of LV mechanical dyssynchrony and afford the most benefit after RFA. This study supports further investigation in the relationship between electrical and mechanical activation using EPS and phase analysis of gated SPECT MPI. PMID:22532253
2012-03-01
on the standard Navy Handgun Qualification Course. Results partially supported the hypotheses. The simulation group showed greater improvement in MPI...standard Navy Handgun Qualification Course. Results partially supported the hypotheses. The simulation group showed greater improvement in MPI than the...14 3. Navy Handgun Qualification Course Firing Sequence ..................15 F. PROCEDURES
NASA Technical Reports Server (NTRS)
Hockney, George; Lee, Seungwon
2008-01-01
A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
Meinel, Felix G; Schoepf, U Joseph; Townsend, Jacob C; Flowers, Brian A; Geyer, Lucas L; Ebersberger, Ullrich; Krazinski, Aleksander W; Kunz, Wolfgang G; Thierfelder, Kolja M; Baker, Deborah W; Khan, Ashan M; Fernandes, Valerian L; O'Brien, Terrence X
2018-06-15
We aimed to determine the diagnostic yield and accuracy of coronary CT angiography (CCTA) in patients referred for invasive coronary angiography (ICA) based on clinical concern for coronary artery disease (CAD) and an abnormal nuclear stress myocardial perfusion imaging (MPI) study. We enrolled 100 patients (84 male, mean age 59.6 ± 8.9 years) with an abnormal MPI study and subsequent referral for ICA. Each patient underwent CCTA prior to ICA. We analyzed the prevalence of potentially obstructive CAD (≥50% stenosis) on CCTA and calculated the diagnostic accuracy of ≥50% stenosis on CCTA for the detection of clinically significant CAD on ICA (defined as any ≥70% stenosis or ≥50% left main stenosis). On CCTA, 54 patients had at least one ≥50% stenosis. With ICA, 45 patients demonstrated clinically significant CAD. A positive CCTA had 100% sensitivity and 84% specificity with a 100% negative predictive value and 83% positive predictive value for clinically significant CAD on a per patient basis in MPI positive symptomatic patients. In conclusion, almost half (48%) of patients with suspected CAD and an abnormal MPI study demonstrate no obstructive CAD on CCTA.
Message Passing vs. Shared Address Space on a Cluster of SMPs
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak
2000-01-01
The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Taupitz, Matthias; Ariza de Schellenberger, Angela; Kosch, Olaf; Eberbeck, Dietmar; Wagner, Susanne; Trahms, Lutz; Hamm, Bernd; Schnorr, Jörg
2018-01-01
Synthesis of novel magnetic multicore particles (MCP) in the nano range, involves alkaline precipitation of iron(II) chloride in the presence of atmospheric oxygen. This step yields green rust, which is oxidized to obtain magnetic nanoparticles, which probably consist of a magnetite/maghemite mixed-phase. Final growth and annealing at 90°C in the presence of a large excess of carboxymethyl dextran gives MCP very promising magnetic properties for magnetic particle imaging (MPI), an emerging medical imaging modality, and magnetic resonance imaging (MRI). The magnetic nanoparticles are biocompatible and thus potential candidates for future biomedical applications such as cardiovascular imaging, sentinel lymph node mapping in cancer patients, and stem cell tracking. The new MCP that we introduce here have three times higher magnetic particle spectroscopy performance at lower and middle harmonics and five times higher MPS signal strength at higher harmonics compared with Resovist®. In addition, the new MCP have also an improved in vivo MPI performance compared to Resovist®, and we here report the first in vivo MPI investigation of this new generation of magnetic nanoparticles. PMID:29300729
A communication library for the parallelization of air quality models on structured grids
NASA Astrophysics Data System (ADS)
Miehe, Philipp; Sandu, Adrian; Carmichael, Gregory R.; Tang, Youhua; Dăescu, Dacian
PAQMSG is an MPI-based, Fortran 90 communication library for the parallelization of air quality models (AQMs) on structured grids. It consists of distribution, gathering and repartitioning routines for different domain decompositions implementing a master-worker strategy. The library is architecture and application independent and includes optimization strategies for different architectures. This paper presents the library from a user perspective. Results are shown from the parallelization of STEM-III on Beowulf clusters. The PAQMSG library is available on the web. The communication routines are easy to use, and should allow for an immediate parallelization of existing AQMs. PAQMSG can also be used for constructing new models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moryakov, A. V., E-mail: sailor@orc.ru
2016-12-15
An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers
NASA Technical Reports Server (NTRS)
Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.
2014-01-01
This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing
NASA Astrophysics Data System (ADS)
Decyk, V. K.; Dauger, D. E.
We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
NASA Technical Reports Server (NTRS)
Jones, Terry; Mark, Richard; Martin, Jeanne; May, John; Pierce, Elsie; Stanberry, Linda
1996-01-01
This paper describes an implementation of the proposed MPI-IO (Message Passing Interface - Input/Output) standard for parallel I/O. Our system uses third-party transfer to move data over an external network between the processors where it is used and the I/O devices where it resides. Data travels directly from source to destination, without the need for shuffling it among processors or funneling it through a central node. Our distributed server model lets multiple compute nodes share the burden of coordinating data transfers. The system is built on the High Performance Storage System (HPSS), and a prototype version runs on a Meiko CS-2 parallel computer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perez, R. Navarro; Schunck, N.; Lasseri, R. -D.
Here, we describe the new version 3.00 of the code hfbtho that solves the nuclear Hartree–Fock (HF) or Hartree–Fock–Bogolyubov (HFB) problem by using the cylindrical transformed deformed harmonic oscillator basis. In the new version, we have implemented the following features: (i) the full Gogny force in both particle–hole and particle–particle channels, (ii) the calculation of the nuclear collective inertia at the perturbative cranking approximation, (iii) the calculation of fission fragment charge, mass and deformations based on the determination of the neck, (iv) the regularization of zero-range pairing forces, (v) the calculation of localization functions, (vi) a MPI interface for large-scalemore » mass table calculations.« less
NASA Astrophysics Data System (ADS)
Petibon, Yoann; Guehl, Nicolas J.; Reese, Timothy G.; Ebrahimi, Behzad; Normandin, Marc D.; Shoup, Timothy M.; Alpert, Nathaniel M.; El Fakhri, Georges; Ouyang, Jinsong
2017-01-01
PET is an established modality for myocardial perfusion imaging (MPI) which enables quantification of absolute myocardial blood flow (MBF) using dynamic imaging and kinetic modeling. However, heart motion and partial volume effects (PVE) significantly limit the spatial resolution and quantitative accuracy of PET MPI. Simultaneous PET-MR offers a solution to the motion problem in PET by enabling MR-based motion correction of PET data. The aim of this study was to develop a motion and PVE correction methodology for PET MPI using simultaneous PET-MR, and to assess its impact on both static and dynamic PET MPI using 18F-Flurpiridaz, a novel 18F-labeled perfusion tracer. Two dynamic 18F-Flurpiridaz MPI scans were performed on healthy pigs using a PET-MR scanner. Cardiac motion was tracked using a dedicated tagged-MRI (tMR) sequence. Motion fields were estimated using non-rigid registration of tMR images and used to calculate motion-dependent attenuation maps. Motion correction of PET data was achieved by incorporating tMR-based motion fields and motion-dependent attenuation coefficients into image reconstruction. Dynamic and static PET datasets were created for each scan. Each dataset was reconstructed as (i) Ungated, (ii) Gated (end-diastolic phase), and (iii) Motion-Corrected (MoCo), each without and with point spread function (PSF) modeling for PVE correction. Myocardium-to-blood concentration ratios (MBR) and apparent wall thickness were calculated to assess image quality for static MPI. For dynamic MPI, segment- and voxel-wise MBF values were estimated by non-linear fitting of a 2-tissue compartment model to tissue time-activity-curves. MoCo and Gating respectively decreased mean apparent wall thickness by 15.1% and 14.4% and increased MBR by 20.3% and 13.6% compared to Ungated images (P < 0.01). Combined motion and PSF correction (MoCo-PSF) yielded 30.9% (15.7%) lower wall thickness and 82.2% (20.5%) higher MBR compared to Ungated data reconstructed without (with) PSF modeling (P < 0.01). For dynamic PET, mean MBF across all segments were comparable for MoCo (0.72 ± 0.21 ml/min/ml) and Gating (0.69 ± 0.18 ml/min/ml). Ungated data yielded significantly lower mean MBF (0.59 ± 0.16 ml/min/ml). Mean MBF for MoCo-PSF was 0.80 ± 0.22 ml/min/ml, which was 37.9% (25.0%) higher than that obtained from Ungated data without (with) PSF correction (P < 0.01). The developed methodology holds promise to improve the image quality and sensitivity of PET MPI studies performed using PET-MR.
ERIC Educational Resources Information Center
Bernstein, Hamutal; Martin, Carlos; Eyster, Lauren; Anderson, Theresa; Owen, Stephanie; Martin-Caughey, Amanda
2015-01-01
The Urban Institute conducted an implementation and participant-outcomes evaluation of the Alaska Native Science & Engineering Program (ANSEP). ANSEP is a multi-stage initiative designed to prepare and support Alaska Native students from middle school through graduate school to succeed in science, technology, engineering, and math (STEM)…
Myocardial perfusion imaging in patients with a recent, normal exercise test.
Bovin, Ann; Klausen, Ib C; Petersen, Lars J
2013-03-26
To investigate the added value of myocardial perfusion scintigraphy imaging (MPI) in consecutive patients with suspected coronary artery disease (CAD) and a recent, normal exercise electrocardiography (ECG). This study was a retrospective analysis of consecutive patients referred for MPI during a 2-year period from 2006-2007 at one clinic. All eligible patients were suspected of suffering from CAD, and had performed a satisfactory bicycle exercise test (i.e., peak heart rate > 85% of the expected, age-predicted maximum) within 6 mo of referral, their exercise ECG was had no signs of ischemia, there was no exercise-limiting angina, and no cardiac events occurred between the exercise test and referral. The patients subsequently underwent a standard 2-d, stress-rest exercise MPI. Ischemia was defined based on visual scoring supported by quantitative segmental analysis (i.e., sum of stress score > 3). The results of cardiac catheterization were analyzed, and clinical follow up was performed by review of electronic medical files. A total of 56 patients fulfilled the eligibility criteria. Most patients had a low or intermediate ATPIII pre-test risk of CAD (6 patients had a high pre-test risk). The referral exercise test showed a mean Duke score of 5 (range: 2 to 11), which translated to a low post-exercise risk in 66% and intermediate risk in 34%. A total of seven patients were reported with ischemia by MPI. Three of these patients had high ATPIII pre-test risk scores. Six of these seven patients underwent cardiac catheterization, which showed significant stenosis in one patient with a high pre-test risk of CAD, and indeterminate lesions in three patients (two of whom had high pre-test risk scores). With MPI as a gate keeper for catheterization, no significant, epicardial stenosis was observed in any of the 50 patients (0%, 95% confidence interval 0.0 to 7.1) with low to intermediate pre-test risk of CAD and a negative exercise test. No cardiac events occurred in any patients within a median follow up period of > 1200 d. The added diagnostic value of MPI in patients with low or intermediate risk of CAD and a recent, normal exercise test is marginal.
ERIC Educational Resources Information Center
Beaulieu, David
2008-01-01
This article traces the history of policy development in Native American education from the second term of President William J. Clinton and his signing of Executive Order 13096 of August 6, 1998 on American Indian/Alaska Native education, through the passage and implementation of the No Child Left Behind (NCLB) Act and initial consideration of its…
Accelerating atomistic calculations of quantum energy eigenstates on graphic cards
NASA Astrophysics Data System (ADS)
Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.
2014-10-01
Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
A parallel time integrator for noisy nonlinear oscillatory systems
NASA Astrophysics Data System (ADS)
Subber, Waad; Sarkar, Abhijit
2018-06-01
In this paper, we adapt a parallel time integration scheme to track the trajectories of noisy non-linear dynamical systems. Specifically, we formulate a parallel algorithm to generate the sample path of nonlinear oscillator defined by stochastic differential equations (SDEs) using the so-called parareal method for ordinary differential equations (ODEs). The presence of Wiener process in SDEs causes difficulties in the direct application of any numerical integration techniques of ODEs including the parareal algorithm. The parallel implementation of the algorithm involves two SDEs solvers, namely a fine-level scheme to integrate the system in parallel and a coarse-level scheme to generate and correct the required initial conditions to start the fine-level integrators. For the numerical illustration, a randomly excited Duffing oscillator is investigated in order to study the performance of the stochastic parallel algorithm with respect to a range of system parameters. The distributed implementation of the algorithm exploits Massage Passing Interface (MPI).
Unstructured Adaptive Grid Computations on an Array of SMPs
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Pramanick, Ira; Sohn, Andrew; Simon, Horst D.
1996-01-01
Dynamic load balancing is necessary for parallel adaptive methods to solve unsteady CFD problems on unstructured grids. We have presented such a dynamic load balancing framework called JOVE, in this paper. Results on a four-POWERnode POWER CHALLENGEarray demonstrated that load balancing gives significant performance improvements over no load balancing for such adaptive computations. The parallel speedup of JOVE, implemented using MPI on the POWER CHALLENCEarray, was significant, being as high as 31 for 32 processors. An implementation of JOVE that exploits 'an array of SMPS' architecture was also studied; this hybrid JOVE outperformed flat JOVE by up to 28% on the meshes and adaption models tested. With large, realistic meshes and actual flow-solver and adaption phases incorporated into JOVE, hybrid JOVE can be expected to yield significant advantage over flat JOVE, especially as the number of processors is increased, thus demonstrating the scalability of an array of SMPs architecture.
ImgLib2--generic image processing in Java.
Pietzsch, Tobias; Preibisch, Stephan; Tomancák, Pavel; Saalfeld, Stephan
2012-11-15
ImgLib2 is an open-source Java library for n-dimensional data representation and manipulation with focus on image processing. It aims at minimizing code duplication by cleanly separating pixel-algebra, data access and data representation in memory. Algorithms can be implemented for classes of pixel types and generic access patterns by which they become independent of the specific dimensionality, pixel type and data representation. ImgLib2 illustrates that an elegant high-level programming interface can be achieved without sacrificing performance. It provides efficient implementations of common data types, storage layouts and algorithms. It is the data model underlying ImageJ2, the KNIME Image Processing toolbox and an increasing number of Fiji-Plugins. ImgLib2 is licensed under BSD. Documentation and source code are available at http://imglib2.net and in a public repository at https://github.com/imagej/imglib. Supplementary data are available at Bioinformatics Online. saalfeld@mpi-cbg.de
A Comparison of Three Programming Models for Adaptive Applications
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)
2000-01-01
We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hepburn, I.; De Schutter, E., E-mail: erik@oist.jp; Theoretical Neurobiology & Neuroengineering, University of Antwerp, Antwerp 2610
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realisticmore » biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.« less
NASA Astrophysics Data System (ADS)
Bonavita, M.; Torrisi, L.
2005-03-01
A new data assimilation system has been designed and implemented at the National Center for Aeronautic Meteorology and Climatology of the Italian Air Force (CNMCA) in order to improve its operational numerical weather prediction capabilities and provide more accurate guidance to operational forecasters. The system, which is undergoing testing before operational use, is based on an “observation space” version of the 3D-VAR method for the objective analysis component, and on the High Resolution Regional Model (HRM) of the Deutscher Wetterdienst (DWD) for the prognostic component. Notable features of the system include a completely parallel (MPI+OMP) implementation of the solution of analysis equations by a preconditioned conjugate gradient descent method; correlation functions in spherical geometry with thermal wind constraint between mass and wind field; derivation of the objective analysis parameters from a statistical analysis of the innovation increments.
Strong scaling of general-purpose molecular dynamics simulations on GPUs
NASA Astrophysics Data System (ADS)
Glaser, Jens; Nguyen, Trung Dac; Anderson, Joshua A.; Lui, Pak; Spiga, Filippo; Millan, Jaime A.; Morse, David C.; Glotzer, Sharon C.
2015-07-01
We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, 2013). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5 ×.
Li, J; Guo, L-X; Zeng, H; Han, X-B
2009-06-01
A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
NASA Astrophysics Data System (ADS)
Kracher, Daniela; Manzini, Elisa; Reick, Christian H.; Schultz, Martin; Stein, Olaf
2014-05-01
Climate change is driven by an increasing release of anthropogenic greenhouse gases (GHGs) such as carbon dioxide and nitrous oxide (N2O). Besides fossil fuel burning, also land use change and land management are anthropogenic sources of GHGs. Especially inputs of reactive nitrogen via fertilizer and deposition lead to enhanced emissions of N2O. One effect of a drastic future increase in surface temperature is a modification of atmospheric circulation, e.g. an accelerated Brewer Dobson circulation affecting the exchange between troposphere and stratosphere. N2O is inert in the troposphere and decayed only in the stratosphere. Thus, changes in atmospheric circulation, especially changes in the exchange between troposphere and stratosphere, will affect the atmospheric transport, decay, and distribution of N2O. In our study we assess the impact of global warming on atmospheric circulation and implied effects on the distribution and lifetime of atmospheric N2O. As terrestrial N2O emissions are highly determined by inputs of reactive nitrogen - the location of which being determined by human choice - we examine in particular the importance of latitudinal source regions of N2O for its global distribution. For this purpose we apply the Max Planck Institute Earth System Model, MPI-ESM. MPI-ESM consists of the atmospheric general circulation model ECHAM, the land surface model JSBACH, and MPIOM/HAMOCC representing ocean circulation and ocean biogeochemistry. Prognostic atmospheric N2O concentrations in MPI-ESM are determined by land N2O emissions, ocean N2O exchange and atmospheric tracer transport. As stratospheric chemistry is not explicitly represented in MPI-ESM, stratospheric decay rates of N2O are prescribed from a MACC MOZART simulation.
Response properties of the refractory auditory nerve fiber.
Miller, C A; Abbas, P J; Robinson, B K
2001-09-01
The refractory characteristics of auditory nerve fibers limit their ability to accurately encode temporal information. Therefore, they are relevant to the design of cochlear prostheses. It is also possible that the refractory property could be exploited by prosthetic devices to improve information transfer, as refractoriness may enhance the nerve's stochastic properties. Furthermore, refractory data are needed for the development of accurate computational models of auditory nerve fibers. We applied a two-pulse forward-masking paradigm to a feline model of the human auditory nerve to assess refractory properties of single fibers. Each fiber was driven to refractoriness by a single (masker) current pulse delivered intracochlearly. Properties of firing efficiency, latency, jitter, spike amplitude, and relative spread (a measure of dynamic range and stochasticity) were examined by exciting fibers with a second (probe) pulse and systematically varying the masker-probe interval (MPI). Responses to monophasic cathodic current pulses were analyzed. We estimated the mean absolute refractory period to be about 330 micros and the mean recovery time constant to be about 410 micros. A significant proportion of fibers (13 of 34) responded to the probe pulse with MPIs as short as 500 micros. Spike amplitude decreased with decreasing MPI, a finding relevant to the development of computational nerve-fiber models, interpretation of gross evoked potentials, and models of more central neural processing. A small mean decrement in spike jitter was noted at small MPI values. Some trends (such as spike latency-vs-MPI) varied across fibers, suggesting that sites of excitation varied across fibers. Relative spread was found to increase with decreasing MPI values, providing direct evidence that stochastic properties of fibers are altered under conditions of refractoriness.
Effect of balloon mitral valvotomy on left ventricular function in rheumatic mitral stenosis.
Rajesh, Gopalan Nair; Sreekumar, Pradeep; Haridasan, Vellani; Sajeev, C G; Bastian, Cicy; Vinayakumar, D; Kadermuneer, P; Mathew, Dolly; George, Biju; Krishnan, M N
Mitral stenosis (MS) is found to produce left ventricular (LV) dysfunction in some studies. We sought to study the left ventricular function in patients with rheumatic MS undergoing balloon mitral valvotomy (BMV). Ours is the first study to analyze effect of BMV on mitral annular plane systolic excursion (MAPSE), and to quantify prevalence of longitudinal left ventricular dysfunction in rheumatic MS. In this prospective cohort study, we included 43 patients with severe rheumatic mitral stenosis undergoing BMV. They were compared to twenty controls whose distribution of age and gender were similar to that of patients. The parameters compared were LV ejection fraction (EF) by modified Simpson's method, mitral annular systolic velocity (MASV), MAPSE, mitral annular early diastolic velocity (E'), and myocardial performance index (MPI). These parameters were reassessed immediately following BMV and after 3 months of procedure. MASV, MAPSE, E', and EF were significantly lower and MPI was higher in mitral stenosis group compared to controls. Impaired longitudinal LV function was present in 77% of study group. MAPSE and EF did not show significant change after BMV while MPI, MASV, and E' improved significantly. MASV and E' showed improvement immediately after BMV, while MPI decreased only at 3 months follow-up. There were significantly lower mitral annular motion parameters including MAPSE in patients with rheumatic mitral stenosis. Those with atrial fibrillation had higher MPI. Immediately after BMV, there was improvement in LV long axis function with a gradual improvement in global LV function. There was no significant change of MAPSE after BMV. Copyright © 2015 Cardiological Society of India. Published by Elsevier B.V. All rights reserved.
Besli, Feyzullah; Basar, Cengiz; Kecebas, Mesut; Turker, Yasin
2015-03-01
This study evaluated the response to electrical cardioversion (EC) and the effect on the myocardial performance index (MPI) in patients with persistent and long-standing persistent atrial fibrillation (AF). We enrolled 103 patients (mean age 69.6 ± 8.9 years, 40.7% males) with a diagnosis of persistent and long-standing persistent AF. EC was applied to all patients after one g of amiodarone administration. Echocardiographic findings before EC were compared in patients with successful versus unsuccessful cardioversions and in patients with maintained sinus rhythm (SR) versus those with AF recurrence at the end of the first month. We also compared echocardiographic data before EC versus at the end of the first month in the same patients with maintained SR. SR was achieved in 72.8% of patients and was continued at the end of the first month in 69.3% of the patients. The MPI value of all patients was found to be 0.73 ± 0.21. The size of the left atrium was determined to be an independent predictor of the maintenance of SR at 1 month. In subgroup analyses, when we compared echocardiographic findings before EC and at the end of the first month in patients with maintained SR, the MPI (0.66 ± 0.14 vs 0.56 ± 0.09, p < 0.001) values were significantly decreased. Our study is the first to show impairment of the MPI, which is an indicator of systolic and diastolic function, in patients with persistent and long-standing persistent AF and improvement of the MPI after successful EC.
Bouyoucef, Salah E; Mercuri, Mathew; Pascual, Thomas N; Allam, Adel H; Vangu, Mboyo; Vitola, João V; Better, Nathan; Karthikeyan, Ganesan; Mahmarian, John J; Rehani, Madan M; Kashyap, Ravi; Dondi, Maurizio; Paez, Diana; Einstein, Andrew J
While nuclear myocardial perfusion imaging (MPI) offers many benefits to patients with known or suspected cardiovascular disease, concerns exist regarding radiation-associated health effects. Little is known regarding MPI practice in Africa. We sought to characterise radiation doses and the use of MPI best practices that could minimise radiation in African nuclear cardiology laboratories, and compare these to practice worldwide. Demographics and clinical characteristics were collected for a consecutive sample of 348 patients from 12 laboratories in six African countries over a one-week period from March to April 2013. Radiation effective dose (ED) was estimated for each patient. A quality index (QI) enumerating adherence to eight best practices, identified a priori by an IAEA expert panel, was calculated for each laboratory. We compared these metrics with those from 7 563 patients from 296 laboratories outside Africa. Median (interquartile range) patient ED in Africa was similar to that of the rest of the world [9.1 (5.1-15.6) vs 10.3 mSv (6.8-12.6), p = 0.14], although a larger proportion of African patients received a low ED, ≤ 9 mSv targeted in societal recommendations (49.7 vs 38.2%, p < 0.001). Bestpractice adherence was higher among African laboratories (QI score: 6.3 ± 1.2 vs 5.4 ± 1.3, p = 0.013). However, median ED varied significantly among African laboratories (range: 2.0-16.3 mSv; p < 0.0001) and QI range was 4-8. Patient radiation dose from MPI in Africa was similar to that in the rest of the world, and adherence to best practices was relatively high in African laboratories. Nevertheless there remain opportunities to further reduce radiation exposure to African patients from MPI.
Doukky, Rami; Hayes, Kathleen; Frogge, Nathan; Nazir, Noreen T; Collado, Fareed M; Williams, Kim A
2015-05-01
The impact of health insurance carrier and socioeconomic status (SES) on the adherence to appropriate use criteria (AUC) for radionuclide myocardial perfusion imaging (MPI) is unknown. Health insurance carrier's prior authorization and patient's SES impact adherence to AUC for MPI in a fee-for-service setting. We conducted a prospective cohort study of 1511 consecutive patients who underwent outpatient MPI in a multi-site, office-based, fee-for-service setting. The patients were stratified according to the 2009 AUC into appropriate/uncertain appropriateness and inappropriate use groups. Insurance status was categorized as Medicare (does not require prior authorization) vs commercial (requires prior authorization). Socioeconomic status was determined by the median household income in the ZIP code of residence. The proportion of patients with Medicare was 33% vs 67% with commercial insurance. The rate of inappropriate use was higher among patients with commercial insurance vs Medicare (55% vs 24%; P < 0.001); this difference was not significant after adjusting for confounders known to impact AUC determination (odds ratio: 1.06, 95% confidence interval: 0.62-1.82, P = 0.82). The mean annual household income in the residential areas of patients with inappropriate use as compared to those with appropriate/uncertain use was $72 000 ± 21 000 vs $68 000 ± 20 000, respectively (P < 0.001). After adjusting for covariates known to impact AUC determination, SES (top vs bottom quartile income area) was not independently predictive of inappropriate MPI use (odds ratio: 0.9, 95% confidence interval: 0.53-1.52, P = 0.69). Insurance carriers prior authorization and SES do not seem to play a significant role in determining physicians adherence to AUC for MPI. © 2015 Wiley Periodicals, Inc.
Zhang, Li; Liu, Zhe; Hu, Ke-You; Tian, Qing-Bao; Wei, Ling-Ge; Zhao, Zhe; Shen, Hong-Rui; Hu, Jing
2015-01-01
Early detection of muscular dystrophy (MD)-associated cardiomyopathy is important because early medical treatment may slow cardiac remodeling and attenuate symptoms of cardiac dysfunction; however, no sensitive and standard diagnostic method for MD at an earlier stage has been well-recognized. Thus, the aim of this study was to test the early diagnostic value of technetium 99m-methoxyisobutylisonitrile ((99)Tc(m)-MIBI) gated myocardial perfusion imaging (G-MPI) for MD. Ninety-one patients underwent (99)Tc(m)-MIBI G-MPI examinations when they were diagnosed with Duchenne muscular dystrophy (DMD) (n=77) or Becker muscular dystrophy (BMD; n=14). (99)Tc(m)-MIBI G-MPI examinations were repeated in 43 DMD patients who received steroid treatments for 2 years as a follow-up examination. Myocardial defects were observed in nearly every segment of the left ventricular wall in both DMD and BMD patients compared with controls, especially in the inferior walls and the apices by using (99)Tc(m)-MIBI G-MPI. Cardiac wall movement impairment significantly correlated with age in the DMD and BMD groups (r s=0.534 [P<0.05] and r s=0.784 [P<0.05], respectively). Intermittent intravenous doses of glucocorticoids and continuation with oral steroid treatments significantly improved myocardial function in DMD patients (P<0.05), but not in BMD patients. (99)Tc(m)-MIBI G-MPI is a sensitive and safe approach for early evaluation of cardiomyopathy in patients with DMD or BMD, and can serve as a candidate method for the evaluation of progression, prognosis, and assessment of the effect of glucocorticoid treatment in these patients.
Bouyoucef, Salah E; Mercuri, Mathew; Einstein, Andrew J; Pascual, Thomas NB; Kashyap, Ravi; Dondi, Maurizio; Paez, Diana; Allam, Adel H; Vangu, Mboyo; Vitola, João V; Better, Nathan; Karthikeyan, Ganesan; Mahmarian, John J; Rehani, Madan M; Einstein, Andrew J
2017-01-01
Summary Objective: While nuclear myocardial perfusion imaging (MPI) offers many benefits to patients with known or suspected cardiovascular disease, concerns exist regarding radiationassociated health effects. Little is known regarding MPI practice in Africa. We sought to characterise radiation doses and the use of MPI best practices that could minimise radiation in African nuclear cardiology laboratories, and compare these to practice worldwide. Methods: Demographics and clinical characteristics were collected for a consecutive sample of 348 patients from 12 laboratories in six African countries over a one-week period from March to April 2013. Radiation effective dose (ED) was estimated for each patient. A quality index (QI) enumerating adherence to eight best practices, identified a priori by an IAEA expert panel, was calculated for each laboratory. We compared these metrics with those from 7 563 patients from 296 laboratories outside Africa. Results: to that of the rest of the world [9.1 (5.1–15.6) vs 10.3 mSv (6.8–12.6), p = 0.14], although a larger proportion of African patients received a low ED, ≤ 9 mSv targeted in societal recommendations (49.7 vs 38.2%, p < 0.001). Bestpractice adherence was higher among African laboratories (QI score: 6.3 ± 1.2 vs 5.4 ± 1.3, p = 0.013). However, median ED varied significantly among African laboratories (range: 2.0–16.3 mSv; p < 0.0001) and QI range was 4–8. Conclusion: Patient radiation dose from MPI in Africa was similar to that in the rest of the world, and adherence to best practices was relatively high in African laboratories. Nevertheless there remain opportunities to further reduce radiation exposure to African patients from MPI. PMID:28906538
Rastgou, Fereydoon; Shojaeifard, Maryam; Amin, Ahmad; Ghaedian, Tahereh; Firoozabadi, Hasan; Malek, Hadi; Yaghoobi, Nahid; Bitarafan-Rajabi, Ahmad; Haghjoo, Majid; Amouzadeh, Hedieh; Barati, Hossein
2014-12-01
Recently, the phase analysis of gated single-photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI) has become feasible via several software packages for the evaluation of left ventricular mechanical dyssynchrony. We compared two quantitative software packages, quantitative gated SPECT (QGS) and Emory cardiac toolbox (ECTb), with tissue Doppler imaging (TDI) as the conventional method for the evaluation of left ventricular mechanical dyssynchrony. Thirty-one patients with severe heart failure (ejection fraction ≤35%) and regular heart rhythm, who referred for gated-SPECT MPI, were enrolled. TDI was performed within 3 days after MPI. Dyssynchrony parameters derived from gated-SPECT MPI were analyzed by QGS and ECTb and were compared with the Yu index and septal-lateral wall delay measured by TDI. QGS and ECTb showed a good correlation for assessment of phase histogram bandwidth (PHB) and phase standard deviation (PSD) (r = 0.664 and r = 0.731, P < .001, respectively). However, the mean value of PHB and PSD by ECTb was significantly higher than that of QGS. No significant correlation was found between ECTb and QGS and the Yu index. Nevertheless, PHB, PSD, and entropy derived from QGS revealed a significant (r = 0.424, r = 0.478, r = 0.543, respectively; P < .02) correlation with septal-lateral wall delay. Despite a good correlation between QGS and ECTb software packages, different normal cut-off values of PSD and PHB should be defined for each software package. There was only a modest correlation between phase analysis of gated-SPECT MPI and TDI data, especially in the population of heart failure patients with both narrow and wide QRS complex.
NASA Astrophysics Data System (ADS)
Schürmann, Gregor J.; Kaminski, Thomas; Köstler, Christoph; Carvalhais, Nuno; Voßbeck, Michael; Kattge, Jens; Giering, Ralf; Rödenbeck, Christian; Heimann, Martin; Zaehle, Sönke
2016-09-01
We describe the Max Planck Institute Carbon Cycle Data Assimilation System (MPI-CCDAS) built around the tangent-linear version of the JSBACH land-surface scheme, which is part of the MPI-Earth System Model v1. The simulated phenology and net land carbon balance were constrained by globally distributed observations of the fraction of absorbed photosynthetically active radiation (FAPAR, using the TIP-FAPAR product) and atmospheric CO2 at a global set of monitoring stations for the years 2005 to 2009. When constrained by FAPAR observations alone, the system successfully, and computationally efficiently, improved simulated growing-season average FAPAR, as well as its seasonality in the northern extra-tropics. When constrained by atmospheric CO2 observations alone, global net and gross carbon fluxes were improved, despite a tendency of the system to underestimate tropical productivity. Assimilating both data streams jointly allowed the MPI-CCDAS to match both observations (TIP-FAPAR and atmospheric CO2) equally well as the single data stream assimilation cases, thereby increasing the overall appropriateness of the simulated biosphere dynamics and underlying parameter values. Our study thus demonstrates the value of multiple-data-stream assimilation for the simulation of terrestrial biosphere dynamics. It further highlights the potential role of remote sensing data, here the TIP-FAPAR product, in stabilising the strongly underdetermined atmospheric inversion problem posed by atmospheric transport and CO2 observations alone. Notwithstanding these advances, the constraint of the observations on regional gross and net CO2 flux patterns on the MPI-CCDAS is limited through the coarse-scale parametrisation of the biosphere model. We expect improvement through a refined initialisation strategy and inclusion of further biosphere observations as constraints.
Verra, Martin L; Angst, Felix; Brioschi, Roberto; Lehmann, Susanne; Keefe, Francis J; Staal, J Bart; de Bie, Rob A; Aeschlimann, André
2009-01-01
INTRODUCTION: The present study aimed to replicate and validate the empirically derived subgroup classification based on the Multidimensional Pain Inventory (MPI) in a sample of highly disabled fibromyalgia (FM) patients. Second, it examined how the identified subgroups differed in their response to an intensive, interdisciplinary inpatient pain management program. METHODS: Participants were 118 persons with FM who experienced persistent pain and were disabled. Subgroup classification was conducted by cluster analysis using MPI subscale scores at entry to the program. At program entry and discharge, participants completed the MPI, Medical Outcomes Study Short Form-36, Hospital Anxiety and Depression Scale and Coping Strategies Questionnaire. RESULTS: Cluster analysis identified three subgroups in the highly disabled sample that were similar to those described by other studies using less disabled samples of FM. The dysfunctional subgroup (DYS; 36% of the sample) showed the highest level of depression, the interpersonally distressed subgroup (ID; 24%) showed a modest level of depression and the adaptive copers subgroup (AC; 38%) showed the lowest depression scores in the MPI (negative mood), Medical Outcomes Study Short Form-36 (mental health), Hospital Anxiety and Depression Scale (depression) and Coping Strategies Questionnaire (catastrophizing). Significant differences in treatment outcome were observed among the three subgroups in terms of reduction of pain severity (as assessed using the MPI). The effect sizes were 1.42 for DYS, 1.32 for AC and 0.62 for ID (P=0.004 for pairwise comparison of ID-AC and P=0.018 for ID-DYS). DISCUSSION: These findings underscore the importance of assessing individuals’ differences in how they adjust to FM. PMID:20011715
Amer, Hamid; Niaz, Khalid; Hatazawa, Jun; Gasmelseed, Ahmed; Samiri, Hussain Al; Al Othman, Maram; Hammad, Mai Al
2017-11-01
We sought to determine the prognostic importance of adenosine-induced ischemic ECG changes in patients with normal single-photon emission computed tomography myocardial perfusion images (MPI). We carried out a retrospective analysis of 765 patients undergoing adenosine MPI between January 2013 and January 2015. Patients with baseline ECG abnormalities and/or abnormal scan were excluded. Overall, 67 (8.7%) patients had ischemic ECG changes during adenosine infusion in the form of ST depression of 1 mm or more. Of these, 29 [43% (3.8% of all patients)] had normal MPI (positive ECG group). An age-matched and sex-matched group of 108 patients with normal MPI without ECG changes served as control participants (negative ECG group). During a mean follow-up duration of 33.3±6.1 months, patients in the positive ECG group did not have significantly more adverse cardiac events than those in the negative ECG group. One (0.9%) patient in the negative ECG group had a nonfatal myocardial infarction (0.7% annual event rate after a negative MPI). Also in this group, two (1.8%) patients admitted with a diagnosis of CAD where they have been ruled out by angiography. A fourth case in this, in the negative ECG group, was admitted because of heart failure that proved to be secondary to a pulmonary cause and not CAD. A case only in the positive ECG group was admitted as a CAD that was ruled out by coronary angiography. Patients with normal myocardial perfusion scintigraphy in whom ST-segment depression develops during adenosine stress test appear to have no increased risk for future cardiac events compared with similar patients without ECG evidence of ischemia.
Magnetic particle imaging: from proof of principle to preclinical applications
NASA Astrophysics Data System (ADS)
Knopp, T.; Gdaniec, N.; Möddel, M.
2017-07-01
Tomographic imaging has become a mandatory tool for the diagnosis of a majority of diseases in clinical routine. Since each method has its pros and cons, a variety of them is regularly used in clinics to satisfy all application needs. Magnetic particle imaging (MPI) is a relatively new tomographic imaging technique that images magnetic nanoparticles with a high spatiotemporal resolution in a quantitative way, and in turn is highly suited for vascular and targeted imaging. MPI was introduced in 2005 and now enters the preclinical research phase, where medical researchers get access to this new technology and exploit its potential under physiological conditions. Within this paper, we review the development of MPI since its introduction in 2005. Besides an in-depth description of the basic principles, we provide detailed discussions on imaging sequences, reconstruction algorithms, scanner instrumentation and potential medical applications.
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.« less
Argobots: A Lightweight Low-Level Threading and Tasking Framework
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan; ...
2017-10-24
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Development of small scale cluster computer for numerical analysis
NASA Astrophysics Data System (ADS)
Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.
2017-09-01
In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Adaptation of Consultation Planning for Native American and Latina Women with Breast Cancer
ERIC Educational Resources Information Center
Belkora, Jeffrey; Franklin, Lauren; O'Donnell, Sara; Ohnemus, Julie; Stacey, Dawn
2009-01-01
Context: Resource centers in rural, underserved areas are implementing Consultation Planning (CP) to help women with breast cancer create a question list before a doctor visit. Purpose: To identify changes needed for acceptable delivery of CP to rural Native Americans and Latinas. Methods: We interviewed and surveyed 27 Native American and Latino…
Study of Special Populations: Native American Students with Disabilities. Chapter 7.
ERIC Educational Resources Information Center
Office of Special Education and Rehabilitative Services (ED), Washington, DC. Div. of Innovation and Development.
As one of a series of papers on the unique needs of special populations with disabilities, this chapter of the 16th annual report on the implementation of the Individuals with Disabilities Education Act (IDEA) reviews the literature on provision of services to Native American students with disabilities. Native American students with disabilities…
Military Interoperable Digital Hospital Testbed
2012-07-01
Subject outcome measures include blood pressure, waist circumference, weight, body mass index (BMI), body fat, HDL cholesterol, triglycerides , glucose ...master patient index (MPI), 625 duplicate chest x-rays and CT scans of the head between sending and receiving institution (taken within 0-7 days) were... Index (MPI) software. The report included chest x-rays (CPT 71010 and 71020) and CT scans of the head (CPT 70450) for the stated time periods. The
GSD-1G and MPI-DING Reference Glasses for In Situ and Bulk Isotopic Determination
Jochum, K.P.; Wilson, S.A.; Abouchami, W.; Amini, M.; Chmeleff, J.; Eisenhauer, A.; Hegner, E.; Iaccheri, L.M.; Kieffer, B.; Krause, J.; McDonough, W.F.; Mertz-Kraus, R.; Raczek, I.; Rudnick, R.L.; Scholz, Donna K.; Steinhoefel, G.; Stoll, B.; Stracke, A.; Tonarini, S.; Weis, D.; Weis, U.; Woodhead, J.D.
2011-01-01
This paper contains the results of an extensive isotopic study of United States Geological Survey GSD-1G and MPI-DING reference glasses. Thirteen different laboratories were involved using high-precision bulk (TIMS, MC-ICP-MS) and microanalytical (LA-MC-ICP-MS, LA-ICP-MS) techniques. Detailed studies were performed to demonstrate the large-scale and small-scale homogeneity of the reference glasses. Together with previously published isotopic data from ten other laboratories, preliminary reference and information values as well as their uncertainties at the 95% confidence level were determined for H, O, Li, B, Si, Ca, Sr, Nd, Hf, Pb, Th and U isotopes using the recommendations of the International Association of Geoanalysts for certification of reference materials. Our results indicate that GSD-1G and the MPI-DING glasses are suitable reference materials for microanalytical and bulk analytical purposes. Ce document contient les r??sultats d'une importante ??tude isotopique des verres de r??f??rence USGS GSD-1G et MPI-DING. Treize laboratoires diff??rents ont particip?? au travers de techniques analytiques de haute pr??cision travaillant soit sur ??chantillon total (TIMS, MC-ICP-MS) soit par microanalyse ??in situ?? (LA-MC-ICP-MS, LA-ICP-MS). ?? 2010 The Authors. Geostandards and Geoanalytical Research ?? 2010 International Association of Geoanalysts.
NASA Astrophysics Data System (ADS)
Tomitaka, Asahi; Arami, Hamed; Gandhi, Sonu; Krishnan, Kannan M.
2015-10-01
Magnetic Particle Imaging (MPI) is a new real-time imaging modality, which promises high tracer mass sensitivity and spatial resolution directly generated from iron oxide nanoparticles. In this study, monodisperse iron oxide nanoparticles with median core diameters ranging from 14 to 26 nm were synthesized and their surface was conjugated with lactoferrin to convert them into brain glioma targeting agents. The conjugation was confirmed with the increase of the hydrodynamic diameters, change of zeta potential, and Bradford assay. Magnetic particle spectrometry (MPS), performed to evaluate the MPI performance of these nanoparticles, showed no change in signal after lactoferrin conjugation to nanoparticles for all core diameters, suggesting that the MPI signal is dominated by Néel relaxation and thus independent of hydrodynamic size difference or presence of coating molecules before and after conjugations. For this range of core sizes (14-26 nm), both MPS signal intensity and spatial resolution improved with increasing core diameter of nanoparticles. The lactoferrin conjugated iron oxide nanoparticles (Lf-IONPs) showed specific cellular internalization into C6 cells with a 5-fold increase in MPS signal compared to IONPs without lactoferrin, both after 24 h incubation. These results suggest that Lf-IONPs can be used as tracers for targeted brain glioma imaging using MPI.
NASA Astrophysics Data System (ADS)
Rahmer, J.; Antonelli, A.; Sfara, C.; Tiemann, B.; Gleich, B.; Magnani, M.; Weizenecker, J.; Borgert, J.
2013-06-01
Magnetic particle imaging (MPI) is a new medical imaging approach that is based on the nonlinear magnetization response of super-paramagnetic iron oxide nanoparticles (SPIOs) injected into the blood stream. To date, real-time MPI of the bolus passage of an approved MRI SPIO contrast agent injected into the tail vein of living mice has been demonstrated. However, nanoparticles are rapidly removed from the blood stream by the mononuclear phagocyte system. Therefore, imaging applications for long-term monitoring require the repeated administration of bolus injections, which complicates quantitative comparisons due to the temporal variations in concentration. Encapsulation of SPIOs into red blood cells (RBCs) has been suggested to increase the blood circulation time of nanoparticles. This work presents first evidence that SPIO-loaded RBCs can be imaged in the blood pool of mice several hours after injection using MPI. This finding is supported by magnetic particle spectroscopy performed to quantify the iron concentration in blood samples extracted from the mice 3 and 24 h after injection of SPIO-loaded RBCs. Based on these results, new MPI applications can be envisioned, such as permanent 3D real-time visualization of the vessel tree during interventional procedures, bleeding monitoring after stroke, or long-term monitoring and treatment control of cardiovascular diseases.
Levy, Andrew E; Shah, Nishant R; Matheny, Michael E; Reeves, Ruth M; Gobbel, Glenn T; Bradley, Steven M
2018-04-25
Reporting standards promote clarity and consistency of stress myocardial perfusion imaging (MPI) reports, but do not require an assessment of post-test risk. Natural Language Processing (NLP) tools could potentially help estimate this risk, yet it is unknown whether reports contain adequate descriptive data to use NLP. Among VA patients who underwent stress MPI and coronary angiography between January 1, 2009 and December 31, 2011, 99 stress test reports were randomly selected for analysis. Two reviewers independently categorized each report for the presence of critical data elements essential to describing post-test ischemic risk. Few stress MPI reports provided a formal assessment of post-test risk within the impression section (3%) or the entire document (4%). In most cases, risk was determinable by combining critical data elements (74% impression, 98% whole). If ischemic risk was not determinable (25% impression, 2% whole), inadequate description of systolic function (9% impression, 1% whole) and inadequate description of ischemia (5% impression, 1% whole) were most commonly implicated. Post-test ischemic risk was determinable but rarely reported in this sample of stress MPI reports. This supports the potential use of NLP to help clarify risk. Further study of NLP in this context is needed.
Recent developments in bend-insensitive and ultra-bend-insensitive fibers
NASA Astrophysics Data System (ADS)
Boivin, David; de Montmorillon, Louis-Anne; Provost, Lionel; Montaigne, Nelly; Gooijer, Frans; Aldea, Eugen; Jensma, Jaap; Sillard, Pierre
2010-02-01
Designed to overcome the limitations in case of extreme bending conditions, Bend- and Ultra-Bend-Insensitive Fibers (BIFs and UBIFs) appear as ideal solutions for use in FTTH networks and in components, pigtails or patch-cords for ever demanding applications such as military or sensing. Recently, however, questions have been raised concerning the Multi-Path-Interference (MPI) levels in these fibers. Indeed, they are potentially subject to interferences between the fundamental mode and the higher-order mode that is also bend resistant. This MPI is generated because of discrete discontinuities such as staples, bends and splices/connections that occur on distance scales that become comparable to the laser coherent length. In this paper, we will demonstrate the high MPI tolerance of all-solid single-trench-assisted BIFs and UBIFs. We will present the first comprehensive study combining theoretical and experimental points of view to quantify the impact of fusion splices on coherent MPI. To be complete, results for mechanical splices will also be reported. Finally, we will show how the single-trench- assisted concept combined with the versatile PCVD process allows to tightly control the distributions of fibers characteristics. Such controls are needed to massively produce BIFs and to meet the more stringent specifications of the UBIFs.
NASA Astrophysics Data System (ADS)
Nuraini, Lutviasari; Prifiharni, Siska; Priyotomo, Gadang; Sundjono, Gunawan, Hadi; Purawiardi, Ibrahim
2018-05-01
The performance of carbon steel, galvanized steel and aluminium after one month exposed in the atmospheric coastal area, which is in Limbangan and Karangsong Beach, West Java, Indonesia was evaluated. The corrosion rate was determined by weight loss method and the morphology of the steel after exposed was observed by Scanning Electron Microscopy(SEM)/Energy Dispersive X-Ray Analysis(EDX). The site was monitored to determine the chloride content in the marine atmosphere. Then, the corrosion products formed at carbon steel were characterized by X-Ray diffraction (XRD). The result showed the aggressively corrosion in Karangsong beach, indicated from the corrosion rate of carbon steel, galvanized steel and aluminium were 38.514 mpy; 4.7860 mpy and 0.5181 mpy, respectively. While in Limbangan Beach the corrosion rate of specimen carbon steel, galvanized steel and aluminium were 3.339; 0.219 and 0.166 mpy, respectively. The chloride content was found to be the main factor that influences in the atmospheric corrosion process in this area. Chloride content accumulated in Karangsong and Limbangan was 497 mg/m2.day and 117 mg/m2.day, respectively. The XRD Analysis on each carbon steel led to the characterization of a complex mixture of iron oxides phases.
High frequency QRS ECG predicts ischemic defects during myocardial perfusion imaging
NASA Technical Reports Server (NTRS)
2004-01-01
Changes in high frequency QRS components of the electrocardiogram (HF QRS ECG) (150-250 Hz) are more sensitive than changes in conventional ST segments for detecting myocardial ischemia. We investigated the accuracy of 12-lead HF QRS ECG in detecting ischemia during adenosine tetrofosmin myocardial perfusion imaging (MPI). 12-lead HF QRS ECG recordings were obtained from 45 patients before and during adenosine technetium-99 tetrofosmin MPI tests. Before the adenosine infusions, recordings of HF QRS were analyzed according to a morphological score that incorporated the number, type and location of reduced amplitude zones (RAZs) present in the 12 leads. During the adenosine infusions, recordings of HF QRS were analyzed according to the maximum percentage changes (in both the positive and negative directions) that occurred in root mean square (RMS) voltage amplitudes within the 12 leads. The best set of prospective HF QRS criteria had a sensitivity of 94% and a specificity of 83% for correctly identifying the MPI result. The sensitivity of simultaneous ST segment changes (18%) was significantly lower than that of any individual HF QRS criterion (P less than 0.00l). Analysis of 12-lead HF QRS ECG is highly sensitive and specific for detecting ischemic perfusion defects during adenosine MPI stress tests and significantly more sensitive than analysis of conventional ST segments.
High frequency QRS ECG predicts ischemic defects during myocardial perfusion imaging
NASA Technical Reports Server (NTRS)
Rahman, Atiar
2006-01-01
Background: Changes in high frequency QRS components of the electrocardiogram (HF QRS ECG) (150-250 Hz) are more sensitive than changes in conventional ST segments for detecting myocardial ischemia. We investigated the accuracy of 12-lead HF QRS ECG in detecting ischemia during adenosine tetrofosmin myocardial perfusion imaging (MPI). Methods and Results: 12-lead HF QRS ECG recordings were obtained from 45 patients before and during adenosine technetium-99 tetrofosmin MPI tests. Before the adenosine infusions, recordings of HF QRS were analyzed according to a morphological score that incorporated the number, type and location of reduced amplitude zones (RAZs) present in the 12 leads. During the adenosine infusions, recordings of HF QRS were analyzed according to the maximum percentage changes (in both the positive and negative directions) that occurred in root mean square (RMS) voltage amplitudes within the 12 leads. The best set of prospective HF QRS criteria had a sensitivity of 94% and a specificity of 83% for correctly identifying the MPI result. The sensitivity of simultaneous ST segment changes (18%) was significantly lower than that of any individual HF QRS criterion (P<0.001). Conclusions: Analysis of 12-lead HF QRS ECG is highly sensitive and specific for detecting ischemic perfusion defects during adenosine MPI stress tests and significantly more sensitive than analysis of conventional ST segments.
The Experience of Non-Native English-Speaking Students in Academic Libraries in the United States.
ERIC Educational Resources Information Center
Onwuegbuzie, Anthony J.; Jiao, Qun G.; Daley, Christine E.
This study compared native and non-native English-speaking university students with respect to frequency of library usage and reasons for using the library, as well as differences between these groups with respect to levels of library anxiety. Findings were intended to be used in the planning and implementation of library services for…
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)
2002-01-01
This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Indirect spectrophotometric determination of trace cyanide with cationic porphyrins.
Ishii, H; Kohata, K
1991-05-01
Three highly sensitive methods for the determination of cyanide have been developed, based on the fact that the complexation of silver ions with three cationic porphyrins, 5,10,15,20-tetrakis-(1-methyl-2-pyridinio)porphine [T(2-MPy)P], 5,10,15,20-tetrakis(1-methyl-3-pyridinio)porphine [T(3-MPy)P] and 5,10,15,20-tetrakis(1-methyl-4-pyridinio)porphine [T(4-MPy)P], in alkaline media is inhibited by cyanide and the decrease in absorbance of the silver(II) complex is proportional to the cyanide concentration. Sensitivities of the procedures developed are 0.133, 0.126 and 0.234 ng/cm(2), respectively for an absorbance of 0.001. Cadmium(II), copper(II), mercury(II), zinc(II), iodide and sulfide interfere with the cyanide determination. One of the proposed methods was applied to the determination of cyanide in waste-water samples, with satisfactory results.
Kieffer, Philip J; Williams, Jarred M; Shepard, Molly K; Giguère, Steeve; Epstein, Kira L
2018-01-01
The objectives of the study were to: i) determine baseline microvascular perfusion indices (MPI) and assess their repeatability in healthy horses under general anesthesia, and ii) compare the MPIs of 3 microvascular beds (oral mucosa, colonic serosa, and rectal mucosa). Healthy adult horses were anesthetized and sidestream dark field microscopy was used to collect video loops of the oral mucosa, rectal mucosa, and colonic serosa under normotensive conditions without cardiovascular support drugs; videos were later analyzed to produce MPIs. Baseline MPI values were determined for each site, which included the total vessel density (TVD), perfused vessel density (PVD), portion perfused vessels (PPV), and microcirculatory flow index (MFI). Differences in MPIs between microvascular beds were not statistically significant. Repeatability of the measurements varied for each MPI. In particular, the site of sampling had a profound effect on the repeatability of the PPV measurements and should be considered in future studies.
Kieffer, Philip J.; Williams, Jarred M.; Shepard, Molly K.; Giguère, Steeve; Epstein, Kira L.
2018-01-01
The objectives of the study were to: i) determine baseline microvascular perfusion indices (MPI) and assess their repeatability in healthy horses under general anesthesia, and ii) compare the MPIs of 3 microvascular beds (oral mucosa, colonic serosa, and rectal mucosa). Healthy adult horses were anesthetized and sidestream dark field microscopy was used to collect video loops of the oral mucosa, rectal mucosa, and colonic serosa under normotensive conditions without cardiovascular support drugs; videos were later analyzed to produce MPIs. Baseline MPI values were determined for each site, which included the total vessel density (TVD), perfused vessel density (PVD), portion perfused vessels (PPV), and microcirculatory flow index (MFI). Differences in MPIs between microvascular beds were not statistically significant. Repeatability of the measurements varied for each MPI. In particular, the site of sampling had a profound effect on the repeatability of the PPV measurements and should be considered in future studies. PMID:29382969
Review of progress in magnetic particle inspection
NASA Astrophysics Data System (ADS)
Eisenmann, David J.; Enyart, Darrel; Lo, Chester; Brasche, Lisa
2014-02-01
Magnetic particle inspection (MPI) has been widely utilized for decades, and sees considerable use in the aerospace industry with a majority of the steel parts being inspected with MPI at some point in the lifecycle. Typical aircraft locations inspected are landing gear, engine components, attachment hardware, and doors. In spite of its numerous applications the method remains poorly understood, and there are many aspects of that method which would benefit from in-depth study. This shortcoming is due to the fact that MPI combines the complicated nature of electromagnetics, metallurgical material effects, fluid-particle motion dynamics, and physiological human factors into a single inspection. To promote understanding of the intricate method issues that affect sensitivity, or to assist with the revision of industry specifications and standards, research studies will be prioritized through the guidance of a panel of industry experts, using an approach which has worked successfully in the past to guide fluorescent penetrant inspection (FPI) research efforts.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katti, Amogh; Di Fatta, Giuseppe; Naughton III, Thomas J
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implementedmore » and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.« less
Gittelsohn, Joel; Evans, Marguerite; Helitzer, Deborah; Anliker, Jean; Story, Mary; Metcalfe, Lauve; Davis, Sally; Cloud, Patty Iron
2016-01-01
This paper describes how formative research was developed and implemented to produce obesity prevention interventions among school children in six different Native American nations that are part of the Pathways study. The formative assessment work presented here was unique in several ways: (1) it represents the first time formative research methods have been applied across multiple Native American tribes; (2) it is holistic, including data collection from parents, children, teachers, administrators and community leaders; and (3) it was developed by a multi-disciplinary group, including substantial input from Native American collaborators. The paper describes the process of developing the different units of the protocol, how data collection was implemented and how analyses were structured around the identification of risk behaviors. An emphasis is placed on describing which units of the formative assessment protocol were most effective and which were less effective. PMID:10181023
CoFFEE: Corrections For Formation Energy and Eigenvalues for charged defect simulations
NASA Astrophysics Data System (ADS)
Naik, Mit H.; Jain, Manish
2018-05-01
Charged point defects in materials are widely studied using Density Functional Theory (DFT) packages with periodic boundary conditions. The formation energy and defect level computed from these simulations need to be corrected to remove the contributions from the spurious long-range interaction between the defect and its periodic images. To this effect, the CoFFEE code implements the Freysoldt-Neugebauer-Van de Walle (FNV) correction scheme. The corrections can be applied to charged defects in a complete range of material shapes and size: bulk, slab (or two-dimensional), wires and nanoribbons. The code is written in Python and features MPI parallelization and optimizations using the Cython package for slow steps.
Sethuraman, Arun; Hey, Jody
2015-01-01
IMa2 and related programs are used to study the divergence of closely related species and of populations within species. These methods are based on the sampling of genealogies using MCMC, and they can proceed quite slowly for larger data sets. We describe a parallel implementation, called IMa2p, that provides a nearly linear increase in genealogy sampling rate with the number of processors in use. IMa2p is written in OpenMPI and C++, and scales well for demographic analyses of a large number of loci and populations, which are difficult to study using the serial version of the program. PMID:26059786
Eigensolver for a Sparse, Large Hermitian Matrix
NASA Technical Reports Server (NTRS)
Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris
2003-01-01
A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
NASA Astrophysics Data System (ADS)
Kim, Jeong-Gyu; Kim, Woong-Tae; Ostriker, Eve C.; Skinner, M. Aaron
2017-12-01
We present an implementation of an adaptive ray-tracing (ART) module in the Athena hydrodynamics code that accurately and efficiently handles the radiative transfer involving multiple point sources on a three-dimensional Cartesian grid. We adopt a recently proposed parallel algorithm that uses nonblocking, asynchronous MPI communications to accelerate transport of rays across the computational domain. We validate our implementation through several standard test problems, including the propagation of radiation in vacuum and the expansions of various types of H II regions. Additionally, scaling tests show that the cost of a full ray trace per source remains comparable to that of the hydrodynamics update on up to ∼ {10}3 processors. To demonstrate application of our ART implementation, we perform a simulation of star cluster formation in a marginally bound, turbulent cloud, finding that its star formation efficiency is 12% when both radiation pressure forces and photoionization by UV radiation are treated. We directly compare the radiation forces computed from the ART scheme with those from the M1 closure relation. Although the ART and M1 schemes yield similar results on large scales, the latter is unable to resolve the radiation field accurately near individual point sources.
Large-scale parallel lattice Boltzmann-cellular automaton model of two-dimensional dendritic growth
NASA Astrophysics Data System (ADS)
Jelinek, Bohumir; Eshraghi, Mohsen; Felicelli, Sergio; Peters, John F.
2014-03-01
An extremely scalable lattice Boltzmann (LB)-cellular automaton (CA) model for simulations of two-dimensional (2D) dendritic solidification under forced convection is presented. The model incorporates effects of phase change, solute diffusion, melt convection, and heat transport. The LB model represents the diffusion, convection, and heat transfer phenomena. The dendrite growth is driven by a difference between actual and equilibrium liquid composition at the solid-liquid interface. The CA technique is deployed to track the new interface cells. The computer program was parallelized using the Message Passing Interface (MPI) technique. Parallel scaling of the algorithm was studied and major scalability bottlenecks were identified. Efficiency loss attributable to the high memory bandwidth requirement of the algorithm was observed when using multiple cores per processor. Parallel writing of the output variables of interest was implemented in the binary Hierarchical Data Format 5 (HDF5) to improve the output performance, and to simplify visualization. Calculations were carried out in single precision arithmetic without significant loss in accuracy, resulting in 50% reduction of memory and computational time requirements. The presented solidification model shows a very good scalability up to centimeter size domains, including more than ten million of dendrites. Catalogue identifier: AEQZ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQZ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, UK Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 29,767 No. of bytes in distributed program, including test data, etc.: 3131,367 Distribution format: tar.gz Programming language: Fortran 90. Computer: Linux PC and clusters. Operating system: Linux. Has the code been vectorized or parallelized?: Yes. Program is parallelized using MPI. Number of processors used: 1-50,000 RAM: Memory requirements depend on the grid size Classification: 6.5, 7.7. External routines: MPI (http://www.mcs.anl.gov/research/projects/mpi/), HDF5 (http://www.hdfgroup.org/HDF5/) Nature of problem: Dendritic growth in undercooled Al-3 wt% Cu alloy melt under forced convection. Solution method: The lattice Boltzmann model solves the diffusion, convection, and heat transfer phenomena. The cellular automaton technique is deployed to track the solid/liquid interface. Restrictions: Heat transfer is calculated uncoupled from the fluid flow. Thermal diffusivity is constant. Unusual features: Novel technique, utilizing periodic duplication of a pre-grown “incubation” domain, is applied for the scaleup test. Running time: Running time varies from minutes to days depending on the domain size and number of computational cores.
A numerical differentiation library exploiting parallel architectures
NASA Astrophysics Data System (ADS)
Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.
2009-08-01
We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Effects of business-as-usual anthropogenic emissions on air quality
NASA Astrophysics Data System (ADS)
Pozzer, A.; Zimmermann, P.; Doering, U. M.; van Aardenne, J.; Tost, H.; Dentener, F.; Janssens-Maenhout, G.; Lelieveld, J.
2012-08-01
The atmospheric chemistry general circulation model EMAC has been used to estimate the impact of anthropogenic emission changes on global and regional air quality in recent and future years (2005, 2010, 2025 and 2050). The emission scenario assumes that population and economic growth largely determine energy and food consumption and consequent pollution sources with the current technologies ("business as usual"). This scenario is chosen to show the effects of not implementing legislation to prevent additional climate change and growing air pollution, other than what is in place for the base year 2005, representing a pessimistic (but plausible) future. By comparing with recent observations, it is shown that the model reproduces the main features of regional air pollution distributions though with some imprecisions inherent to the coarse horizontal resolution (~100 km) and simplified bottom-up emission input. To identify possible future hot spots of poor air quality, a multi pollutant index (MPI), suited for global model output, has been applied. It appears that East and South Asia and the Middle East represent such hotspots due to very high pollutant concentrations, while a general increase of MPIs is observed in all populated regions in the Northern Hemisphere. In East Asia a range of pollutant gases and fine particulate matter (PM2.5) is projected to reach very high levels from 2005 onward, while in South Asia air pollution, including ozone, will grow rapidly towards the middle of the century. Around the Persian Gulf, where natural PM2.5 concentrations are already high (desert dust), ozone levels are expected to increase strongly. The population weighted MPI (PW-MPI), which combines demographic and pollutant concentration projections, shows that a rapidly increasing number of people worldwide will experience reduced air quality during the first half of the 21st century. Following this business as usual scenario, it is projected that air quality for the global average citizen in 2050 would be almost comparable to that for the average citizen in East Asia in the year 2005, which underscores the need to pursue emission reductions.
An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Hargrove, Paul H.; Iancu, Costin
The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message delivery to improve communication throughput. In this paper we quantify the performance of one-sided and two-sided communication paradigms with respect to: 1) the optimal available hardware transfer mechanism, 2) message ordering constraints, 3) per node and per core message concurrency. In addition to using Cray native communication APIs, we use UPC and MPI micro-benchmarks to capture one- and two-sided semantics respectively. Our results indicate that relaxing the message delivery order can improve performance up to 4.6x when compared with strict ordering. When hardware allows it, high-level one-sided programmingmore » models can already take advantage of message reordering. Enforcing the ordering semantics of two-sided communication comes with a performance penalty. Furthermore, we argue that exposing out-of-order delivery at the application level is required for the next-generation programming models. Any ordering constraints in the language specifications reduce communication performance for small messages and increase the number of active cores required for peak throughput.« less
Smiljanić, Rajka; Bradlow, Ann R.
2011-01-01
This study investigated how native language background interacts with speaking style adaptations in determining levels of speech intelligibility. The aim was to explore whether native and high proficiency non-native listeners benefit similarly from native and non-native clear speech adjustments. The sentence-in-noise perception results revealed that fluent non-native listeners gained a large clear speech benefit from native clear speech modifications. Furthermore, proficient non-native talkers in this study implemented conversational-to-clear speaking style modifications in their second language (L2) that resulted in significant intelligibility gain for both native and non-native listeners. The results of the accentedness ratings obtained for native and non-native conversational and clear speech sentences showed that while intelligibility was improved, the presence of foreign accent remained constant in both speaking styles. This suggests that objective intelligibility and subjective accentedness are two independent dimensions of non-native speech. Overall, these results provide strong evidence that greater experience in L2 processing leads to improved intelligibility in both production and perception domains. These results also demonstrated that speaking style adaptations along with less signal distortion can contribute significantly towards successful native and non-native interactions. PMID:22225056
NASA Astrophysics Data System (ADS)
Kracher, D.; Manzini, E.; Reick, C. H.; Schultz, M. G.; Stein, O.
2014-12-01
Greenhouse gas induced climate change will modify the physical conditions of the atmosphere. One of the projected changes is an acceleration of the Brewer-Dobson circulation in the stratosphere, as it has been shown in many model studies. This change in the stratospheric circulation consequently bears an effect on the transport and distribution of atmospheric components such as N2O. Since N2O is involved in ozone destruction, a modified distribution of N2O can be of importance for ozone chemistry. N2O is inert in the troposphere and decays only in the stratosphere. Thus, changes in the exchange between troposphere and stratosphere can also affect the stratospheric sink of N2O, and consequently its atmospheric lifetime. N2O is a potent greenhouse gas with a global warming potential of currently approximately 300 CO2-equivalents in a 100-year perspective. A faster decay in atmospheric N2O mixing ratios, i.e. a decreased atmospheric lifetime of N2O, will also reduce its global warming potential. In order to assess the impact of climate change on atmospheric circulation and implied effects on the distribution and lifetime of atmospheric N2O, we apply the Max Planck Institute Earth System Model, MPI-ESM. MPI-ESM consists of the atmospheric general circulation model ECHAM, the land surface model JSBACH, and MPIOM/HAMOCC representing ocean circulation and ocean biogeochemistry. Prognostic atmospheric N2O concentrations in MPI-ESM are determined by land N2O emissions, ocean-atmosphere N2O exchange and atmospheric tracer transport. As stratospheric chemistry is not explicitly represented in MPI-ESM, stratospheric decay rates of N2O are prescribed from a MACC MOZART simulation. Increasing surface temperatures and CO2 concentrations in the stratosphere impact atmospheric circulation differently. Thus, we conduct a series of transient runs with the atmospheric model of MPI-ESM to isolate different factors governing a shift in atmospheric circulation. From those transient simulations we diagnose decreasing tropospheric N2O concentrations, increased transport of N2O from the troposphere to the stratosphere, and increasing stratospheric decay of N2O leading to a reduction in atmospheric lifetime of N2O, in dependency to climate change evolution.
NASA Astrophysics Data System (ADS)
Schuster, Mareike; Thürkow, Markus; Weiher, Stefan; Kirchner, Ingo; Ulbrich, Uwe; Will, Andreas
2016-04-01
A general bias of global atmosphere ocean models, and also of the MPI-ESM, is an under-representation of the high latitude cyclone activity and an overestimation of the mid latitude cyclone activity in the North Atlantic, thus representing the extra-tropical storm track too zonal. We will show, that this effect can be antagonized by applying an atmospheric Two-Way Coupling (TWC). In this study we present a newly developed Two-Way Coupled model system, which is based on the MPI-ESM, and show that it is able to capture the mean storm track location more accurate. It also influences the sub-decadal deterministic predictability of extra-tropical cyclones and shows significantly enhanced skill compared to the "uncoupled" MPI-ESM standalone system. This study evaluates a set of hindcast experiments performed with said Two-Way Coupled model system. The regional model COSMO CLM is Two-Way Coupled to the atmosphere of the global Max-Plack-Institute Earth System Model (MPI-ESM) and therefore integrates and exchanges the state of the atmosphere every 10 minutes (MPI-TWC-ESM). In the coupled source region (North Atlantic), mesoscale processes which are relevant for the formation and early-stage development of cyclones are expected to be better represented, and therefore influence the large scale dynamics of the target region (Europe). The database covers 102 "uncoupled" years and 102 Two-Way Coupled years of the recent climate (1960-2010). Results are validated against the ERA-Interim reanalysis. Besides the climatological point of view, the design of this single model ensemble allows for an analysis of the predictability of the first and second leadyears of the hindcasts. As a first step to understand the improved predictability of cyclones, we will show a detailed analysis of climatologies for specific cyclone categories, sorted by season and region. Especially for cyclones affecting Europe, the TWC is capable to counteract the AOGCM's biases in the North Atlantic. Also, cyclones which are generated in the northern North Atlantic and the Labrador Sea are to an extraordinary extent underestimated in the "uncoupled" MPI-ESM - for the latter region the TWC can balance this shortcoming. In the Northern Hemisphere annual mean statistics the TWC does not change the distribution of the strength of cyclones, but it changes the distribution of the lifetime of cyclones.
A portable MPI-based parallel vector template library
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.
1995-01-01
This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.
1995-01-01
This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
Global magnetohydrodynamic simulations on multiple GPUs
NASA Astrophysics Data System (ADS)
Wong, Un-Hong; Wong, Hon-Cheng; Ma, Yonghui
2014-01-01
Global magnetohydrodynamic (MHD) models play the major role in investigating the solar wind-magnetosphere interaction. However, the huge computation requirement in global MHD simulations is also the main problem that needs to be solved. With the recent development of modern graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA), it is possible to perform global MHD simulations in a more efficient manner. In this paper, we present a global magnetohydrodynamic (MHD) simulator on multiple GPUs using CUDA 4.0 with GPUDirect 2.0. Our implementation is based on the modified leapfrog scheme, which is a combination of the leapfrog scheme and the two-step Lax-Wendroff scheme. GPUDirect 2.0 is used in our implementation to drive multiple GPUs. All data transferring and kernel processing are managed with CUDA 4.0 API instead of using MPI or OpenMP. Performance measurements are made on a multi-GPU system with eight NVIDIA Tesla M2050 (Fermi architecture) graphics cards. These measurements show that our multi-GPU implementation achieves a peak performance of 97.36 GFLOPS in double precision.
Davide Rassati; Massimo Faccoli; Lorenzo Marini; Robert A. Haack; Andrea Battisti; Edoardo Petrucco Toffolo
2015-01-01
Non-native wood-boring beetles (Coleoptera) represent one of the most commonly intercepted groups of insects at ports worldwide. The development of early detection methods is a crucial step when implementing rapid response programs so that non-native wood-boring beetles can be quickly detected and a timely action plan can be produced. However, due to the limited...
Advances in Parallel Computing and Databases for Digital Pathology in Cancer Research
2016-11-13
these technologies and how we have used them in the past. We are interested in learning more about the needs of clinical pathologists as we continue to...such as image processing and correlation. Further, High Performance Computing (HPC) paradigms such as the Message Passing Interface (MPI) have been...Defense for Research and Engineering. such as pMatlab [4], or bcMPI [5] can significantly reduce the need for deep knowledge of parallel computing. In
Ishihara, Masaru; Onoguchi, Masahisa; Taniguchi, Yasuyo; Shibutani, Takayuki
2017-12-01
The aim of this study was to clarify the differences in thallium-201-chloride (thallium-201) myocardial perfusion imaging (MPI) scans evaluated by conventional anger-type single-photon emission computed tomography (conventional SPECT) versus cadmium-zinc-telluride SPECT (CZT SPECT) imaging in normal databases for different ethnic groups. MPI scans from 81 consecutive Japanese patients were examined using conventional SPECT and CZT SPECT and analyzed with the pre-installed quantitative perfusion SPECT (QPS) software. We compared the summed stress score (SSS), summed rest score (SRS), and summed difference score (SDS) for the two SPECT devices. For a normal MPI reference, we usually use Japanese databases for MPI created by the Japanese Society of Nuclear Medicine, which can be used with conventional SPECT but not with CZT SPECT. In this study, we used new Japanese normal databases constructed in our institution to compare conventional and CZT SPECT. Compared with conventional SPECT, CZT SPECT showed lower SSS (p < 0.001), SRS (p = 0.001), and SDS (p = 0.189) using the pre-installed SPECT database. In contrast, CZT SPECT showed no significant difference from conventional SPECT in QPS analysis using the normal databases from our institution. Myocardial perfusion analyses by CZT SPECT should be evaluated using normal databases based on the ethnic group being evaluated.
Dynamic CT perfusion imaging of the myocardium: a technical note on improvement of image quality.
Muenzel, Daniela; Kabus, Sven; Gramer, Bettina; Leber, Vivian; Vembar, Mani; Schmitt, Holger; Wildgruber, Moritz; Fingerle, Alexander A; Rummeny, Ernst J; Huber, Armin; Noël, Peter B
2013-01-01
To improve image and diagnostic quality in dynamic CT myocardial perfusion imaging (MPI) by using motion compensation and a spatio-temporal filter. Dynamic CT MPI was performed using a 256-slice multidetector computed tomography scanner (MDCT). Data from two different patients-with and without myocardial perfusion defects-were evaluated to illustrate potential improvements for MPI (institutional review board approved). Three datasets for each patient were generated: (i) original data (ii) motion compensated data and (iii) motion compensated data with spatio-temporal filtering performed. In addition to the visual assessment of the tomographic slices, noise and contrast-to-noise-ratio (CNR) were measured for all data. Perfusion analysis was performed using time-density curves with regions-of-interest (ROI) placed in normal and hypoperfused myocardium. Precision in definition of normal and hypoperfused areas was determined in corresponding coloured perfusion maps. The use of motion compensation followed by spatio-temporal filtering resulted in better alignment of the cardiac volumes over time leading to a more consistent perfusion quantification and improved detection of the extend of perfusion defects. Additionally image noise was reduced by 78.5%, with CNR improvements by a factor of 4.7. The average effective radiation dose estimate was 7.1±1.1 mSv. The use of motion compensation and spatio-temporal smoothing will result in improved quantification of dynamic CT MPI using a latest generation CT scanner.