Characterizing Task-Based OpenMP Programs
Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats
2015-01-01
Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance. PMID:25860023
A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, C; Quinlan, D; Panas, T
2010-01-25
OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers for optimal performance and to target modern hardware architectures. A variety of extensible and robust research compilers are key to OpenMP's sustainable success in the future. In this paper, we present our efforts to build an OpenMP 3.0 research compiler for C, C++, and Fortran; using the ROSE source-to-source compiler framework. Our goal is to support OpenMP research for ourselves and others. We have extended ROSE's internal representation to handle all of the OpenMP 3.0 constructs and facilitate their manipulation. Since OpenMP research is oftenmore » complicated by the tight coupling of the compiler translations and the runtime system, we present a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries. These rules additionally define how to build a set of translations targeting XOMP. Our work demonstrates how to reuse OpenMP translations across different runtime libraries. This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries. We present an evaluation of our work by demonstrating an analysis tool for OpenMP correctness. We also show how XOMP can be defined using both GOMP and Omni and present comparative performance results against other OpenMP compilers.« less
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs
Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...
2013-01-01
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
OpenMP 4.5 Validation and Verification Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pophale, Swaroop S; Bernholdt, David E; Hernandez, Oscar R
2017-12-15
OpenMP, a directive-based programming API, introduce directives for accelerator devices that programmers are starting to use more frequently in production codes. To make sure OpenMP directives work correctly across architectures, it is critical to have a mechanism that tests for an implementation's conformance to the OpenMP standard. This testing process can uncover ambiguities in the OpenMP specification, which helps compiler developers and users make a better use of the standard. We fill this gap through our validation and verification test suite that focuses on the offload directives available in OpenMP 4.5.
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2016-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland
2003-01-01
In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
NASA Technical Reports Server (NTRS)
Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)
2002-01-01
In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Support of Multidimensional Parallelism in the OpenMP Programming Model
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele
2003-01-01
OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
NASA Technical Reports Server (NTRS)
Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele
2004-01-01
In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Atzeni, Simone; Ahn, Dong; Gopalakrishnan, Ganesh
2017-01-12
Archer is built on top of the LLVM/Clang compilers that support OpenMP. It applies static and dynamic analysis techniques to detect data races in OpenMP programs generating a very low runtime and memory overhead. Static analyses identify data race free OpenMP regions and exclude them from runtime analysis, which is performed by ThreadSanitizer included in LLVM/Clang.
Toward Enhancing OpenMP's Work-Sharing Directives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, B M; Huang, L; Jin, H
2006-05-17
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.
Automatic Multilevel Parallelization Using OpenMP
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)
2002-01-01
In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gyllenhaal, J.
CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading. For simplicity, it does not use MPI by default but it is expected to be run on the resources a threaded MPI task would use (e.g., a portion of a shared memory compute node). Compiling with -DWITH_MPI allows packing one or more nodes with CLOMP tasks and having CLOMP report OpenMP performance for the slowest MPI task. On current systems, the strong scaling performance results for 4, 8, or 16 threads are of the most interest. Suggested weakmore » scaling inputs are provided for evaluating future systems. Since MPI is often used to place at least one MPI task per coherence or NUMA domain, it is recommended to focus OpenMP runtime measurements on a subset of node hardware where it is most possible to have low OpenMP overheads (e.g., within one coherence domain or NUMA domain).« less
On a model of three-dimensional bursting and its parallel implementation
NASA Astrophysics Data System (ADS)
Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.
2008-04-01
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barbara Chapman
OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close tomore » DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gylenhaal, J.; Bronevetsky, G.
2007-05-25
CLOMP is the C version of the Livermore OpenMP benchmark deeloped to measure OpenMP overheads and other performance impacts due to threading (like NUMA memory layouts, memory contention, cache effects, etc.) in order to influence future system design. Current best-in-class implementations of OpenMP have overheads at least ten times larger than is required by many of our applications for effective use of OpenMP. This benchmark shows the significant negative performance impact of these relatively large overheads and of other thread effects. The CLOMP benchmark highly configurable to allow a variety of problem sizes and threading effects to be studied andmore » it carefully checks its results to catch many common threading errors. This benchmark is expected to be included as part of the Sequoia Benchmark suite for the Sequoia procurement.« less
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores
NASA Astrophysics Data System (ADS)
Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei
We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry
1999-01-01
As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.« less
A Programming Model Performance Study Using the NAS Parallel Benchmarks
Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...
2010-01-01
Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less
A multi-threaded version of MCFM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campbell, John M.; Ellis, R. Keith; Giele, Walter T.
We report on our findings modifying MCFM using OpenMP to implement multi-threading. By using OpenMP, the modified MCFM will execute on any processor, automatically adjusting to the number of available threads. We then modified the integration routine VEGAS to distribute the event evaluation over the threads, while combining all events at the end of every iteration to optimize the numerical integration. Furthermore, we took special care so that the results of the Monte Carlo integration were independent of the number of threads used, to facilitate the validation of the OpenMP version of MCFM.
Early Experiences Writing Performance Portable OpenMP 4 Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joubert, Wayne; Hernandez, Oscar R
In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mishra, Alok; Li, Lingda; Kong, Martin
Here, the latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC systems have enhanced support for unified memory space. In such systems, CPU and GPU can access each other's memory transparently, that is, the data movement is managed automatically by the underlying system software and hardware. Memory over subscription is also possible in these systems. However, there is a significant lack of knowledge about how this mechanism will perform, and how programmers shouldmore » use it. We have modified several benchmarks codes, in the Rodinia benchmark suite, to study the behavior of OpenMP accelerator extensions and have used them to explore the impact of unified memory in an OpenMP context. We moreover modified the open source LLVM compiler to allow OpenMP programs to exploit unified memory. The results of our evaluation reveal that, while the performance of unified memory is comparable with that of normal GPU offloading for benchmarks with little data reuse, it suffers from significant overhead when GPU memory is over subcribed for benchmarks with large amount of data reuse. Based on these results, we provide several guidelines for programmers to achieve better performance with unified memory.« less
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP
NASA Technical Reports Server (NTRS)
Norton, C.; Balsara, D.
1999-01-01
Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.
OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms
Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...
2016-09-21
In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less
Shrimankar, D D; Sathe, S R
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Shrimankar, D. D.; Sathe, S. R.
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
OpenMP performance for benchmark 2D shallow water equations using LBM
NASA Astrophysics Data System (ADS)
Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry
2018-03-01
Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.
NASA Astrophysics Data System (ADS)
Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav
2017-10-01
In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
A numerical differentiation library exploiting parallel architectures
NASA Astrophysics Data System (ADS)
Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.
2009-08-01
We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Multilevel Parallelization of AutoDock 4.2.
Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P
2011-04-28
Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
BLESS 2: accurate, memory-efficient and fast error correction method.
Heo, Yun; Ramachandran, Anand; Hwu, Wen-Mei; Ma, Jian; Chen, Deming
2016-08-01
The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. Freely available at https://sourceforge.net/projects/bless-ec dchen@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented
MPI, HPF or OpenMP: A Study with the NAS Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)
1999-01-01
Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor
NASA Astrophysics Data System (ADS)
Hristov, Ivan; Goranov, Goran; Hristova, Radoslava
2018-02-01
We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.
Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A
2016-06-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The Research of the Parallel Computing Development from the Angle of Cloud Computing
NASA Astrophysics Data System (ADS)
Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun
2017-10-01
Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
Testing New Programming Paradigms with NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
What Scientific Applications can Benefit from Hardware Transactional Memory?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schindewolf, M; Bihari, B; Gyllenhaal, J
2012-06-04
Achieving efficient and correct synchronization of multiple threads is a difficult and error-prone task at small scale and, as we march towards extreme scale computing, will be even more challenging when the resulting application is supposed to utilize millions of cores efficiently. Transactional Memory (TM) is a promising technique to ease the burden on the programmer, but only recently has become available on commercial hardware in the new Blue Gene/Q system and hence the real benefit for realistic applications has not been studied, yet. This paper presents the first performance results of TM embedded into OpenMP on a prototype systemmore » of BG/Q and characterizes code properties that will likely lead to benefits when augmented with TM primitives. We first, study the influence of thread count, environment variables and memory layout on TM performance and identify code properties that will yield performance gains with TM. Second, we evaluate the combination of OpenMP with multiple synchronization primitives on top of MPI to determine suitable task to thread ratios per node. Finally, we condense our findings into a set of best practices. These are applied to a Monte Carlo Benchmark and a Smoothed Particle Hydrodynamics method. In both cases an optimized TM version, executed with 64 threads on one node, outperforms a simple TM implementation. MCB with optimized TM yields a speedup of 27.45 over baseline.« less
Effective Vectorization with OpenMP 4.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham
This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology
NASA Astrophysics Data System (ADS)
Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.
2009-04-01
Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Parallel processing implementation for the coupled transport of photons and electrons using OpenMP
NASA Astrophysics Data System (ADS)
Doerner, Edgardo
2016-05-01
In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.
Issues Identified During September 2016 IBM OpenMP 4.5 Hackathon
DOE Office of Scientific and Technical Information (OSTI.GOV)
Richards, David F.
In September, 2016 IBM hosted an OpenMP 4.5 Hackathon at the TJ Watson Research Center. Teams from LLNL, ORNL, SNL, LANL, and LBNL attended the event. As with the 2015 hackathon, IBM produced an extremely useful and successful event with unmatched support from compiler team, applications staff, and facilities. Approximately 24 IBM staff supported 4-day hackathon and spent significant time 4-6 weeks out to prepare environment and become familiar with apps. This hackathon was also the first event to feature LLVM & XL C/C++ and Fortran compilers. This report records many of the issues encountered by the LLNL teams duringmore » the hackathon.« less
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
NASA Technical Reports Server (NTRS)
Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)
2001-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Parallel protein secondary structure prediction based on neural networks.
Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi
2004-01-01
Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.
Numerical modeling of exciton-polariton Bose-Einstein condensate in a microcavity
NASA Astrophysics Data System (ADS)
Voronych, Oksana; Buraczewski, Adam; Matuszewski, Michał; Stobińska, Magdalena
2017-06-01
A novel, optimized numerical method of modeling of an exciton-polariton superfluid in a semiconductor microcavity was proposed. Exciton-polaritons are spin-carrying quasiparticles formed from photons strongly coupled to excitons. They possess unique properties, interesting from the point of view of fundamental research as well as numerous potential applications. However, their numerical modeling is challenging due to the structure of nonlinear differential equations describing their evolution. In this paper, we propose to solve the equations with a modified Runge-Kutta method of 4th order, further optimized for efficient computations. The algorithms were implemented in form of C++ programs fitted for parallel environments and utilizing vector instructions. The programs form the EPCGP suite which has been used for theoretical investigation of exciton-polaritons. Catalogue identifier: AFBQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD-3 No. of lines in distributed program, including test data, etc.: 2157 No. of bytes in distributed program, including test data, etc.: 498994 Distribution format: tar.gz Programming language: C++ with OpenMP extensions (main numerical program), Python (helper scripts). Computer: Modern PC (tested on AMD and Intel processors), HP BL2x220. Operating system: Unix/Linux and Windows. Has the code been vectorized or parallelized?: Yes (OpenMP) RAM: 200 MB for single run Classification: 7, 7.7. Nature of problem: An exciton-polariton superfluid is a novel, interesting physical system allowing investigation of high temperature Bose-Einstein condensation of exciton-polaritons-quasiparticles carrying spin. They have brought a lot of attention due to their unique properties and potential applications in polariton-based optoelectronic integrated circuits. This is an out-of-equilibrium quantum system confined within a semiconductor microcavity. It is described by a set of nonlinear differential equations similar in spirit to the Gross-Pitaevskii (GP) equation, but their unique properties do not allow standard GP solving frameworks to be utilized. Finding an accurate and efficient numerical algorithm as well as development of optimized numerical software is necessary for effective theoretical investigation of exciton-polaritons. Solution method: A Runge-Kutta method of 4th order was employed to solve the set of differential equations describing exciton-polariton superfluids. The method was fitted for the exciton-polariton equations and further optimized. The C++ programs utilize OpenMP extensions and vector operations in order to fully utilize the computer hardware. Running time: 6h for 100 ps evolution, depending on the values of parameters
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Implementing Shared Memory Parallelism in MCBEND
NASA Astrophysics Data System (ADS)
Bird, Adam; Long, David; Dobson, Geoff
2017-09-01
MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hager, Robert, E-mail: rhager@pppl.gov; Yoon, E.S., E-mail: yoone@rpi.edu; Ku, S., E-mail: sku@pppl.gov
2016-06-15
Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. In this article, the non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. The finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable onmore » high-performance computing systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. The collision operator's good weak and strong scaling behavior are shown.« less
Hager, Robert; Yoon, E. S.; Ku, S.; ...
2016-04-04
Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. The non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. Moreover, the finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable on high-performance computingmore » systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. As a result, the collision operator's good weak and strong scaling behavior are shown.« less
Argobots: A Lightweight Low-Level Threading and Tasking Framework
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan; ...
2017-10-24
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS
NASA Astrophysics Data System (ADS)
Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.
2018-03-01
Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
Roofline model toolkit: A practical tool for architectural and program analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian
We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
NASA Technical Reports Server (NTRS)
Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
NASA Astrophysics Data System (ADS)
Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.
2018-07-01
This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.
Hierarchical resilience with lightweight threads.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wheeler, Kyle Bruce
2011-10-01
This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
GPU Accelerated Browser for Neuroimaging Genomics.
Zigon, Bob; Li, Huang; Yao, Xiaohui; Fang, Shiaofen; Hasan, Mohammad Al; Yan, Jingwen; Moore, Jason H; Saykin, Andrew J; Shen, Li
2018-04-25
Neuroimaging genomics is an emerging field that provides exciting opportunities to understand the genetic basis of brain structure and function. The unprecedented scale and complexity of the imaging and genomics data, however, have presented critical computational bottlenecks. In this work we present our initial efforts towards building an interactive visual exploratory system for mining big data in neuroimaging genomics. A GPU accelerated browsing tool for neuroimaging genomics is created that implements the ANOVA algorithm for single nucleotide polymorphism (SNP) based analysis and the VEGAS algorithm for gene-based analysis, and executes them at interactive rates. The ANOVA algorithm is 110 times faster than the 4-core OpenMP version, while the VEGAS algorithm is 375 times faster than its 4-core OpenMP counter part. This approach lays a solid foundation for researchers to address the challenges of mining large-scale imaging genomics datasets via interactive visual exploration.
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Jong, Wibe de
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Data Race Benchmark Collection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua
2017-03-21
This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.
The MOLDY short-range molecular dynamics package
NASA Astrophysics Data System (ADS)
Ackland, G. J.; D'Mellow, K.; Daraszewicz, S. L.; Hepburn, D. J.; Uhrin, M.; Stratford, K.
2011-12-01
We describe a parallelised version of the MOLDY molecular dynamics program. This Fortran code is aimed at systems which may be described by short-range potentials and specifically those which may be addressed with the embedded atom method. This includes a wide range of transition metals and alloys. MOLDY provides a range of options in terms of the molecular dynamics ensemble used and the boundary conditions which may be applied. A number of standard potentials are provided, and the modular structure of the code allows new potentials to be added easily. The code is parallelised using OpenMP and can therefore be run on shared memory systems, including modern multicore processors. Particular attention is paid to the updates required in the main force loop, where synchronisation is often required in OpenMP implementations of molecular dynamics. We examine the performance of the parallel code in detail and give some examples of applications to realistic problems, including the dynamic compression of copper and carbon migration in an iron-carbon alloy. Program summaryProgram title: MOLDY Catalogue identifier: AEJU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 2 No. of lines in distributed program, including test data, etc.: 382 881 No. of bytes in distributed program, including test data, etc.: 6 705 242 Distribution format: tar.gz Programming language: Fortran 95/OpenMP Computer: Any Operating system: Any Has the code been vectorised or parallelized?: Yes. OpenMP is required for parallel execution RAM: 100 MB or more Classification: 7.7 Nature of problem: Moldy addresses the problem of many atoms (of order 10 6) interacting via a classical interatomic potential on a timescale of microseconds. It is designed for problems where statistics must be gathered over a number of equivalent runs, such as measuring thermodynamic properities, diffusion, radiation damage, fracture, twinning deformation, nucleation and growth of phase transitions, sputtering etc. In the vast majority of materials, the interactions are non-pairwise, and the code must be able to deal with many-body forces. Solution method: Molecular dynamics involves integrating Newton's equations of motion. MOLDY uses verlet (for good energy conservation) or predictor-corrector (for accurate trajectories) algorithms. It is parallelised using open MP. It also includes a static minimisation routine to find the lowest energy structure. Boundary conditions for surfaces, clusters, grain boundaries, thermostat (Nose), barostat (Parrinello-Rahman), and externally applied strain are provided. The initial configuration can be either a repeated unit cell or have all atoms given explictly. Initial velocities are generated internally, but it is also possible to specify the velocity of a particular atom. A wide range of interatomic force models are implemented, including embedded atom, Morse or Lennard-Jones. Thus the program is especially well suited to calculations of metals. Restrictions: The code is designed for short-ranged potentials, and there is no Ewald sum. Thus for long range interactions where all particles interact with all others, the order- N scaling will fail. Different interatomic potential forms require recompilation of the code. Additional comments: There is a set of associated open-source analysis software for postprocessing and visualisation. This includes local crystal structure recognition and identification of topological defects. Running time: A set of test modules for running time are provided. The code scales as order N. The parallelisation shows near-linear scaling with number of processors in a shared memory environment. A typical run of a few tens of nanometers for a few nanoseconds will run on a timescale of days on a multiprocessor desktop.
A feasibility study on porting the community land model onto accelerators using OpenACC
Wang, Dali; Wu, Wei; Winkler, Frank; ...
2014-01-01
As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC appears as a very promising technology, therefore, we have conducted a feasibility analysis on porting the Community Land Model (CLM), a terrestrial ecosystem model within the Community Earth System Models (CESM)). Specifically, we used automatic function testing platform to extract a small computing kernel out of CLM, then we apply this kernel into the actually CLM dataflowmore » procedure, and investigate the strategy of data parallelization and the benefit of data movement provided by current implementation of OpenACC. Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP implementation using single OpenMP thread, but it is 2.8 times slower than the performance of OpenMP implementation using 16 threads. On multiple nodes, MPI_OpenACC implementation demonstrated very good scalability on up to 128 GPUs on 128 computing nodes. This study also provides useful information for us to look into the potential benefits of “deep copy” capability and “routine” feature of OpenACC standards. In conclusion, we believe that our experience on the environmental model, CLM, can be beneficial to many other scientific research programs who are interested to porting their large scale scientific code using OpenACC onto high-end computers, empowered by hybrid computing architecture.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hornung, Richard D.; Hones, Holger E.
The RAJA Performance Suite is designed to evaluate performance of the RAJA performance portability library on a wide variety of important high performance computing (HPC) algorithmic lulmels. These kernels assess compiler optimizations and various parallel programming model backends accessible through RAJA, such as OpenMP, CUDA, etc. The Initial version of the suite contains 25 computational kernels, each of which appears in 6 variants: Baseline SequcntiaJ, RAJA SequentiaJ, Baseline OpenMP, RAJA OpenMP, Baseline CUDA, RAJA CUDA. All variants of each kernel perform essentially the same mathematical operations and the loop body code for each kernel is identical across all variants. Theremore » are a few kernels, such as those that contain reduction operations, that require CUDA-specific coding for their CUDA variants. ActuaJ computer instructions executed and how they run in parallel differs depending on the parallel programming model backend used and which optimizations are perfonned by the compiler used to build the Perfonnance Suite executable. The Suite will be used primarily by RAJA developers to perform regular assessments of RAJA performance across a range of hardware platforms and compilers as RAJA features are being developed. It will also be used by LLNL hardware and software vendor panners for new defining requirements for future computing platform procurements and acceptance testing. In particular, the RAJA Performance Suite will be used for compiler acceptance testing of the upcoming CORAUSierra machine {initial LLNL delivery expected in late-2017/early 2018) and the CORAL-2 procurement. The Suite will aJso be used to generate concise source code reproducers of compiler and runtime issues we uncover so that we may provide them to relevant vendors to be fixed.« less
Performance evaluation of canny edge detection on a tiled multicore architecture
NASA Astrophysics Data System (ADS)
Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald
2011-01-01
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Parameters analysis of a porous medium model for treatment with hyperthermia using OpenMP
NASA Astrophysics Data System (ADS)
Freitas Reis, Ruy; dos Santos Loureiro, Felipe; Lobosco, Marcelo
2015-09-01
Cancer is the second cause of death in the world so treatments have been developed trying to work around this world health problem. Hyperthermia is not a new technique, but its use in cancer treatment is still at early stage of development. This treatment is based on overheat the target area to a threshold temperature that causes cancerous cell necrosis and apoptosis. To simulate this phenomenon using magnetic nanoparticles in an under skin cancer treatment, a three-dimensional porous medium model was adopted. This study presents a sensibility analysis of the model parameters such as the porosity and blood velocity. To ensure a second-order solution approach, a 7-points centered finite difference method was used for space discretization while a predictor-corrector method was used to time evolution. Due to the massive computations required to find the solution of a three-dimensional model, this paper also presents a first attempt to improve performance using OpenMP, a parallel programming API.
Characterization of UMT2013 Performance on Advanced Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howell, Louis
2014-12-31
This paper presents part of a larger effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. The focus here is on UMT2013, a proxy implementation of deterministic transport for unstructured meshes. I present weak and strong MPI scaling results and studies of OpenMP efficiency on the Sequoia BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. The hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while informationmore » from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Preliminary tests that exploit NVRAM as extended memory on an Ivy Bridge machine designed for “Big Data” applications are also included.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bull, Jeffrey S.
This presentation describes how to build MCNP 6.2. MCNP®* 6.2 can be compiled on Macs, PCs, and most Linux systems. It can also be built for parallel execution using both OpenMP and Messing Passing Interface (MPI) methods. MCNP6 requires Fortran, C, and C++ compilers to build the code.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nielsen, Jens; D’Avezac, Mayeul; Hetherington, James
2013-12-14
Ab initio kinetic Monte Carlo (KMC) simulations have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These simulations necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. Moremore » recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for simulating catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce parallelization with OpenMP. We further benchmark our framework by simulating a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.« less
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang
2017-10-01
The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators
Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew
2014-01-01
Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950
Performance Analysis of and Tool Support for Transactional Memory on BG/Q
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schindewolf, M
2011-12-08
Martin Schindewolf worked during his internship at the Lawrence Livermore National Laboratory (LLNL) under the guidance of Martin Schulz at the Computer Science Group of the Center for Applied Scientific Computing. We studied the performance of the TM subsystem of BG/Q as well as researched the possibilities for tool support for TM. To study the performance, we run CLOMP-TM. CLOMP-TM is a benchmark designed for the purpose to quantify the overhead of OpenMP and compare different synchronization primitives. To advance CLOMP-TM, we added Message Passing Interface (MPI) routines for a hybrid parallelization. This enables to run multiple MPI tasks, eachmore » running OpenMP, on one node. With these enhancements, a beneficial MPI task to OpenMP thread ratio is determined. Further, the synchronization primitives are ranked as a function of the application characteristics. To demonstrate the usefulness of these results, we investigate a real Monte Carlo simulation called Monte Carlo Benchmark (MCB). Applying the lessons learned yields the best task to thread ratio. Further, we were able to tune the synchronization by transactifying the MCB. Further, we develop tools that capture the performance of the TM run time system and present it to the application's developer. The performance of the TM run time system relies on the built-in statistics. These tools use the Blue Gene Performance Monitoring (BGPM) interface to correlate the statistics from the TM run time system with performance counter values. This combination provides detailed insights in the run time behavior of the application and enables to track down the cause of degraded performance. Further, one tool has been implemented that separates the performance counters in three categories: Successful Speculation, Unsuccessful Speculation and No Speculation. All of the tools are crafted around IBM's xlc compiler for C and C++ and have been run and tested on a Q32 early access system.« less
Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card
NASA Astrophysics Data System (ADS)
Jiang, Jinpeng; Zhu, Peimin
2018-05-01
Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.
Kalantzis, Georgios; Tachibana, Hidenobu
2014-01-01
For microdosimetric calculations event-by-event Monte Carlo (MC) methods are considered the most accurate. The main shortcoming of those methods is the extensive requirement for computational time. In this work we present an event-by-event MC code of low projectile energy electron and proton tracks for accelerated microdosimetric MC simulations on a graphic processing unit (GPU). Additionally, a hybrid implementation scheme was realized by employing OpenMP and CUDA in such a way that both GPU and multi-core CPU were utilized simultaneously. The two implementation schemes have been tested and compared with the sequential single threaded MC code on the CPU. Performance comparison was established on the speed-up for a set of benchmarking cases of electron and proton tracks. A maximum speedup of 67.2 was achieved for the GPU-based MC code, while a further improvement of the speedup up to 20% was achieved for the hybrid approach. The results indicate the capability of our CPU-GPU implementation for accelerated MC microdosimetric calculations of both electron and proton tracks without loss of accuracy. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Acoustic 3D modeling by the method of integral equations
NASA Astrophysics Data System (ADS)
Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.
2018-02-01
This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.
Hung, Ling-Hong; Samudrala, Ram
2014-06-15
fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
McClure, J. E.; Prins, J. F.; Miller, C. T.
2014-07-01
Multiphase flow implementations of the lattice Boltzmann method (LBM) are widely applied to the study of porous medium systems. In this work, we construct a new variant of the popular "color" LBM for two-phase flow in which a three-dimensional, 19-velocity (D3Q19) lattice is used to compute the momentum transport solution while a three-dimensional, seven velocity (D3Q7) lattice is used to compute the mass transport solution. Based on this formulation, we implement a novel heterogeneous GPU-accelerated algorithm in which the mass transport solution is computed by multiple shared memory CPU cores programmed using OpenMP while a concurrent solution of the momentum transport is performed using a GPU. The heterogeneous solution is demonstrated to provide speedup of 2.6 × as compared to multi-core CPU solution and 1.8 × compared to GPU solution due to concurrent utilization of both CPU and GPU bandwidths. Furthermore, we verify that the proposed formulation provides an accurate physical representation of multiphase flow processes and demonstrate that the approach can be applied to perform heterogeneous simulations of two-phase flow in porous media using a typical GPU-accelerated workstation.
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
NASA Astrophysics Data System (ADS)
Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.
2016-09-01
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
NASA Astrophysics Data System (ADS)
Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.
2016-04-01
We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.
Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, C; Quinlan, D J; Willcock, J J
2008-12-12
Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructuremore » which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.« less
NASA Astrophysics Data System (ADS)
Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.
2017-07-01
Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.
Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias
2011-10-01
Future multiscale and multiphysics models that support research into human disease, translational medical science, and treatment can utilize the power of high-performance computing (HPC) systems. We anticipate that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message-passing processes [e.g., the message-passing interface (MPI)] with multithreading (e.g., OpenMP, Pthreads). The objective of this study is to compare the performance of such hybrid programming models when applied to the simulation of a realistic physiological multiscale model of the heart. Our results show that the hybrid models perform favorably when compared to an implementation using only the MPI and, furthermore, that OpenMP in combination with the MPI provides a satisfactory compromise between performance and code complexity. Having the ability to use threads within MPI processes enables the sophisticated use of all processor cores for both computation and communication phases. Considering that HPC systems in 2012 will have two orders of magnitude more cores than what was used in this study, we believe that faster than real-time multiscale cardiac simulations can be achieved on these systems.
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.
2011-01-01
We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.
Code Parallelization with CAPO: A User Manual
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2001-01-01
A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scherrer, Chad; Halappanavar, Mahantesh; Tewari, Ambuj
2012-07-03
We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-01-17
This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler
NASA Astrophysics Data System (ADS)
Clay, M. P.; Yeung, P. K.; Buaria, D.; Gotoh, T.
2017-11-01
Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a 81923 scalar field at Schmidt number 512 in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. Supported by NSF Grant ACI-1036170, the NCSA Blue Waters Project with subaward via UIUC, and a DOE INCITE allocation at ORNL.
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Turney, Raymond D.
2001-01-01
This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
2014-07-22
The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less
High Resolution Aerospace Applications using the NASA Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Aftosmis, Michael J.; Berger, Marsha
2005-01-01
This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier-Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary combination of these two simulation codes enables high-fidelity characterization of aerospace vehicle design performance over the entire flight envelope through extensive parametric analysis and detailed simulation of critical regions of the flight envelope. Both packages. are industrial-level codes designed for complex geometry and incorpor.ats. CuStomized multigrid solution algorithms. The performance of these codes on Columbia is examined using both MPI and OpenMP and using both the NUMAlink and InfiniBand interconnect fabrics. Numerical results demonstrate good scalability on up to 2016 CPUs using the NUMAIink4 interconnect, with measured computational rates in the vicinity of 3 TFLOP/s, while InfiniBand showed some performance degradation at high CPU counts, particularly with multigrid. Nonetheless, the results are encouraging enough to indicate that larger test cases using combined MPI/OpenMP communication should scale well on even more processors.
A Non-Equilibrium Sediment Transport Model for Coastal Inlets and Navigation Channels
2011-01-01
exchange of water , sediment, and nutrients between estuaries and the ocean. Because of the multiple interacting forces (waves, wind, tide, river...in parallel using OpenMP. The CMS takes advantage of the Surface- water Modeling System (SMS) interface for grid generation and model setup, as well...as for plotting and post- processing (Zundel, 2000). The circulation model in the CMS (called CMS-Flow) computes the unsteady water level and
NASA Astrophysics Data System (ADS)
Baba, J. S.; Koju, V.; John, D.
2015-03-01
The propagation of light in turbid media is an active area of research with relevance to numerous investigational fields, e.g., biomedical diagnostics and therapeutics. The statistical random-walk nature of photon propagation through turbid media is ideal for computational based modeling and simulation. Ready access to super computing resources provide a means for attaining brute force solutions to stochastic light-matter interactions entailing scattering by facilitating timely propagation of sufficient (>107) photons while tracking characteristic parameters based on the incorporated physics of the problem. One such model that works well for isotropic but fails for anisotropic scatter, which is the case for many biomedical sample scattering problems, is the diffusion approximation. In this report, we address this by utilizing Berry phase (BP) evolution as a means for capturing anisotropic scattering characteristics of samples in the preceding depth where the diffusion approximation fails. We extend the polarization sensitive Monte Carlo method of Ramella-Roman, et al., to include the computationally intensive tracking of photon trajectory in addition to polarization state at every scattering event. To speed-up the computations, which entail the appropriate rotations of reference frames, the code was parallelized using OpenMP. The results presented reveal that BP is strongly correlated to the photon penetration depth, thus potentiating the possibility of polarimetric depth resolved characterization of highly scattering samples, e.g., biological tissues.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baba, Justin S; John, Dwayne O; Koju, Vijay
The propagation of light in turbid media is an active area of research with relevance to numerous investigational fields, e.g., biomedical diagnostics and therapeutics. The statistical random-walk nature of photon propagation through turbid media is ideal for computational based modeling and simulation. Ready access to super computing resources provide a means for attaining brute force solutions to stochastic light-matter interactions entailing scattering by facilitating timely propagation of sufficient (>10million) photons while tracking characteristic parameters based on the incorporated physics of the problem. One such model that works well for isotropic but fails for anisotropic scatter, which is the case formore » many biomedical sample scattering problems, is the diffusion approximation. In this report, we address this by utilizing Berry phase (BP) evolution as a means for capturing anisotropic scattering characteristics of samples in the preceding depth where the diffusion approximation fails. We extend the polarization sensitive Monte Carlo method of Ramella-Roman, et al.,1 to include the computationally intensive tracking of photon trajectory in addition to polarization state at every scattering event. To speed-up the computations, which entail the appropriate rotations of reference frames, the code was parallelized using OpenMP. The results presented reveal that BP is strongly correlated to the photon penetration depth, thus potentiating the possibility of polarimetric depth resolved characterization of highly scattering samples, e.g., biological tissues.« less
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
CCC7-119 Reactive Molecular Dynamics Simulations of Hot Spot Growth in Shocked Energetic Materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, Aidan P.
2015-03-01
The purpose of this work is to understand how defects control initiation in energetic materials used in stockpile components; Sequoia gives us the core-count to run very large-scale simulations of up to 10 million atoms and; Using an OpenMP threaded implementation of the ReaxFF package in LAMMPS, we have been able to get good parallel efficiency running on 16k nodes of Sequoia, with 1 hardware thread per core.
OpenMP Performance on the Columbia Supercomputer
NASA Technical Reports Server (NTRS)
Haoqiang, Jin; Hood, Robert
2005-01-01
This presentation discusses Columbia World Class Supercomputer which is one of the world's fastest supercomputers providing 61 TFLOPs (10/20/04). Conceived, designed, built, and deployed in just 120 days. A 20-node supercomputer built on proven 512-processor nodes. The largest SGI system in the world with over 10,000 Intel Itanium 2 processors and provides the largest node size incorporating commodity parts (512) and the largest shared-memory environment (2048) with 88% efficiency tops the scalar systems on the Top500 list.
OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithms
2016-05-01
composed of hyper - spectral video sequences recording the release of chemical plumes at the Dugway Proving Ground. We use the 329 frames of the...video. Each frame is a hyper - spectral image with dimension 128 × 320 × 129, where 129 is the dimension of the channel of each pixel. The total number of...j=1 . Then we use the nested for- loop to calculate the values of WXY by the formula (1). We then put the corresponding value in an array which
Lattice QCD simulations using the OpenACC platform
NASA Astrophysics Data System (ADS)
Majumdar, Pushan
2016-10-01
In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with OpenMP like directives. We present some examples of QCD simulation codes using OpenACC and discuss their performance on the Fermi and Kepler GPUs.
Rambrain - a library for virtually extending physical memory
NASA Astrophysics Data System (ADS)
Imgrund, Maximilian; Arth, Alexander
2017-08-01
We introduce Rambrain, a user space library that manages memory consumption of your code. Using Rambrain you can overcommit memory over the size of physical memory present in the system. Rambrain takes care of temporarily swapping out data to disk and can handle multiples of the physical memory size present. Rambrain is thread-safe, OpenMP and MPI compatible and supports Asynchronous IO. The library was designed to require minimal changes to existing programs and to be easy to use.
NDL-v2.0: A new version of the numerical differentiation library for parallel architectures
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.
2014-07-01
We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.
NASA Astrophysics Data System (ADS)
Nouri-Borujerdi, Ali; Moazezi, Arash
2018-01-01
The current study investigates the conjugate heat transfer characteristics for laminar flow in backward facing step channel. All of the channel walls are insulated except the lower thick wall under a constant temperature. The upper wall includes a insulated obstacle perpendicular to flow direction. The effect of obstacle height and location on the fluid flow and heat transfer are numerically explored for the Reynolds number in the range of 10 ≤ Re ≤ 300. Incompressible Navier-Stokes and thermal energy equations are solved simultaneously in fluid region by the upwind compact finite difference scheme based on flux-difference splitting in conjunction with artificial compressibility method. In the thick wall, the energy equation is obtained by Laplace equation. A multi-block approach is used to perform parallel computing to reduce the CPU time. Each block is modeled separately by sharing boundary conditions with neighbors. The developed program for modeling was written in FORTRAN language with OpenMP API. The obtained results showed that using of the multi-block parallel computing method is a simple robust scheme with high performance and high-order accurate. Moreover, the obtained results demonstrated that the increment of Reynolds number and obstacle height as well as decrement of horizontal distance between the obstacle and the step improve the heat transfer.
Characterization of Proxy Application Performance on Advanced Architectures. UMT2013, MCB, AMG2013
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howell, Louis H.; Gunney, Brian T.; Bhatele, Abhinav
2015-10-09
Three codes were tested at LLNL as part of a Tri-Lab effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. Teams from Sandia and Los Alamos tested proxy apps of their own. The focus in this report is on the LLNL codes UMT2013, MCB, and AMG2013. We present weak and strong MPI scaling results and studies of OpenMP efficiency on a large BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. Themore » hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while information from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Results from three more speculative tests are also included: one that exploits NVRAM as extended memory, one that studies performance under a power bound, and one that illustrates the effects of changing the torus network mapping on BG/Q.« less
Finite-Difference Algorithm for Simulating 3D Electromagnetic Wavefields in Conductive Media
NASA Astrophysics Data System (ADS)
Aldridge, D. F.; Bartel, L. C.; Knox, H. A.
2013-12-01
Electromagnetic (EM) wavefields are routinely used in geophysical exploration for detection and characterization of subsurface geological formations of economic interest. Recorded EM signals depend strongly on the current conductivity of geologic media. Hence, they are particularly useful for inferring fluid content of saturated porous bodies. In order to enhance understanding of field-recorded data, we are developing a numerical algorithm for simulating three-dimensional (3D) EM wave propagation and diffusion in heterogeneous conductive materials. Maxwell's equations are combined with isotropic constitutive relations to obtain a set of six, coupled, first-order partial differential equations governing the electric and magnetic vectors. An advantage of this system is that it does not contain spatial derivatives of the three medium parameters electric permittivity, magnetic permeability, and current conductivity. Numerical solution methodology consists of explicit, time-domain finite-differencing on a 3D staggered rectangular grid. Temporal and spatial FD operators have order 2 and N, where N is user-selectable. We use an artificially-large electric permittivity to maximize the FD timestep, and thus reduce execution time. For the low frequencies typically used in geophysical exploration, accuracy is not unduly compromised. Grid boundary reflections are mitigated via convolutional perfectly matched layers (C-PMLs) imposed at the six grid flanks. A shared-memory-parallel code implementation via OpenMP directives enables rapid algorithm execution on a multi-thread computational platform. Good agreement is obtained in comparisons of numerically-generated data with reference solutions. EM wavefields are sourced via point current density and magnetic dipole vectors. Spatially-extended inductive sources (current carrying wire loops) are under development. We are particularly interested in accurate representation of high-conductivity sub-grid-scale features that are common in industrial environments (borehole casing, pipes, railroad tracks). Present efforts are oriented toward calculating the EM responses of these objects via a First Born Approximation approach. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Variational data assimilation system "INM RAS - Black Sea"
NASA Astrophysics Data System (ADS)
Parmuzin, Eugene; Agoshkov, Valery; Assovskiy, Maksim; Giniatulin, Sergey; Zakharova, Natalia; Kuimov, Grigory; Fomin, Vladimir
2013-04-01
Development of Informational-Computational Systems (ICS) for Data Assimilation Procedures is one of multidisciplinary problems. To study and solve these problems one needs to apply modern results from different disciplines and recent developments in: mathematical modeling; theory of adjoint equations and optimal control; inverse problems; numerical methods theory; numerical algebra and scientific computing. The problems discussed above are studied in the Institute of Numerical Mathematics of the Russian Academy of Science (INM RAS) in ICS for Personal Computers (PC). Special problems and questions arise while effective ICS versions for PC are being developed. These problems and questions can be solved with applying modern methods of numerical mathematics and by solving "parallelism problem" using OpenMP technology and special linear algebra packages. In this work the results on the ICS development for PC-ICS "INM RAS - Black Sea" are presented. In the work the following problems and questions are discussed: practical problems that can be studied by ICS; parallelism problems and their solutions with applying of OpenMP technology and the linear algebra packages used in ICS "INM - Black Sea"; Interface of ICS. The results of ICS "INM RAS - Black Sea" testing are presented. Efficiency of technologies and methods applied are discussed. The work was supported by RFBR, grants No. 13-01-00753, 13-05-00715 and by The Ministry of education and science of Russian Federation, project 8291, project 11.519.11.1005 References: [1] V.I. Agoshkov, M.V. Assovskii, S.A. Lebedev, Numerical simulation of Black Sea hydrothermodynamics taking into account tide-forming forces. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 5-31 [2] E.I. Parmuzin, V.I. Agoshkov, Numerical solution of the variational assimilation problem for sea surface temperature in the model of the Black Sea dynamics. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 69-94 [3] V.B. Zalesny, N.A. Diansky, V.V. Fomin, S.N. Moshonkin, S.G. Demyshev, Numerical model of the circulation of Black Sea and Sea of Azov. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 95-111 [4] V.I. Agoshkov, S.V. Giniatulin, G.V. Kuimov. OpenMP technology and linear algebra packages in the variation data assimilation systems. - Abstracts of the 1-st China-Russia Conference on Numerical Algebra with Applications in Radiactive Hydrodynamics, Beijing, China, October 16-18, 2012. [5] Zakharova N.B., Agoshkov V.I., Parmuzin E.I., The new method of ARGO buoys system observation data interpolation. Russian Journal of Numerical Analysis and Mathematical Modelling. Vol. 28, Issue 1, 2013.
Tycho 2: A Proxy Application for Kinetic Transport Sweeps
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garrett, Charles Kristopher; Warsa, James S.
2016-09-14
Tycho 2 is a proxy application that implements discrete ordinates (SN) kinetic transport sweeps on unstructured, 3D, tetrahedral meshes. It has been designed to be small and require minimal dependencies to make collaboration and experimentation as easy as possible. Tycho 2 has been released as open source software. The software is currently in a beta release with plans for a stable release (version 1.0) before the end of the year. The code is parallelized via MPI across spatial cells and OpenMP across angles. Currently, several parallelization algorithms are implemented.
Parallelizing a peanut butter sandwich
NASA Astrophysics Data System (ADS)
Quenette, S. M.
2005-12-01
This poster aims to demonstrate, in a novel way, why contemporary computational code development is seemingly hard to a geodynamics modeler (i.e. a non-computer-scientist). For example, to utilise comtemporary computer hardware, parallelisation is required. But why do we chose the explicit approach (MPI) over an implicit (OpenMP) one? How does this relate to the typical geodynamics codes. And do we face this same style of problems in every day life? We aim to demonstrate that the little bit of complexity, fore-thought and effort is worth its while.
OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation
NASA Astrophysics Data System (ADS)
Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun
2017-11-01
We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary solutions, respectively. The present OpenMP programs are designed for computers with multi-core processors and optimized for compiling with both commercially-licensed Intel Fortran and popular free open-source GNU Fortran compiler. The programs are easy to use and are elaborated with helpful comments for the users. All input parameters are listed at the beginning of each program. Different output files provide physical quantities such as energy, chemical potential, root-mean-square sizes, densities, etc. We also present speedup test results for new versions of the programs. Program files doi:http://dx.doi.org/10.17632/y8zk3jgn84.2 Licensing provisions: Apache License 2.0 Programming language: OpenMP GNU and Intel Fortran 90. Computer: Any multi-core personal computer or workstation with the appropriate OpenMP-capable Fortran compiler installed. Number of processors used: All available CPU cores on the executing computer. Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 1888; ibid.204 (2016) 209. Does the new version supersede the previous version?: Not completely. It does supersede previous Fortran programs from both references above, but not OpenMP C programs from Comput. Phys. Commun. 204 (2016) 209. Nature of problem: The present Open Multi-Processing (OpenMP) Fortran programs, optimized for use with commercially-licensed Intel Fortran and free open-source GNU Fortran compilers, solve the time-dependent nonlinear partial differential (GP) equation for a trapped Bose-Einstein condensate in one (1d), two (2d), and three (3d) spatial dimensions for six different trap symmetries: axially and radially symmetric traps in 3d, circularly symmetric traps in 2d, fully isotropic (spherically symmetric) and fully anisotropic traps in 2d and 3d, as well as 1d traps, where no spatial symmetry is considered. Solution method: We employ the split-step Crank-Nicolson algorithm to discretize the time-dependent GP equation in space and time. The discretized equation is then solved by imaginary- or real-time propagation, employing adequately small space and time steps, to yield the solution of stationary and non-stationary problems, respectively. Reasons for the new version: Previously published Fortran programs [1,2] have now become popular tools [3] for solving the GP equation. These programs have been translated to the C programming language [4] and later extended to the more complex scenario of dipolar atoms [5]. Now virtually all computers have multi-core processors and some have motherboards with more than one physical computer processing unit (CPU), which may increase the number of available CPU cores on a single computer to several tens. The C programs have been adopted to be very fast on such multi-core modern computers using general-purpose graphic processing units (GPGPU) with Nvidia CUDA and computer clusters using Message Passing Interface (MPI) [6]. Nevertheless, previously developed Fortran programs are also commonly used for scientific computation and most of them use a single CPU core at a time in modern multi-core laptops, desktops, and workstations. Unless the Fortran programs are made aware and capable of making efficient use of the available CPU cores, the solution of even a realistic dynamical 1d problem, not to mention the more complicated 2d and 3d problems, could be time consuming using the Fortran programs. Previously, we published auto-parallel Fortran programs [2] suitable for Intel (but not GNU) compiler for solving the GP equation. Hence, a need for the full OpenMP version of the Fortran programs to reduce the execution time cannot be overemphasized. To address this issue, we provide here such OpenMP Fortran programs, optimized for both Intel and GNU Fortran compilers and capable of using all available CPU cores, which can significantly reduce the execution time. Summary of revisions: Previous Fortran programs [1] for solving the time-dependent GP equation in 1d, 2d, and 3d with different trap symmetries have been parallelized using the OpenMP interface to reduce the execution time on multi-core processors. There are six different trap symmetries considered, resulting in six programs for imaginary-time propagation and six for real-time propagation, totaling to 12 programs included in BEC-GP-OMP-FOR software package. All input data (number of atoms, scattering length, harmonic oscillator trap length, trap anisotropy, etc.) are conveniently placed at the beginning of each program, as before [2]. Present programs introduce a new input parameter, which is designated by Number_of_Threads and defines the number of CPU cores of the processor to be used in the calculation. If one sets the value 0 for this parameter, all available CPU cores will be used. For the most efficient calculation it is advisable to leave one CPU core unused for the background system's jobs. For example, on a machine with 20 CPU cores such that we used for testing, it is advisable to use up to 19 CPU cores. However, the total number of used CPU cores can be divided into more than one job. For instance, one can run three simulations simultaneously using 10, 4, and 5 CPU cores, respectively, thus totaling to 19 used CPU cores on a 20-core computer. The Fortran source programs are located in the directory src, and can be compiled by the make command using the makefile in the root directory BEC-GP-OMP-FOR of the software package. The examples of produced output files can be found in the directory output, although some large density files are omitted, to save space. The programs calculate the values of actually used dimensionless nonlinearities from the physical input parameters, where the input parameters correspond to the identical nonlinearity values as in the previously published programs [1], so that the output files of the old and new programs can be directly compared. The output files are conveniently named such that their contents can be easily identified, following the naming convention introduced in Ref. [2]. For example, a file named -out.txt, where is a name of the individual program, represents the general output file containing input data, time and space steps, nonlinearity, energy and chemical potential, and was named fort.7 in the old Fortran version of programs [1]. A file named -den.txt is the output file with the condensate density, which had the names fort.3 and fort.4 in the old Fortran version [1] for imaginary- and real-time propagation programs, respectively. Other possible density outputs, such as the initial density, are commented out in the programs to have a simpler set of output files, but users can uncomment and re-enable them, if needed. In addition, there are output files for reduced (integrated) 1d and 2d densities for different programs. In the real-time programs there is also an output file reporting the dynamics of evolution of root-mean-square sizes after a perturbation is introduced. The supplied real-time programs solve the stationary GP equation, and then calculate the dynamics. As the imaginary-time programs are more accurate than the real-time programs for the solution of a stationary problem, one can first solve the stationary problem using the imaginary-time programs, adapt the real-time programs to read the pre-calculated wave function and then study the dynamics. In that case the parameter NSTP in the real-time programs should be set to zero and the space mesh and nonlinearity parameters should be identical in both programs. The reader is advised to consult our previous publication where a complete description of the output files is given [2]. A readme.txt file, included in the root directory, explains the procedure to compile and run the programs. We tested our programs on a workstation with two 10-core Intel Xeon E5-2650 v3 CPUs. The parameters used for testing are given in sample input files, provided in the corresponding directory together with the programs. In Table 1 we present wall-clock execution times for runs on 1, 6, and 19 CPU cores for programs compiled using Intel and GNU Fortran compilers. The corresponding columns "Intel speedup" and "GNU speedup" give the ratio of wall-clock execution times of runs on 1 and 19 CPU cores, and denote the actual measured speedup for 19 CPU cores. In all cases and for all numbers of CPU cores, although the GNU Fortran compiler gives excellent results, the Intel Fortran compiler turns out to be slightly faster. Note that during these tests we always ran only a single simulation on a workstation at a time, to avoid any possible interference issues. Therefore, the obtained wall-clock times are more reliable than the ones that could be measured with two or more jobs running simultaneously. We also studied the speedup of the programs as a function of the number of CPU cores used. The performance of the Intel and GNU Fortran compilers is illustrated in Fig. 1, where we plot the speedup and actual wall-clock times as functions of the number of CPU cores for 2d and 3d programs. We see that the speedup increases monotonically with the number of CPU cores in all cases and has large values (between 10 and 14 for 3d programs) for the maximal number of cores. This fully justifies the development of OpenMP programs, which enable much faster and more efficient solving of the GP equation. However, a slow saturation in the speedup with the further increase in the number of CPU cores is observed in all cases, as expected. The speedup tends to increase for programs in higher dimensions, as they become more complex and have to process more data. This is why the speedups of the supplied 2d and 3d programs are larger than those of 1d programs. Also, for a single program the speedup increases with the size of the spatial grid, i.e., with the number of spatial discretization points, since this increases the amount of calculations performed by the program. To demonstrate this, we tested the supplied real2d-th program and varied the number of spatial discretization points NX=NY from 20 to 1000. The measured speedup obtained when running this program on 19 CPU cores as a function of the number of discretization points is shown in Fig. 2. The speedup first increases rapidly with the number of discretization points and eventually saturates. Additional comments: Example inputs provided with the programs take less than 30 minutes to run on a workstation with two Intel Xeon E5-2650 v3 processors (2 QPI links, 10 CPU cores, 25 MB cache, 2.3 GHz).
A portable approach for PIC on emerging architectures
NASA Astrophysics Data System (ADS)
Decyk, Viktor
2016-03-01
A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
NASA Astrophysics Data System (ADS)
Liu, Tianyu; Wolfe, Noah; Lin, Hui; Zieb, Kris; Ji, Wei; Caracappa, Peter; Carothers, Christopher; Xu, X. George
2017-09-01
This paper contains two parts revolving around Monte Carlo transport simulation on Intel Many Integrated Core coprocessors (MIC, also known as Xeon Phi). (1) MCNP 6.1 was recompiled into multithreading (OpenMP) and multiprocessing (MPI) forms respectively without modification to the source code. The new codes were tested on a 60-core 5110P MIC. The test case was FS7ONNi, a radiation shielding problem used in MCNP's verification and validation suite. It was observed that both codes became slower on the MIC than on a 6-core X5650 CPU, by a factor of 4 for the MPI code and, abnormally, 20 for the OpenMP code, and both exhibited limited capability of strong scaling. (2) We have recently added a Constructive Solid Geometry (CSG) module to our ARCHER code to provide better support for geometry modelling in radiation shielding simulation. The functions of this module are frequently called in the particle random walk process. To identify the performance bottleneck we developed a CSG proxy application and profiled the code using the geometry data from FS7ONNi. The profiling data showed that the code was primarily memory latency bound on the MIC. This study suggests that despite low initial porting e_ort, Monte Carlo codes do not naturally lend themselves to the MIC platform — just like to the GPUs, and that the memory latency problem needs to be addressed in order to achieve decent performance gain.
Use of general purpose graphics processing units with MODFLOW
Hughes, Joseph D.; White, Jeremy T.
2013-01-01
To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Fast and Accurate Support Vector Machines on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
Use Computer-Aided Tools to Parallelize Large CFD Applications
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Yan, J.
2000-01-01
Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.
HPC Profiling with the Sun Studio™ Performance Tools
NASA Astrophysics Data System (ADS)
Itzkowitz, Marty; Maruyama, Yukon
In this paper, we describe how to use the Sun Studio Performance Tools to understand the nature and causes of application performance problems. We first explore CPU and memory performance problems for single-threaded applications, giving some simple examples. Then, we discuss multi-threaded performance issues, such as locking and false-sharing of cache lines, in each case showing how the tools can help. We go on to describe OpenMP applications and the support for them in the performance tools. Then we discuss MPI applications, and the techniques used to profile them. Finally, we present our conclusions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
Ibrahim, Khaled Z.; Madduri, Kamesh; Williams, Samuel; ...
2013-07-18
The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This paper presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Finally, our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.
2016-12-01
The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Ojeda-May, Pedro; Nam, Kwangho
2017-08-08
The strategy and implementation of scalable and efficient semiempirical (SE) QM/MM methods in CHARMM are described. The serial version of the code was first profiled to identify routines that required parallelization. Afterward, the code was parallelized and accelerated with three approaches. The first approach was the parallelization of the entire QM/MM routines, including the Fock matrix diagonalization routines, using the CHARMM message passage interface (MPI) machinery. In the second approach, two different self-consistent field (SCF) energy convergence accelerators were implemented using density and Fock matrices as targets for their extrapolations in the SCF procedure. In the third approach, the entire QM/MM and MM energy routines were accelerated by implementing the hybrid MPI/open multiprocessing (OpenMP) model in which both the task- and loop-level parallelization strategies were adopted to balance loads between different OpenMP threads. The present implementation was tested on two solvated enzyme systems (including <100 QM atoms) and an S N 2 symmetric reaction in water. The MPI version exceeded existing SE QM methods in CHARMM, which include the SCC-DFTB and SQUANTUM methods, by at least 4-fold. The use of SCF convergence accelerators further accelerated the code by ∼12-35% depending on the size of the QM region and the number of CPU cores used. Although the MPI version displayed good scalability, the performance was diminished for large numbers of MPI processes due to the overhead associated with MPI communications between nodes. This issue was partially overcome by the hybrid MPI/OpenMP approach which displayed a better scalability for a larger number of CPU cores (up to 64 CPUs in the tested systems).
Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.
Ruymgaart, A Peter; Elber, Ron
2012-11-13
We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).
A Wideband Fast Multipole Method for the two-dimensional complex Helmholtz equation
NASA Astrophysics Data System (ADS)
Cho, Min Hyung; Cai, Wei
2010-12-01
A Wideband Fast Multipole Method (FMM) for the 2D Helmholtz equation is presented. It can evaluate the interactions between N particles governed by the fundamental solution of 2D complex Helmholtz equation in a fast manner for a wide range of complex wave number k, which was not easy with the original FMM due to the instability of the diagonalized conversion operator. This paper includes the description of theoretical backgrounds, the FMM algorithm, software structures, and some test runs. Program summaryProgram title: 2D-WFMM Catalogue identifier: AEHI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4636 No. of bytes in distributed program, including test data, etc.: 82 582 Distribution format: tar.gz Programming language: C Computer: Any Operating system: Any operating system with gcc version 4.2 or newer Has the code been vectorized or parallelized?: Multi-core processors with shared memory RAM: Depending on the number of particles N and the wave number k Classification: 4.8, 4.12 External routines: OpenMP ( http://openmp.org/wp/) Nature of problem: Evaluate interaction between N particles governed by the fundamental solution of 2D Helmholtz equation with complex k. Solution method: Multilevel Fast Multipole Algorithm in a hierarchical quad-tree structure with cutoff level which combines low frequency method and high frequency method. Running time: Depending on the number of particles N, wave number k, and number of cores in CPU. CPU time increases as N log N.
Ultrasonic geometrical characterization of periodically corrugated surfaces.
Liu, Jingfei; Declercq, Nico F
2013-04-01
Accurate characterization of the characteristic dimensions of a periodically corrugated surface using ultrasonic imaging technique is investigated both theoretically and experimentally. The possibility of accurately characterizing the characteristic dimensions is discussed. The condition for accurate characterization and the quantitative relationship between the accuracy and its determining parameters are given. The strategies to avoid diffraction effects instigated by the periodical nature of a corrugated surface are also discussed. Major causes of erroneous measurements are theoretically discussed and experimentally illustrated. A comparison is made between the presented results and the optical measurements, revealing acceptable agreement. This work realistically exposes the capability of the proposed ultrasonic technique to accurately characterize the lateral and vertical characteristic dimensions of corrugated surfaces. Both the general principles developed theoretically as well as the proposed practical techniques may serve as useful guidelines to peers. Copyright © 2012 Elsevier B.V. All rights reserved.
Heterogeneous Hardware Parallelism Review of the IN2P3 2016 Computing School
NASA Astrophysics Data System (ADS)
Lafage, Vincent
2017-11-01
Parallel and hybrid Monte Carlo computation. The Monte Carlo method is the main workhorse for computation of particle physics observables. This paper provides an overview of various HPC technologies that can be used today: multicore (OpenMP, HPX), manycore (OpenCL). The rewrite of a twenty years old Fortran 77 Monte Carlo will illustrate the various programming paradigms in use beyond language implementation. The problem of parallel random number generator will be addressed. We will give a short report of the one week school dedicated to these recent approaches, that took place in École Polytechnique in May 2016.
On the Performance of an Algebraic MultigridSolver on Multicore Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, A H; Schulz, M; Yang, U M
2010-04-29
Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG's performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread and process pinning and correct memory associations. We have implemented most of the techniques in a MultiCore SUPport library (MCSup), which helps to map OpenMP applications to multicore machines. We present results using both an MPI-only and a hybrid MPI/OpenMP model.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2016-01-01
In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Parallel implementation of approximate atomistic models of the AMOEBA polarizable model
NASA Astrophysics Data System (ADS)
Demerdash, Omar; Head-Gordon, Teresa
2016-11-01
In this work we present a replicated data hybrid OpenMP/MPI implementation of a hierarchical progression of approximate classical polarizable models that yields speedups of up to ∼10 compared to the standard OpenMP implementation of the exact parent AMOEBA polarizable model. In addition, our parallel implementation exhibits reasonable weak and strong scaling. The resulting parallel software will prove useful for those who are interested in how molecular properties converge in the condensed phase with respect to the MBE, it provides a fruitful test bed for exploring different electrostatic embedding schemes, and offers an interesting possibility for future exascale computing paradigms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Messer, Bronson; Harris, James A; Parete-Koon, Suzanne T
We describe recent development work on the core-collapse supernova code CHIMERA. CHIMERA has consumed more than 100 million cpu-hours on Oak Ridge Leadership Computing Facility (OLCF) platforms in the past 3 years, ranking it among the most important applications at the OLCF. Most of the work described has been focused on exploiting the multicore nature of the current platform (Jaguar) via, e.g., multithreading using OpenMP. In addition, we have begun a major effort to marshal the computational power of GPUs with CHIMERA. The impending upgrade of Jaguar to Titan a 20+ PF machine with an NVIDIA GPU on many nodesmore » makes this work essential.« less
Performance Portability Strategies for Grid C++ Expression Templates
NASA Astrophysics Data System (ADS)
Boyle, Peter A.; Clark, M. A.; DeTar, Carleton; Lin, Meifeng; Rana, Verinder; Vaquero Avilés-Casco, Alejandro
2018-03-01
One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.
3D-PDR: Three-dimensional photodissociation region code
NASA Astrophysics Data System (ADS)
Bisbas, T. G.; Bell, T. A.; Viti, S.; Yates, J.; Barlow, M. J.
2018-03-01
3D-PDR is a three-dimensional photodissociation region code written in Fortran. It uses the Sundials package (written in C) to solve the set of ordinary differential equations and it is the successor of the one-dimensional PDR code UCL_PDR (ascl:1303.004). Using the HEALpix ray-tracing scheme (ascl:1107.018), 3D-PDR solves a three-dimensional escape probability routine and evaluates the attenuation of the far-ultraviolet radiation in the PDR and the propagation of FIR/submm emission lines out of the PDR. The code is parallelized (OpenMP) and can be applied to 1D and 3D problems.
XaNSoNS: GPU-accelerated simulator of diffraction patterns of nanoparticles
NASA Astrophysics Data System (ADS)
Neverov, V. S.
XaNSoNS is an open source software with GPU support, which simulates X-ray and neutron 1D (or 2D) diffraction patterns and pair-distribution functions (PDF) for amorphous or crystalline nanoparticles (up to ∼107 atoms) of heterogeneous structural content. Among the multiple parameters of the structure the user may specify atomic displacements, site occupancies, molecular displacements and molecular rotations. The software uses general equations nonspecific to crystalline structures to calculate the scattering intensity. It supports four major standards of parallel computing: MPI, OpenMP, Nvidia CUDA and OpenCL, enabling it to run on various architectures, from CPU-based HPCs to consumer-level GPUs.
Research on bathymetry estimation by Worldview-2 based with the semi-analytical model
NASA Astrophysics Data System (ADS)
Sheng, L.; Bai, J.; Zhou, G.-W.; Zhao, Y.; Li, Y.-C.
2015-04-01
South Sea Islands of China are far away from the mainland, the reefs takes more than 95% of south sea, and most reefs scatter over interested dispute sensitive area. Thus, the methods of obtaining the reefs bathymetry accurately are urgent to be developed. Common used method, including sonar, airborne laser and remote sensing estimation, are limited by the long distance, large area and sensitive location. Remote sensing data provides an effective way for bathymetry estimation without touching over large area, by the relationship between spectrum information and bathymetry. Aimed at the water quality of the south sea of China, our paper develops a bathymetry estimation method without measured water depth. Firstly the semi-analytical optimization model of the theoretical interpretation models has been studied based on the genetic algorithm to optimize the model. Meanwhile, OpenMP parallel computing algorithm has been introduced to greatly increase the speed of the semi-analytical optimization model. One island of south sea in China is selected as our study area, the measured water depth are used to evaluate the accuracy of bathymetry estimation from Worldview-2 multispectral images. The results show that: the semi-analytical optimization model based on genetic algorithm has good results in our study area;the accuracy of estimated bathymetry in the 0-20 meters shallow water area is accepted.Semi-analytical optimization model based on genetic algorithm solves the problem of the bathymetry estimation without water depth measurement. Generally, our paper provides a new bathymetry estimation method for the sensitive reefs far away from mainland.
NASA Astrophysics Data System (ADS)
Zhao, G.; Liu, J.; Chen, B.; Guo, R.; Chen, L.
2017-12-01
Forward modeling of gravitational fields at large-scale requires to consider the curvature of the Earth and to evaluate the Newton's volume integral in spherical coordinates. To acquire fast and accurate gravitational effects for subsurface structures, subsurface mass distribution is usually discretized into small spherical prisms (called tesseroids). The gravity fields of tesseroids are generally calculated numerically. One of the commonly used numerical methods is the 3D Gauss-Legendre quadrature (GLQ). However, the traditional GLQ integration suffers from low computational efficiency and relatively poor accuracy when the observation surface is close to the source region. We developed a fast and high accuracy 3D GLQ integration based on the equivalence of kernel matrix, adaptive discretization and parallelization using OpenMP. The equivalence of kernel matrix strategy increases efficiency and reduces memory consumption by calculating and storing the same matrix elements in each kernel matrix just one time. In this method, the adaptive discretization strategy is used to improve the accuracy. The numerical investigations show that the executing time of the proposed method is reduced by two orders of magnitude compared with the traditional method that without these optimized strategies. High accuracy results can also be guaranteed no matter how close the computation points to the source region. In addition, the algorithm dramatically reduces the memory requirement by N times compared with the traditional method, where N is the number of discretization of the source region in the longitudinal direction. It makes the large-scale gravity forward modeling and inversion with a fine discretization possible.
NASA Astrophysics Data System (ADS)
Tsai, Y. L.; Wu, T. R.; Lin, C. Y.; Chuang, M. H.; Lin, C. W.
2016-02-01
An ideal storm surge operational model should feature as: 1. Large computational domain which covers the complete typhoon life cycle. 2. Supporting both parametric and atmospheric models. 3. Capable of calculating inundation area for risk assessment. 4. Tides are included for accurate inundation simulation. Literature review shows that not many operational models reach the goals for the fast calculation, and most of the models have limited functions. In this paper, a well-developed COMCOT (COrnell Multi-grid Coupled of Tsunami Model) tsunami model is chosen as the kernel to establish a storm surge model which solves the nonlinear shallow water equations on both spherical and Cartesian coordinates directly. The complete evolution of storm surge including large-scale propagation and small-scale offshore run-up can be simulated by nested-grid scheme. The global tide model TPXO 7.2 established by Oregon State University is coupled to provide astronomical boundary conditions. The atmospheric model named WRF (Weather Research and Forecasting Model) is also coupled to provide metrological fields. The high-efficiency thin-film method is adopted to evaluate the storm surge inundation. Our in-house model has been optimized by OpenMp (Open Multi-Processing) with the performance which is 10 times faster than the original version and makes it an early-warning storm surge model. In this study, the thorough simulation of 2013 Typhoon Haiyan is performed. The detailed results will be presented in Oceanic Science Meeting of 2016 in terms of surge propagation and high-resolution inundation areas.
KITTEN Lightweight Kernel 0.1 Beta
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pedretti, Kevin; Levenhagen, Michael; Kelly, Suzanne
2007-12-12
The Kitten Lightweight Kernel is a simplified OS (operating system) kernel that is intended to manage a compute node's hardware resources. It provides a set of mechanisms to user-level applications for utilizing hardware resources (e.g., allocating memory, creating processes, accessing the network). Kitten is much simpler than general-purpose OS kernels, such as Linux or Windows, but includes all of the esssential functionality needed to support HPC (high-performance computing) MPI, PGAS and OpenMP applications. Kitten provides unique capabilities such as physically contiguous application memory, transparent large page support, and noise-free tick-less operation, which enable HPC applications to obtain greater efficiency andmore » scalability than with general purpose OS kernels.« less
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
NASA Astrophysics Data System (ADS)
DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug
2018-03-01
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
2017-09-01
ER D C/ CH L TR -1 7- 15 Strategic Environmental Research and Development Program Develop Accurate Methods for Characterizing and...current environments. This research will provide more accurate methods for assessing contaminated sediment stability for many DoD and Environmental...47.88026 pascals yards 0.9144 meters ERDC/CHL TR-17-15 xi Executive Summary Objective The proposed research goal is to develop laboratory methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan
2010-01-01
Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less
Liu, Yan-Chun; Xiao, Sa; Yang, Kun; Ling, Li; Sun, Zhi-Liang; Liu, Zhao-Ying
2017-06-01
This study reports an applicable analytical strategy of comprehensive identification and structure characterization of target components from Gelsemium elegans by using high-performance liquid chromatography quadrupole time-of-flight mass spectrometry (LC-QqTOF MS) based on the use of accurate mass databases combined with MS/MS spectra. The databases created included accurate masses and elemental compositions of 204 components from Gelsemium and their structural data. The accurate MS and MS/MS spectra were acquired through data-dependent auto MS/MS mode followed by an extraction of the potential compounds from the LC-QqTOF MS raw data of the sample. The same was matched using the databases to search for targeted components in the sample. The structures for detected components were tentatively characterized by manually interpreting the accurate MS/MS spectra for the first time. A total of 57 components have been successfully detected and structurally characterized from the crude extracts of G. elegans, but has failed to differentiate some isomers. This analytical strategy is generic and efficient, avoids isolation and purification procedures, enables a comprehensive structure characterization of target components of Gelsemium and would be widely applicable for complicated mixtures that are derived from Gelsemium preparations. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
2017-09-01
ER D C/ CH L TR -1 7- 15 Strategic Environmental Research and Development Program Develop Accurate Methods for Characterizing and...current environments. This research will provide more accurate methods for assessing contaminated sediment stability for many DoD and Environmental...47.88026 pascals yards 0.9144 meters ERDC/CHL TR-17-15 xi Executive Summary Objective The proposed research goal is to develop laboratory methods
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].
Furuta, Takuya; Sato, Tatsuhiko
2015-01-01
Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Exact diagonalization of quantum lattice models on coprocessors
NASA Astrophysics Data System (ADS)
Siro, T.; Harju, A.
2016-10-01
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Fast 3D Surface Extraction 2 pages (including abstract)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sewell, Christopher Meyer; Patchett, John M.; Ahrens, James P.
Ocean scientists searching for isosurfaces and/or thresholds of interest in high resolution 3D datasets required a tedious and time-consuming interactive exploration experience. PISTON research and development activities are enabling ocean scientists to rapidly and interactively explore isosurfaces and thresholds in their large data sets using a simple slider with real time calculation and visualization of these features. Ocean Scientists can now visualize more features in less time, helping them gain a better understanding of the high resolution data sets they work with on a daily basis. Isosurface timings (512{sup 3} grid): VTK 7.7 s, Parallel VTK (48-core) 1.3 s, PISTONmore » OpenMP (48-core) 0.2 s, PISTON CUDA (Quadro 6000) 0.1 s.« less
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Quantitative Phase Microscopy for Accurate Characterization of Microlens Arrays
NASA Astrophysics Data System (ADS)
Grilli, Simonetta; Miccio, Lisa; Merola, Francesco; Finizio, Andrea; Paturzo, Melania; Coppola, Sara; Vespini, Veronica; Ferraro, Pietro
Microlens arrays are of fundamental importance in a wide variety of applications in optics and photonics. This chapter deals with an accurate digital holography-based characterization of both liquid and polymeric microlenses fabricated by an innovative pyro-electrowetting process. The actuation of liquid and polymeric films is obtained through the use of pyroelectric charges generated into polar dielectric lithium niobate crystals.
a Modeling Method of Fluttering Leaves Based on Point Cloud
NASA Astrophysics Data System (ADS)
Tang, J.; Wang, Y.; Zhao, Y.; Hao, W.; Ning, X.; Lv, K.; Shi, Z.; Zhao, M.
2017-09-01
Leaves falling gently or fluttering are common phenomenon in nature scenes. The authenticity of leaves falling plays an important part in the dynamic modeling of natural scenes. The leaves falling model has a widely applications in the field of animation and virtual reality. We propose a novel modeling method of fluttering leaves based on point cloud in this paper. According to the shape, the weight of leaves and the wind speed, three basic trajectories of leaves falling are defined, which are the rotation falling, the roll falling and the screw roll falling. At the same time, a parallel algorithm based on OpenMP is implemented to satisfy the needs of real-time in practical applications. Experimental results demonstrate that the proposed method is amenable to the incorporation of a variety of desirable effects.
NASA Astrophysics Data System (ADS)
Greynolds, Alan W.
2013-09-01
Results from the GelOE optical engineering software are presented for the through-focus, monochromatic coherent and polychromatic incoherent imaging of a radial "star" target for equivalent t-number circular and Gaussian pupils. The FFT-based simulations are carried out using OpenMP threading on a multi-core desktop computer, with and without the aid of a many-core NVIDIA GPU accessing its cuFFT library. It is found that a custom FFT optimized for the 12-core host has similar performance to a simply implemented 256-core GPU FFT. A more sophisticated version of the latter but tuned to reduce overhead on a 448-core GPU is 20 to 28 times faster than a basic FFT implementation running on one CPU core.
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers
NASA Astrophysics Data System (ADS)
Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori
OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.
NASA Astrophysics Data System (ADS)
van Reenen, Alexander; Gao, Yang; Bos, Arjen H.; de Jong, Arthur M.; Hulsen, Martien A.; den Toonder, Jaap M. J.; Prins, Menno W. J.
2013-07-01
The application of magnetic particles in biomedical research and in-vitro diagnostics requires accurate characterization of their magnetic properties, with single-particle resolution and good statistics. Here, we report intra-pair magnetophoresis as a method to accurately quantify the field-dependent magnetic moments of magnetic particles and to rapidly generate histograms of the magnetic moments with good statistics. We demonstrate our method with particles of different sizes and from different sources, with a measurement precision of a few percent. We expect that intra-pair magnetophoresis will be a powerful tool for the characterization and improvement of particles for the upcoming field of particle-based nanobiotechnology.
NASA Astrophysics Data System (ADS)
Sandalski, Stou
Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named
NASA Astrophysics Data System (ADS)
Galiatsatos, P. G.; Tennyson, J.
2012-11-01
The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
NASA Astrophysics Data System (ADS)
Fehr, M.; Navarro, V.; Martin, L.; Fletcher, E.
2013-08-01
Space Situational Awareness[8] (SSA) is defined as the comprehensive knowledge, understanding and maintained awareness of the population of space objects, the space environment and existing threats and risks. As ESA's SSA Conjunction Prediction Service (CPS) requires the repetitive application of a processing algorithm against a data set of man-made space objects, it is crucial to exploit the highly parallelizable nature of this problem. Currently the CPS system makes use of OpenMP[7] for parallelization purposes using CPU threads, but only a GPU with its hundreds of cores can fully benefit from such high levels of parallelism. This paper presents the adaptation of several core algorithms[5] of the CPS for general-purpose computing on graphics processing units (GPGPU) using NVIDIAs Compute Unified Device Architecture (CUDA).
Parallel fast multipole boundary element method applied to computational homogenization
NASA Astrophysics Data System (ADS)
Ptaszny, Jacek
2018-01-01
In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.
NASA Technical Reports Server (NTRS)
Clune, Tom
2014-01-01
This tutorial will introduce Fortran developers to unit-testing and test-driven development (TDD) using pFUnit. As with other unit-testing frameworks, pFUnit, simplifies the process of writing, collecting, and executing tests while providing clear diagnostic messages for failing tests. pFUnit specifically targets the development of scientific-technical software written in Fortran and includes customized features such as: assertions for multi-dimensional arrays, distributed (MPI) and thread-based (OpenMP) parallellism, and flexible parameterized tests.These sessions will include numerous examples and hands-on exercises that gradually build in complexity. Attendees are expected to have working knowledge of F90, but familiarity with object-oriented syntax in F2003 and MPI will be of benefit for the more advanced examples. By the end of the tutorial the audience should feel comfortable in applying pFUnit within their own development environment.
NASA Astrophysics Data System (ADS)
Payne, Joshua; Taitano, William; Knoll, Dana; Liebs, Chris; Murthy, Karthik; Feltman, Nicolas; Wang, Yijie; McCarthy, Colleen; Cieren, Emanuel
2012-10-01
In order to solve problems such as the ion coalescence and slow MHD shocks fully kinetically we developed a fully implicit 2D energy and charge conserving electromagnetic PIC code, PlasmaApp2D. PlasmaApp2D differs from previous implicit PIC implementations in that it will utilize advanced architectures such as GPUs and shared memory CPU systems, with problems too large to fit into cache. PlasmaApp2D will be a hybrid CPU-GPU code developed primarily to run on the DARWIN cluster at LANL utilizing four 12-core AMD Opteron CPUs and two NVIDIA Tesla GPUs per node. MPI will be used for cross-node communication, OpenMP will be used for on-node parallelism, and CUDA will be used for the GPUs. Development progress and initial results will be presented.
Characterization of Alaskan HMA mixtures with the simple performance tester.
DOT National Transportation Integrated Search
2014-05-01
Material characterization provides basic and essential information for pavement design and the evaluation of hot mix asphalt (HMA). : This study focused on the accurate characterization of an Alaskan HMA mixture using an asphalt mixture performance t...
Magnetic Field Generation and B-Dot Sensor Characterization in the High Frequency Band
2012-03-01
date Dr. Andrew J, Terzuoli, PhD (Member) date Dr. Michael J. Havrilla, PhD (Member) date AFIT/GE/ENG/12-20 Abstract Designing a high frequency ( HF ...large wavelengths in the HF range make it difficult to accurately estimate from which direction a magnetic field is emitting. Accurate DF estimates are...necessary for search and rescue operations and geolocating RF emitters of interest. The primary goal of this research is to characterize the
DOE Office of Scientific and Technical Information (OSTI.GOV)
Puzzarini, Cristina; Biczysko, Malgorzata; Bloino, Julien
2014-04-20
In an effort to provide an accurate spectroscopic characterization of oxirane, state-of-the-art computational methods and approaches have been employed to determine highly accurate fundamental vibrational frequencies and rotational parameters. Available experimental data were used to assess the reliability of our computations, and an accuracy on average of 10 cm{sup –1} for fundamental transitions as well as overtones and combination bands has been pointed out. Moving to rotational spectroscopy, relative discrepancies of 0.1%, 2%-3%, and 3%-4% were observed for rotational, quartic, and sextic centrifugal-distortion constants, respectively. We are therefore confident that the highly accurate spectroscopic data provided herein can be usefulmore » for identification of oxirane in Titan's atmosphere and the assignment of unidentified infrared bands. Since oxirane was already observed in the interstellar medium and some astronomical objects are characterized by very high D/H ratios, we also considered the accurate determination of the spectroscopic parameters for the mono-deuterated species, oxirane-d1. For the latter, an empirical scaling procedure allowed us to improve our computed data and to provide predictions for rotational transitions with a relative accuracy of about 0.02% (i.e., an uncertainty of about 40 MHz for a transition lying at 200 GHz).« less
Moss, R; Zarebski, A; Dawson, P; McCAW, J M
2017-01-01
Accurate forecasting of seasonal influenza epidemics is of great concern to healthcare providers in temperate climates, since these epidemics vary substantially in their size, timing and duration from year to year, making it a challenge to deliver timely and proportionate responses. Previous studies have shown that Bayesian estimation techniques can accurately predict when an influenza epidemic will peak many weeks in advance, and we have previously tailored these methods for metropolitan Melbourne (Australia) and Google Flu Trends data. Here we extend these methods to clinical observation and laboratory-confirmation data for Melbourne, on the grounds that these data sources provide more accurate characterizations of influenza activity. We show that from each of these data sources we can accurately predict the timing of the epidemic peak 4-6 weeks in advance. We also show that making simultaneous use of multiple surveillance systems to improve forecast skill remains a fundamental challenge. Disparate systems provide complementary characterizations of disease activity, which may or may not be comparable, and it is unclear how a 'ground truth' for evaluating forecasts against these multiple characterizations might be defined. These findings are a significant step towards making optimal use of routine surveillance data for outbreak forecasting.
Tian, Ye; Schwieters, Charles D; Opella, Stanley J; Marassi, Francesca M
2017-01-01
Structure determination of proteins by NMR is unique in its ability to measure restraints, very accurately, in environments and under conditions that closely mimic those encountered in vivo. For example, advances in solid-state NMR methods enable structure determination of membrane proteins in detergent-free lipid bilayers, and of large soluble proteins prepared by sedimentation, while parallel advances in solution NMR methods and optimization of detergent-free lipid nanodiscs are rapidly pushing the envelope of the size limit for both soluble and membrane proteins. These experimental advantages, however, are partially squandered during structure calculation, because the commonly used force fields are purely repulsive and neglect solvation, Van der Waals forces and electrostatic energy. Here we describe a new force field, and updated energy functions, for protein structure calculations with EEFx implicit solvation, electrostatics, and Van der Waals Lennard-Jones forces, in the widely used program Xplor-NIH. The new force field is based primarily on CHARMM22, facilitating calculations with a wider range of biomolecules. The new EEFx energy function has been rewritten to enable OpenMP parallelism, and optimized to enhance computation efficiency. It implements solvation, electrostatics, and Van der Waals energy terms together, thus ensuring more consistent and efficient computation of the complete nonbonded energy lists. Updates in the related python module allow detailed analysis of the interaction energies and associated parameters. The new force field and energy function work with both soluble proteins and membrane proteins, including those with cofactors or engineered tags, and are very effective in situations where there are sparse experimental restraints. Results obtained for NMR-restrained calculations with a set of five soluble proteins and five membrane proteins show that structures calculated with EEFx have significant improvements in accuracy, precision, and conformation, and that structure refinement can be obtained by short relaxation with EEFx to obtain improvements in these key metrics. These developments broaden the range of biomolecular structures that can be calculated with high fidelity from NMR restraints.
Computation of Calcium Score with Dual Energy CT: A Phantom Study
Kumar, Vidhya; Min, James K.; He, Xin; Raman, Subha V.
2016-01-01
Dual energy computed tomography (DECT) improves material and tissue characterization compared to single energy CT (SECT); we sought to validate coronary calcium quantification in advancing cardiovascular DECT. In an anthropomorphic phantom, agreement between measurements was excellent, and Bland-Altman analysis demonstrated minimal bias. Compared to the known calcium mass for each phantom, calcium mass by DECT was highly accurate. Noncontrast DECT yields accurate calcium measures, and warrants consideration in cardiac protocols for additional tissue characterizations. PMID:27680414
Accurate mode characterization of two-mode optical fibers by in-fiber acousto-optics.
Alcusa-Sáez, E; Díez, A; Andrés, M V
2016-03-07
Acousto-optic interaction in optical fibers is exploited for the accurate and broadband characterization of two-mode optical fibers. Coupling between LP 01 and LP 1m modes is produced in a broadband wavelength range. Difference in effective indices, group indices, and chromatic dispersions between the guided modes, are obtained from experimental measurements. Additionally, we show that the technique is suitable to investigate the fine modes structure of LP modes, and some other intriguing features related with modes' cut-off.
Nano-Scale Characterization of Al-Mg Nanocrystalline Alloys
NASA Astrophysics Data System (ADS)
Harvey, Evan; Ladani, Leila
Materials with nano-scale microstructure have become increasingly popular due to their benefit of substantially increased strengths. The increase in strength as a result of decreasing grain size is defined by the Hall-Petch equation. With increased interest in miniaturization of components, methods of mechanical characterization of small volumes of material are necessary because traditional means such as tensile testing becomes increasingly difficult with such small test specimens. This study seeks to characterize elastic-plastic properties of nanocrystalline Al-5083 through nanoindentation and related data analysis techniques. By using nanoindentation, accurate predictions of the elastic modulus and hardness of the alloy were attained. Also, the employed data analysis model provided reasonable estimates of the plastic properties (strain-hardening exponent and yield stress) lending credibility to this procedure as an accurate, full mechanical characterization method.
Ice-sheet modelling accelerated by graphics cards
NASA Astrophysics Data System (ADS)
Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek
2014-11-01
Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.
Massively parallel sparse matrix function calculations with NTPoly
NASA Astrophysics Data System (ADS)
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shipman, Galen M.
These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
Kernel optimization for short-range molecular dynamics
NASA Astrophysics Data System (ADS)
Hu, Changjun; Wang, Xianmeng; Li, Jianjiang; He, Xinfu; Li, Shigang; Feng, Yangde; Yang, Shaofeng; Bai, He
2017-02-01
To optimize short-range force computations in Molecular Dynamics (MD) simulations, multi-threading and SIMD optimizations are presented in this paper. With respect to multi-threading optimization, a Partition-and-Separate-Calculation (PSC) method is designed to avoid write conflicts caused by using Newton's third law. Serial bottlenecks are eliminated with no additional memory usage. The method is implemented by using the OpenMP model. Furthermore, the PSC method is employed on Intel Xeon Phi coprocessors in both native and offload models. We also evaluate the performance of the PSC method under different thread affinities on the MIC architecture. In the SIMD execution, we explain the performance influence in the PSC method, considering the "if-clause" of the cutoff radius check. The experiment results show that our PSC method is relatively more efficient compared to some traditional methods. In double precision, our 256-bit SIMD implementation is about 3 times faster than the scalar version.
A Locality-Based Threading Algorithm for the Configuration-Interaction Method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Johnson, Calvin
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
A Locality-Based Threading Algorithm for the Configuration-Interaction Method
Shan, Hongzhang; Williams, Samuel; Johnson, Calvin; ...
2017-07-03
The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
BCYCLIC: A parallel block tridiagonal matrix cyclic solver
NASA Astrophysics Data System (ADS)
Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.
2010-09-01
A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...
2013-01-01
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less
GANDALF - Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids
NASA Astrophysics Data System (ADS)
Hubber, D. A.; Rosotti, G. P.; Booth, R. A.
2018-01-01
GANDALF is a new hydrodynamics and N-body dynamics code designed for investigating planet formation, star formation and star cluster problems. GANDALF is written in C++, parallelized with both OPENMP and MPI and contains a PYTHON library for analysis and visualization. The code has been written with a fully object-oriented approach to easily allow user-defined implementations of physics modules or other algorithms. The code currently contains implementations of smoothed particle hydrodynamics, meshless finite-volume and collisional N-body schemes, but can easily be adapted to include additional particle schemes. We present in this paper the details of its implementation, results from the test suite, serial and parallel performance results and discuss the planned future development. The code is freely available as an open source project on the code-hosting website github at https://github.com/gandalfcode/gandalf and is available under the GPLv2 license.
Parallel transformation of K-SVD solar image denoising algorithm
NASA Astrophysics Data System (ADS)
Liang, Youwen; Tian, Yu; Li, Mei
2017-02-01
The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
NASA Astrophysics Data System (ADS)
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke
2018-01-01
Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.
Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU
NASA Astrophysics Data System (ADS)
Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang
2017-10-01
Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.
SToRM: A Model for 2D environmental hydraulics
Simões, Francisco J. M.
2017-01-01
A two-dimensional (depth-averaged) finite volume Godunov-type shallow water model developed for flow over complex topography is presented. The model, SToRM, is based on an unstructured cell-centered finite volume formulation and on nonlinear strong stability preserving Runge-Kutta time stepping schemes. The numerical discretization is founded on the classical and well established shallow water equations in hyperbolic conservative form, but the convective fluxes are calculated using auto-switching Riemann and diffusive numerical fluxes. Computational efficiency is achieved through a parallel implementation based on the OpenMP standard and the Fortran programming language. SToRM’s implementation within a graphical user interface is discussed. Field application of SToRM is illustrated by utilizing it to estimate peak flow discharges in a flooding event of the St. Vrain Creek in Colorado, U.S.A., in 2013, which reached 850 m3/s (~30,000 f3 /s) at the location of this study.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Langer, Steven H.; Karlin, Ian; Marinak, Marty M.
HYDRA is used to simulate a variety of experiments carried out at the National Ignition Facility (NIF) [4] and other high energy density physics facilities. HYDRA has packages to simulate radiation transfer, atomic physics, hydrodynamics, laser propagation, and a number of other physics effects. HYDRA has over one million lines of code and includes both MPI and thread-level (OpenMP and pthreads) parallelism. This paper measures the performance characteristics of HYDRA using hardware counters on an IBM BlueGene/Q system. We report key ratios such as bytes/instruction and memory bandwidth for several different physics packages. The total number of bytes read andmore » written per time step is also reported. We show that none of the packages which use significant time are memory bandwidth limited on a Blue Gene/Q. HYDRA currently issues very few SIMD instructions. The pressure on memory bandwidth will increase if high levels of SIMD instructions can be achieved.« less
Automation of Data Traffic Control on DSM Architecture
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry
2001-01-01
The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
LAMMPS strong scaling performance optimization on Blue Gene/Q
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coffman, Paul; Jiang, Wei; Romero, Nichols A.
2014-11-12
LAMMPS "Large-scale Atomic/Molecular Massively Parallel Simulator" is an open-source molecular dynamics package from Sandia National Laboratories. Significant performance improvements in strong-scaling and time-to-solution for this application on IBM's Blue Gene/Q have been achieved through computational optimizations of the OpenMP versions of the short-range Lennard-Jones term of the CHARMM force field and the long-range Coulombic interaction implemented with the PPPM (particle-particle-particle mesh) algorithm, enhanced by runtime parameter settings controlling thread utilization. Additionally, MPI communication performance improvements were made to the PPPM calculation by re-engineering the parallel 3D FFT to use MPICH collectives instead of point-to-point. Performance testing was done using anmore » 8.4-million atom simulation scaling up to 16 racks on the Mira system at Argonne Leadership Computing Facility (ALCF). Speedups resulting from this effort were in some cases over 2x.« less
Model-independent partial wave analysis using a massively-parallel fitting framework
NASA Astrophysics Data System (ADS)
Sun, L.; Aoude, R.; dos Reis, A. C.; Sokoloff, M.
2017-10-01
The functionality of GooFit, a GPU-friendly framework for doing maximum-likelihood fits, has been extended to extract model-independent {\\mathscr{S}}-wave amplitudes in three-body decays such as D + → h + h + h -. A full amplitude analysis is done where the magnitudes and phases of the {\\mathscr{S}}-wave amplitudes are anchored at a finite number of m 2(h + h -) control points, and a cubic spline is used to interpolate between these points. The amplitudes for {\\mathscr{P}}-wave and {\\mathscr{D}}-wave intermediate states are modeled as spin-dependent Breit-Wigner resonances. GooFit uses the Thrust library, with a CUDA backend for NVIDIA GPUs and an OpenMP backend for threads with conventional CPUs. Performance on a variety of platforms is compared. Executing on systems with GPUs is typically a few hundred times faster than executing the same algorithm on a single CPU.
GPU accelerated implementation of NCI calculations using promolecular density.
Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric
2017-05-30
The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Gregarious Data Re-structuring in a Many Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres
this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less
Proceeding On : Parallelisation Of Critical Code Passages In PHOENIX/3D
NASA Astrophysics Data System (ADS)
Arkenberg, Mario; Wichert, Viktoria; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach here, by introducing especially adapted, parallel numerical methods and correspondingly parallelising time critical code passages. In the following, we present our work on PHOENIX/3D.While parallelisation is generally worthwhile, it requires revision of time-consuming subroutines with respect to separability of localised data and variables in order to determine the optimal approach. Of course, the same applies to the code structure. The importance of this ongoing work can be showcased by recently derived benchmark results, which were generated utilis- ing MPI and OpenMP. Furthermore, the need for a careful and thorough choice of an adequate, machine dependent setup is discussed.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panyala, Ajay; Chavarría-Miranda, Daniel; Manzano, Joseph B.
High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structuredmore » locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.« less
Methods for characterizing convective cryoprobe heat transfer in ultrasound gel phantoms.
Etheridge, Michael L; Choi, Jeunghwan; Ramadhyani, Satish; Bischof, John C
2013-02-01
While cryosurgery has proven capable in treating of a variety of conditions, it has met with some resistance among physicians, in part due to shortcomings in the ability to predict treatment outcomes. Here we attempt to address several key issues related to predictive modeling by demonstrating methods for accurately characterizing heat transfer from cryoprobes, report temperature dependent thermal properties for ultrasound gel (a convenient tissue phantom) down to cryogenic temperatures, and demonstrate the ability of convective exchange heat transfer boundary conditions to accurately describe freezing in the case of single and multiple interacting cryoprobe(s). Temperature dependent changes in the specific heat and thermal conductivity for ultrasound gel are reported down to -150 °C for the first time here and these data were used to accurately describe freezing in ultrasound gel in subsequent modeling. Freezing around a single and two interacting cryoprobe(s) was characterized in the ultrasound gel phantom by mapping the temperature in and around the "iceball" with carefully placed thermocouple arrays. These experimental data were fit with finite-element modeling in COMSOL Multiphysics, which was used to investigate the sensitivity and effectiveness of convective boundary conditions in describing heat transfer from the cryoprobes. Heat transfer at the probe tip was described in terms of a convective coefficient and the cryogen temperature. While model accuracy depended strongly on spatial (i.e., along the exchange surface) variation in the convective coefficient, it was much less sensitive to spatial and transient variations in the cryogen temperature parameter. The optimized fit, convective exchange conditions for the single-probe case also provided close agreement with the experimental data for the case of two interacting cryoprobes, suggesting that this basic characterization and modeling approach can be extended to accurately describe more complicated, multiprobe freezing geometries. Accurately characterizing cryoprobe behavior in phantoms requires detailed knowledge of the freezing medium's properties throughout the range of expected temperatures and an appropriate description of the heat transfer across the probe's exchange surfaces. Here we demonstrate that convective exchange boundary conditions provide an accurate and versatile description of heat transfer from cryoprobes, offering potential advantages over the traditional constant surface heat flux and constant surface temperature descriptions. In addition, although this study was conducted on Joule-Thomson type cryoprobes, the general methodologies should extend to any probe that is based on convective exchange with a cryogenic fluid.
Optimized multiple quantum MAS lineshape simulations in solid state NMR
NASA Astrophysics Data System (ADS)
Brouwer, William J.; Davis, Michael C.; Mueller, Karl T.
2009-10-01
The majority of nuclei available for study in solid state Nuclear Magnetic Resonance have half-integer spin I>1/2, with corresponding electric quadrupole moment. As such, they may couple with a surrounding electric field gradient. This effect introduces anisotropic line broadening to spectra, arising from distinct chemical species within polycrystalline solids. In Multiple Quantum Magic Angle Spinning (MQMAS) experiments, a second frequency dimension is created, devoid of quadrupolar anisotropy. As a result, the center of gravity of peaks in the high resolution dimension is a function of isotropic second order quadrupole and chemical shift alone. However, for complex materials, these parameters take on a stochastic nature due in turn to structural and chemical disorder. Lineshapes may still overlap in the isotropic dimension, complicating the task of assignment and interpretation. A distributed computational approach is presented here which permits simulation of the two-dimensional MQMAS spectrum, generated by random variates from model distributions of isotropic chemical and quadrupole shifts. Owing to the non-convex nature of the residual sum of squares (RSS) function between experimental and simulated spectra, simulated annealing is used to optimize the simulation parameters. In this manner, local chemical environments for disordered materials may be characterized, and via a re-sampling approach, error estimates for parameters produced. Program summaryProgram title: mqmasOPT Catalogue identifier: AEEC_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEC_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3650 No. of bytes in distributed program, including test data, etc.: 73 853 Distribution format: tar.gz Programming language: C, OCTAVE Computer: UNIX/Linux Operating system: UNIX/Linux Has the code been vectorised or parallelized?: Yes RAM: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 3.5M, SMP AMD opteron Classification: 2.3 External routines: OCTAVE ( http://www.gnu.org/software/octave/), GNU Scientific Library ( http://www.gnu.org/software/gsl/), OPENMP ( http://openmp.org/wp/) Nature of problem: The optimal simulation and modeling of multiple quantum magic angle spinning NMR spectra, for general systems, especially those with mild to significant disorder. The approach outlined and implemented in C and OCTAVE also produces model parameter error estimates. Solution method: A model for each distinct chemical site is first proposed, for the individual contribution of crystallite orientations to the spectrum. This model is averaged over all powder angles [1], as well as the (stochastic) parameters; isotropic chemical shift and quadrupole coupling constant. The latter is accomplished via sampling from a bi-variate Gaussian distribution, using the Box-Muller algorithm to transform Sobol (quasi) random numbers [2]. A simulated annealing optimization is performed, and finally the non-linear jackknife [3] is applied in developing model parameter error estimates. Additional comments: The distribution contains a script, mqmasOpt.m, which runs in the OCTAVE language workspace. Running time: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 58.35 seconds, SMP AMD opteron. References:S.K. Zaremba, Annali di Matematica Pura ed Applicata 73 (1966) 293. H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods, SIAM, 1992. T. Fox, D. Hinkley, K. Larntz, Technometrics 22 (1980) 29.
Waste Characterization Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vigil-Holterman, Luciana R.; Naranjo, Felicia Danielle
2016-02-02
This report discusses ways to classify waste as outlined by LANL. Waste Generators must make a waste determination and characterize regulated waste by appropriate analytical testing or use of acceptable knowledge (AK). Use of AK for characterization requires several source documents. Waste characterization documentation must be accurate, sufficient, and current (i.e., updated); relevant and traceable to the waste stream’s generation, characterization, and management; and not merely a list of information sources.
NASA Astrophysics Data System (ADS)
Dey, T.; Rodrigue, P.
2015-07-01
We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.
NASA Astrophysics Data System (ADS)
Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa
2017-08-01
The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing
(KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.
A systematic approach to adnexal masses discovered on ultrasound: the ADNEx MR scoring system.
Sadowski, Elizabeth A; Robbins, Jessica B; Rockall, Andrea G; Thomassin-Naggara, Isabelle
2018-03-01
Adnexal lesions are a common occurrence in radiology practice and imaging plays a crucial role in triaging women appropriately. Current trends toward early detection and characterization have increased the need for accurate imaging assessment of adnexal lesions prior to treatment. Ultrasound is the first-line imaging modality for assessing adnexal lesions; however, approximately 20% of lesions are incompletely characterized after ultrasound evaluation. Secondary assessment with MR imaging using the ADNEx MR Scoring System has been demonstrated as highly accurate in the characterization of adnexal lesions and in excluding ovarian cancer. This review will address the role of MR imaging in further assessment of adnexal lesions discovered on US, and the utility of the ADNEx MR Scoring System.
Building phytochemical mass spec identification protocols and database libraries
USDA-ARS?s Scientific Manuscript database
An optimized single LC-MS evaluation that would accurately determine the elemental composition of as many compounds present in an extract would greatly aid in the evaluation of plant tissues. For phytochemicals, we have used accurate mass analysis to quickly characterize the potential chemical formu...
Knowlton, Chris; Meliza, C Daniel; Margoliash, Daniel; Abarbanel, Henry D I
2014-06-01
Estimating the behavior of a network of neurons requires accurate models of the individual neurons along with accurate characterizations of the connections among them. Whereas for a single cell, measurements of the intracellular voltage are technically feasible and sufficient to characterize a useful model of its behavior, making sufficient numbers of simultaneous intracellular measurements to characterize even small networks is infeasible. This paper builds on prior work on single neurons to explore whether knowledge of the time of spiking of neurons in a network, once the nodes (neurons) have been characterized biophysically, can provide enough information to usefully constrain the functional architecture of the network: the existence of synaptic links among neurons and their strength. Using standardized voltage and synaptic gating variable waveforms associated with a spike, we demonstrate that the functional architecture of a small network of model neurons can be established.
High Spatial Resolution Commercial Satellite Imaging Product Characterization
NASA Technical Reports Server (NTRS)
Ryan, Robert E.; Pagnutti, Mary; Blonski, Slawomir; Ross, Kenton W.; Stnaley, Thomas
2005-01-01
NASA Stennis Space Center's Remote Sensing group has been characterizing privately owned high spatial resolution multispectral imaging systems, such as IKONOS, QuickBird, and OrbView-3. Natural and man made targets were used for spatial resolution, radiometric, and geopositional characterizations. Higher spatial resolution also presents significant adjacency effects for accurate reliable radiometry.
Accurate phylogenetic classification of DNA fragments based onsequence composition
DOE Office of Scientific and Technical Information (OSTI.GOV)
McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis
2006-05-01
Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequencemore » characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.« less
Implications of Weak Link Effects on Thermal Characteristics of Transition-Edge Sensors
NASA Technical Reports Server (NTRS)
Bailey, Catherine
2011-01-01
Weak link behavior in transition-edge sensor (TES) devices creates the need for a more careful characterization of a device's thermal characteristics through its transition. This is particularly true for small TESs where a small change in the measurement current results in large changes in temperature. A highly current-dependent transition shape makes accurate thermal characterization of the TES parameters through the transition challenging. To accurately interpret measurements, especially complex impedance, it is crucial to know the temperature-dependent thermal conductance, G(T), and heat capacity, C(T), at each point through the transition. We will present data illustrating these effects and discuss how we overcome the challenges that are present in accurately determining G and T from IV curves. We will also show how these weak link effects vary with TES size.
Evaluation of water-quality data and monitoring program for Lake Travis, near Austin, Texas
Rast, Walter; Slade, Raymond M.
1998-01-01
The multiple-comparison tests indicate that, for some constituents, a single sampling site for a constituent or property might adequately characterize the water quality of Lake Travis for that constituent or property. However, multiple sampling sites are required to provide information of sufficient temporal and spatial resolution to accurately evaluate other water-quality constituents for the reservoir. For example, the water-quality data from surface samples and from bottom samples indicate that nutrients (nitrogen, phosphorus) might require additional sampling sites for a more accurate characterization of their in-lake dynamics.
Steady-state low thermal resistance characterization apparatus: The bulk thermal tester
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burg, Brian R.; Kolly, Manuel; Blasakis, Nicolas
The reliability of microelectronic devices is largely dependent on electronic packaging, which includes heat removal. The appropriate packaging design therefore necessitates precise knowledge of the relevant material properties, including thermal resistance and thermal conductivity. Thin materials and high conductivity layers make their thermal characterization challenging. A steady state measurement technique is presented and evaluated with the purpose to characterize samples with a thermal resistance below 100 mm{sup 2} K/W. It is based on the heat flow meter bar approach made up by two copper blocks and relies exclusively on temperature measurements from thermocouples. The importance of thermocouple calibration is emphasizedmore » in order to obtain accurate temperature readings. An in depth error analysis, based on Gaussian error propagation, is carried out. An error sensitivity analysis highlights the importance of the precise knowledge of the thermal interface materials required for the measurements. Reference measurements on Mo samples reveal a measurement uncertainty in the range of 5% and most accurate measurements are obtained at high heat fluxes. Measurement techniques for homogeneous bulk samples, layered materials, and protruding cavity samples are discussed. Ultimately, a comprehensive overview of a steady state thermal characterization technique is provided, evaluating the accuracy of sample measurements with thermal resistances well below state of the art setups. Accurate characterization of materials used in heat removal applications, such as electronic packaging, will enable more efficient designs and ultimately contribute to energy savings.« less
Characterization of photomultiplier tubes with a realistic model through GPU-boosted simulation
NASA Astrophysics Data System (ADS)
Anthony, M.; Aprile, E.; Grandi, L.; Lin, Q.; Saldanha, R.
2018-02-01
The accurate characterization of a photomultiplier tube (PMT) is crucial in a wide-variety of applications. However, current methods do not give fully accurate representations of the response of a PMT, especially at very low light levels. In this work, we present a new and more realistic model of the response of a PMT, called the cascade model, and use it to characterize two different PMTs at various voltages and light levels. The cascade model is shown to outperform the more common Gaussian model in almost all circumstances and to agree well with a newly introduced model independent approach. The technical and computational challenges of this model are also presented along with the employed solution of developing a robust GPU-based analysis framework for this and other non-analytical models.
Robotic and Multiaxial Testing for the Constitutive Characterization of Composites
John Michopoulos; Athanasios Iliopoulos; John Hermanson
2012-01-01
As wind energy production drives the manufacturing of wind turbine blades, the utilization of glass and carbon fiber composites as a material of choice continuousiy increases. Consequently, the needs for accurate structural design and material qualification and certification as well as the needs for aging predictions furlher underline the need for accurate constitutive...
USDA-ARS?s Scientific Manuscript database
An optimized single run evaluation that would accurately determine the elemental composition of as many compounds present in an extract would greatly aid in the evaluation of plant tissues. For phytochemicals, we have used accurate mass analysis to quickly characterize the potential chemical formula...
Bellili, A; Linguerri, R; Hochlaf, M; Puzzarini, C
2015-11-14
In an effort to provide an accurate structural and spectroscopic characterization of acetyl cyanide, its two enolic isomers and the corresponding cationic species, state-of-the-art computational methods, and approaches have been employed. The coupled-cluster theory including single and double excitations together with a perturbative treatment of triples has been used as starting point in composite schemes accounting for extrapolation to the complete basis-set limit as well as core-valence correlation effects to determine highly accurate molecular structures, fundamental vibrational frequencies, and rotational parameters. The available experimental data for acetyl cyanide allowed us to assess the reliability of our computations: structural, energetic, and spectroscopic properties have been obtained with an overall accuracy of about, or better than, 0.001 Å, 2 kcal/mol, 1-10 MHz, and 11 cm(-1) for bond distances, adiabatic ionization potentials, rotational constants, and fundamental vibrational frequencies, respectively. We are therefore confident that the highly accurate spectroscopic data provided herein can be useful for guiding future experimental investigations and/or astronomical observations.
NASA Astrophysics Data System (ADS)
McClanahan, James Patrick
Eddy Current Testing (ECT) is a Non-Destructive Examination (NDE) technique that is widely used in power generating plants (both nuclear and fossil) to test the integrity of heat exchanger (HX) and steam generator (SG) tubing. Specifically for this research, laboratory-generated, flawed tubing data were examined. The purpose of this dissertation is to develop and implement an automated method for the classification and an advanced characterization of defects in HX and SG tubing. These two improvements enhanced the robustness of characterization as compared to traditional bobbin-coil ECT data analysis methods. A more robust classification and characterization of the tube flaw in-situ (while the SG is on-line but not when the plant is operating), should provide valuable information to the power industry. The following are the conclusions reached from this research. A feature extraction program acquiring relevant information from both the mixed, absolute and differential data was successfully implemented. The CWT was utilized to extract more information from the mixed, complex differential data. Image Processing techniques used to extract the information contained in the generated CWT, classified the data with a high success rate. The data were accurately classified, utilizing the compressed feature vector and using a Bayes classification system. An estimation of the upper bound for the probability of error, using the Bhattacharyya distance, was successfully applied to the Bayesian classification. The classified data were separated according to flaw-type (classification) to enhance characterization. The characterization routine used dedicated, flaw-type specific ANNs that made the characterization of the tube flaw more robust. The inclusion of outliers may help complete the feature space so that classification accuracy is increased. Given that the eddy current test signals appear very similar, there may not be sufficient information to make an extremely accurate (>95%) classification or an advanced characterization using this system. It is necessary to have a larger database fore more accurate system learning.
Digital Mapping and Environmental Characterization of National Wild and Scenic River Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
McManamay, Ryan A; Bosnall, Peter; Hetrick, Shelaine L
2013-09-01
Spatially accurate geospatial information is required to support decision-making regarding sustainable future hydropower development. Under a memorandum of understanding among several federal agencies, a pilot study was conducted to map a subset of National Wild and Scenic Rivers (WSRs) at a higher resolution and provide a consistent methodology for mapping WSRs across the United States and across agency jurisdictions. A subset of rivers (segments falling under the jurisdiction of the National Park Service) were mapped at a high resolution using the National Hydrography Dataset (NHD). The spatial extent and representation of river segments mapped at NHD scale were compared withmore » the prevailing geospatial coverage mapped at a coarser scale. Accurately digitized river segments were linked to environmental attribution datasets housed within the Oak Ridge National Laboratory s National Hydropower Asset Assessment Program database to characterize the environmental context of WSR segments. The results suggest that both the spatial scale of hydrography datasets and the adherence to written policy descriptions are critical to accurately mapping WSRs. The environmental characterization provided information to deduce generalized trends in either the uniqueness or the commonness of environmental variables associated with WSRs. Although WSRs occur in a wide range of human-modified landscapes, environmental data layers suggest that they provide habitats important to terrestrial and aquatic organisms and recreation important to humans. Ultimately, the research findings herein suggest that there is a need for accurate, consistent, mapping of the National WSRs across the agencies responsible for administering each river. Geospatial applications examining potential landscape and energy development require accurate sources of information, such as data layers that portray realistic spatial representations.« less
NASA Astrophysics Data System (ADS)
Foucher, Johann; Filippov, Pavel; Penzkofer, Christian; Irmer, Bernd; Schmidt, Sebastian W.
2013-04-01
Atomic force microscopy (AFM) is increasingly used in the semiconductor industry as a versatile monitoring tool for highly critical lithography and etching process steps. Applications range from the inspection of the surface roughness of new materials, over accurate depth measurements to the determination of critical dimension structures. The aim to address the rapidly growing demands on measurement uncertainty and throughput more and more shifts the focus of attention to the AFM tip, which represents the crucial link between AFM tool and the sample to be monitored. Consequently, in order to reach the AFM tool's full potential, the performance of the AFM tip has to be considered as a determining parameter. Currently available AFM tips made from silicon are generally limited by their diameter, radius, and sharpness, considerably restricting the AFM measurement capabilities on sub-30nm spaces. In addition to that, there's lack of adequate characterization structures to accurately characterize sub-25nm tip diameters. Here, we present and discuss a recently introduced AFM tip design (T-shape like design) with precise tip diameters down to 15nm and tip radii down to 5nm fabricated from amorphous, high density diamond-like carbon (HDC/DLC) using electron beam induced processing (EBIP). In addition to that advanced design, we propose a new characterizer structure, which allows for accurate characterization and design control of sub-25nm tip diameters and sub-10nm tip edges radii. We demonstrate the potential advantages of combining a small tip shape design, i.e. tip diameter and tip edge radius, and an advanced tip characterizer for the semiconductor industry by the measurement of advanced lithography patterns.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singledecker, Steven J.; Jones, Scotty W.; Dorries, Alison M.
2012-07-01
In the coming fiscal years of potentially declining budgets, Department of Energy facilities such as the Los Alamos National Laboratory (LANL) will be looking to reduce the cost of radioactive waste characterization, management, and disposal processes. At the core of this cost reduction process will be choosing the most cost effective, efficient, and accurate methods of radioactive waste characterization. Central to every radioactive waste management program is an effective and accurate waste characterization program. Choosing between methods can determine what is classified as low level radioactive waste (LLRW), transuranic waste (TRU), waste that can be disposed of under an Authorizedmore » Release Limit (ARL), industrial waste, and waste that can be disposed of in municipal landfills. The cost benefits of an accurate radioactive waste characterization program cannot be overstated. In addition, inaccurate radioactive waste characterization of radioactive waste can result in the incorrect classification of radioactive waste leading to higher disposal costs, Department of Transportation (DOT) violations, Notice of Violations (NOVs) from Federal and State regulatory agencies, waste rejection from disposal facilities, loss of operational capabilities, and loss of disposal options. Any one of these events could result in the program that mischaracterized the waste losing its ability to perform it primary operational mission. Generators that produce radioactive waste have four characterization strategies at their disposal: - Acceptable Knowledge/Process Knowledge (AK/PK); - Indirect characterization using a software application or other dose to curie methodologies; - Non-Destructive Analysis (NDA) tools such as gamma spectroscopy; - Direct sampling (e.g. grab samples or Surface Contaminated Object smears) and laboratory analytical; Each method has specific advantages and disadvantages. This paper will evaluate each method detailing those advantages and disadvantages including; - Cost benefit analysis (basic materials costs, overall program operations costs, man-hours per sample analyzed, etc.); - Radiation Exposure As Low As Reasonably Achievable (ALARA) program considerations; - Industrial Health and Safety risks; - Overall Analytical Confidence Level. The concepts in this paper apply to any organization with significant radioactive waste characterization and management activities working to within budget constraints and seeking to optimize their waste characterization strategies while reducing analytical costs. (authors)« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...
2018-02-16
Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less
Bayesian approach to analyzing holograms of colloidal particles.
Dimiduk, Thomas G; Manoharan, Vinothan N
2016-10-17
We demonstrate a Bayesian approach to tracking and characterizing colloidal particles from in-line digital holograms. We model the formation of the hologram using Lorenz-Mie theory. We then use a tempered Markov-chain Monte Carlo method to sample the posterior probability distributions of the model parameters: particle position, size, and refractive index. Compared to least-squares fitting, our approach allows us to more easily incorporate prior information about the parameters and to obtain more accurate uncertainties, which are critical for both particle tracking and characterization experiments. Our approach also eliminates the need to supply accurate initial guesses for the parameters, so it requires little tuning.
An Energy-Based Hysteresis Model for Magnetostrictive Transducers
NASA Technical Reports Server (NTRS)
Calkins, F. T.; Smith, R. C.; Flatau, A. B.
1997-01-01
This paper addresses the modeling of hysteresis in magnetostrictive transducers. This is considered in the context of control applications which require an accurate characterization of the relation between input currents and strains output by the transducer. This relation typically exhibits significant nonlinearities and hysteresis due to inherent properties of magnetostrictive materials. The characterization considered here is based upon the Jiles-Atherton mean field model for ferromagnetic hysteresis in combination with a quadratic moment rotation model for magnetostriction. As demonstrated through comparison with experimental data, the magnetization model very adequately quantifies both major and minor loops under various operating conditions. The combined model can then be used to accurately characterize output strains at moderate drive levels. The advantages to this model lie in the small number (six) of required parameters and the flexibility it exhibits in a variety of operating conditions.
ERIC Educational Resources Information Center
Center for Rural Pennsylvania, 2004
2004-01-01
Pennsylvania's rural areas are often characterized as having lower incomes and lower housing values than urban areas. This characterization is not universally accurate, however, since there are some impressive pockets of wealth within rural Pennsylvania. To highlight the diversity of wealth among Pennsylvania's rural municipalities, the Center for…
A CASE STUDY ILLUSTRATING THE IMPORTANCE OF ACCURATE SITE CHARACTERIZATION
Too frequently, researchers rely on incomplete site characterization data to determine the placement of the sampling wells. They forget that it is these sampling wells that will be used to evaluate the effectiveness of their research efforts. This case study illustrates the eff...
USDA-ARS?s Scientific Manuscript database
Most hosts are concurrently or sequentially infected with multiple parasites, thus fully understanding interactions between individual parasite species and their hosts depends on accurate characterization of the parasite community. For parasitic nematodes, non-invasive methods for obtaining quantita...
Importance of geologic characterization of potential low-level radioactive waste disposal sites
Weibel, C.P.; Berg, R.C.
1991-01-01
Using the example of the Geff Alternative Site in Wayne County, Illinois, for the disposal of low-level radioactive waste, this paper demonstrates, from a policy and public opinion perspective, the importance of accurately determining site stratigraphy. Complete and accurate characterization of geologic materials and determination of site stratigraphy at potential low-level waste disposal sites provides the frame-work for subsequent hydrologic and geochemical investigations. Proper geologic characterization is critical to determine the long-term site stability and the extent of interactions of groundwater between the site and its surroundings. Failure to adequately characterize site stratigraphy can lead to the incorrect evaluation of the geology of a site, which in turn may result in a lack of public confidence. A potential problem of lack of public confidence was alleviated as a result of the resolution and proper definition of the Geff Alternative Site stratigraphy. The integrity of the investigation was not questioned and public perception was not compromised. ?? 1991 Springer-Verlag New York Inc.
John Michopoulos; Athanasios Iliopoulos; John Hermanson
2012-01-01
As wind energy production drives the manufacturing of wind turbine blades, the utilization of glass and carbon fiber composites as a material of choice continuously increases. Consequently, the needs for accurate structural design and material qualification and certification as well as the needs for aging predictions further underline the need for accurate constitutive...
batman: BAsic Transit Model cAlculatioN in Python
NASA Astrophysics Data System (ADS)
Kreidberg, Laura
2015-11-01
I introduce batman, a Python package for modeling exoplanet transit light curves. The batman package supports calculation of light curves for any radially symmetric stellar limb darkening law, using a new integration algorithm for models that cannot be quickly calculated analytically. The code uses C extension modules to speed up model calculation and is parallelized with OpenMP. For a typical light curve with 100 data points in transit, batman can calculate one million quadratic limb-darkened models in 30 seconds with a single 1.7 GHz Intel Core i5 processor. The same calculation takes seven minutes using the four-parameter nonlinear limb darkening model (computed to 1 ppm accuracy). Maximum truncation error for integrated models is an input parameter that can be set as low as 0.001 ppm, ensuring that the community is prepared for the precise transit light curves we anticipate measuring with upcoming facilities. The batman package is open source and publicly available at https://github.com/lkreidberg/batman .
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doerfler, Douglas; Austin, Brian; Cook, Brandon
There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less
Power and Performance Trade-offs for Space Time Adaptive Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino
Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less
High-Productivity Computing in Computational Physics Education
NASA Astrophysics Data System (ADS)
Tel-Zur, Guy
2011-03-01
We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.
Composing Data Parallel Code for a SPARQL Graph Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste
Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
spMC: an R-package for 3D lithological reconstructions based on spatial Markov chains
NASA Astrophysics Data System (ADS)
Sartore, Luca; Fabbri, Paolo; Gaetan, Carlo
2016-09-01
The paper presents the spatial Markov Chains (spMC) R-package and a case study of subsoil simulation/prediction located in a plain site of Northeastern Italy. spMC is a quite complete collection of advanced methods for data inspection, besides spMC implements Markov Chain models to estimate experimental transition probabilities of categorical lithological data. Furthermore, simulation methods based on most known prediction methods (as indicator Kriging and CoKriging) were implemented in spMC package. Moreover, other more advanced methods are available for simulations, e.g. path methods and Bayesian procedures, that exploit the maximum entropy. Since the spMC package was developed for intensive geostatistical computations, part of the code is implemented for parallel computations via the OpenMP constructs. A final analysis of this computational efficiency compares the simulation/prediction algorithms by using different numbers of CPU cores, and considering the example data set of the case study included in the package.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2017-01-01
In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Pythran: enabling static optimization of scientific Python programs
NASA Astrophysics Data System (ADS)
Guelton, Serge; Brunet, Pierrick; Amini, Mehdi; Merlini, Adrien; Corbillon, Xavier; Raynaud, Alan
2015-01-01
Pythran is an open source static compiler that turns modules written in a subset of Python language into native ones. Assuming that scientific modules do not rely much on the dynamic features of the language, it trades them for powerful, possibly inter-procedural, optimizations. These optimizations include detection of pure functions, temporary allocation removal, constant folding, Numpy ufunc fusion and parallelization, explicit thread-level parallelism through OpenMP annotations, false variable polymorphism pruning, and automatic vector instruction generation such as AVX or SSE. In addition to these compilation steps, Pythran provides a C++ runtime library that leverages the C++ STL to provide generic containers, and the Numeric Template Toolbox for Numpy support. It takes advantage of modern C++11 features such as variadic templates, type inference, move semantics and perfect forwarding, as well as classical idioms such as expression templates. Unlike the Cython approach, Pythran input code remains compatible with the Python interpreter. Output code is generally as efficient as the annotated Cython equivalent, if not more, but without the backward compatibility loss.
A Parallel 2D Numerical Simulation of Tumor Cells Necrosis by Local Hyperthermia
NASA Astrophysics Data System (ADS)
Reis, R. F.; Loureiro, F. S.; Lobosco, M.
2014-03-01
Hyperthermia has been widely used in cancer treatment to destroy tumors. The main idea of the hyperthermia is to heat a specific region like a tumor so that above a threshold temperature the tumor cells are destroyed. This can be accomplished by many heat supply techniques and the use of magnetic nanoparticles that generate heat when an alternating magnetic field is applied has emerged as a promise technique. In the present paper, the Pennes bioheat transfer equation is adopted to model the thermal tumor ablation in the context of magnetic nanoparticles. Numerical simulations are carried out considering different injection sites for the nanoparticles in an attempt to achieve better hyperthermia conditions. Explicit finite difference method is employed to solve the equations. However, a large amount of computation is required for this purpose. Therefore, this work also presents an initial attempt to improve performance using OpenMP, a parallel programming API. Experimental results were quite encouraging: speedups around 35 were obtained on a 64-core machine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Load Balancing Strategies for Multiphase Flows on Structured Grids
NASA Astrophysics Data System (ADS)
Olshefski, Kristopher; Owkes, Mark
2017-11-01
The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
ARES v2: new features and improved performance
NASA Astrophysics Data System (ADS)
Sousa, S. G.; Santos, N. C.; Adibekyan, V.; Delgado-Mena, E.; Israelian, G.
2015-05-01
Aims: We present a new upgraded version of ARES. The new version includes a series of interesting new features such as automatic radial velocity correction, a fully automatic continuum determination, and an estimation of the errors for the equivalent widths. Methods: The automatic correction of the radial velocity is achieved with a simple cross-correlation function, and the automatic continuum determination, as well as the estimation of the errors, relies on a new approach to evaluating the spectral noise at the continuum level. Results: ARES v2 is totally compatible with its predecessor. We show that the fully automatic continuum determination is consistent with the previous methods applied for this task. It also presents a significant improvement on its performance thanks to the implementation of a parallel computation using the OpenMP library. Automatic Routine for line Equivalent widths in stellar Spectra - ARES webpage: http://www.astro.up.pt/~sousasag/ares/Based on observations made with ESO Telescopes at the La Silla Paranal Observatory under programme ID 075.D-0800(A).
GPU accelerated particle visualization with Splotch
NASA Astrophysics Data System (ADS)
Rivi, M.; Gheller, C.; Dykes, T.; Krokos, M.; Dolag, K.
2014-07-01
Splotch is a rendering algorithm for exploration and visual discovery in particle-based datasets coming from astronomical observations or numerical simulations. The strengths of the approach are production of high quality imagery and support for very large-scale datasets through an effective mix of the OpenMP and MPI parallel programming paradigms. This article reports our experiences in re-designing Splotch for exploiting emerging HPC architectures nowadays increasingly populated with GPUs. A performance model is introduced to guide our re-factoring of Splotch. A number of parallelization issues are discussed, in particular relating to race conditions and workload balancing, towards achieving optimal performances. Our implementation was accomplished by using the CUDA programming paradigm. Our strategy is founded on novel schemes achieving optimized data organization and classification of particles. We deploy a reference cosmological simulation to present performance results on acceleration gains and scalability. We finally outline our vision for future work developments including possibilities for further optimizations and exploitation of hybrid systems and emerging accelerators.
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray
2015-12-01
Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; VanderWijngaart, Rob F.
2003-01-01
We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.
Measuring PM and related air pollutants using low-cost sensors
Emerging air quality sensors may play a key role in better characterizing levels of air pollution in a variety of settings There are a wide range of low-cost (< $500 US) sensors on the market, but few have been characterized. If accurate, this new generation of inexpensive sens...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, L; Du, X; Liu, T
Purpose: As a module of ARCHER -- Accelerated Radiation-transport Computations in Heterogeneous EnviRonments, ARCHER{sub RT} is designed for RadioTherapy (RT) dose calculation. This paper describes the application of ARCHERRT on patient-dependent TomoTherapy and patient-independent IMRT. It also conducts a 'fair' comparison of different GPUs and multicore CPU. Methods: The source input used for patient-dependent TomoTherapy is phase space file (PSF) generated from optimized plan. For patient-independent IMRT, the open filed PSF is used for different cases. The intensity modulation is simulated by fluence map. The GEANT4 code is used as benchmark. DVH and gamma index test are employed to evaluatemore » the accuracy of ARCHER{sub RT} code. Some previous studies reported misleading speedups by comparing GPU code with serial CPU code. To perform a fairer comparison, we write multi-thread code with OpenMP to fully exploit computing potential of CPU. The hardware involved in this study are a 6-core Intel E5-2620 CPU and 6 NVIDIA M2090 GPUs, a K20 GPU and a K40 GPU. Results: Dosimetric results from ARCHER{sub RT} and GEANT4 show good agreement. The 2%/2mm gamma test pass rates for different clinical cases are 97.2% to 99.7%. A single M2090 GPU needs 50~79 seconds for the simulation to achieve a statistical error of 1% in the PTV. The K40 card is about 1.7∼1.8 times faster than M2090 card. Using 6 M2090 card, the simulation can be finished in about 10 seconds. For comparison, Intel E5-2620 needs 507∼879 seconds for the same simulation. Conclusion: We successfully applied ARCHER{sub RT} to Tomotherapy and patient-independent IMRT, and conducted a fair comparison between GPU and CPU performance. The ARCHER{sub RT} code is both accurate and efficient and may be used towards clinical applications.« less
NASA Astrophysics Data System (ADS)
Park, Jong Ho; Park, Jung Jin; Park, O. Ok; Jin, Chang-Soo; Yang, Jung Hoon
2016-04-01
Because of the rise in renewable energy use, the redox flow battery (RFB) has attracted extensive attention as an energy storage system. Thus, many studies have focused on improving the performance of the felt electrodes used in RFBs. However, existing analysis cells are unsuitable for characterizing felt electrodes because of their complex 3-dimensional structure. Analysis is also greatly affected by the measurement conditions, viz. compression ratio, contact area, and contact strength between the felt and current collector. To address the growing need for practical analytical apparatus, we report a new analysis cell for accurate electrochemical characterization of felt electrodes under various conditions, and compare it with previous ones. In this cell, the measurement conditions can be exhaustively controlled with a compression supporter. The cell showed excellent reproducibility in cyclic voltammetry analysis and the results agreed well with actual RFB charge-discharge performance.
Characterization of Cloud Water-Content Distribution
NASA Technical Reports Server (NTRS)
Lee, Seungwon
2010-01-01
The development of realistic cloud parameterizations for climate models requires accurate characterizations of subgrid distributions of thermodynamic variables. To this end, a software tool was developed to characterize cloud water-content distributions in climate-model sub-grid scales. This software characterizes distributions of cloud water content with respect to cloud phase, cloud type, precipitation occurrence, and geo-location using CloudSat radar measurements. It uses a statistical method called maximum likelihood estimation to estimate the probability density function of the cloud water content.
Rastogi, L.; Dash, K.; Arunachalam, J.
2013-01-01
The quantitative analysis of glutathione (GSH) is important in different fields like medicine, biology, and biotechnology. Accurate quantitative measurements of this analyte have been hampered by the lack of well characterized reference standards. The proposed procedure is intended to provide an accurate and definitive method for the quantitation of GSH for reference measurements. Measurement of the stoichiometrically existing sulfur content in purified GSH offers an approach for its quantitation and calibration through an appropriate characterized reference material (CRM) for sulfur would provide a methodology for the certification of GSH quantity, that is traceable to SI (International system of units). The inductively coupled plasma optical emission spectrometry (ICP-OES) approach negates the need for any sample digestion. The sulfur content of the purified GSH is quantitatively converted into sulfate ions by microwave-assisted UV digestion in the presence of hydrogen peroxide prior to ion chromatography (IC) measurements. The measurement of sulfur by ICP-OES and IC (as sulfate) using the “high performance” methodology could be useful for characterizing primary calibration standards and certified reference materials with low uncertainties. The relative expanded uncertainties (% U) expressed at 95% confidence interval for ICP-OES analyses varied from 0.1% to 0.3%, while in the case of IC, they were between 0.2% and 1.2%. The described methods are more suitable for characterizing primary calibration standards and certifying reference materials of GSH, than for routine measurements. PMID:29403814
Fast and Accurate Circuit Design Automation through Hierarchical Model Switching.
Huynh, Linh; Tagkopoulos, Ilias
2015-08-21
In computer-aided biological design, the trifecta of characterized part libraries, accurate models and optimal design parameters is crucial for producing reliable designs. As the number of parts and model complexity increase, however, it becomes exponentially more difficult for any optimization method to search the solution space, hence creating a trade-off that hampers efficient design. To address this issue, we present a hierarchical computer-aided design architecture that uses a two-step approach for biological design. First, a simple model of low computational complexity is used to predict circuit behavior and assess candidate circuit branches through branch-and-bound methods. Then, a complex, nonlinear circuit model is used for a fine-grained search of the reduced solution space, thus achieving more accurate results. Evaluation with a benchmark of 11 circuits and a library of 102 experimental designs with known characterization parameters demonstrates a speed-up of 3 orders of magnitude when compared to other design methods that provide optimality guarantees.
Characterization of xenon ion and neutral interactions in a well-characterized experiment
NASA Astrophysics Data System (ADS)
Patino, Marlene I.; Wirz, Richard E.
2018-06-01
Interactions between fast ions and slow neutral atoms are commonly dominated by charge-exchange and momentum-exchange collisions, which are important to understanding and simulating the performance and behavior of many plasma devices. To investigate these interactions, this work developed a simple, well-characterized experiment that accurately measures the behavior of high energy xenon ions incident on a background of xenon neutral atoms. By using well-defined operating conditions and a simple geometry, these results serve as canonical data for the development and validation of plasma models and models of neutral beam sources that need to ensure accurate treatment of angular scattering distributions of charge-exchange and momentum-exchange ions and neutrals. The energies used in this study are relevant for electric propulsion devices ˜1.5 keV and can be used to improve models of ion-neutral interactions in the plume. By comparing these results to both analytical and computational models of ion-neutral interactions, we discovered the importance of (1) accurately treating the differential cross-sections for momentum-exchange and charge-exchange collisions over a large range of neutral background pressures and (2) properly considering commonly overlooked interactions, such as ion-induced electron emission from nearby surfaces and neutral-neutral ionization collisions.
Ross, Charles W; Simonsick, William J; Bogusky, Michael J; Celikay, Recep W; Guare, James P; Newton, Randall C
2016-06-28
Ceramides are a central unit of all sphingolipids which have been identified as sites of biological recognition on cellular membranes mediating cell growth and differentiation. Several glycosphingolipids have been isolated, displaying immunomodulatory and anti-tumor activities. These molecules have generated considerable interest as potential vaccine adjuvants in humans. Accurate analyses of these and related sphingosine analogues are important for the characterization of structure, biological function, and metabolism. We report the complementary use of direct laser desorption ionization (DLDI), sheath flow electrospray ionization (ESI) Fourier transform ion cyclotron resonance mass spectrometry (FTICR MS) and high-field nuclear magnetic resonance (NMR) analysis for the rapid, accurate identification of hexacosanoylceramide and starting materials. DLDI does not require stringent sample preparation and yields representative ions. Sheath-flow ESI yields ions of the product and byproducts and was significantly better than monospray ESI due to improved compound solubility. Negative ion sheath flow ESI provided data of starting materials and products all in one acquisition as hexacosanoic acid does not ionize efficiently when ceramides are present. NMR provided characterization of these lipid molecules complementing the results obtained from MS analyses. NMR data was able to differentiate straight chain versus branched chain alkyl groups not easily obtained from mass spectrometry.
Medium Spatial Resolution Satellite Characterization
NASA Technical Reports Server (NTRS)
Stensaas, Greg
2007-01-01
This project provides characterization and calibration of aerial and satellite systems in support of quality acquisition and understanding of remote sensing data, and verifies and validates the associated data products with respect to ground and and atmospheric truth so that accurate value-added science can be performed. The project also provides assessment of new remote sensing technologies.
Atomic force microscopy characterization of cellulose nanocrystals
Roya R. Lahiji; Xin Xu; Ronald Reifenberger; Arvind Raman; Alan Rudie; Robert J. Moon
2010-01-01
Cellulose nanocrystals (CNCs) are gaining interest as a âgreenâ nanomaterial with superior mechanical and chemical properties for high-performance nanocomposite materials; however, there is a lack of accurate material property characterization of individual CNCs. Here, a detailed study of the topography, elastic and adhesive properties of individual wood-derived CNCs...
Characterizing dispersal patterns in a threatened seabird with limited genetic structure
Laurie A. Hall; Per J. Palsboll; Steven R. Beissinger; James T. Harvey; Martine Berube; Martin G. Raphael; Kim Nelson; Richard T. Golightly; Laura McFarlane-Tranquilla; Scott H. Newman; M. Zachariah Peery
2009-01-01
Genetic assignment methods provide an appealing approach for characterizing dispersal patterns on ecological time scales, but require sufficient genetic differentiation to accurately identify migrants and a large enough sample size of migrants to, for example, compare dispersal between sexes or age classes. We demonstrate that assignment methods can be rigorously used...
Gravity Field Characterization around Small Bodies
NASA Astrophysics Data System (ADS)
Takahashi, Yu
A small body rendezvous mission requires accurate gravity field characterization for safe, accurate navigation purposes. However, the current techniques of gravity field modeling around small bodies are not achieved to the level of satisfaction. This thesis will address how the process of current gravity field characterization can be made more robust for future small body missions. First we perform the covariance analysis around small bodies via multiple slow flybys. Flyby characterization requires less laborious scheduling than its orbit counterpart, simultaneously reducing the risk of impact into the asteroid's surface. It will be shown that the level of initial characterization that can occur with this approach is no less than the orbit approach. Next, we apply the same technique of gravity field characterization to estimate the spin state of 4179 Touatis, which is a near-Earth asteroid in close to 4:1 resonance with the Earth. The data accumulated from 1992-2008 are processed in a least-squares filter to predict Toutatis' orientation during the 2012 apparition. The center-of-mass offset and the moments of inertia estimated thereof can be used to constrain the internal density distribution within the body. Then, the spin state estimation is developed to a generalized method to estimate the internal density distribution within a small body. The density distribution is estimated from the orbit determination solution of the gravitational coefficients. It will be shown that the surface gravity field reconstructed from the estimated density distribution yields higher accuracy than the conventional gravity field models. Finally, we will investigate two types of relatively unknown gravity fields, namely the interior gravity field and interior spherical Bessel gravity field, in order to investigate how accurately the surface gravity field can be mapped out for proximity operations purposes. It will be shown that these formulations compute the surface gravity field with unprecedented accuracy for a well-chosen set of parametric settings, both regionally and globally.
Electrochemical thermodynamic measurement system
Reynier, Yvan [Meylan, FR; Yazami, Rachid [Los Angeles, CA; Fultz, Brent T [Pasadena, CA
2009-09-29
The present invention provides systems and methods for accurately characterizing thermodynamic and materials properties of electrodes and electrochemical energy storage and conversion systems. Systems and methods of the present invention are configured for simultaneously collecting a suite of measurements characterizing a plurality of interconnected electrochemical and thermodynamic parameters relating to the electrode reaction state of advancement, voltage and temperature. Enhanced sensitivity provided by the present methods and systems combined with measurement conditions that reflect thermodynamically stabilized electrode conditions allow very accurate measurement of thermodynamic parameters, including state functions such as the Gibbs free energy, enthalpy and entropy of electrode/electrochemical cell reactions, that enable prediction of important performance attributes of electrode materials and electrochemical systems, such as the energy, power density, current rate and the cycle life of an electrochemical cell.
Puzzarini, Cristina; Ali, Ashraf; Biczysko, Malgorzata; Barone, Vincenzo
2014-09-10
An accurate spectroscopic characterization of protonated oxirane has been carried out by means of state-of-the-art computational methods and approaches. The calculated spectroscopic parameters from our recent computational investigation of oxirane together with the corresponding experimental data available were used to assess the accuracy of our predicted rotational and IR spectra of protonated oxirane. We found an accuracy of about 10 cm -1 for vibrational transitions (fundamentals as well as overtones and combination bands) and, in relative terms, of 0.1% for rotational transitions. We are therefore confident that the spectroscopic data provided herein are a valuable support for the detection of protonated oxirane not only in Titan's atmosphere but also in the interstellar medium.
NASA Astrophysics Data System (ADS)
Oh, Kwang Jin; Kang, Ji Hoon; Myung, Hun Joo
2012-02-01
We have revised a general purpose parallel molecular dynamics simulation program mm_par using the object-oriented programming. We parallelized the revised version using a hierarchical scheme in order to utilize more processors for a given system size. The benchmark result will be presented here. New version program summaryProgram title: mm_par2.0 Catalogue identifier: ADXP_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXP_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2 390 858 No. of bytes in distributed program, including test data, etc.: 25 068 310 Distribution format: tar.gz Programming language: C++ Computer: Any system operated by Linux or Unix Operating system: Linux Classification: 7.7 External routines: We provide wrappers for FFTW [1], Intel MKL library [2] FFT routine, and Numerical recipes [3] FFT, random number generator, and eigenvalue solver routines, SPRNG [4] random number generator, Mersenne Twister [5] random number generator, space filling curve routine. Catalogue identifier of previous version: ADXP_v1_0 Journal reference of previous version: Comput. Phys. Comm. 174 (2006) 560 Does the new version supersede the previous version?: Yes Nature of problem: Structural, thermodynamic, and dynamical properties of fluids and solids from microscopic scales to mesoscopic scales. Solution method: Molecular dynamics simulation in NVE, NVT, and NPT ensemble, Langevin dynamics simulation, dissipative particle dynamics simulation. Reasons for new version: First, object-oriented programming has been used, which is known to be open for extension and closed for modification. It is also known to be better for maintenance. Second, version 1.0 was based on atom decomposition and domain decomposition scheme [6] for parallelization. However, atom decomposition is not popular due to its poor scalability. On the other hand, domain decomposition scheme is better for scalability. It still has a limitation in utilizing a large number of cores on recent petascale computers due to the requirement that the domain size is larger than the potential cutoff distance. To go beyond such a limitation, a hierarchical parallelization scheme has been adopted in this new version and implemented using MPI [7] and OPENMP [8]. Summary of revisions: (1) Object-oriented programming has been used. (2) A hierarchical parallelization scheme has been adopted. (3) SPME routine has been fully parallelized with parallel 3D FFT using volumetric decomposition scheme [9]. K.J.O. thanks Mr. Seung Min Lee for useful discussion on programming and debugging. Running time: Running time depends on system size and methods used. For test system containing a protein (PDB id: 5DHFR) with CHARMM22 force field [10] and 7023 TIP3P [11] waters in simulation box having dimension 62.23 Å×62.23 Å×62.23 Å, the benchmark results are given in Fig. 1. Here the potential cutoff distance was set to 12 Å and the switching function was applied from 10 Å for the force calculation in real space. For the SPME [12] calculation, K, K, and K were set to 64 and the interpolation order was set to 4. To do the fast Fourier transform, we used Intel MKL library. All bonds including hydrogen atoms were constrained using SHAKE/RATTLE algorithms [13,14]. The code was compiled using Intel compiler version 11.1 and mvapich2 version 1.5. Fig. 2 shows performance gains from using CUDA-enabled version [15] of mm_par for 5DHFR simulation in water on Intel Core2Quad 2.83 GHz and GeForce GTX 580. Even though mm_par2.0 is not ported yet for GPU, its performance data would be useful to expect mm_par2.0 performance on GPU. Timing results for 1000 MD steps. 1, 2, 4, and 8 in the figure mean the number of OPENMP threads. Timing results for 1000 MD steps from double precision simulation on CPU, single precision simulation on GPU, and double precision simulation on GPU.
Analysis and application of classification methods of complex carbonate reservoirs
NASA Astrophysics Data System (ADS)
Li, Xiongyan; Qin, Ruibao; Ping, Haitao; Wei, Dan; Liu, Xiaomei
2018-06-01
There are abundant carbonate reservoirs from the Cenozoic to Mesozoic era in the Middle East. Due to variation in sedimentary environment and diagenetic process of carbonate reservoirs, several porosity types coexist in carbonate reservoirs. As a result, because of the complex lithologies and pore types as well as the impact of microfractures, the pore structure is very complicated. Therefore, it is difficult to accurately calculate the reservoir parameters. In order to accurately evaluate carbonate reservoirs, based on the pore structure evaluation of carbonate reservoirs, the classification methods of carbonate reservoirs are analyzed based on capillary pressure curves and flow units. Based on the capillary pressure curves, although the carbonate reservoirs can be classified, the relationship between porosity and permeability after classification is not ideal. On the basis of the flow units, the high-precision functional relationship between porosity and permeability after classification can be established. Therefore, the carbonate reservoirs can be quantitatively evaluated based on the classification of flow units. In the dolomite reservoirs, the average absolute error of calculated permeability decreases from 15.13 to 7.44 mD. Similarly, the average absolute error of calculated permeability of limestone reservoirs is reduced from 20.33 to 7.37 mD. Only by accurately characterizing pore structures and classifying reservoir types, reservoir parameters could be calculated accurately. Therefore, characterizing pore structures and classifying reservoir types are very important to accurate evaluation of complex carbonate reservoirs in the Middle East.
NASA Astrophysics Data System (ADS)
Pedemonte, Stefano; Pierce, Larry; Van Leemput, Koen
2017-11-01
Measuring the depth-of-interaction (DOI) of gamma photons enables increasing the resolution of emission imaging systems. Several design variants of DOI-sensitive detectors have been recently introduced to improve the performance of scanners for positron emission tomography (PET). However, the accurate characterization of the response of DOI detectors, necessary to accurately measure the DOI, remains an unsolved problem. Numerical simulations are, at the state of the art, imprecise, while measuring directly the characteristics of DOI detectors experimentally is hindered by the impossibility to impose the depth-of-interaction in an experimental set-up. In this article we introduce a machine learning approach for extracting accurate forward models of gamma imaging devices from simple pencil-beam measurements, using a nonlinear dimensionality reduction technique in combination with a finite mixture model. The method is purely data-driven, not requiring simulations, and is applicable to a wide range of detector types. The proposed method was evaluated both in a simulation study and with data acquired using a monolithic gamma camera designed for PET (the cMiCE detector), demonstrating the accurate recovery of the DOI characteristics. The combination of the proposed calibration technique with maximum- a posteriori estimation of the coordinates of interaction provided a depth resolution of ≈1.14 mm for the simulated PET detector and ≈1.74 mm for the cMiCE detector. The software and experimental data are made available at http://occiput.mgh.harvard.edu/depthembedding/.
Koohbor, Behrad; Kidane, Addis; Lu, Wei-Yang
2016-06-27
As an optimum energy-absorbing material system, polymeric foams are needed to dissipate the kinetic energy of an impact, while maintaining the impact force transferred to the protected object at a low level. As a result, it is crucial to accurately characterize the load bearing and energy dissipation performance of foams at high strain rate loading conditions. There are certain challenges faced in the accurate measurement of the deformation response of foams due to their low mechanical impedance. In the present work, a non-parametric method is successfully implemented to enable the accurate assessment of the compressive constitutive response of rigid polymericmore » foams subjected to impact loading conditions. The method is based on stereovision high speed photography in conjunction with 3D digital image correlation, and allows for accurate evaluation of inertia stresses developed within the specimen during deformation time. In conclusion, full-field distributions of stress, strain and strain rate are used to extract the local constitutive response of the material at any given location along the specimen axis. In addition, the effective energy absorbed by the material is calculated. Finally, results obtained from the proposed non-parametric analysis are compared with data obtained from conventional test procedures.« less
Venkatesh, Sudhakar K.; Chandan, Vishal; Roberts, Lewis R.
2013-01-01
Liver masses present a relatively common clinical dilemma, particularly with the increasing use of various imaging modalities in the diagnosis of abdominal and other symptoms. The accurate and reliable determination of the nature of the liver mass is critical, not only to reassure individuals with benign lesions but also, and perhaps more importantly, to ensure that malignant lesions are diagnosed correctly. This avoids the devastating consequences of missed diagnosis and the delayed treatment of malignancy or the unnecessary treatment of benign lesions With appropriate interpretation of the clinical history and physical examination, and the judicious use of laboratory and imaging studies, the majority of liver masses can be characterized noninvasively. Accurate characterization of liver masses by cross-sectional imaging is particularly dependent on an understanding of the unique phasic vascular perfusion of the liver and the characteristic behaviors of different lesions during multiphasic contrast imaging. When non-invasive characterization is indeterminate, a liver biopsy may be necessary for definitive diagnosis. Standard histologic examination is usually complemented by immunohistochemical analysis of protein biomarkers. Accurate diagnosis allows the appropriate selection of optimal management, which is frequently reassurance or intermittent follow up for benign masses. For malignant lesions or those at risk of malignant transformation, management depends on the tumor staging, the functional status of the uninvolved liver and technical surgical considerations. Unresectable metastatic masses require oncologic consultation and therapy. The efficient characterization and management of liver masses therefore requires a multidisciplinary collaboration between the gastroenterologist/hepatologist, radiologist, pathologist, hepatobiliary or transplant surgeon, and medical oncologist. PMID:24055987
N. Keca; N. B. Klopfenstein; M.-S. Kim; H. Solheim; S. Woodward
2014-01-01
Armillaria species have a global distribution and play variable ecological roles, including causing root disease of diverse forest, ornamental and horticultural trees. Accurate identification of Armillaria species is critical to understand their distribution and ecological roles. This work focused on characterizing an unidentified Armillaria isolate from a Serbian...
On the Design of Attitude-Heading Reference Systems Using the Allan Variance.
Hidalgo-Carrió, Javier; Arnold, Sascha; Poulakis, Pantelis
2016-04-01
The Allan variance is a method to characterize stochastic random processes. The technique was originally developed to characterize the stability of atomic clocks and has also been successfully applied to the characterization of inertial sensors. Inertial navigation systems (INS) can provide accurate results in a short time, which tend to rapidly degrade in longer time intervals. During the last decade, the performance of inertial sensors has significantly improved, particularly in terms of signal stability, mechanical robustness, and power consumption. The mass and volume of inertial sensors have also been significantly reduced, offering system-level design and accommodation advantages. This paper presents a complete methodology for the characterization and modeling of inertial sensors using the Allan variance, with direct application to navigation systems. Although the concept of sensor fusion is relatively straightforward, accurate characterization and sensor-information filtering is not a trivial task, yet they are essential for good performance. A complete and reproducible methodology utilizing the Allan variance, including all the intermediate steps, is described. An end-to-end (E2E) process for sensor-error characterization and modeling up to the final integration in the sensor-fusion scheme is explained in detail. The strength of this approach is demonstrated with representative tests on novel, high-grade inertial sensors. Experimental navigation results are presented from two distinct robotic applications: a planetary exploration rover prototype and an autonomous underwater vehicle (AUV).
Tip Characterization Method using Multi-feature Characterizer for CD-AFM
Orji, Ndubuisi G.; Itoh, Hiroshi; Wang, Chumei; Dixson, Ronald G.; Walecki, Peter S.; Schmidt, Sebastian W.; Irmer, Bernd
2016-01-01
In atomic force microscopy (AFM) metrology, the tip is a key source of uncertainty. Images taken with an AFM show a change in feature width and shape that depends on tip geometry. This geometric dilation is more pronounced when measuring features with high aspect ratios, and makes it difficult to obtain absolute dimensions. In order to accurately measure nanoscale features using an AFM, the tip dimensions should be known with a high degree of precision. We evaluate a new AFM tip characterizer, and apply it to critical dimension AFM (CD-AFM) tips used for high aspect ratio features. The characterizer is made up of comb-shaped lines and spaces, and includes a series of gratings that could be used as an integrated nanoscale length reference. We also demonstrate a simulation method that could be used to specify what range of tip sizes and shapes the characterizer can measure. Our experiments show that for non re-entrant features, the results obtained with this characterizer are consistent to 1 nm with the results obtained by using widely accepted but slower methods that are common practice in CD-AFM metrology. A validation of the integrated length standard using displacement interferometry indicates a uniformity of better than 0.75%, suggesting that the sample could be used as highly accurate and SI traceable lateral scale for the whole evaluation process. PMID:26720439
Liu, Zhao-Ying; Huang, Ling-Li; Chen, Dong-Mei; Dai, Meng-Hong; Tao, Yan-Fei; Wang, Yu-Lian; Yuan, Zong-Hui
2010-02-01
The application of electrospray ionization hybrid ion trap/time-of-flight mass spectrometry coupled with high-performance liquid chromatography (LC/MS-IT-TOF) in the rapid characterization of in vitro metabolites of quinocetone was developed. Metabolites formed in rat liver microsomes were separated using a VP-ODS column with gradient elution. Multiple scans of metabolites in MS and MS(2) modes and accurate mass measurements were automatically performed simultaneously through data-dependent acquisition in only a 30-min analysis. Most measured mass errors were less than 10 ppm for both protonated molecules and fragment ions using external mass calibration. The elemental compositions of all fragment ions of quinocetone and its metabolites could be rapidly assigned based upon the known compositional elements of protonated molecules. The structure of metabolites were elucidated based on the combination of three techniques: agreement between their proposed structure, the accurate masses, and the elemental composition of ions in their mass spectra; comparison of their changes in accurate molecular masses and fragment ions with those of parent drug or metabolite; and the elemental compositions of lost mass numbers in proposed fragmentation pathways. Twenty-seven phase I metabolites were identified as 11 reduction metabolites, three direct hydroxylation metabolites, and 13 metabolites with a combination of reduction and hydroxylation. All metabolites except the N-oxide reduction metabolite M6 are new metabolites of quinocetone, which were not previously reported. The ability to conduct expected biotransformation profiling via tandem mass spectrometry coupled with accurate mass measurement, all in a single experimental run, is one of the most attractive features of this methodology. The results demonstrate the use of LC/MS-IT-TOF approach appears to be rapid, efficient, and reliable in structural characterization of drug metabolites.
Abadlia, L; Gasser, F; Khalouk, K; Mayoufi, M; Gasser, J G
2014-09-01
In this paper we describe an experimental setup designed to measure simultaneously and very accurately the resistivity and the absolute thermoelectric power, also called absolute thermopower or absolute Seebeck coefficient, of solid and liquid conductors/semiconductors over a wide range of temperatures (room temperature to 1600 K in present work). A careful analysis of the existing experimental data allowed us to extend the absolute thermoelectric power scale of platinum to the range 0-1800 K with two new polynomial expressions. The experimental device is controlled by a LabView program. A detailed description of the accurate dynamic measurement methodology is given in this paper. We measure the absolute thermoelectric power and the electrical resistivity and deduce with a good accuracy the thermal conductivity using the relations between the three electronic transport coefficients, going beyond the classical Wiedemann-Franz law. We use this experimental setup and methodology to give new very accurate results for pure copper, platinum, and nickel especially at very high temperatures. But resistivity and absolute thermopower measurement can be more than an objective in itself. Resistivity characterizes the bulk of a material while absolute thermoelectric power characterizes the material at the point where the electrical contact is established with a couple of metallic elements (forming a thermocouple). In a forthcoming paper we will show that the measurement of resistivity and absolute thermoelectric power characterizes advantageously the (change of) phase, probably as well as DSC (if not better), since the change of phases can be easily followed during several hours/days at constant temperature.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abadlia, L.; Mayoufi, M.; Gasser, F.
2014-09-15
In this paper we describe an experimental setup designed to measure simultaneously and very accurately the resistivity and the absolute thermoelectric power, also called absolute thermopower or absolute Seebeck coefficient, of solid and liquid conductors/semiconductors over a wide range of temperatures (room temperature to 1600 K in present work). A careful analysis of the existing experimental data allowed us to extend the absolute thermoelectric power scale of platinum to the range 0-1800 K with two new polynomial expressions. The experimental device is controlled by a LabView program. A detailed description of the accurate dynamic measurement methodology is given in thismore » paper. We measure the absolute thermoelectric power and the electrical resistivity and deduce with a good accuracy the thermal conductivity using the relations between the three electronic transport coefficients, going beyond the classical Wiedemann-Franz law. We use this experimental setup and methodology to give new very accurate results for pure copper, platinum, and nickel especially at very high temperatures. But resistivity and absolute thermopower measurement can be more than an objective in itself. Resistivity characterizes the bulk of a material while absolute thermoelectric power characterizes the material at the point where the electrical contact is established with a couple of metallic elements (forming a thermocouple). In a forthcoming paper we will show that the measurement of resistivity and absolute thermoelectric power characterizes advantageously the (change of) phase, probably as well as DSC (if not better), since the change of phases can be easily followed during several hours/days at constant temperature.« less
Designing tools for oil exploration using nuclear modeling
NASA Astrophysics Data System (ADS)
Mauborgne, Marie-Laure; Allioli, Françoise; Manclossi, Mauro; Nicoletti, Luisa; Stoller, Chris; Evans, Mike
2017-09-01
When designing nuclear tools for oil exploration, one of the first steps is typically nuclear modeling for concept evaluation and initial characterization. Having an accurate model, including the availability of accurate cross sections, is essential to reduce or avoid time consuming and costly design iterations. During tool response characterization, modeling is benchmarked with experimental data and then used to complement and to expand the database to make it more detailed and inclusive of more measurement environments which are difficult or impossible to reproduce in the laboratory. We present comparisons of our modeling results obtained using the ENDF/B-VI and ENDF/B-VII cross section data bases, focusing on the response to a few elements found in the tool, borehole and subsurface formation. For neutron-induced inelastic and capture gamma ray spectroscopy, major obstacles may be caused by missing or inaccurate cross sections for essential materials. We show examples of the benchmarking of modeling results against experimental data obtained during tool characterization and discuss observed discrepancies.
Araneo, Rodolfo; Rinaldi, Antonio; Notargiacomo, Andrea; Bini, Fabiano; Pea, Marialilia; Celozzi, Salvatore; Marinozzi, Franco; Lovat, Giampiero
2014-12-08
Micro- and nano-scale materials and systems based on zinc oxide are expected to explode in their applications in the electronics and photonics, including nano-arrays of addressable optoelectronic devices and sensors, due to their outstanding properties, including semiconductivity and the presence of a direct bandgap, piezoelectricity, pyroelectricity and biocompatibility. Most applications are based on the cooperative and average response of a large number of ZnO micro/nanostructures. However, in order to assess the quality of the materials and their performance, it is fundamental to characterize and then accurately model the specific electrical and piezoelectric properties of single ZnO structures. In this paper, we report on focused ion beam machined high aspect ratio nanowires and their mechanical and electrical (by means of conductive atomic force microscopy) characterization. Then, we investigate the suitability of new power-law design concepts to accurately model the relevant electrical and mechanical size-effects, whose existence has been emphasized in recent reviews.
NASA Astrophysics Data System (ADS)
Zhang, Kai; Yang, Fanlin; Zhang, Hande; Su, Dianpeng; Li, QianQian
2017-06-01
The correlation between seafloor morphological features and biological complexity has been identified in numerous recent studies. This research focused on the potential for accurate characterization of coral reefs based on high-resolution bathymetry from multiple sources. A standard deviation (STD) based method for quantitatively characterizing terrain complexity was developed that includes robust estimation to correct for irregular bathymetry and a calibration for the depth-dependent variablity of measurement noise. Airborne lidar and shipborne sonar bathymetry measurements from Yuanzhi Island, South China Sea, were merged to generate seamless high-resolution coverage of coral bathymetry from the shoreline to deep water. The new algorithm was applied to the Yuanzhi Island surveys to generate maps of quantitive terrain complexity, which were then compared to in situ video observations of coral abundance. The terrain complexity parameter is significantly correlated with seafloor coral abundance, demonstrating the potential for accurately and efficiently mapping coral abundance through seafloor surveys, including combinations of surveys using different sensors.
Araneo, Rodolfo; Rinaldi, Antonio; Notargiacomo, Andrea; Bini, Fabiano; Pea, Marialilia; Celozzi, Salvatore; Marinozzi, Franco; Lovat, Giampiero
2014-01-01
Micro- and nano-scale materials and systems based on zinc oxide are expected to explode in their applications in the electronics and photonics, including nano-arrays of addressable optoelectronic devices and sensors, due to their outstanding properties, including semiconductivity and the presence of a direct bandgap, piezoelectricity, pyroelectricity and biocompatibility. Most applications are based on the cooperative and average response of a large number of ZnO micro/nanostructures. However, in order to assess the quality of the materials and their performance, it is fundamental to characterize and then accurately model the specific electrical and piezoelectric properties of single ZnO structures. In this paper, we report on focused ion beam machined high aspect ratio nanowires and their mechanical and electrical (by means of conductive atomic force microscopy) characterization. Then, we investigate the suitability of new power-law design concepts to accurately model the relevant electrical and mechanical size-effects, whose existence has been emphasized in recent reviews. PMID:25494351
Abnormal fronto-striatal activation as a marker of threshold and subthreshold Bulimia Nervosa.
Cyr, Marilyn; Yang, Xiao; Horga, Guillermo; Marsh, Rachel
2018-04-01
This study aimed to determine whether functional disturbances in fronto-striatal control circuits characterize adolescents with Bulimia Nervosa (BN) spectrum eating disorders regardless of clinical severity. FMRI was used to assess conflict-related brain activations during performance of a Simon task in two samples of adolescents with BN symptoms compared with healthy adolescents. The BN samples differed in the severity of their clinical presentation, illness duration and age. Multi-voxel pattern analyses (MVPAs) based on machine learning were used to determine whether patterns of fronto-striatal activation characterized adolescents with BN spectrum disorders regardless of clinical severity, and whether accurate classification of less symptomatic adolescents (subthreshold BN; SBN) could be achieved based on patterns of activation in adolescents who met DSM5 criteria for BN. MVPA classification analyses revealed that both BN and SBN adolescents could be accurately discriminated from healthy adolescents based on fronto-striatal activation. Notably, the patterns detected in more severely ill BN compared with healthy adolescents accurately discriminated less symptomatic SBN from healthy adolescents. Deficient activation of fronto-striatal circuits can characterize BN early in its course, when clinical presentations are less severe, perhaps pointing to circuit-based disturbances as useful biomarker or risk factor for the disorder, and a tool for understanding its developmental trajectory, as well as the development of early interventions. © 2018 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, Kenneth W., E-mail: kenneth.allen@gtri.gatech.edu; Scott, Mark M.; Reid, David R.
In this work, we present a new X-band waveguide (WR90) measurement method that permits the broadband characterization of the complex permittivity for low dielectric loss tangent material specimens with improved accuracy. An electrically long polypropylene specimen that partially fills the cross-section is inserted into the waveguide and the transmitted scattering parameter (S{sub 21}) is measured. The extraction method relies on computational electromagnetic simulations, coupled with a genetic algorithm, to match the experimental S{sub 21} measurement. The sensitivity of the technique to sample length was explored by simulating specimen lengths from 2.54 to 15.24 cm, in 2.54 cm increments. Analysis ofmore » our simulated data predicts the technique will have the sensitivity to measure loss tangent values on the order of 10{sup −3} for materials such as polymers with relatively low real permittivity values. The ability to accurately characterize low-loss dielectric material specimens of polypropylene is demonstrated experimentally. The method was validated by excellent agreement with a free-space focused-beam system measurement of a polypropylene sheet. This technique provides the material measurement community with the ability to accurately extract material properties of low-loss material specimen over the entire X-band range. This technique could easily be extended to other frequency bands.« less
NASA Technical Reports Server (NTRS)
Thurai, Merhala; Bringi, Viswanathan; Kennedy, Patrick; Notaros, Branislav; Gatlin, Patrick
2017-01-01
Accurate measurements of rain drop size distributions (DSD), with particular emphasis on small and tiny drops, are presented. Measurements were conducted in two very different climate regions, namely Northern Colorado and Northern Alabama. Both datasets reveal a combination of (i) a drizzle mode for drop diameters less than 0.7 mm and (ii) a precipitation mode for larger diameters. Scattering calculations using the DSDs are performed at S and X bands and compared with radar observations for the first location. Our accurate DSDs will improve radar-based rain rate estimates as well as propagation predictions.
A High Order, Locally-Adaptive Method for the Navier-Stokes Equations
NASA Astrophysics Data System (ADS)
Chan, Daniel
1998-11-01
I have extended the FOSLS method of Cai, Manteuffel and McCormick (1997) and implemented it within the framework of a spectral element formulation using the Legendre polynomial basis function. The FOSLS method solves the Navier-Stokes equations as a system of coupled first-order equations and provides the ellipticity that is needed for fast iterative matrix solvers like multigrid to operate efficiently. Each element is treated as an object and its properties are self-contained. Only C^0 continuity is imposed across element interfaces; this design allows local grid refinement and coarsening without the burden of having an elaborate data structure, since only information along element boundaries is needed. With the FORTRAN 90 programming environment, I can maintain a high computational efficiency by employing a hybrid parallel processing model. The OpenMP directives provides parallelism in the loop level which is executed in a shared-memory SMP and the MPI protocol allows the distribution of elements to a cluster of SMP's connected via a commodity network. This talk will provide timing results and a comparison with a second order finite difference method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erez, Mattan; Yelick, Katherine; Sarkar, Vivek
The Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models and runtime systems to meet the challenges of Exascale computing. Our approach is to provide an efficient and scalable programming model that can be adapted to application needs through the use of dynamic runtime features and domain-specific languages for computational kernels. We address the following technical challenges: Programmability: Rich set of programming constructs based on a Hierarchical Partitioned Global Address Space (HPGAS) model, demonstrated in UPC++. Scalability: Hierarchical locality control, lightweight communication (extended GASNet), and ef- ficient synchronization mechanisms (Phasers). Performance Portability:more » Just-in-time specialization (SEJITS) for generating hardware-specific code and scheduling libraries for domain-specific adaptive runtimes (Habanero). Energy Efficiency: Communication-optimal code generation to optimize energy efficiency by re- ducing data movement. Resilience: Containment Domains for flexible, domain-specific resilience, using state capture mechanisms and lightweight, asynchronous recovery mechanisms. Interoperability: Runtime and language interoperability with MPI and OpenMP to encourage broad adoption.« less
OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark
NASA Astrophysics Data System (ADS)
Ki, Seo Jin; Sugimura, Tak; Kim, Albert S.
2015-02-01
We developed a practical method to accelerate execution of Soil and Water Assessment Tool (SWAT) using open (free) computational resources. The SWAT source code (rev 622) was recompiled using a non-commercial Intel FORTRAN compiler in Ubuntu 12.04 LTS Linux platform, and newly named iOMP-SWAT in this study. GNU utilities of make, gprof, and diff were used to develop the iOMP-SWAT package, profile memory usage, and check identicalness of parallel and serial simulations. Among 302 SWAT subroutines, the slowest routines were identified using GNU gprof, and later modified using Open Multiple Processing (OpenMP) library in an 8-core shared memory system. In addition, a C wrapping function was used to rapidly set large arrays to zero by cross compiling with the original SWAT FORTRAN package. A universal speedup ratio of 2.3 was achieved using input data sets of a large number of hydrological response units. As we specifically focus on acceleration of a single SWAT run, the use of iOMP-SWAT for parameter calibrations will significantly improve the performance of SWAT optimization.
Development of full wave code for modeling RF fields in hot non-uniform plasmas
NASA Astrophysics Data System (ADS)
Zhao, Liangji; Svidzinski, Vladimir; Spencer, Andrew; Kim, Jin-Soo
2016-10-01
FAR-TECH, Inc. is developing a full wave RF modeling code to model RF fields in fusion devices and in general plasma applications. As an important component of the code, an adaptive meshless technique is introduced to solve the wave equations, which allows resolving plasma resonances efficiently and adapting to the complexity of antenna geometry and device boundary. The computational points are generated using either a point elimination method or a force balancing method based on the monitor function, which is calculated by solving the cold plasma dispersion equation locally. Another part of the code is the conductivity kernel calculation, used for modeling the nonlocal hot plasma dielectric response. The conductivity kernel is calculated on a coarse grid of test points and then interpolated linearly onto the computational points. All the components of the code are parallelized using MPI and OpenMP libraries to optimize the execution speed and memory. The algorithm and the results of our numerical approach to solving 2-D wave equations in a tokamak geometry will be presented. Work is supported by the U.S. DOE SBIR program.
NASA Astrophysics Data System (ADS)
Menzel, R.; Paynter, D.; Jones, A. L.
2017-12-01
Due to their relatively low computational cost, radiative transfer models in global climate models (GCMs) run on traditional CPU architectures generally consist of shortwave and longwave parameterizations over a small number of wavelength bands. With the rise of newer GPU and MIC architectures, however, the performance of high resolution line-by-line radiative transfer models may soon approach those of the physical parameterizations currently employed in GCMs. Here we present an analysis of the current performance of a new line-by-line radiative transfer model currently under development at GFDL. Although originally designed to specifically exploit GPU architectures through the use of CUDA, the radiative transfer model has recently been extended to include OpenMP in an effort to also effectively target MIC architectures such as Intel's Xeon Phi. Using input data provided by the upcoming Radiative Forcing Model Intercomparison Project (RFMIP, as part of CMIP 6), we compare model results and performance data for various model configurations and spectral resolutions run on both GPU and Intel Knights Landing architectures to analogous runs of the standard Oxford Reference Forward Model on traditional CPUs.
Livermore Compiler Analysis Loop Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hornung, R. D.
2013-03-01
LCALS is designed to evaluate compiler optimizations and performance of a variety of loop kernels and loop traversal software constructs. Some of the loop kernels are pulled directly from "Livermore Loops Coded in C", developed at LLNL (see item 11 below for details of earlier code versions). The older suites were used to evaluate floating-point performances of hardware platforms prior to porting larger application codes. The LCALS suite is geared toward assissing C++ compiler optimizations and platform performance related to SIMD vectorization, OpenMP threading, and advanced C++ language features. LCALS contains 20 of 24 loop kernels from the older Livermoremore » Loop suites, plus various others representative of loops found in current production appkication codes at LLNL. The latter loops emphasize more diverse loop constructs and data access patterns than the others, such as multi-dimensional difference stencils. The loops are included in a configurable framework, which allows control of compilation, loop sampling for execution timing, which loops are run and their lengths. It generates timing statistics for analysis and comparing variants of individual loops. Also, it is easy to add loops to the suite as desired.« less
Parallelization of elliptic solver for solving 1D Boussinesq model
NASA Astrophysics Data System (ADS)
Tarwidi, D.; Adytia, D.
2018-03-01
In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems
NASA Astrophysics Data System (ADS)
Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.
2017-07-01
Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.
Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
NASA Astrophysics Data System (ADS)
Hou, Zhenlong; Huang, Danian
2017-09-01
In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.
NASA Astrophysics Data System (ADS)
Hao, Qichen; Shao, Jingli; Cui, Yali; Zhang, Qiulan; Huang, Linxian
2018-05-01
An optimization approach is used for the operation of groundwater artificial recharge systems in an alluvial fan in Beijing, China. The optimization model incorporates a transient groundwater flow model, which allows for simulation of the groundwater response to artificial recharge. The facilities' operation with regard to recharge rates is formulated as a nonlinear programming problem to maximize the volume of surface water recharged into the aquifers under specific constraints. This optimization problem is solved by the parallel genetic algorithm (PGA) based on OpenMP, which could substantially reduce the computation time. To solve the PGA with constraints, the multiplicative penalty method is applied. In addition, the facilities' locations are implicitly determined on the basis of the results of the recharge-rate optimizations. Two scenarios are optimized and the optimal results indicate that the amount of water recharged into the aquifers will increase without exceeding the upper limits of the groundwater levels. Optimal operation of this artificial recharge system can also contribute to the more effective recovery of the groundwater storage capacity.
A hybrid parallel framework for the cellular Potts model simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
Global magnetohydrodynamic simulations on multiple GPUs
NASA Astrophysics Data System (ADS)
Wong, Un-Hong; Wong, Hon-Cheng; Ma, Yonghui
2014-01-01
Global magnetohydrodynamic (MHD) models play the major role in investigating the solar wind-magnetosphere interaction. However, the huge computation requirement in global MHD simulations is also the main problem that needs to be solved. With the recent development of modern graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA), it is possible to perform global MHD simulations in a more efficient manner. In this paper, we present a global magnetohydrodynamic (MHD) simulator on multiple GPUs using CUDA 4.0 with GPUDirect 2.0. Our implementation is based on the modified leapfrog scheme, which is a combination of the leapfrog scheme and the two-step Lax-Wendroff scheme. GPUDirect 2.0 is used in our implementation to drive multiple GPUs. All data transferring and kernel processing are managed with CUDA 4.0 API instead of using MPI or OpenMP. Performance measurements are made on a multi-GPU system with eight NVIDIA Tesla M2050 (Fermi architecture) graphics cards. These measurements show that our multi-GPU implementation achieves a peak performance of 97.36 GFLOPS in double precision.
Procacci, Piero
2016-06-27
We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Michael J. Falkowski; Paul Gessler; Penelope Morgan; Alistair M. S. Smith; Andrew T. Hudak
2004-01-01
Land managers need cost-effective methods for mapping and characterizing fire fuels quickly and accurately. The advent of sensors with increased spatial resolution may improve the accuracy and reduce the cost of fuels mapping. The objective of this research is to evaluate the accuracy and utility of imagery from the Advanced Spaceborne Thermal Emission and Reflection...
Evaluating the ASTER sensor for mapping and characterizing forest fire fuels in northern Idaho
Michael J. Falkowski; Paul Gessler; Penelope Morgan; Alistair M. S. Smith; Andrew T. Hudak
2004-01-01
Land managers need cost-effective methods for mapping and characterizing fire fuels quickly and accurately. The advent of sensors with increased spatial resolution may improve the accuracy and reduce the cost of fuels mapping. The objective of this research is to evaluate the accuracy and utility of imagery from the Advanced Spaceborne Thermal Emission and Reflection...
Chromatic-aberration diagnostic based on a spectrally resolved lateral-shearing interferometer
Bahk, Seung -Whan; Dorrer, Christopher; Roides, Rick G.; ...
2016-03-18
Here, a simple diagnostic characterizing one-dimensional chromatic aberrations in a broadband beam is introduced. A Ronchi grating placed in front of a spectrometer entrance slit provides spectrally coupled spatial phase information. The radial-group delay of a refractive system and the pulse-front delay of a wedged glass plate have been characterized accurately in a demonstration experiment.
Characterizing and mapping forest fire fuels using ASTER imagery and gradient modeling
Michael J. Falkowski; Paul E. Gessler; Penelope Morgan; Andrew T. Hudak; Alistair M. S. Smith
2005-01-01
Land managers need cost-effective methods for mapping and characterizing forest fuels quickly and accurately. The launch of satellite sensors with increased spatial resolution may improve the accuracy and reduce the cost of fuels mapping. The objective of this research is to evaluate the accuracy and utility of imagery from the advanced spaceborne thermal emission and...
A focal-spot diagnostic for on-shot characterization of high-energy petawatt lasers.
Bromage, J; Bahk, S-W; Irwin, D; Kwiatkowski, J; Pruyne, A; Millecchia, M; Moore, M; Zuegel, J D
2008-10-13
An on-shot focal-spot diagnostic for characterizing high-energy, petawatt-class laser systems is presented. Accurate measurements at full energy are demonstrated using high-resolution wavefront sensing in combination with techniques to calibrate on-shot measurements with low-power sample beams. Results are shown for full-energy activation shots of the OMEGA EP Laser System.
High strain-rate soft material characterization via inertial cavitation
NASA Astrophysics Data System (ADS)
Estrada, Jonathan B.; Barajas, Carlos; Henann, David L.; Johnsen, Eric; Franck, Christian
2018-03-01
Mechanical characterization of soft materials at high strain-rates is challenging due to their high compliance, slow wave speeds, and non-linear viscoelasticity. Yet, knowledge of their material behavior is paramount across a spectrum of biological and engineering applications from minimizing tissue damage in ultrasound and laser surgeries to diagnosing and mitigating impact injuries. To address this significant experimental hurdle and the need to accurately measure the viscoelastic properties of soft materials at high strain-rates (103-108 s-1), we present a minimally invasive, local 3D microrheology technique based on inertial microcavitation. By combining high-speed time-lapse imaging with an appropriate theoretical cavitation framework, we demonstrate that this technique has the capability to accurately determine the general viscoelastic material properties of soft matter as compliant as a few kilopascals. Similar to commercial characterization algorithms, we provide the user with significant flexibility in evaluating several constitutive laws to determine the most appropriate physical model for the material under investigation. Given its straightforward implementation into most current microscopy setups, we anticipate that this technique can be easily adopted by anyone interested in characterizing soft material properties at high loading rates including hydrogels, tissues and various polymeric specimens.
Characterization of structural connections using free and forced response test data
NASA Technical Reports Server (NTRS)
Lawrence, Charles; Huckelbridge, Arthur A.
1989-01-01
The accurate prediction of system dynamic response often has been limited by deficiencies in existing capabilities to characterize connections adequately. Connections between structural components often are complex mechanically, and difficult to accurately model analytically. Improved analytical models for connections are needed to improve system dynamic preditions. A procedure for identifying physical connection properties from free and forced response test data is developed, then verified utilizing a system having both a linear and nonlinear connection. Connection properties are computed in terms of physical parameters so that the physical characteristics of the connections can better be understood, in addition to providing improved input for the system model. The identification procedure is applicable to multi-degree of freedom systems, and does not require that the test data be measured directly at the connection locations.
Optimization of the incident wavelength in Mueller matrix imaging of cervical collagen
NASA Astrophysics Data System (ADS)
Chue-Sang, Joseph; Ramella-Roman, Jessica C.
2018-03-01
Mueller matrix polarimetry (MMP) can be utilized to determine optical anisotropy in birefringent materials. Many factors must be optimized to improve the quality of information collected from MMP of biological samples. As part of a study of pre-term birth (PTB) that relied on measurement of the orientation and distribution of collagen in the cervix, an optimal wavelength for MMp to allow more accurate characterization of collagen in cervical tissue was sought. To this end, we developed a multispectral Mueller matrix polarimeter and conducted experiments on ex-vivo porcine cervix samples preserved in paraffin. The Mueller matrices obtained with this system were decomposed to generate orientation and retardation images. Initial findings indicate that wavelengths below 560 nm offer a more accurate characterization of collagen anisotropy in the porcine cervix.
Earthquake Ground Motion Selection
DOT National Transportation Integrated Search
2012-05-01
Nonlinear analyses of soils, structures, and soil-structure systems offer the potential for more accurate characterization of geotechnical and structural response under strong earthquake shaking. The increasing use of advanced performance-based desig...
NASA Astrophysics Data System (ADS)
Madankan, Reza
All across the world, toxic material clouds are emitted from sources, such as industrial plants, vehicular traffic, and volcanic eruptions can contain chemical, biological or radiological material. With the growing fear of natural, accidental or deliberate release of toxic agents, there is tremendous interest in precise source characterization and generating accurate hazard maps of toxic material dispersion for appropriate disaster management. In this dissertation, an end-to-end framework has been developed for probabilistic source characterization and forecasting of atmospheric release incidents. The proposed methodology consists of three major components which are combined together to perform the task of source characterization and forecasting. These components include Uncertainty Quantification, Optimal Information Collection, and Data Assimilation. Precise approximation of prior statistics is crucial to ensure performance of the source characterization process. In this work, an efficient quadrature based method has been utilized for quantification of uncertainty in plume dispersion models that are subject to uncertain source parameters. In addition, a fast and accurate approach is utilized for the approximation of probabilistic hazard maps, based on combination of polynomial chaos theory and the method of quadrature points. Besides precise quantification of uncertainty, having useful measurement data is also highly important to warranty accurate source parameter estimation. The performance of source characterization is highly affected by applied sensor orientation for data observation. Hence, a general framework has been developed for the optimal allocation of data observation sensors, to improve performance of the source characterization process. The key goal of this framework is to optimally locate a set of mobile sensors such that measurement of textit{better} data is guaranteed. This is achieved by maximizing the mutual information between model predictions and observed data, given a set of kinetic constraints on mobile sensors. Dynamic Programming method has been utilized to solve the resulting optimal control problem. To complete the loop of source characterization process, two different estimation techniques, minimum variance estimation framework and Bayesian Inference method has been developed to fuse model forecast with measurement data. Incomplete information regarding the distribution of associated noise signal in measurement data, is another major challenge in the source characterization of plume dispersion incidents. This frequently happens in data assimilation of atmospheric data by using the satellite imagery. This occurs due to the fact that satellite imagery data can be polluted with noise, depending on weather conditions, clouds, humidity, etc. Unfortunately, there is no accurate procedure to quantify the error in recorded satellite data. Hence, using classical data assimilation methods in this situation is not straight forward. In this dissertation, the basic idea of a novel approach has been proposed to tackle these types of real world problems with more accuracy and robustness. A simple example demonstrating the real-world scenario is presented to validate the developed methodology.
Budischak, Sarah A; Hoberg, Eric P; Abrams, Art; Jolles, Anna E; Ezenwa, Vanessa O
2015-09-01
Most hosts are concurrently or sequentially infected with multiple parasites; thus, fully understanding interactions between individual parasite species and their hosts depends on accurate characterization of the parasite community. For parasitic nematodes, noninvasive methods for obtaining quantitative, species-specific infection data in wildlife are often unreliable. Consequently, characterization of gastrointestinal nematode communities of wild hosts has largely relied on lethal sampling to isolate and enumerate adult worms directly from the tissues of dead hosts. The necessity of lethal sampling severely restricts the host species that can be studied, the adequacy of sample sizes to assess diversity, the geographic scope of collections and the research questions that can be addressed. Focusing on gastrointestinal nematodes of wild African buffalo, we evaluated whether accurate characterization of nematode communities could be made using a noninvasive technique that combined conventional parasitological approaches with molecular barcoding. To establish the reliability of this new method, we compared estimates of gastrointestinal nematode abundance, prevalence, richness and community composition derived from lethal sampling with estimates derived from our noninvasive approach. Our noninvasive technique accurately estimated total and species-specific worm abundances, as well as worm prevalence and community composition when compared to the lethal sampling method. Importantly, the rate of parasite species discovery was similar for both methods, and only a modest number of barcoded larvae (n = 10) were needed to capture key aspects of parasite community composition. Overall, this new noninvasive strategy offers numerous advantages over lethal sampling methods for studying nematode-host interactions in wildlife and can readily be applied to a range of study systems. © 2015 John Wiley & Sons Ltd.
Determination of accurate vertical atmospheric profiles of extinction and turbulence
NASA Astrophysics Data System (ADS)
Hammel, Steve; Campbell, James; Hallenborg, Eric
2017-09-01
Our ability to generate an accurate vertical profile characterizing the atmosphere from the surface to a point above the boundary layer top is quite rudimentary. The region from a land or sea surface to an altitude of 3000 meters is dynamic and particularly important to the performance of many active optical systems. Accurate and agile instruments are necessary to provide measurements in various conditions, and models are needed to provide the framework and predictive capability necessary for system design and optimization. We introduce some of the path characterization instruments and describe the first work to calibrate and validate them. Along with a verification of measurement accuracy, the tests must also establish each instruments performance envelope. Measurement of these profiles in the field is a problem, and we will present a discussion of recent field test activity to address this issue. The Comprehensive Atmospheric Boundary Layer Extinction/Turbulence Resolution Analysis eXperiment (CABLE/TRAX) was conducted late June 2017. There were two distinct objectives for the experiment: 1) a comparison test of various scintillometers and transmissometers on a homogeneous horizontal path; 2) a vertical profile experiment. In this paper we discuss only the vertical profiling effort, and we describe the instruments that generated data for vertical profiles of absorption, scattering, and turbulence. These three profiles are the core requirements for an accurate assessment of laser beam propagation.
Fast 2D FWI on a multi and many-cores workstation.
NASA Astrophysics Data System (ADS)
Thierry, Philippe; Donno, Daniela; Noble, Mark
2014-05-01
Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.
Real-Time Characterization of Special Nuclear Materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walston, Sean; Candy, Jim; Chambers, Dave
2015-09-04
When confronting an item that may contain nuclear material, it is urgently necessary to determine its characteristics. Our goal is to provide accurate information with high-con dence as rapidly as possible.
NASA Technical Reports Server (NTRS)
Saus, Joseph R.; DeLaat, John C.; Chang, Clarence T.; Vrnak, Daniel R.
2012-01-01
At the NASA Glenn Research Center, a characterization rig was designed and constructed for the purpose of evaluating high bandwidth liquid fuel modulation devices to determine their suitability for active combustion control research. Incorporated into the rig s design are features that approximate conditions similar to those that would be encountered by a candidate device if it were installed on an actual combustion research rig. The characterized dynamic performance measures obtained through testing in the rig are planned to be accurate indicators of expected performance in an actual combustion testing environment. To evaluate how well the characterization rig predicts fuel modulator dynamic performance, characterization rig data was compared with performance data for a fuel modulator candidate when the candidate was in operation during combustion testing. Specifically, the nominal and off-nominal performance data for a magnetostrictive-actuated proportional fuel modulation valve is described. Valve performance data were collected with the characterization rig configured to emulate two different combustion rig fuel feed systems. Fuel mass flows and pressures, fuel feed line lengths, and fuel injector orifice size was approximated in the characterization rig. Valve performance data were also collected with the valve modulating the fuel into the two combustor rigs. Comparison of the predicted and actual valve performance data show that when the valve is operated near its design condition the characterization rig can appropriately predict the installed performance of the valve. Improvements to the characterization rig and accompanying modeling activities are underway to more accurately predict performance, especially for the devices under development to modulate fuel into the much smaller fuel injectors anticipated in future lean-burning low-emissions aircraft engine combustors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less
Emissions & Measurements - Black Carbon
Emissions and Measurement (EM) research activities performed within the National Risk Management Research Lab NRMRL) of EPA's Office of Research and Development (ORD) support measurement and laboratory analysis approaches to accurately characterize source emissions, and near sour...
Ab Initio Potential Energy Surfaces and Quantum Dynamics for Polyatomic Bimolecular Reactions.
Fu, Bina; Zhang, Dong H
2018-05-08
There has been great progress in the development of potential energy surfaces (PESs) and quantum dynamics calculations in the gas phase. The establishment of a fitting procedure for highly accurate PESs and new developments in quantum reactive scattering on reliable PESs allow accurate characterization of reaction dynamics beyond triatomic systems. This review will give the recent development in our group in constructing ab initio PESs based on neural networks and the time-dependent wave packet calculations for bimolecular reactions beyond three atoms. Bimolecular reactions of current interest to the community, namely, OH + H 2 , H + H 2 O, OH + CO, H + CH 4 , and Cl + CH 4 , are focused on. Quantum mechanical characterization of these reactions uncovers interesting dynamical phenomena with an unprecedented level of sophistication and has greatly advanced our understanding of polyatomic reaction dynamics.
High-accurate optical fiber liquid level sensor
NASA Astrophysics Data System (ADS)
Sun, Dexing; Chen, Shouliu; Pan, Chao; Jin, Henghuan
1991-08-01
A highly accurate optical fiber liquid level sensor is presented. The single-chip microcomputer is used to process and control the signal. This kind of sensor is characterized by self-security and is explosion-proof, so it can be applied in any liquid level detecting areas, especially in the oil and chemical industries. The theories and experiments about how to improve the measurement accuracy are described. The relative error for detecting the measurement range 10 m is up to 0.01%.
in the Saint Petersburg area. We use three random forest models, that differ in their use of past information , to predict a vessels next port of visit...network where past information is used to more accurately predict the future state. The transitional probabilities change when predictor variables are...added that reach deeper into the past. Our findings suggest that successful prediction of the movement of a vessel depends on having accurate information on its recent history.
NASA Technical Reports Server (NTRS)
Wu, S. T.
1987-01-01
The goal for the SAMEX magnetograph's optical system is to accurately measure the polarization state of sunlight in a narrow spectral bandwidth over the field of view of an active region to make an accurate determination of the magnetic field in that region. The instrumental polarization is characterized. The optics and coatings were designed to minimize this spurious polarization introduced by foreoptics. The method developed to calculate the instrumental polarization of the SAMEX optics is described.
NASA Technical Reports Server (NTRS)
Perez, Christopher E.; Berg, Melanie D.; Friendlich, Mark R.
2011-01-01
Motivation for this work is: (1) Accurately characterize digital signal processor (DSP) core single-event effect (SEE) behavior (2) Test DSP cores across a large frequency range and across various input conditions (3) Isolate SEE analysis to DSP cores alone (4) Interpret SEE analysis in terms of single-event upsets (SEUs) and single-event transients (SETs) (5) Provide flight missions with accurate estimate of DSP core error rates and error signatures.
Characterization of lipid films by an angle-interrogation surface plasmon resonance imaging device.
Liu, Linlin; Wang, Qiong; Yang, Zhong; Wang, Wangang; Hu, Ning; Luo, Hongyan; Liao, Yanjian; Zheng, Xiaolin; Yang, Jun
2015-04-01
Surface topographies of lipid films have an important significance in the analysis of the preparation of giant unilamellar vesicles (GUVs). In order to achieve accurately high-throughput and rapidly analysis of surface topographies of lipid films, a homemade SPR imaging device is constructed based on the classical Kretschmann configuration and an angle interrogation manner. A mathematical model is developed to accurately describe the shift including the light path in different conditions and the change of the illumination point on the CCD camera, and thus a SPR curve for each sampling point can also be achieved, based on this calculation method. The experiment results show that the topographies of lipid films formed in distinct experimental conditions can be accurately characterized, and the measuring resolution of the thickness lipid film may reach 0.05 nm. Compared with existing SPRi devices, which realize detection by monitoring the change of the reflective-light intensity, this new SPRi system can achieve the change of the resonance angle on the entire sensing surface. Thus, it has higher detection accuracy as the traditional angle-interrogation SPR sensor, with much wider detectable range of refractive index. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Yuan, Wu; Kut, Carmen; Liang, Wenxuan; Li, Xingde
2017-03-01
Cancer is known to alter the local optical properties of tissues. The detection of OCT-based optical attenuation provides a quantitative method to efficiently differentiate cancer from non-cancer tissues. In particular, the intraoperative use of quantitative OCT is able to provide a direct visual guidance in real time for accurate identification of cancer tissues, especially these without any obvious structural layers, such as brain cancer. However, current methods are suboptimal in providing high-speed and accurate OCT attenuation mapping for intraoperative brain cancer detection. In this paper, we report a novel frequency-domain (FD) algorithm to enable robust and fast characterization of optical attenuation as derived from OCT intensity images. The performance of this FD algorithm was compared with traditional fitting methods by analyzing datasets containing images from freshly resected human brain cancer and from a silica phantom acquired by a 1310 nm swept-source OCT (SS-OCT) system. With graphics processing unit (GPU)-based CUDA C/C++ implementation, this new attenuation mapping algorithm can offer robust and accurate quantitative interpretation of OCT images in real time during brain surgery.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fernandez-Serra, Maria Victoria
2016-09-12
The research objective of this proposal is the computational modeling of the metal-electrolyte interface purely from first principles. The accurate calculation of the electrostatic potential at electrically biased metal-electrolyte interfaces is a current challenge for periodic “ab-initio” simulations. It is also an essential requisite for predicting the correspondence between the macroscopic voltage and the microscopic interfacial charge distribution in electrochemical fuel cells. This interfacial charge distribution is the result of the chemical bonding between solute and metal atoms, and therefore cannot be accurately calculated with the use of semi-empirical classical force fields. The project aims to study in detail themore » structure and dynamics of aqueous electrolytes at metallic interfaces taking into account the effect of the electrode potential. Another side of the project is to produce an accurate method to simulate the water/metal interface. While both experimental and theoretical surface scientists have made a lot of progress on the understanding and characterization of both atomistic structures and reactions at the solid/vacuum interface, the theoretical description of electrochemical interfaces is still lacking behind. A reason for this is that a complete and accurate first principles description of both the liquid and the metal interfaces is still computationally too expensive and complex, since their characteristics are governed by the explicit atomic and electronic structure built at the interface as a response to environmental conditions. This project will characterize in detail how different theoretical levels of modeling describer the metal/water interface. In particular the role of van der Waals interactions will be carefully analyzed and prescriptions to perform accurate simulations will be produced.« less
a Protocol for High-Accuracy Theoretical Thermochemistry
NASA Astrophysics Data System (ADS)
Welch, Bradley; Dawes, Richard
2017-06-01
Theoretical studies of spectroscopy and reaction dynamics including the necessary development of potential energy surfaces rely on accurate thermochemical information. The Active Thermochemical Tables (ATcT) approach by Ruscic^{1} incorporates data for a large number of chemical species from a variety of sources (both experimental and theoretical) and derives a self-consistent network capable of making extremely accurate estimates of quantities such as temperature dependent enthalpies of formation. The network provides rigorous uncertainties, and since the values don't rely on a single measurement or calculation, the provenance of each quantity is also obtained. To expand and improve the network it is desirable to have a reliable protocol such as the HEAT approach^{2} for calculating accurate theoretical data. Here we present and benchmark an approach based on explicitly-correlated coupled-cluster theory and vibrational perturbation theory (VPT2). Methyldioxy and Methyl Hydroperoxide are important and well-characterized species in combustion processes and begin the family of (ethyl-, propyl-based, etc) similar compounds (much less is known about the larger members). Accurate anharmonic frequencies are essential to accurately describe even the 0 K enthalpies of formation, but are especially important for finite temperature studies. Here we benchmark the spectroscopic and thermochemical accuracy of the approach, comparing with available data for the smallest systems, and comment on the outlook for larger systems that are less well-known and characterized. ^{1}B. Ruscic, Active Thermochemical Tables (ATcT) values based on ver. 1.118 of the Thermochemical Network (2015); available at ATcT.anl.gov ^{2}A. Tajti, P. G. Szalay, A. G. Császár, M. Kállay, J. Gauss, E. F. Valeev, B. A. Flowers, J. Vázquez, and J. F. Stanton. JCP 121, (2004): 11599.
Acoustic Characterization of Grass-cover Ground
2014-11-20
for noise and rever- beration control. Examples of porous media are cements , ceramics, rocks, building insulation, foams and soil. Characterizing the...To perform the calibration of the tube an absorbing material with known acoustic properties is used. A sample of Melamine foam , 5 cm thick was used...system was calibrated using materials with known acous- tic properties in order to confirm accurate measurement of the system. Melamine foam 5 cm (1.97 in
77 FR 52762 - Notice of Lodging of Consent Decree Pursuant to The Clean Water Act
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-30
... required projects; (2) more accurately characterize drainage basin overflows and propose mitigation measures; and (3) incorporates a milestone for completing upgrades to the City's treatment plant. Under the...
Computerized traffic data acquisition system, updated.
DOT National Transportation Integrated Search
1980-01-01
Although the parameters that characterize traffic flow have been established nationally for several years, it is only recently that technology has made accurate measurement of them economically feasible. This report describes a system that provides a...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, S; Hoffman, J; McNitt-Gray, M
Purpose: Iterative reconstruction methods show promise for improving image quality and lowering the dose in helical CT. We aim to develop a novel model-based reconstruction method that offers potential for dose reduction with reasonable computation speed and storage requirements for vendor-independent reconstruction from clinical data on a normal desktop computer. Methods: In 2012, Xu proposed reconstructing on rotating slices to exploit helical symmetry and reduce the storage requirements for the CT system matrix. Inspired by this concept, we have developed a novel reconstruction method incorporating the stored-system-matrix approach together with iterative coordinate-descent (ICD) optimization. A penalized-least-squares objective function with amore » quadratic penalty term is solved analytically voxel-by-voxel, sequentially iterating along the axial direction first, followed by the transaxial direction. 8 in-plane (transaxial) neighbors are used for the ICD algorithm. The forward problem is modeled via a unique approach that combines the principle of Joseph’s method with trilinear B-spline interpolation to enable accurate reconstruction with low storage requirements. Iterations are accelerated with multi-CPU OpenMP libraries. For preliminary evaluations, we reconstructed (1) a simulated 3D ellipse phantom and (2) an ACR accreditation phantom dataset exported from a clinical scanner (Definition AS, Siemens Healthcare). Image quality was evaluated in the resolution module. Results: Image quality was excellent for the ellipse phantom. For the ACR phantom, image quality was comparable to clinical reconstructions and reconstructions using open-source FreeCT-wFBP software. Also, we did not observe any deleterious impact associated with the utilization of rotating slices. The system matrix storage requirement was only 4.5GB, and reconstruction time was 50 seconds per iteration. Conclusion: Our reconstruction method shows potential for furthering research in low-dose helical CT, in particular as part of our ongoing development of an acquisition/reconstruction pipeline for generating images under a wide range of conditions. Our algorithm will be made available open-source as “FreeCT-ICD”. NIH U01 CA181156; Disclosures (McNitt-Gray): Institutional research agreement, Siemens Healthcare; Past recipient, research grant support, Siemens Healthcare; Consultant, Toshiba America Medical Systems; Consultant, Samsung Electronics.« less
A simplified approach to characterizing a kilovoltage source spectrum for accurate dose computation.
Poirier, Yannick; Kouznetsov, Alexei; Tambasco, Mauro
2012-06-01
To investigate and validate the clinical feasibility of using half-value layer (HVL) and peak tube potential (kVp) for characterizing a kilovoltage (kV) source spectrum for the purpose of computing kV x-ray dose accrued from imaging procedures. To use this approach to characterize a Varian® On-Board Imager® (OBI) source and perform experimental validation of a novel in-house hybrid dose computation algorithm for kV x-rays. We characterized the spectrum of an imaging kV x-ray source using the HVL and the kVp as the sole beam quality identifiers using third-party freeware Spektr to generate the spectra. We studied the sensitivity of our dose computation algorithm to uncertainties in the beam's HVL and kVp by systematically varying these spectral parameters. To validate our approach experimentally, we characterized the spectrum of a Varian® OBI system by measuring the HVL using a Farmer-type Capintec ion chamber (0.06 cc) in air and compared dose calculations using our computationally validated in-house kV dose calculation code to measured percent depth-dose and transverse dose profiles for 80, 100, and 125 kVp open beams in a homogeneous phantom and a heterogeneous phantom comprising tissue, lung, and bone equivalent materials. The sensitivity analysis of the beam quality parameters (i.e., HVL, kVp, and field size) on dose computation accuracy shows that typical measurement uncertainties in the HVL and kVp (±0.2 mm Al and ±2 kVp, respectively) source characterization parameters lead to dose computation errors of less than 2%. Furthermore, for an open beam with no added filtration, HVL variations affect dose computation accuracy by less than 1% for a 125 kVp beam when field size is varied from 5 × 5 cm(2) to 40 × 40 cm(2). The central axis depth dose calculations and experimental measurements for the 80, 100, and 125 kVp energies agreed within 2% for the homogeneous and heterogeneous block phantoms, and agreement for the transverse dose profiles was within 6%. The HVL and kVp are sufficient for characterizing a kV x-ray source spectrum for accurate dose computation. As these parameters can be easily and accurately measured, they provide for a clinically feasible approach to characterizing a kV energy spectrum to be used for patient specific x-ray dose computations. Furthermore, these results provide experimental validation of our novel hybrid dose computation algorithm. © 2012 American Association of Physicists in Medicine.
Accurate phase measurements for thick spherical objects using optical quadrature microscopy
NASA Astrophysics Data System (ADS)
Warger, William C., II; DiMarzio, Charles A.
2009-02-01
In vitro fertilization (IVF) procedures have resulted in the birth of over three million babies since 1978. Yet the live birth rate in the United States was only 34% in 2005, with 32% of the successful pregnancies resulting in multiple births. These multiple pregnancies were directly attributed to the transfer of multiple embryos to increase the probability that a single, healthy embryo was included. Current viability markers used for IVF, such as the cell number, symmetry, size, and fragmentation, are analyzed qualitatively with differential interference contrast (DIC) microscopy. However, this method is not ideal for quantitative measures beyond the 8-cell stage of development because the cells overlap and obstruct the view within and below the cluster of cells. We have developed the phase-subtraction cell-counting method that uses the combination of DIC and optical quadrature microscopy (OQM) to count the number of cells accurately in live mouse embryos beyond the 8-cell stage. We have also created a preliminary analysis to measure the cell symmetry, size, and fragmentation quantitatively by analyzing the relative dry mass from the OQM image in conjunction with the phase-subtraction count. In this paper, we will discuss the characterization of OQM with respect to measuring the phase accurately for spherical samples that are much larger than the depth of field. Once fully characterized and verified with human embryos, this methodology could provide the means for a more accurate method to score embryo viability.
Characterization of in-flight performance of ion propulsion systems
NASA Astrophysics Data System (ADS)
Sovey, James S.; Rawlin, Vincent K.
1993-06-01
In-flight measurements of ion propulsion performance, ground test calibrations, and diagnostic performance measurements were reviewed. It was found that accelerometers provided the most accurate in-flight thrust measurements compared with four other methods that were surveyed. An experiment has also demonstrated that pre-flight alignment of the thrust vector was sufficiently accurate so that gimbal adjustments and use of attitude control thrusters were not required to counter disturbance torques caused by thrust vector misalignment. The effects of facility background pressure, facility enhanced charge-exchange reactions, and contamination on ground-based performance measurements are also discussed. Vacuum facility pressures for inert-gas ion thruster life tests and flight qualification tests will have to be less than 2 mPa to ensure accurate performance measurements.
Characterization of in-flight performance of ion propulsion systems
NASA Technical Reports Server (NTRS)
Sovey, James S.; Rawlin, Vincent K.
1993-01-01
In-flight measurements of ion propulsion performance, ground test calibrations, and diagnostic performance measurements were reviewed. It was found that accelerometers provided the most accurate in-flight thrust measurements compared with four other methods that were surveyed. An experiment has also demonstrated that pre-flight alignment of the thrust vector was sufficiently accurate so that gimbal adjustments and use of attitude control thrusters were not required to counter disturbance torques caused by thrust vector misalignment. The effects of facility background pressure, facility enhanced charge-exchange reactions, and contamination on ground-based performance measurements are also discussed. Vacuum facility pressures for inert-gas ion thruster life tests and flight qualification tests will have to be less than 2 mPa to ensure accurate performance measurements.
Improved maize reference genome with single-molecule technologies
USDA-ARS?s Scientific Manuscript database
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate elucidation of biological processes and support translation of research findings into improved and sustainable agricultural technolog...
A sun-tracking environmental chamber for the outdoor quantification of CPV modules
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faiman, David, E-mail: faiman@bgu.ac.il; Melnichak, Vladimir, E-mail: faiman@bgu.ac.il; Bokobza, Dov, E-mail: faiman@bgu.ac.il
2014-09-26
The paper describes a sun-tracking environmental chamber and its associated fast electronics, devised for the accurate outdoor characterization of CPV cells, receivers, mono-modules, and modules. Some typical measurement results are presented.
Laboratory tests for hot-mix asphalt characterization in Virginia.
DOT National Transportation Integrated Search
2005-01-01
This project reviewed existing laboratory methods for accurately describing the constitutive behavior of the mixes used in the Commonwealth of Virginia. Indirect tensile (IDT) strength, resilient modulus, static creep in the IDT and uniaxial modes, f...
The use of LIDAR to characterize aircraft exhaust plumes
DOT National Transportation Integrated Search
2003-06-22
Aircraft emissions are a growing concern for the FAA, airports, and the community. U.S. : and international air quality models were previously unable to accurately predict initial : plume dispersion and the resulting pollutant concentrations because ...
WILDFIRE EMISSION MODELING: INTEGRATING BLUESKY AND SMOKE
Atmospheric chemical transport models are used to simulate historic meteorological episodes for developing air quality management strategies. Wildland fire emissions need to be characterized accurately to achieve these air quality management goals. The temporal and spatial esti...
NASA Astrophysics Data System (ADS)
Lin, Yuting; Thayer, Dave; Nalcioglu, Orhan; Gulsen, Gultekin
2011-10-01
We present a magnetic resonance (MR)-guided near-infrared dynamic contrast enhanced diffuse optical tomography (DCE-DOT) system for characterization of tumors using an optical contrast agent (ICG) and a MR contrast agent [Gd-diethylenetriaminepentaacetic acid (DTPA)] in a rat model. Both ICG and Gd-DTPA are injected and monitored simultaneously using a combined MRI-DOT system, resulting in accurate co-registration between two imaging modalities. Fisher rats bearing R3230 breast tumor are imaged using this hybrid system. For the first time, enhancement kinetics of the exogenous contrast ICG is recovered from the DCE-DOT data using MR anatomical a priori information. As tumors grow, they undergo necrosis and the tissue transforms from viable to necrotic. The results show that the physiological changes between viable and necrotic tissue can be differentiated more accurately based on the ICG enhancement kinetics when MR anatomical information is utilized.
Ultrathin conformal devices for precise and continuous thermal characterization of human skin
Webb, R. Chad; Bonifas, Andrew P.; Behnaz, Alex; Zhang, Yihui; Yu, Ki Jun; Cheng, Huanyu; Shi, Mingxing; Bian, Zuguang; Liu, Zhuangjian; Kim, Yun-Soung; Yeo, Woon-Hong; Park, Jae Suk; Song, Jizhou; Li, Yuhang; Huang, Yonggang; Gorbach, Alexander M.; Rogers, John A.
2013-01-01
Precision thermometry of the skin can, together with other measurements, provide clinically relevant information about cardiovascular health, cognitive state, malignancy and many other important aspects of human physiology. Here, we introduce an ultrathin, compliant skin-like sensor/actuator technology that can pliably laminate onto the epidermis to provide continuous, accurate thermal characterizations that are unavailable with other methods. Examples include non-invasive spatial mapping of skin temperature with millikelvin precision, and simultaneous quantitative assessment of tissue thermal conductivity. Such devices can also be implemented in ways that reveal the time-dynamic influence of blood flow and perfusion on these properties. Experimental and theoretical studies establish the underlying principles of operation, and define engineering guidelines for device design. Evaluation of subtle variations in skin temperature associated with mental activity, physical stimulation and vasoconstriction/dilation along with accurate determination of skin hydration through measurements of thermal conductivity represent some important operational examples. PMID:24037122
Ultrathin conformal devices for precise and continuous thermal characterization of human skin
NASA Astrophysics Data System (ADS)
Webb, R. Chad; Bonifas, Andrew P.; Behnaz, Alex; Zhang, Yihui; Yu, Ki Jun; Cheng, Huanyu; Shi, Mingxing; Bian, Zuguang; Liu, Zhuangjian; Kim, Yun-Soung; Yeo, Woon-Hong; Park, Jae Suk; Song, Jizhou; Li, Yuhang; Huang, Yonggang; Gorbach, Alexander M.; Rogers, John A.
2013-10-01
Precision thermometry of the skin can, together with other measurements, provide clinically relevant information about cardiovascular health, cognitive state, malignancy and many other important aspects of human physiology. Here, we introduce an ultrathin, compliant skin-like sensor/actuator technology that can pliably laminate onto the epidermis to provide continuous, accurate thermal characterizations that are unavailable with other methods. Examples include non-invasive spatial mapping of skin temperature with millikelvin precision, and simultaneous quantitative assessment of tissue thermal conductivity. Such devices can also be implemented in ways that reveal the time-dynamic influence of blood flow and perfusion on these properties. Experimental and theoretical studies establish the underlying principles of operation, and define engineering guidelines for device design. Evaluation of subtle variations in skin temperature associated with mental activity, physical stimulation and vasoconstriction/dilation along with accurate determination of skin hydration through measurements of thermal conductivity represent some important operational examples.
NASA Astrophysics Data System (ADS)
Mulia, Iyan E.; Gusman, Aditya Riadi; Satake, Kenji
2017-12-01
Recently, there are numerous tsunami observation networks deployed in several major tsunamigenic regions. However, guidance on where to optimally place the measurement devices is limited. This study presents a methodological approach to select strategic observation locations for the purpose of tsunami source characterizations, particularly in terms of the fault slip distribution. Initially, we identify favorable locations and determine the initial number of observations. These locations are selected based on extrema of empirical orthogonal function (EOF) spatial modes. To further improve the accuracy, we apply an optimization algorithm called a mesh adaptive direct search to remove redundant measurement locations from the EOF-generated points. We test the proposed approach using multiple hypothetical tsunami sources around the Nankai Trough, Japan. The results suggest that the optimized observation points can produce more accurate fault slip estimates with considerably less number of observations compared to the existing tsunami observation networks.
Lunar mineral feedstocks from rocks and soils: X-ray digital imaging in resource evaluation
NASA Technical Reports Server (NTRS)
Chambers, John G.; Patchen, Allan; Taylor, Lawrence A.; Higgins, Stefan J.; Mckay, David S.
1994-01-01
The rocks and soils of the Moon provide raw materials essential to the successful establishment of a lunar base. Efficient exploitation of these resources requires accurate characterization of mineral abundances, sizes/shapes, and association of 'ore' and 'gangue' phases, as well as the technology to generate high-yield/high-grade feedstocks. Only recently have x-ray mapping and digital imaging techniques been applied to lunar resource evaluation. The topics covered include inherent differences between lunar basalts and soils and quantitative comparison of rock-derived and soil-derived ilmenite concentrates. It is concluded that x-ray digital-imaging characterization of lunar raw materials provides a quantitative comparison that is unattainable by traditional petrographic techniques. These data are necessary for accurately determining mineral distributions of soil and crushed rock material. Application of these techniques will provide an important link to choosing the best raw material for mineral beneficiation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marzouk, Youssef; Fast P.; Kraus, M.
2006-01-01
Terrorist attacks using an aerosolized pathogen preparation have gained credibility as a national security concern after the anthrax attacks of 2001. The ability to characterize such attacks, i.e., to estimate the number of people infected, the time of infection, and the average dose received, is important when planning a medical response. We address this question of characterization by formulating a Bayesian inverse problem predicated on a short time-series of diagnosed patients exhibiting symptoms. To be of relevance to response planning, we limit ourselves to 3-5 days of data. In tests performed with anthrax as the pathogen, we find that thesemore » data are usually sufficient, especially if the model of the outbreak used in the inverse problem is an accurate one. In some cases the scarcity of data may initially support outbreak characterizations at odds with the true one, but with sufficient data the correct inferences are recovered; in other words, the inverse problem posed and its solution methodology are consistent. We also explore the effect of model error-situations for which the model used in the inverse problem is only a partially accurate representation of the outbreak; here, the model predictions and the observations differ by more than a random noise. We find that while there is a consistent discrepancy between the inferred and the true characterizations, they are also close enough to be of relevance when planning a response.« less
Roussis, S G
2001-08-01
The automated acquisition of the product ion spectra of all precursor ions in a selected mass range by using a magnetic sector/orthogonal acceleration time-of-flight (oa-TOF) tandem mass spectrometer for the characterization of complex petroleum mixtures is reported. Product ion spectra are obtained by rapid oa-TOF data acquisition and simultaneous scanning of the magnet. An analog signal generator is used for the scanning of the magnet. Slow magnet scanning rates permit the accurate profiling of precursor ion peaks and the acquisition of product ion spectra for all isobaric ion species. The ability of the instrument to perform both high- and low-energy collisional activation experiments provides access to a large number of dissociation pathways useful for the characterization of precursor ions. Examples are given that illustrate the capability of the method for the characterization of representative petroleum mixtures. The structural information obtained by the automated MS/MS experiment is used in combination with high-resolution accurate mass measurement results to characterize unknown components in a polar extract of a refinery product. The exhaustive mapping of all precursor ions in representative naphtha and middle-distillate fractions is presented. Sets of isobaric ion species are separated and their structures are identified by interpretation from first principles or by comparison with standard 70-eV EI libraries of spectra. The utility of the method increases with the complexity of the samples.
How many landmarks are enough to characterize shape and size variation?
Watanabe, Akinobu
2018-01-01
Accurate characterization of morphological variation is crucial for generating reliable results and conclusions concerning changes and differences in form. Despite the prevalence of landmark-based geometric morphometric (GM) data in the scientific literature, a formal treatment of whether sampled landmarks adequately capture shape variation has remained elusive. Here, I introduce LaSEC (Landmark Sampling Evaluation Curve), a computational tool to assess the fidelity of morphological characterization by landmarks. This task is achieved by calculating how subsampled data converge to the pattern of shape variation in the full dataset as landmark sampling is increased incrementally. While the number of landmarks needed for adequate shape variation is dependent on individual datasets, LaSEC helps the user (1) identify under- and oversampling of landmarks; (2) assess robustness of morphological characterization; and (3) determine the number of landmarks that can be removed without compromising shape information. In practice, this knowledge could reduce time and cost associated with data collection, maintain statistical power in certain analyses, and enable the incorporation of incomplete, but important, specimens to the dataset. Results based on simulated shape data also reveal general properties of landmark data, including statistical consistency where sampling additional landmarks has the tendency to asymptotically improve the accuracy of morphological characterization. As landmark-based GM data become more widely adopted, LaSEC provides a systematic approach to evaluate and refine the collection of shape data--a goal paramount for accumulation and analysis of accurate morphological information.
Systematic characterization of maturation time of fluorescent proteins in living cells
Balleza, Enrique; Kim, J. Mark; Cluzel, Philippe
2017-01-01
Slow maturation time of fluorescent proteins limits accurate measurement of rapid gene expression dynamics and effectively reduces fluorescence signal in growing cells. We used high-precision time-lapse microscopy to characterize, at two different temperatures in E. coli, the maturation kinetics of 50 FPs that span the visible spectrum. We identified fast-maturing FPs that yield the highest signal-to-noise ratio and temporal resolution in individual growing cells. PMID:29320486
Improved Phase Characterization of Far-Regional Body Wave Arrivals in Central Asia
2008-09-30
developing array -based methods that can more accurately characterize far-regional (14*-29*) seismic wavefield structure. Far- regional (14*-29*) seismograms...arrivals with the primary arrivals. These complexities can be region and earthquake specific. The regional seismic arrays that have been built in the last...fifteen years should be a rich data source for the study of far-regional phase behavior. The arrays are composed of high-quality borehole seismometers
Linear Self-Referencing Techiques for Short-Optical-Pulse Characterization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dorrer, C.; Kang, I.
2008-04-04
Linear self-referencing techniques for the characterization of the electric field of short optical pulses are presented. The theoretical and practical advantages of these techniques are developed. Experimental implementations are described, and their performance is compared to the performance of their nonlinear counterparts. Linear techniques demonstrate unprecedented sensitivity and are a perfect fit in many domains where the precise, accurate measurement of the electric field of an optical pulse is required.
Sensor validation and fusion for gas turbine vibration monitoring
NASA Astrophysics Data System (ADS)
Yan, Weizhong; Goebel, Kai F.
2003-08-01
Vibration monitoring is an important practice throughout regular operation of gas turbine power systems and, even more so, during characterization tests. Vibration monitoring relies on accurate and reliable sensor readings. To obtain accurate readings, sensors are placed such that the signal is maximized. In the case of characterization tests, strain gauges are placed at the location of vibration modes on blades inside the gas turbine. Due to the prevailing harsh environment, these sensors have a limited life and decaying accuracy, both of which impair vibration assessment. At the same time bandwidth limitations may restrict data transmission, which in turn limits the number of sensors that can be used for assessment. Knowing the sensor status (normal or faulty), and more importantly, knowing the true vibration level of the system all the time is essential for successful gas turbine vibration monitoring. This paper investigates a dynamic sensor validation and system health reasoning scheme that addresses the issues outlined above by considering only the information required to reliably assess system health status. In particular, if abnormal system health is suspected or if the primary sensor is determined to be faulted, information from available "sibling" sensors is dynamically integrated. A confidence expresses the complex interactions of sensor health and system health, their reliabilities, conflicting information, and what the health assessment is. Effectiveness of the scheme in achieving accurate and reliable vibration evaluation is then demonstrated using a combination of simulated data and a small sample of a real-world application data where the vibration of compressor blades during a real time characterization test of a new gas turbine power system is monitored.
Analyses of GPR signals for characterization of ground conditions in urban areas
NASA Astrophysics Data System (ADS)
Hong, Won-Taek; Kang, Seonghun; Lee, Sung Jin; Lee, Jong-Sub
2018-05-01
Ground penetrating radar (GPR) is applied for the characterization of the ground conditions in urban areas. In addition, time domain reflectometry (TDR) and dynamic cone penetrometer (DCP) tests are conducted for the accurate analyses of the GPR images. The GPR images are acquired near a ground excavation site, where a ground subsidence occurred and was repaired. Moreover, the relative permittivity and dynamic cone penetration index (DCPI) are profiled through the TDR and DCP tests, respectively. As the ground in the urban area is kept under a low-moisture condition, the relative permittivity, which is inversely related to the electromagnetic impedance, is mainly affected by the dry density and is inversely proportional to the DCPI value. Because the first strong signal in the GPR image is shifted 180° from the emitted signal, the polarity of the electromagnetic wave reflected at the dense layer, where the reflection coefficient is negative, is identical to that of the first strong signal. The temporal-scaled GPR images can be accurately converted into the spatial-scaled GPR images using the relative permittivity determined by the TDR test. The distribution of the loose layer can be accurately estimated by using the spatial-scaled GPR images and reflection characteristics of the electromagnetic wave. Note that the loose layer distribution estimated in this study matches well with the DCPI profile and is visually verified from the endoscopic images. This study demonstrates that the GPR survey complemented by the TDR and DCP tests, may be an effective method for the characterization of ground conditions in an urban area.
NASA Astrophysics Data System (ADS)
Dickey, Dwayne J.; Moore, Ronald B.; Tulip, John
2001-01-01
For photodynamic therapy of solid tumors, such as prostatic carcinoma, to be achieved, an accurate model to predict tissue parameters and light dose must be found. Presently, most analytical light dosimetry models are fluence based and are not clinically viable for tissue characterization. Other methods of predicting optical properties, such as Monet Carlo, are accurate but far too time consuming for clinical application. However, radiance predicted by the P3-Approximation, an anaylitical solution to the transport equation, may be a viable and accurate alternative. The P3-Approximation accurately predicts optical parameters in intralipid/methylene blue based phantoms in a spherical geometry. The optical parameters furnished by the radiance, when introduced into fluence predicted by both P3- Approximation and Grosjean Theory, correlate well with experimental data. The P3-Approximation also predicts the optical properties of prostate tissue, agreeing with documented optical parameters. The P3-Approximation could be the clinical tool necessary to facilitate PDT of solid tumors because of the limited number of invasive measurements required and the speed in which accurate calculations can be performed.
Characterizing short-term stability for Boolean networks over any distribution of transfer functions
Seshadhri, C.; Smith, Andrew M.; Vorobeychik, Yevgeniy; ...
2016-07-05
Here we present a characterization of short-term stability of random Boolean networks under arbitrary distributions of transfer functions. Given any distribution of transfer functions for a random Boolean network, we present a formula that decides whether short-term chaos (damage spreading) will happen. We provide a formal proof for this formula, and empirically show that its predictions are accurate. Previous work only works for special cases of balanced families. Finally, it has been observed that these characterizations fail for unbalanced families, yet such families are widespread in real biological networks.
OCCIMA: Optical Channel Characterization in Maritime Atmospheres
NASA Astrophysics Data System (ADS)
Hammel, Steve; Tsintikidis, Dimitri; deGrassie, John; Reinhardt, Colin; McBryde, Kevin; Hallenborg, Eric; Wayne, David; Gibson, Kristofor; Cauble, Galen; Ascencio, Ana; Rudiger, Joshua
2015-05-01
The Navy is actively developing diverse optical application areas, including high-energy laser weapons and free- space optical communications, which depend on an accurate and timely knowledge of the state of the atmospheric channel. The Optical Channel Characterization in Maritime Atmospheres (OCCIMA) project is a comprehensive program to coalesce and extend the current capability to characterize the maritime atmosphere for all optical and infrared wavelengths. The program goal is the development of a unified and validated analysis toolbox. The foundational design for this program coordinates the development of sensors, measurement protocols, analytical models, and basic physics necessary to fulfill this goal.
Aircraft Dynamic Modeling in Turbulence
NASA Technical Reports Server (NTRS)
Morelli, Eugene A.; Cunninham, Kevin
2012-01-01
A method for accurately identifying aircraft dynamic models in turbulence was developed and demonstrated. The method uses orthogonal optimized multisine excitation inputs and an analytic method for enhancing signal-to-noise ratio for dynamic modeling in turbulence. A turbulence metric was developed to accurately characterize the turbulence level using flight measurements. The modeling technique was demonstrated in simulation, then applied to a subscale twin-engine jet transport aircraft in flight. Comparisons of modeling results obtained in turbulent air to results obtained in smooth air were used to demonstrate the effectiveness of the approach.
Assessments of aggregate exposure to pesticides and other surface contamination in residential environments are often driven by assumptions about dermal contacts. Accurately predicting cumulative doses from realistic skin contact scenarios requires characterization of exposure sc...
Final report: the use of LIDAR to characterize aircraft initial plume characteristics
DOT National Transportation Integrated Search
2004-02-28
Aircraft emissions are a growing concern for the FAA, airports, and the community. U.S. : and international air quality models were previously unable to accurately predict initial : plume dispersion and the resulting pollutant concentrations because ...
Improving Statewide Freight Routing Capabilities for Sub-National Commodity Flows
DOT National Transportation Integrated Search
2012-10-01
The ability to fully understand and accurately characterize freight vehicle route choices is important in helping to inform regional and state decisions. This project recommends improvements to WSDOTs Statewide Freight GIS Network Model to more ac...
Improved analysis tool for concrete pavement : [project summary].
DOT National Transportation Integrated Search
2017-10-01
University of Florida researchers developed 3D-FE models to more accurately predict the behavior of concrete slabs. They also followed up on a project to characterize strain gauge performance for a Florida Department of Transportation (FDOT) concrete...
GROUND WATER SAMPLING FOR VERTICAL PROFILING OF CONTAMINANTS
Accurate delineation of plume boundaries and vertical contaminant distribution are necessary in order to adequately characterize waste sites and determine remedial strategies to be employed. However, it is important to consider the sampling objectives, sampling methods, and sampl...
Parallel Execution of Functional Mock-up Units in Buildings Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan
2016-06-30
A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported asmore » a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.« less
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.
Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro
2016-01-01
Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Chang, Xueli; Du, Siliang; Li, Yingying; Fang, Shenghui
2018-01-01
Large size high resolution (HR) satellite image matching is a challenging task due to local distortion, repetitive structures, intensity changes and low efficiency. In this paper, a novel matching approach is proposed for the large size HR satellite image registration, which is based on coarse-to-fine strategy and geometric scale-invariant feature transform (SIFT). In the coarse matching step, a robust matching method scale restrict (SR) SIFT is implemented at low resolution level. The matching results provide geometric constraints which are then used to guide block division and geometric SIFT in the fine matching step. The block matching method can overcome the memory problem. In geometric SIFT, with area constraints, it is beneficial for validating the candidate matches and decreasing searching complexity. To further improve the matching efficiency, the proposed matching method is parallelized using OpenMP. Finally, the sensing image is rectified to the coordinate of reference image via Triangulated Irregular Network (TIN) transformation. Experiments are designed to test the performance of the proposed matching method. The experimental results show that the proposed method can decrease the matching time and increase the number of matching points while maintaining high registration accuracy. PMID:29702589
Parallel heuristics for scalable community detection
Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth
2015-08-14
Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
Parallel peak pruning for scalable SMP contour tree computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.
As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Hybrid multicore/vectorisation technique applied to the elastic wave equation on a staggered grid
NASA Astrophysics Data System (ADS)
Titarenko, Sofya; Hildyard, Mark
2017-07-01
In modern physics it has become common to find the solution of a problem by solving numerically a set of PDEs. Whether solving them on a finite difference grid or by a finite element approach, the main calculations are often applied to a stencil structure. In the last decade it has become usual to work with so called big data problems where calculations are very heavy and accelerators and modern architectures are widely used. Although CPU and GPU clusters are often used to solve such problems, parallelisation of any calculation ideally starts from a single processor optimisation. Unfortunately, it is impossible to vectorise a stencil structured loop with high level instructions. In this paper we suggest a new approach to rearranging the data structure which makes it possible to apply high level vectorisation instructions to a stencil loop and which results in significant acceleration. The suggested method allows further acceleration if shared memory APIs are used. We show the effectiveness of the method by applying it to an elastic wave propagation problem on a finite difference grid. We have chosen Intel architecture for the test problem and OpenMP (Open Multi-Processing) since they are extensively used in many applications.
Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias
2011-01-01
Future multiscale and multiphysics models must use the power of high performance computing (HPC) systems to enable research into human disease, translational medical science, and treatment. Previously we showed that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message passing processes (e.g. the message passing interface (MPI)) with multithreading (e.g. OpenMP, POSIX pthreads). The objective of this work is to compare the performance of such hybrid programming models when applied to the simulation of a lightweight multiscale cardiac model. Our results show that the hybrid models do not perform favourably when compared to an implementation using only MPI which is in contrast to our results using complex physiological models. Thus, with regards to lightweight multiscale cardiac models, the user may not need to increase programming complexity by using a hybrid programming approach. However, considering that model complexity will increase as well as the HPC system size in both node count and number of cores per node, it is still foreseeable that we will achieve faster than real time multiscale cardiac simulations on these systems using hybrid programming models.
PREMER: a Tool to Infer Biological Networks.
Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R
2017-10-04
Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations
NASA Astrophysics Data System (ADS)
Hause, Benjamin; Parker, Scott; Chen, Yang
2013-10-01
We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes
2017-01-01
To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation. PMID:28582389
Einkemmer, Lukas
2017-01-01
To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation.
Amrhein, Sven; Schwab, Marie-Luise; Hoffmann, Marc; Hubbuch, Jürgen
2014-11-07
Over the last decade, the use of design of experiment approaches in combination with fully automated high throughput (HTP) compatible screenings supported by robotic liquid handling stations (LHS), adequate fast analytics and data processing has been developed in the biopharmaceutical industry into a strategy of high throughput process development (HTPD) resulting in lower experimental effort, sample reduction and an overall higher degree of process optimization. Apart from HTP technologies, lab-on-a-chip technology has experienced an enormous growth in the last years and allows further reduction of sample consumption. A combination of LHS and lab-on-a-chip technology is highly desirable and realized in the present work to characterize aqueous two phase systems with respect to tie lines. In particular, a new high throughput compatible approach for the characterization of aqueous two phase systems regarding tie lines by exploiting differences in phase densities is presented. Densities were measured by a standalone micro fluidic liquid density sensor, which was integrated into a liquid handling station by means of a developed generic Tip2World interface. This combination of liquid handling stations and lab-on-a-chip technology enables fast, fully automated, and highly accurate density measurements. The presented approach was used to determine the phase diagram of ATPSs composed of potassium phosphate (pH 7) and polyethylene glycol (PEG) with a molecular weight of 300, 400, 600 and 1000 Da respectively in the presence and in the absence of 3% (w/w) sodium chloride. Considering the whole ATPS characterization process, two complete ATPSs could be characterized within 24h, including four runs per ATPS for binodal curve determination (less than 45 min/run), and tie line determination (less than 45 min/run for ATPS preparation and 8h for density determination), which can be performed fully automated over night without requiring man power. The presented methodology provides a cost, time and material effective approach for characterization of ATPS phase diagram on base on highly accurate and comprehensive data. By this means the derived data opens the door for a more detailed description of ATPS towards generating mechanistic based models, since molecular approaches such as MD simulations or molecular descriptions along the line of QSAR heavily rely on accurate and comprehensive data. Copyright © 2014 Elsevier B.V. All rights reserved.
Wildland firefighter deaths in the United States: A comparison of existing surveillance systems
Butler, Corey; Marsh, Suzanne; Domitrovich, Joseph W.; Helmkamp, Jim
2017-01-01
Wildland fire fighting is a high-risk occupation requiring considerable physical and psychological demands. Multiple agencies publish fatality summaries for wildland firefighters; however, the reported number and types vary. At least five different surveillance systems capture deaths, each with varying case definitions and case inclusion/exclusion criteria. Four are population-level systems and one is case-based. System differences create challenges to accurately characterize fatalities. Data within each of the five surveillance systems were examined to better understand the types of wildland firefighter data collected, to assess each system’s utility in characterizing wildland firefighter fatalities, and to determine each system’s potential to inform prevention strategies. To describe similarities and differences in how data were recorded and characterized, wildland fire deaths for three of the population-based systems were matched and individual fatalities across systems were compared. Between 2001 and 2012, 247 unique deaths were captured among the systems; 73% of these were captured in all three systems. Most common causes of death in all systems were associated with aviation, vehicles, medical events, and entrapments/burnovers. The data show that, although the three systems often report similar annual summary statistics, events captured in each system vary each year depending on the types of events that the system is designed to track, such as inclusion/exclusion of fatalities associated with the Hometown Heroes Survivor Benefits Act of 2003. The overarching and central goal of each system is to collect accurate and timely information to improve wildland firefighter safety and health. Each system is unique and has varying inclusion and exclusion criteria for capturing and tracking different subsets of wildland firefighter tasks and duties. Use of a common case definition and better descriptions and interpretations of the data and the results would help to more accurately characterize wildland firefighter traumatic injuries and illnesses, lessen the likelihood for misinterpretation of wildland firefighter fatality data, and assist with defining the true occupational injury burden within this high-risk population. PMID:27754819
The application of ANN for zone identification in a complex reservoir
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, A.C.; Molnar, D.; Aminian, K.
1995-12-31
Reservoir characterization plays a critical role in appraising the economic success of reservoir management and development methods. Nearly all reservoirs show some degree of heterogeneity, which invariably impacts production. As a result, the production performance of a complex reservoir cannot be realistically predicted without accurate reservoir description. Characterization of a heterogeneous reservoir is a complex problem. The difficulty stems from the fact that sufficient data to accurately predict the distribution of the formation attributes are not usually available. Generally the geophysical logs are available from a considerable number of wells in the reservoir. Therefore, a methodology for reservoir description andmore » characterization utilizing only well logs data represents a significant technical as well as economic advantage. One of the key issues in the description and characterization of heterogeneous formations is the distribution of various zones and their properties. In this study, several artificial neural networks (ANN) were successfully designed and developed for zone identification in a heterogeneous formation from geophysical well logs. Granny Creek Field in West Virginia has been selected as the study area in this paper. This field has produced oil from Big Injun Formation since the early 1900`s. The water flooding operations were initiated in the 1970`s and are currently still in progress. Well log data on a substantial number of wells in this reservoir were available and were collected. Core analysis results were also available from a few wells. The log data from 3 wells along with the various zone definitions were utilized to train the networks for zone recognition. The data from 2 other wells with previously determined zones, based on the core and log data, were then utilized to verify the developed networks predictions. The results indicated that ANN can be a useful tool for accurately identifying the zones in complex reservoirs.« less
Wildland firefighter deaths in the United States: A comparison of existing surveillance systems.
Butler, Corey; Marsh, Suzanne; Domitrovich, Joseph W; Helmkamp, Jim
2017-04-01
Wildland fire fighting is a high-risk occupation requiring considerable physical and psychological demands. Multiple agencies publish fatality summaries for wildland firefighters; however, the reported number and types vary. At least five different surveillance systems capture deaths, each with varying case definitions and case inclusion/exclusion criteria. Four are population-level systems and one is case-based. System differences create challenges to accurately characterize fatalities. Data within each of the five surveillance systems were examined to better understand the types of wildland firefighter data collected, to assess each system's utility in characterizing wildland firefighter fatalities, and to determine each system's potential to inform prevention strategies. To describe similarities and differences in how data were recorded and characterized, wildland fire deaths for three of the population-based systems were matched and individual fatalities across systems were compared. Between 2001 and 2012, 247 unique deaths were captured among the systems; 73% of these were captured in all three systems. Most common causes of death in all systems were associated with aviation, vehicles, medical events, and entrapments/burnovers. The data show that, although the three systems often report similar annual summary statistics, events captured in each system vary each year depending on the types of events that the system is designed to track, such as inclusion/exclusion of fatalities associated with the Hometown Heroes Survivor Benefits Act of 2003. The overarching and central goal of each system is to collect accurate and timely information to improve wildland firefighter safety and health. Each system is unique and has varying inclusion and exclusion criteria for capturing and tracking different subsets of wildland firefighter tasks and duties. Use of a common case definition and better descriptions and interpretations of the data and the results would help to more accurately characterize wildland firefighter traumatic injuries and illnesses, lessen the likelihood for misinterpretation of wildland firefighter fatality data, and assist with defining the true occupational injury burden within this high-risk population.
MEMS-based platforms for mechanical manipulation and characterization of cells
NASA Astrophysics Data System (ADS)
Pan, Peng; Wang, Wenhui; Ru, Changhai; Sun, Yu; Liu, Xinyu
2017-12-01
Mechanical manipulation and characterization of single cells are important experimental techniques in biological and medical research. Because of the microscale sizes and highly fragile structures of cells, conventional cell manipulation and characterization techniques are not accurate and/or efficient enough or even cannot meet the more and more demanding needs in different types of cell-based studies. To this end, novel microelectromechanical systems (MEMS)-based technologies have been developed to improve the accuracy, efficiency, and consistency of various cell manipulation and characterization tasks, and enable new types of cell research. This article summarizes existing MEMS-based platforms developed for cell mechanical manipulation and characterization, highlights their specific design considerations making them suitable for their designated tasks, and discuss their advantages and limitations. In closing, an outlook into future trends is also provided.
Towards an Optimized Method of Olive Tree Crown Volume Measurement
Miranda-Fuentes, Antonio; Llorens, Jordi; Gamarra-Diezma, Juan L.; Gil-Ribes, Jesús A.; Gil, Emilio
2015-01-01
Accurate crown characterization of large isolated olive trees is vital for adjusting spray doses in three-dimensional crop agriculture. Among the many methodologies available, laser sensors have proved to be the most reliable and accurate. However, their operation is time consuming and requires specialist knowledge and so a simpler crown characterization method is required. To this end, three methods were evaluated and compared with LiDAR measurements to determine their accuracy: Vertical Crown Projected Area method (VCPA), Ellipsoid Volume method (VE) and Tree Silhouette Volume method (VTS). Trials were performed in three different kinds of olive tree plantations: intensive, adapted one-trunked traditional and traditional. In total, 55 trees were characterized. Results show that all three methods are appropriate to estimate the crown volume, reaching high coefficients of determination: R2 = 0.783, 0.843 and 0.824 for VCPA, VE and VTS, respectively. However, discrepancies arise when evaluating tree plantations separately, especially for traditional trees. Here, correlations between LiDAR volume and other parameters showed that the Mean Vector calculated for VCPA method showed the highest correlation for traditional trees, thus its use in traditional plantations is highly recommended. PMID:25658396
A two-step method for rapid characterization of electroosmotic flows in capillary electrophoresis.
Zhang, Wenjing; He, Muyi; Yuan, Tao; Xu, Wei
2017-12-01
The measurement of electroosmotic flow (EOF) is important in a capillary electrophoresis (CE) experiment in terms of performance optimization and stability improvement. Although several methods exist, there are demanding needs to accurately characterize ultra-low electroosmotic flow rates (EOF rates), such as in coated capillaries used in protein separations. In this work, a new method, called the two-step method, was developed to accurately and rapidly measure EOF rates in a capillary, especially for measuring the ultra-low EOF rates in coated capillaries. In this two-step method, the EOF rates were calculated by measuring the migration time difference of a neutral marker in two consecutive experiments, in which a pressure driven was introduced to accelerate the migration and the DC voltage was reversed to switch the EOF direction. Uncoated capillaries were first characterized by both this two-step method and a conventional method to confirm the validity of this new method. Then this new method was applied in the study of coated capillaries. Results show that this new method is not only fast in speed, but also better in accuracy. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
On axiomatizations of the Shapley value for bi-cooperative games
NASA Astrophysics Data System (ADS)
Meirong, Wu; Shaochen, Cao; Huazhen, Zhu
2016-06-01
There are three decisions available for each participant in bi-cooperative games which can depict real life accurately in this paper. This paper researches the Shapley value of bi-cooperative games and completes the unique characterization. The axiom similar to classical cooperative games which could be used to characterize the Shapley value of bi-cooperative games as well. Meanwhile, it introduces a structural axiom and a zero excluded axiom instead of effective axiom in classical cooperative games.
NASA Astrophysics Data System (ADS)
Pan, Shijia; Mirshekari, Mostafa; Fagert, Jonathon; Ramirez, Ceferino Gabriel; Chung, Albert Jin; Hu, Chih Chi; Shen, John Paul; Zhang, Pei; Noh, Hae Young
2018-02-01
Many human activities induce excitations on ambient structures with various objects, causing the structures to vibrate. Accurate vibration excitation source detection and characterization enable human activity information inference, hence allowing human activity monitoring for various smart building applications. By utilizing structural vibrations, we can achieve sparse and non-intrusive sensing, unlike pressure- and vision-based methods. Many approaches have been presented on vibration-based source characterization, and they often either focus on one excitation type or have limited performance due to the dispersion and attenuation effects of the structures. In this paper, we present our method to characterize two main types of excitations induced by human activities (impulse and slip-pulse) on multiple structures. By understanding the physical properties of waves and their propagation, the system can achieve accurate excitation tracking on different structures without large-scale labeled training data. Specifically, our algorithm takes properties of surface waves generated by impulse and of body waves generated by slip-pulse into account to handle the dispersion and attenuation effects when different types of excitations happen on various structures. We then evaluate the algorithm through multiple scenarios. Our method achieves up to a six times improvement in impulse localization accuracy and a three times improvement in slip-pulse trajectory length estimation compared to existing methods that do not take wave properties into account.
Transverse Tension Fatigue Life Characterization Through Flexure Testing of Composite Materials
NASA Technical Reports Server (NTRS)
OBrien, T. Kevin; Chawan, Arun D.; Krueger, Ronald; Paris, Isabelle
2001-01-01
The transverse tension fatigue life of S2/8552 glass-epoxy and IM7/8552 carbon-epoxy was characterized using flexure tests of 90-degree laminates loaded in 3-point and 4-point bending. The influence of specimen polishing and specimen configuration on transverse tension fatigue life was examined using the glass-epoxy laminates. Results showed that 90-degree bend specimens with polished machined edges and polished tension-side surfaces, where bending failures where observed, had lower fatigue lives than unpolished specimens when cyclically loaded at equal stress levels. The influence of specimen thickness and the utility of a Weibull scaling law was examined using the carbon-epoxy laminates. The influence of test frequency on fatigue results was also documented for the 4-point bending configuration. A Weibull scaling law was used to predict the 4-point bending fatigue lives from the 3-point bending curve fit and vice-versa. Scaling was performed based on maximum cyclic stress level as well as fatigue life. The scaling laws based on stress level shifted the curve fit S-N characterizations in the desired direction, however, the magnitude of the shift was not adequate to accurately predict the fatigue lives. Furthermore, the scaling law based on fatigue life shifted the curve fit S-N characterizations in the opposite direction from measured values. Therefore, these scaling laws were not adequate for obtaining accurate predictions of the transverse tension fatigue lives.
Advanced Mass Spectrometric Methods for the Rapid and Quantitative Characterization of Proteomes
Smith, Richard D.
2002-01-01
Progress is reviewedmore » towards the development of a global strategy that aims to extend the sensitivity, dynamic range, comprehensiveness and throughput of proteomic measurements based upon the use of high performance separations and mass spectrometry. The approach uses high accuracy mass measurements from Fourier transform ion cyclotron resonance mass spectrometry (FTICR) to validate peptide ‘accurate mass tags’ (AMTs) produced by global protein enzymatic digestions for a specific organism, tissue or cell type from ‘potential mass tags’ tentatively identified using conventional tandem mass spectrometry (MS/MS). This provides the basis for subsequent measurements without the need for MS/ MS. High resolution capillary liquid chromatography separations combined with high sensitivity, and high resolution accurate FTICR measurements are shown to be capable of characterizing peptide mixtures of more than 10 5 components. The strategy has been initially demonstrated using the microorganisms Saccharomyces cerevisiae and Deinococcus radiodurans. Advantages of the approach include the high confidence of protein identification, its broad proteome coverage, high sensitivity, and the capability for stableisotope labeling methods for precise relative protein abundance measurements. Abbreviations : LC, liquid chromatography; FTICR, Fourier transform ion cyclotron resonance; AMT, accurate mass tag; PMT, potential mass tag; MMA, mass measurement accuracy; MS, mass spectrometry; MS/MS, tandem mass spectrometry; ppm, parts per million.« less
Time-Frequency Distribution of Seismocardiographic Signals: A Comparative Study
Taebi, Amirtaha; Mansy, Hansen A.
2017-01-01
Accurate estimation of seismocardiographic (SCG) signal features can help successful signal characterization and classification in health and disease. This may lead to new methods for diagnosing and monitoring heart function. Time-frequency distributions (TFD) were often used to estimate the spectrotemporal signal features. In this study, the performance of different TFDs (e.g., short-time Fourier transform (STFT), polynomial chirplet transform (PCT), and continuous wavelet transform (CWT) with different mother functions) was assessed using simulated signals, and then utilized to analyze actual SCGs. The instantaneous frequency (IF) was determined from TFD and the error in estimating IF was calculated for simulated signals. Results suggested that the lowest IF error depended on the TFD and the test signal. STFT had lower error than CWT methods for most test signals. For a simulated SCG, Morlet CWT more accurately estimated IF than other CWTs, but Morlet did not provide noticeable advantages over STFT or PCT. PCT had the most consistently accurate IF estimations and appeared more suited for estimating IF of actual SCG signals. PCT analysis showed that actual SCGs from eight healthy subjects had multiple spectral peaks at 9.20 ± 0.48, 25.84 ± 0.77, 50.71 ± 1.83 Hz (mean ± SEM). These may prove useful features for SCG characterization and classification. PMID:28952511
Soil moisture sensing via swept frequency based microwave sensors
USDA-ARS?s Scientific Manuscript database
Accurate measurement of moisture content is a prime requirement in hydrological, geophysical, and biogeochemical research as well as for material characterization, process control, and irrigation efficiency in water limited regions. Within these areas, consideration of the surface area and associate...
77 FR 40836 - Pennsylvania Regulatory Program
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-11
....302. Number, Location and Depth of Monitoring Points The water quality monitoring system shall accurately characterize groundwater and surface water flow and chemistry and flow systems on the site and... properties of coal ash beneficially used and water quality monitoring requirements. Pennsylvania is...
SPATIAL PREDICTION USING COMBINED SOURCES OF DATA
For improved environmental decision-making, it is important to develop new models for spatial prediction that accurately characterize important spatial and temporal patterns of air pollution. As the U .S. Environmental Protection Agency begins to use spatial prediction in the reg...
Better understanding the transport mechanisms of organophosphorus flame-retardants (OPFRs) in the residential environment is important to more accurately estimate their indoor exposure and develop risk management strategies that protect human health. This study describes an impro...
Gibelli, François; Lombez, Laurent; Guillemoles, Jean-François
2017-02-15
In order to characterize hot carrier populations in semiconductors, photoluminescence measurement is a convenient tool, enabling us to probe the carrier thermodynamical properties in a contactless way. However, the analysis of the photoluminescence spectra is based on some assumptions which will be discussed in this work. We especially emphasize the importance of the variation of the material absorptivity that should be considered to access accurate thermodynamical properties of the carriers, especially by varying the excitation power. The proposed method enables us to obtain more accurate results of thermodynamical properties by taking into account a rigorous physical description and finds direct application in investigating hot carrier solar cells, which are an adequate concept for achieving high conversion efficiencies with a relatively simple device architecture.
Towards a Transferable UAV-Based Framework for River Hydromorphological Characterization
González, Rocío Ballesteros; Leinster, Paul; Wright, Ros
2017-01-01
The multiple protocols that have been developed to characterize river hydromorphology, partly in response to legislative drivers such as the European Union Water Framework Directive (EU WFD), make the comparison of results obtained in different countries challenging. Recent studies have analyzed the comparability of existing methods, with remote sensing based approaches being proposed as a potential means of harmonizing hydromorphological characterization protocols. However, the resolution achieved by remote sensing products may not be sufficient to assess some of the key hydromorphological features that are required to allow an accurate characterization. Methodologies based on high resolution aerial photography taken from Unmanned Aerial Vehicles (UAVs) have been proposed by several authors as potential approaches to overcome these limitations. Here, we explore the applicability of an existing UAV based framework for hydromorphological characterization to three different fluvial settings representing some of the distinct ecoregions defined by the WFD geographical intercalibration groups (GIGs). The framework is based on the automated recognition of hydromorphological features via tested and validated Artificial Neural Networks (ANNs). Results show that the framework is transferable to the Central-Baltic and Mediterranean GIGs with accuracies in feature identification above 70%. Accuracies of 50% are achieved when the framework is implemented in the Very Large Rivers GIG. The framework successfully identified vegetation, deep water, shallow water, riffles, side bars and shadows for the majority of the reaches. However, further algorithm development is required to ensure a wider range of features (e.g., chutes, structures and erosion) are accurately identified. This study also highlights the need to develop an objective and fit for purpose hydromorphological characterization framework to be adopted within all EU member states to facilitate comparison of results. PMID:28954434
Towards a Transferable UAV-Based Framework for River Hydromorphological Characterization.
Rivas Casado, Mónica; González, Rocío Ballesteros; Ortega, José Fernando; Leinster, Paul; Wright, Ros
2017-09-26
The multiple protocols that have been developed to characterize river hydromorphology, partly in response to legislative drivers such as the European Union Water Framework Directive (EU WFD), make the comparison of results obtained in different countries challenging. Recent studies have analyzed the comparability of existing methods, with remote sensing based approaches being proposed as a potential means of harmonizing hydromorphological characterization protocols. However, the resolution achieved by remote sensing products may not be sufficient to assess some of the key hydromorphological features that are required to allow an accurate characterization. Methodologies based on high resolution aerial photography taken from Unmanned Aerial Vehicles (UAVs) have been proposed by several authors as potential approaches to overcome these limitations. Here, we explore the applicability of an existing UAV based framework for hydromorphological characterization to three different fluvial settings representing some of the distinct ecoregions defined by the WFD geographical intercalibration groups (GIGs). The framework is based on the automated recognition of hydromorphological features via tested and validated Artificial Neural Networks (ANNs). Results show that the framework is transferable to the Central-Baltic and Mediterranean GIGs with accuracies in feature identification above 70%. Accuracies of 50% are achieved when the framework is implemented in the Very Large Rivers GIG. The framework successfully identified vegetation, deep water, shallow water, riffles, side bars and shadows for the majority of the reaches. However, further algorithm development is required to ensure a wider range of features (e.g., chutes, structures and erosion) are accurately identified. This study also highlights the need to develop an objective and fit for purpose hydromorphological characterization framework to be adopted within all EU member states to facilitate comparison of results.
Helb, Danica A.; Tetteh, Kevin K. A.; Felgner, Philip L.; Skinner, Jeff; Hubbard, Alan; Arinaitwe, Emmanuel; Mayanja-Kizza, Harriet; Ssewanyana, Isaac; Kamya, Moses R.; Beeson, James G.; Tappero, Jordan; Smith, David L.; Crompton, Peter D.; Rosenthal, Philip J.; Dorsey, Grant; Drakeley, Christopher J.; Greenhouse, Bryan
2015-01-01
Tools to reliably measure Plasmodium falciparum (Pf) exposure in individuals and communities are needed to guide and evaluate malaria control interventions. Serologic assays can potentially produce precise exposure estimates at low cost; however, current approaches based on responses to a few characterized antigens are not designed to estimate exposure in individuals. Pf-specific antibody responses differ by antigen, suggesting that selection of antigens with defined kinetic profiles will improve estimates of Pf exposure. To identify novel serologic biomarkers of malaria exposure, we evaluated responses to 856 Pf antigens by protein microarray in 186 Ugandan children, for whom detailed Pf exposure data were available. Using data-adaptive statistical methods, we identified combinations of antibody responses that maximized information on an individual’s recent exposure. Responses to three novel Pf antigens accurately classified whether an individual had been infected within the last 30, 90, or 365 d (cross-validated area under the curve = 0.86–0.93), whereas responses to six antigens accurately estimated an individual’s malaria incidence in the prior year. Cross-validated incidence predictions for individuals in different communities provided accurate stratification of exposure between populations and suggest that precise estimates of community exposure can be obtained from sampling a small subset of that community. In addition, serologic incidence predictions from cross-sectional samples characterized heterogeneity within a community similarly to 1 y of continuous passive surveillance. Development of simple ELISA-based assays derived from the successful selection strategy outlined here offers the potential to generate rich epidemiologic surveillance data that will be widely accessible to malaria control programs. PMID:26216993
Adhikari, Puspa L; Wong, Roberto L; Overton, Edward B
2017-10-01
Accurate characterization of petroleum hydrocarbons in complex and weathered oil residues is analytically challenging. This is primarily due to chemical compositional complexity of both the oil residues and environmental matrices, and the lack of instrumental selectivity due to co-elution of interferences with the target analytes. To overcome these analytical selectivity issues, we used an enhanced resolution gas chromatography coupled with triple quadrupole mass spectrometry in Multiple Reaction Monitoring (MRM) mode (GC/MS/MS-MRM) to eliminate interferences within the ion chromatograms of target analytes found in environmental samples. This new GC/MS/MS-MRM method was developed and used for forensic fingerprinting of deep-water and marsh sediment samples containing oily residues from the Deepwater Horizon oil spill. The results showed that the GC/MS/MS-MRM method increases selectivity, eliminates interferences, and provides more accurate quantitation and characterization of trace levels of alkyl-PAHs and biomarker compounds, from weathered oil residues in complex sample matrices. The higher selectivity of the new method, even at low detection limits, provides greater insights on isomer and homolog compositional patterns and the extent of oil weathering under various environmental conditions. The method also provides flat chromatographic baselines for accurate and unambiguous calculation of petroleum forensic biomarker compound ratios. Thus, this GC/MS/MS-MRM method can be a reliable analytical strategy for more accurate and selective trace level analyses in petroleum forensic studies, and for tacking continuous weathering of oil residues. Copyright © 2017 Elsevier Ltd. All rights reserved.
Helb, Danica A; Tetteh, Kevin K A; Felgner, Philip L; Skinner, Jeff; Hubbard, Alan; Arinaitwe, Emmanuel; Mayanja-Kizza, Harriet; Ssewanyana, Isaac; Kamya, Moses R; Beeson, James G; Tappero, Jordan; Smith, David L; Crompton, Peter D; Rosenthal, Philip J; Dorsey, Grant; Drakeley, Christopher J; Greenhouse, Bryan
2015-08-11
Tools to reliably measure Plasmodium falciparum (Pf) exposure in individuals and communities are needed to guide and evaluate malaria control interventions. Serologic assays can potentially produce precise exposure estimates at low cost; however, current approaches based on responses to a few characterized antigens are not designed to estimate exposure in individuals. Pf-specific antibody responses differ by antigen, suggesting that selection of antigens with defined kinetic profiles will improve estimates of Pf exposure. To identify novel serologic biomarkers of malaria exposure, we evaluated responses to 856 Pf antigens by protein microarray in 186 Ugandan children, for whom detailed Pf exposure data were available. Using data-adaptive statistical methods, we identified combinations of antibody responses that maximized information on an individual's recent exposure. Responses to three novel Pf antigens accurately classified whether an individual had been infected within the last 30, 90, or 365 d (cross-validated area under the curve = 0.86-0.93), whereas responses to six antigens accurately estimated an individual's malaria incidence in the prior year. Cross-validated incidence predictions for individuals in different communities provided accurate stratification of exposure between populations and suggest that precise estimates of community exposure can be obtained from sampling a small subset of that community. In addition, serologic incidence predictions from cross-sectional samples characterized heterogeneity within a community similarly to 1 y of continuous passive surveillance. Development of simple ELISA-based assays derived from the successful selection strategy outlined here offers the potential to generate rich epidemiologic surveillance data that will be widely accessible to malaria control programs.
Subthreshold SPICE Model Optimization
NASA Astrophysics Data System (ADS)
Lum, Gregory; Au, Henry; Neff, Joseph; Bozeman, Eric; Kamin, Nick; Shimabukuro, Randy
2011-04-01
The first step in integrated circuit design is the simulation of said design in software to verify proper functionally and design requirements. Properties of the process are provided by fabrication foundries in the form of SPICE models. These SPICE models contain the electrical data and physical properties of the basic circuit elements. A limitation of these models is that the data collected by the foundry only accurately model the saturation region. This is fine for most users, but when operating devices in the subthreshold region they are inadequate for accurate simulation results. This is why optimizing the current SPICE models to characterize the subthreshold region is so important. In order to accurately simulate this region of operation, MOSFETs of varying widths and lengths are fabricated and the electrical test data is collected. From the data collected the parameters of the model files are optimized through parameter extraction rather than curve fitting. With the completed optimized models the circuit designer is able to simulate circuit designs for the sub threshold region accurately.
Personality retesting for managing intentional distortion.
Ellingson, Jill E; Heggestad, Eric D; Makarius, Erin E
2012-05-01
Self-report personality questionnaires often contain validity scales designed to flag individuals who intentionally distort their responses toward a more favorable characterization of themselves. Yet, there are no clear directives on how scores on these scales should be used by administrators when making high-stakes decisions about respondents. Two studies were conducted to investigate whether administrator-initiated retesting of flagged individuals represents a viable response to managing intentional distortion on personality questionnaires. We explored the effectiveness of retesting by considering whether retest responses are more accurate representations of a flagged individual's personality characteristics. A comparison of retest scores to a baseline measure of personality indicated that such scores were more accurate. Retesting should only work as a strategy for dealing with intentional distortion when individuals choose to respond more accurately the second time. Thus, we further explored the emotional reaction to being asked to retest as one possible explanation of why individuals who engage in intentional distortion respond more accurately upon retest.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hardisty, M.; Gordon, L.; Agarwal, P.
2007-08-15
Quantitative assessment of metastatic disease in bone is often considered immeasurable and, as such, patients with skeletal metastases are often excluded from clinical trials. In order to effectively quantify the impact of metastatic tumor involvement in the spine, accurate segmentation of the vertebra is required. Manual segmentation can be accurate but involves extensive and time-consuming user interaction. Potential solutions to automating segmentation of metastatically involved vertebrae are demons deformable image registration and level set methods. The purpose of this study was to develop a semiautomated method to accurately segment tumor-bearing vertebrae using the aforementioned techniques. By maintaining morphology of anmore » atlas, the demons-level set composite algorithm was able to accurately differentiate between trans-cortical tumors and surrounding soft tissue of identical intensity. The algorithm successfully segmented both the vertebral body and trabecular centrum of tumor-involved and healthy vertebrae. This work validates our approach as equivalent in accuracy to an experienced user.« less
A practical model for pressure probe system response estimation (with review of existing models)
NASA Astrophysics Data System (ADS)
Hall, B. F.; Povey, T.
2018-04-01
The accurate estimation of the unsteady response (bandwidth) of pneumatic pressure probe systems (probe, line and transducer volume) is a common practical problem encountered in the design of aerodynamic experiments. Understanding the bandwidth of the probe system is necessary to capture unsteady flow features accurately. Where traversing probes are used, the desired traverse speed and spatial gradients in the flow dictate the minimum probe system bandwidth required to resolve the flow. Existing approaches for bandwidth estimation are either complex or inaccurate in implementation, so probes are often designed based on experience. Where probe system bandwidth is characterized, it is often done experimentally, requiring careful experimental set-up and analysis. There is a need for a relatively simple but accurate model for estimation of probe system bandwidth. A new model is presented for the accurate estimation of pressure probe bandwidth for simple probes commonly used in wind tunnel environments; experimental validation is provided. An additional, simple graphical method for air is included for convenience.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Photochemistry of Aqueous C60 Clusters: Wavelength Dependency and Product Characterization
To construct accurate risk assessment models for engineered nanomaterials, there is urgent need for information on the reactivity (or conversely, persistence) and transformation pathways of these materials in the natural environment. As an important step toward addressing this is...
Determination of Ten Perfluorinated Compounds in Bluegill Sunfish (Lepomis macrochirus) Fillets
Limited information is known about the environmental distributions of the perfluorinated compounds (PFCs) such as perfluorooctane sulfonate (PFOS) and perfluorooctanoic acid (PFOA), in part due to a lack of well characterized analytical methods that can be used to accurately mea...
Accurate and affordable physicochemical characterization of commercial engineered nanomaterials is required for toxicology studies to ultimately determine nanomaterial: hazard identification; dose to response metric(s); and mechanism(s) of injury. A minimal physical and chemica...
A high-quality annotated transcriptome of swine peripheral blood
USDA-ARS?s Scientific Manuscript database
Background: High throughput gene expression profiling assays of peripheral blood are widely used in biomedicine, as well as in animal genetics and physiology research. Accurate, comprehensive, and precise interpretation of such high throughput assays relies on well-characterized reference genomes an...
Ecosystem Health: Energy Indicators.
Just as for human beings health is a concept that applies to the condition of the whole organism, the health of an ecosystem refers to the condition of the ecosystem as a whole. For this reason, the study and characterization of ecosystems is fundamental to establishing accurate ...
The Dyslexia Simulation: Impact and Implications
ERIC Educational Resources Information Center
Wadlington, Elizabeth; Elliot, Cynthia; Kirylo, James
2008-01-01
Many students with reading difficulties have a specific learning disability called dyslexia, which is neurobiological in origin and characterized by problems with spelling, decoding, and accurate/fluent word identification, negatively impacting vocabulary growth and comprehension. Consequently, the role of the insightful teacher is critical in…
Characterization of lipid-rich plaques using spectroscopic optical coherence tomography
NASA Astrophysics Data System (ADS)
Nam, Hyeong Soo; Song, Joon Woo; Jang, Sun-Joo; Lee, Jae Joong; Oh, Wang-Yuhl; Kim, Jin Won; Yoo, Hongki
2016-07-01
Intravascular optical coherence tomography (IV-OCT) is a high-resolution imaging method used to visualize the internal structures of walls of coronary arteries in vivo. However, accurate characterization of atherosclerotic plaques with gray-scale IV-OCT images is often limited by various intrinsic artifacts. In this study, we present an algorithm for characterizing lipid-rich plaques with a spectroscopic OCT technique based on a Gaussian center of mass (GCOM) metric. The GCOM metric, which reflects the absorbance properties of lipids, was validated using a lipid phantom. In addition, the proposed characterization method was successfully demonstrated in vivo using an atherosclerotic rabbit model and was found to have a sensitivity and specificity of 94.3% and 76.7% for lipid classification, respectively.
Systems Characterization of Combustor Instabilities With Controls Design Emphasis
NASA Technical Reports Server (NTRS)
Kopasakis, George
2004-01-01
This effort performed test data analysis in order to characterize the general behavior of combustor instabilities with emphasis on controls design. The analysis is performed on data obtained from two configurations of a laboratory combustor rig and from a developmental aero-engine combustor. The study has characterized several dynamic behaviors associated with combustor instabilities. These are: frequency and phase randomness, amplitude modulations, net random phase walks, random noise, exponential growth and intra-harmonic couplings. Finally, the very cause of combustor instabilities was explored and it could be attributed to a more general source-load type impedance interaction that includes the thermo-acoustic coupling. Performing these characterizations on different combustors allows for more accurate identification of the cause of these phenomena and their effect on instability.
2015-01-01
In vitro toxicity assessment of engineered nanomaterials (ENM), the most common testing platform for ENM, requires prior ENM dispersion, stabilization, and characterization in cell culture media. Dispersion inefficiencies and active aggregation of particles often result in polydisperse and multimodal particle size distributions. Accurate characterization of important properties of such polydisperse distributions (size distribution, effective density, charge, mobility, aggregation kinetics, etc.) is critical for understanding differences in the effective dose delivered to cells as a function of time and dispersion conditions, as well as for nano–bio interactions. Here we have investigated the utility of tunable nanopore resistive pulse sensing (TRPS) technology for characterization of four industry relevant ENMs (oxidized single-walled carbon nanohorns, carbon black, cerium oxide and nickel nanoparticles) in cell culture media containing serum. Harvard dispersion and dosimetry platform was used for preparing ENM dispersions and estimating delivered dose to cells based on dispersion characterization input from dynamic light scattering (DLS) and TRPS. The slopes of cell death vs administered and delivered ENM dose were then derived and compared. We investigated the impact of serum protein content, ENM concentration, and cell medium on the size distributions. The TRPS technology offers higher resolution and sensitivity compared to DLS and unique insights into ENM size distribution and concentration, as well as particle behavior and morphology in complex media. The in vitro dose–response slopes changed significantly for certain nanomaterials when delivered dose to cells was taken into consideration, highlighting the importance of accurate dispersion and dosimetry in in vitro nanotoxicology. PMID:25093451
The dynamics of turbulent premixed flames: Mechanisms and models for turbulence-flame interaction
NASA Astrophysics Data System (ADS)
Steinberg, Adam M.
The use of turbulent premixed combustion in engines has been garnering renewed interest due to its potential to reduce NOx emissions. However there are many aspects of turbulence-flame interaction that must be better understood before such flames can be accurately modeled. The focus of this dissertation is to develop an improved understanding for the manner in which turbulence interacts with a premixed flame in the 'thin flamelet regime'. To do so, two new diagnostics were developed and employed in a turbulent slot Bunsen flame. These diagnostics, Cinema-Stereoscopic Particle Image Velocimetry and Orthogonal-Plane Cinema-Stereoscopic Particle Image Velocimetry, provided temporally resolved velocity and flame surface measurements in two- and three-dimensions with rates of up to 3 kHz and spatial resolutions as low as 280 mum. Using these measurements, the mechanisms with which turbulence generates flame surface area were studied. It was found that the previous concept that flame stretch is characterized by counter-rotating vortex pairs does not accurately describe real turbulence-flame interactions. Analysis of the experimental data showed that the straining of the flame surface is determined by coherent structures of fluid dynamic strain rate, while the wrinkling is caused by vortical structures. Furthermore, it was shown that the canonical vortex pair configuration is not an accurate reflection of the real interaction geometry. Hence, models developed based on this geometry are unlikely to be accurate. Previous models for the strain rate, curvature stretch rate, and turbulent burning velocity were evaluated. It was found that the previous models did not accurately predict the measured data for a variety of reasons: the assumed interaction geometries did not encompass enough possibilities to describe the possible effects of real turbulence, the turbulence was not properly characterized, and the transport of flame surface area was not always considered. New models therefore were developed that accurately reflect real turbulence-flame interactions and agree with the measured data. These can be implemented in Large Eddy Simulations to provide improved modeling of turbulence-flame interaction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rogers, D.M.; Coggins, T.L.; Marsh, J.
Numerous efforts are funded by US agencies (DOE, DoD, DHS) for development of novel radiation sensing and measurement systems. An effort has been undertaken to develop a flexible shielding system compatible with a variety of sources (beta, X-ray, gamma, and neutron) that can be highly characterized using conventional radiation detection and measurement systems. Sources available for use in this system include americium-beryllium (AmBe), plutonium-beryllium (PuBe), strontium-90 (Sr-90), californium-252 (Cf-252), krypton-85 (Kr-85), americium-241 (Am-241), and depleted uranium (DU). Shielding can be varied by utilization of materials that include lexan, water, oil, lead, and polyethylene. Arrangements and geometries of source(s) and shieldingmore » can produce symmetrical or asymmetrical radiation fields. The system has been developed to facilitate accurately repeatable configurations. Measurement positions are similarly capable of being accurately re-created. Stand-off measurement positions can be accurately re-established using differential global positioning system (GPS) navigation. Instruments used to characterize individual measurement locations include a variety of sodium iodide (NaI(Tl)) (3 x 3 inch, 4 x 4 x 16 inch, Fidler) and lithium iodide (LiI(Eu)) detectors (for use with multichannel analyzer software) and detectors for use with traditional hand held survey meters such as boron trifluoride (BF{sub 3}), helium-3 ({sup 3}He), and Geiger-Mueller (GM) tubes. Also available are Global Dosimetry thermoluminescent dosimeters (TLDs), CR39 neutron chips, and film badges. Data will be presented comparing measurement techniques with shielding/source configurations. The system is demonstrated to provide a highly functional process for comparison/characterization of various detector types relative to controllable radiation types and levels. Particular attention has been paid to use of neutron sources and measurements. (authors)« less
NASA Astrophysics Data System (ADS)
Chen, X.; Yao, G.; Cai, J.
2017-12-01
Pore structure characteristics are important factors in influencing the fluid transport behavior of porous media, such as pore-throat ratio, pore connectivity and size distribution, moreover, wettability. To accurately characterize the diversity of pore structure among HFUs, five samples selected from different HFUs (porosities are approximately equal, however permeability varies widely) were chosen to conduct micro-computerized tomography test to acquire direct 3D images of pore geometries and to perform mercury injection experiments to obtain the pore volume-radii distribution. To characterize complex and high nonlinear pore structure of all samples, three classic fractal geometry models were applied. Results showed that each HFU has similar box-counting fractal dimension and generalized fractal dimension in the number-area model, but there are significant differences in multifractal spectrums. In the radius-volume model, there are three obvious linear segments, corresponding to three fractal dimension values, and the middle one is proved as the actual fractal dimension according to the maximum radius. In the number-radius model, the spherical-pore size distribution extracted by maximum ball algorithm exist a decrease in the number of small pores compared with the fractal power rate rather than the traditional linear law. Among the three models, only multifractal analysis can classify the HFUs accurately. Additionally, due to the tightness and low-permeability in reservoir rocks, connate water film existing in the inner surface of pore channels commonly forms bound water. The conventional model which is known as Yu-Cheng's model has been proved to be typically not applicable. Considering the effect of irreducible water saturation, an improved fractal permeability model was also deduced theoretically. The comparison results showed that the improved model can be applied to calculate permeability directly and accurately in such unconventional rocks.
NASA Technical Reports Server (NTRS)
Romanofsky, Robert R.; Shalkhauser, Kurt A.
1989-01-01
The design and evaluation of a novel fixturing technique for characterizing millimeter wave solid state devices is presented. The technique utilizes a cosine-tapered ridge guide fixture and a one-tier de-embedding procedure to produce accurate and repeatable device level data. Advanced features of this technique include nondestructive testing, full waveguide bandwidth operation, universality of application, and rapid, yet repeatable, chip-level characterization. In addition, only one set of calibration standards is required regardless of the device geometry.
Ultrasound measurement apparatus for liquids characterization
NASA Astrophysics Data System (ADS)
Vieira, R. C.; Costa-Felix, R. P. B.
2018-03-01
The present paper discloses the validation of an experimental ultrasound apparatus and method for liquids characterization. The research aims to stablish a simple, reliable, accurate and portable way to identify contaminants in hydrocarbon substances, such as adulteration in gasoline. The results depicted so far demonstrated a general uncertainty of speed of sound assessment less than 10 m s-1, and distance accuracy of less than 1%. Those figures are good enough for an in-site device to evaluate possible contamination of fuels or other liquids.
NASA Technical Reports Server (NTRS)
Banks, Daniel W.
2008-01-01
Infrared thermography is a powerful tool for investigating fluid mechanics on flight vehicles. (Can be used to visualize and characterize transition, shock impingement, separation etc.). Updated onboard F-15 based system was used to visualize supersonic boundary layer transition test article. (Tollmien-Schlichting and cross-flow dominant flow fields). Digital Recording improves image quality and analysis capability. (Allows accurate quantitative (temperature) measurements, Greater enhancement through image processing allows analysis of smaller scale phenomena).
A Parallel Stochastic Framework for Reservoir Characterization and History Matching
Thomas, Sunil G.; Klie, Hector M.; Rodriguez, Adolfo A.; ...
2011-01-01
The spatial distribution of parameters that characterize the subsurface is never known to any reasonable level of accuracy required to solve the governing PDEs of multiphase flow or species transport through porous media. This paper presents a numerically cheap, yet efficient, accurate and parallel framework to estimate reservoir parameters, for example, medium permeability, using sensor information from measurements of the solution variables such as phase pressures, phase concentrations, fluxes, and seismic and well log data. Numerical results are presented to demonstrate the method.
NASA Technical Reports Server (NTRS)
Hardage, Donna (Technical Monitor); Davis, V. A.; Mandell, M. J.; Thomsen, M. F.
2003-01-01
An improved specification of the plasma environment has been developed for use in modeling spacecraft charging. It was developed by statistically analyzing a large part of the LANL Magnetospheric Plasma Analyzer (MPA) data set for ion and electron spectral signature correlation with spacecraft charging, including anisotropies. The objective is to identify a relatively simple characterization of the full particle distributions that yield an accurate predication of the observed charging under a wide variety of conditions.
Cognitive learning: a machine learning approach for automatic process characterization from design
NASA Astrophysics Data System (ADS)
Foucher, J.; Baderot, J.; Martinez, S.; Dervilllé, A.; Bernard, G.
2018-03-01
Cutting edge innovation requires accurate and fast process-control to obtain fast learning rate and industry adoption. Current tools available for such task are mainly manual and user dependent. We present in this paper cognitive learning, which is a new machine learning based technique to facilitate and to speed up complex characterization by using the design as input, providing fast training and detection time. We will focus on the machine learning framework that allows object detection, defect traceability and automatic measurement tools.
NASA Astrophysics Data System (ADS)
Medialdea, Alicia; Bateman, Mark D.; Evans, David J.; Roberts, David H.; Chiverrell, Richard C.; Clark, Chris D.
2017-04-01
BRITICE-CHRONO is a NERC-funded consortium project of more than 40 researchers aiming to establish the retreat patterns of the last British and Irish Ice Sheet. For this purpose, optically stimulated luminescence (OSL) dating, among other dating techniques, has been used in order to establish accurate chronology. More than 150 samples from glacial environments have been dated and provide key information for modelling of the ice retreat. Nevertheless, luminescence dating of glacial sediments has proven to be challenging: first, glacial sediments were often affected by incomplete bleaching and secondly, quartz grains within the sediments sampled were often characterized by complex luminescence behaviour; characterized by dim signal and low reproducibility. Specific statistical approaches have been used to over come the former to enable the estimated ages to be based on grain populations most likely to have been well bleached. This latest work presents how issues surrounding complex luminescence behaviour were over-come in order to obtain accurate OSL ages. This study has been performed on two samples of bedded sand originated on an ice walled lake plain, in Lincolnshire, UK. Quartz extracts from each sample were artificially bleached and irradiated to known doses. Dose recovery tests have been carried out under different conditions to study the effect of: preheat temperature, thermal quenching, contribution of slow components, hot bleach after a measuring cycles and IR stimulation. Measurements have been performed on different luminescence readers to study the possible contribution of instrument reproducibility. These have shown that a great variability can be observed not only among the studied samples but also within a specific site and even a specific sample. In order to determine an accurate chronology and realistic uncertainties to the estimated ages, this variability must be taken into account. Tight acceptance criteria to measured doses from natural, not exposed, aliquots have been applied. These derived on reproducible dose distributions from which accurate ages could be estimated.
NASA Technical Reports Server (NTRS)
Lucero, John M.
2003-01-01
A new optically based measuring capability that characterizes surface topography, geometry, and wear has been employed by NASA Glenn Research Center s Tribology and Surface Science Branch. To characterize complex parts in more detail, we are using a three-dimensional, surface structure analyzer-the NewView5000 manufactured by Zygo Corporation (Middlefield, CT). This system provides graphical images and high-resolution numerical analyses to accurately characterize surfaces. Because of the inherent complexity of the various analyzed assemblies, the machine has been pushed to its limits. For example, special hardware fixtures and measuring techniques were developed to characterize Oil- Free thrust bearings specifically. We performed a more detailed wear analysis using scanning white light interferometry to image and measure the bearing structure and topography, enabling a further understanding of bearing failure causes.
NASA Astrophysics Data System (ADS)
Marshall, Hans-Peter
The distribution of water in the snow-covered areas of the world is an important climate change indicator, and it is a vital component of the water cycle. At local and regional scales, the snow water equivalent (SWE), the amount of liquid water a given area of the snowpack represents, is very important for water resource management, flood forecasting, and prediction of available hydropower energy. Measurements from only a few automatic weather stations, such as the SNOTEL network, or sparse manual snowpack measurements are typically extrapolated for estimating SWE over an entire basin. Widespread spatial variability in the distribution of SWE and snowpack stratigraphy at local scales causes large errors in these basin estimates. Remote sensing measurements offer a promising alternative, due to their large spatial coverage and high temporal resolution. Although snow cover extent can currently be estimated from remote sensing data, accurately quantifying SWE from remote sensing measurements has remained difficult, due to a high sensitivity to variations in grain size and stratigraphy. In alpine snowpacks, the large degree of spatial variability of snowpack properties and geometry, caused by topographic, vegetative, and microclimatic effects, also makes prediction of snow avalanches very difficult. Ground-based radar and penetrometer measurements can quickly and accurately characterize snowpack properties and SWE in the field. A portable lightweight radar was developed, and allows a real-time estimate of SWE to within 10%, as well as measurements of depths of all major density transitions within the snowpack. New analysis techniques developed in this thesis allow accurate estimates of mechanical properties and an index of grain size to be retrieved from the SnowMicroPenetrometer. These two tools together allow rapid characterization of the snowpack's geometry, mechanical properties, and SWE, and are used to guide a finite element model to study the stress distribution on a slope. The ability to accurately characterize snowpack properties at much higher resolutions and spatial extent than previously possible will hopefully help lead to a more complete understanding of spatial variability, its effect on remote sensing measurements and snow slope stability, and result in improvements in avalanche prediction and accuracy of SWE estimates from space.
USDA-ARS?s Scientific Manuscript database
Optical characterization of biological materials is useful in many scientific and industrial applications like biomedical diagnosis and nondestructive quality evaluation of food and agricultural products. However, accurate determination of the optical properties from intact biological materials base...
USDA-ARS?s Scientific Manuscript database
Accurate stream topography measurement is important for many ecological applications such as hydraulic modeling and habitat characterization. Habitat complexity measures are often made using total station surveying or visual approximation, which can be subjective and have spatial resolution limitati...
Teacher Evaluation: The Limits of Looking.
ERIC Educational Resources Information Center
Stodolsky, Susan S.
1984-01-01
Reviews current teacher evaluation practices with particular focus on the use of observation. Argues that direct observation is an inadequate evaluation technique because it assumes that stability and consistency are necessary for effective teaching. Presents data showing that flexibility is a more accurate characterization of elementary level…
Accurate and precise characterization of exposure of aquatic ecological resources to chemical stressors is required for ecological risk assessment. Within this assessment, the study of the vulnerability of these resources requires comparative exposure assessments across watershe...
TEMPORAL VARIABILITY OF MICROBIAL INDICATORS OF FECAL CONTAMINATION OF MARINE AND FRESHWATER BEACHES
Monitoring methods for microbial indicators of fecal contamination are an integral component for protecting the health of swimmers exposed to potentially contaminated bathing beach waters. The design of monitoring systems which will accurately characterize the quality of water is...
Accurate Biomass Estimation via Bayesian Adaptive Sampling
NASA Technical Reports Server (NTRS)
Wheeler, Kevin R.; Knuth, Kevin H.; Castle, Joseph P.; Lvov, Nikolay
2005-01-01
The following concepts were introduced: a) Bayesian adaptive sampling for solving biomass estimation; b) Characterization of MISR Rahman model parameters conditioned upon MODIS landcover. c) Rigorous non-parametric Bayesian approach to analytic mixture model determination. d) Unique U.S. asset for science product validation and verification.
Predicting the degradability of waste activated sludge.
Jones, Richard; Parker, Wayne; Zhu, Henry; Houweling, Dwight; Murthy, Sudhir
2009-08-01
The objective of this study was to identify methods for estimating anaerobic digestibility of waste activated sludge (WAS). The WAS streams were generated in three sequencing batch reactors (SBRs) treating municipal wastewater. The wastewater and WAS properties were initially determined through simulation of SBR operation with BioWin (EnviroSim Associates Ltd., Flamborough, Ontario, Canada). Samples of WAS from the SBRs were subsequently characterized through respirometry and batch anaerobic digestion. Respirometry was an effective tool for characterizing the active fraction of WAS and could be a suitable technique for determining sludge composition for input to anaerobic models. Anaerobic digestion of the WAS revealed decreasing methane production and lower chemical oxygen demand removals as the SRT of the sludge increased. BioWin was capable of accurately describing the digestion of the WAS samples for typical digester SRTs. For extended digestion times (i.e., greater than 30 days), some degradation of the endogenous decay products was assumed to achieve accurate simulations for all sludge SRTs.
Emissions & Measurements - Black Carbon | Science ...
Emissions and Measurement (EM) research activities performed within the National Risk Management Research Lab NRMRL) of EPA's Office of Research and Development (ORD) support measurement and laboratory analysis approaches to accurately characterize source emissions, and near source concentrations of air pollutants. They also support integrated Agency research programs (e.g., source to health outcomes) and the development of databases and inventories that assist Federal, state, and local air quality managers and industry implement and comply with air pollution standards. EM research underway in NRMRL supports the Agency's efforts to accurately characterize, analyze, measure and manage sources of air pollution. This pamphlet focuses on the EM research that NRMRL researchers conduct related to black carbon (BC). Black Carbon is a pollutant of concern to EPA due to its potential impact on human health and climate change. There are extensive uncertainties in emissions of BC from stationary and mobile sources. Emissions and Measurement (EM) research activities performed within the National Risk Management Research Lab NRMRL) of EPA's Office of Research and Development (ORD)
NASA Astrophysics Data System (ADS)
Strick, Terence R.; Charvin, Gilles; Dekker, Nynke H.; Allemand, Jean-François; Bensimon, David; Croquette, Vincent
In this article, we describe single-molecule assays using magnetic traps and we applied these assays to topoisomerase enzymes which unwind and disentangle DNA molecules. First, the elasticity of single DNA molecule is characterized using the magnetic trap. We show that a twisting constraint may be easily applied and that its effect upon DNA may be measured accurately. Then we describe how the topoisomerase activity may be observed at the single-molecule level giving direct access to the important biological parameters of the enzyme such as velocity and processivity. Furthermore, individual cycles of unwinding can be observed in real time. This permits an accurate characterization of the enzyme's biochemical cycle. The data treatment required to identify and analyze individual topoisomerization cycles will be presented in detail. This analysis is applicable to a wide variety of molecular motors. To cite this article: T.R. Strick et al., C. R. Physique 3 (2002) 595-618.
Analysis of hydraulic fracturing additives by LC/Q-TOF-MS.
Ferrer, Imma; Thurman, E Michael
2015-08-01
The chemical additives used in fracturing fluids can be used as tracers of water contamination caused by hydraulic fracturing operations. For this purpose, a complete chemical characterization is necessary using advanced analytical techniques. Liquid chromatography coupled with quadrupole time-of-flight mass spectrometry (LC/Q-TOF-MS) was used to identify chemical additives present in flowback and produced waters. Accurate mass measurements of main ions and fragments were used to characterize the major components of fracking fluids. Sodium adducts turned out to be the main molecular adduct ions detected for some additives due to oxygen-rich structures. Among the classes of chemical components analyzed by mass spectrometry include gels (guar gum), biocides (glutaraldehyde and alkyl dimethyl benzyl ammonium chloride), and surfactants (cocamidopropyl dimethylamines, cocamidopropyl hydroxysultaines, and cocamidopropyl derivatives). The capabilities of accurate mass and MS-MS fragmentation are explored for the unequivocal identification of these compounds. A special emphasis is given to the mass spectrometry elucidation approaches used to identify a major class of hydraulic fracturing compounds, surfactants.
Kim, Oh Seok; Newell, Joshua P
2015-10-01
This paper proposes a new land-change model, the Geographic Emission Benchmark (GEB), as an approach to quantify land-cover changes associated with deforestation and forest degradation. The GEB is designed to determine 'baseline' activity data for reference levels. Unlike other models that forecast business-as-usual future deforestation, the GEB internally (1) characterizes 'forest' and 'deforestation' with minimal processing and ground-truthing and (2) identifies 'deforestation hotspots' using open-source spatial methods to estimate regional rates of deforestation. The GEB also characterizes forest degradation and identifies leakage belts. This paper compares the accuracy of GEB with GEOMOD, a popular land-change model used in the UN-REDD (Reducing Emissions from Deforestation and Forest Degradation) Program. Using a case study of the Chinese tropics for comparison, GEB's projection is more accurate than GEOMOD's, as measured by Figure of Merit. Thus, the GEB produces baseline activity data that are moderately accurate for the setting of reference levels.
Lien, Chi-Hsiang; Tilbury, Karissa; Chen, Shean-Jen; Campagnola, Paul J
2013-01-01
Second Harmonic Generation (SHG) microscopy coupled with polarization analysis has great potential for use in tissue characterization, as molecular and supramolecular structural details can be extracted. Such measurements are difficult to perform quickly and accurately. Here we present a new method that uses a liquid crystal modulator (LCM) located in the infinity space of a SHG laser scanning microscope that allows the generation of any desired linear or circular polarization state. As the device contains no moving parts, polarization can be rotated accurately and faster than by manual or motorized control. The performance in terms of polarization purity was validated using Stokes vector polarimetry, and found to have minimal residual polarization ellipticity. SHG polarization imaging characteristics were validated against well-characterized specimens having cylindrical and/or linear symmetries. The LCM has a small footprint and can be implemented easily in any standard microscope and is cost effective relative to other technologies.
Accelerated viscoelastic characterization of T300-5208 graphite-epoxy laminates
NASA Technical Reports Server (NTRS)
Tuttle, M. E.; Brinson, H. F.
1985-01-01
A viscoelastic response scheme for the accelerated characterization of polymer-based composite laminates in applied to T300/5208 graphite/epoxy. The response of uni-directional specimens is modeled. The transient component of the viscoelastic creep compliance is assumed to follow a power law approximation. A recursive relationship is developed, based upon the Schapery single-integral equation, which allows approximation of a continuous time-varying uniaxial load using discrete steps in stress. The viscoelastic response of T300/5208 to transverse normal and shear stresses is determined unsing 90 deg and 10 deg off-axis tensile specimens. In each case the seven viscoelastic material parameters required in the analysis are determined experimentally using short-term creep and creep recovery tests. It is shown that an accurate measure of the power law exponent is crucial for accurate long-term prediction. A short term test cycle selection procedure is proposed, which should provide useful guidelines for the evaluation of other viscoelastic materials.
Development of a detector model for generation of synthetic radiographs of cargo containers
NASA Astrophysics Data System (ADS)
White, Timothy A.; Bredt, Ofelia P.; Schweppe, John E.; Runkle, Robert C.
2008-05-01
Creation of synthetic cargo-container radiographs that possess attributes of their empirical counterparts requires accurate models of the imaging-system response. Synthetic radiographs serve as surrogate data in studies aimed at determining system effectiveness for detecting target objects when it is impractical to collect a large set of empirical radiographs. In the case where a detailed understanding of the detector system is available, an accurate detector model can be derived from first-principles. In the absence of this detail, it is necessary to derive empirical models of the imaging-system response from radiographs of well-characterized objects. Such a case is the topic of this work, where we demonstrate the development of an empirical model of a gamma-ray radiography system with the intent of creating a detector-response model that translates uncollided photon transport calculations into realistic synthetic radiographs. The detector-response model is calibrated to field measurements of well-characterized objects thus incorporating properties such as system sensitivity, spatial resolution, contrast and noise.
Nishiyama, Junpei; Hashimoto, Tsutomu; Sakashita, Yusuke; Fujiyoshi, Hironobu; Hirata, Yutaka
2008-01-01
Eye movements are utilized in many scientific studies as a probe that reflects the neural representation of 3 dimensional extrapersonal space. This study proposes a method to accurately measure the roll component of eye movements under the conditions in which the pupil diameter changes. Generally, the iris pattern matching between a reference and a test iris image is performed to estimate roll angle of the test image. However, iris patterns are subject to change when the pupil size changes, thus resulting in less accurate roll angle estimation if the pupil sizes in the test and reference images are different. We characterized non-uniform iris pattern contraction/expansion caused by pupil dilation/constriction, and developed an algorithm to convert an iris pattern with an arbitrary pupil size into that with the same pupil size as the reference iris pattern. It was demonstrated that the proposed method improved the accuracy of the measurement of roll eye movement by up to 76.9%.
Lien, Chi-Hsiang; Tilbury, Karissa; Chen, Shean-Jen; Campagnola, Paul J.
2013-01-01
Second Harmonic Generation (SHG) microscopy coupled with polarization analysis has great potential for use in tissue characterization, as molecular and supramolecular structural details can be extracted. Such measurements are difficult to perform quickly and accurately. Here we present a new method that uses a liquid crystal modulator (LCM) located in the infinity space of a SHG laser scanning microscope that allows the generation of any desired linear or circular polarization state. As the device contains no moving parts, polarization can be rotated accurately and faster than by manual or motorized control. The performance in terms of polarization purity was validated using Stokes vector polarimetry, and found to have minimal residual polarization ellipticity. SHG polarization imaging characteristics were validated against well-characterized specimens having cylindrical and/or linear symmetries. The LCM has a small footprint and can be implemented easily in any standard microscope and is cost effective relative to other technologies. PMID:24156059
NASA Astrophysics Data System (ADS)
Haywood, Raphaëlle D.; Vanderburg, Andrew; Mortier, Annelies; Giles, Helen A. C.; López-Morales, Mercedes; Lopez, Eric D.; Malavolta, Luca; Charbonneau, David; Collier Cameron, Andrew; Coughlin, Jeffrey L.; Dressing, Courtney D.; Nava, Chantanelle; Latham, David W.; Dumusque, Xavier; Lovis, Christophe; Molinari, Emilio; Pepe, Francesco; Sozzetti, Alessandro; Udry, Stéphane; Bouchy, François; Johnson, John A.; Mayor, Michel; Micela, Giusi; Phillips, David; Piotto, Giampaolo; Rice, Ken; Sasselov, Dimitar; Ségransan, Damien; Watson, Chris; Affer, Laura; Bonomo, Aldo S.; Buchhave, Lars A.; Ciardi, David R.; Fiorenzano, Aldo F.; Harutyunyan, Avet
2018-05-01
We present the confirmation of a small, moderately irradiated (F = 155 ± 7 F ⊕) Neptune with a substantial gas envelope in a P = 11.8728787 ± 0.0000085 day orbit about a quiet, Sun-like G0V star Kepler-1655. Based on our analysis of the Kepler light curve, we determined Kepler-1655b’s radius to be 2.213 ± 0.082 R ⊕. We acquired 95 high-resolution spectra with Telescopio Nazionale Galileo/HARPS-N, enabling us to characterize the host star and determine an accurate mass for Kepler-1655b of 5.0{+/- }2.83.1 {M}\\oplus via Gaussian-process regression. Our mass determination excludes an Earth-like composition with 98% confidence. Kepler-1655b falls on the upper edge of the evaporation valley, in the relatively sparsely occupied transition region between rocky and gas-rich planets. It is therefore part of a population of planets that we should actively seek to characterize further.
Staudacher, Erich M.; Huetteroth, Wolf; Schachtner, Joachim; Daly, Kevin C.
2009-01-01
A central problem facing studies of neural encoding in sensory systems is how to accurately quantify the extent of spatial and temporal responses. In this study, we take advantage of the relatively simple and stereotypic neural architecture found in invertebrates. We combine standard electrophysiological techniques, recently developed population analysis techniques, and novel anatomical methods to form an innovative 4-dimensional view of odor output representations in the antennal lobe of the moth Manduca sexta. This novel approach allows quantification of olfactory responses of characterized neurons with spike time resolution. Additionally, arbitrary integration windows can be used for comparisons with other methods such as imaging. By assigning statistical significance to changes in neuronal firing, this method can visualize activity across the entire antennal lobe. The resulting 4-dimensional representation of antennal lobe output complements imaging and multi-unit experiments yet provides a more comprehensive and accurate view of glomerular activation patterns in spike time resolution. PMID:19464513
An Overview on Measurement-While-Drilling Technique and its Scope in Excavation Industry
NASA Astrophysics Data System (ADS)
Rai, P.; Schunesson, H.; Lindqvist, P.-A.; Kumar, U.
2015-04-01
Measurement-while-drilling (MWD) aims at collecting accurate, speedy and high resolution information from the production blast hole drills with a target of characterization of highly variable rock masses encountered in sub-surface excavations. The essence of the technique rests on combining the physical drill variables in a manner to yield a fairly accurate description of the sub-surface rock mass much ahead of following downstream operations. In this light, the current paper presents an overview of the MWD by explaining the technique and its set-up, the existing drill-rock mass relationships and numerous on-going researches highlighting the real-time applications. Although the paper acknowledges the importance of concepts of specific energy, rock quality index and a couple of other indices and techniques for rock mass characterization, it must be distinctly borne in mind that the technique of MWD is highly site-specific, which entails derivation of site-specific calibration with utmost care.
Translocations, inversions and other chromosome rearrangements.
Morin, Scott J; Eccles, Jennifer; Iturriaga, Amanda; Zimmerman, Rebekah S
2017-01-01
Chromosomal rearrangements have long been known to significantly impact fertility and miscarriage risk. Advancements in molecular diagnostics are challenging contemporary clinicians and patients in accurately characterizing the reproductive risk of a given abnormality. Initial attempts at preimplantation genetic diagnosis were limited by the inability to simultaneously evaluate aneuploidy and missed up to 70% of aneuploidy in chromosomes unrelated to the rearrangement. Contemporary platforms are more accurate and less susceptible to technical errors. These techniques also offer the ability to improve outcomes through diagnosis of uniparental disomy and may soon be able to consistently distinguish between normal and balanced translocation karyotypes. Although an accurate projection of the anticipated number of unbalanced embryos is not possible at present, confirmation of normal/balanced status results in high pregnancy rates (PRs) and diagnostic accuracy. Copyright © 2016 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Brusseau, Mark L.; Guo, Zhilin
2018-01-01
It is evident based on historical data that groundwater contaminant plumes persist at many sites, requiring costly long-term management. High-resolution site-characterization methods are needed to support accurate risk assessments and to select, design, and operate effective remediation operations. Most subsurface characterization methods are generally limited in their ability to provide unambiguous, real-time delineation of specific processes affecting mass-transfer, transformation, and mass removal, and accurate estimation of associated rates. An integrated contaminant elution and tracer test toolkit, comprising a set of local-scale groundwater extraction-and injection tests, was developed to ameliorate the primary limitations associated with standard characterization methods. The test employs extended groundwater extraction to stress the system and induce hydraulic and concentration gradients. Clean water can be injected, which removes the resident aqueous contaminant mass present in the higher-permeability zones and isolates the test zone from the surrounding plume. This ensures that the concentrations and fluxes measured within the isolated area are directly and predominantly influenced by the local mass-transfer and transformation processes controlling mass removal. A suite of standard and novel tracers can be used to delineate specific mass-transfer and attenuation processes that are active at a given site, and to quantify the associated mass-transfer and transformation rates. The conceptual basis for the test is first presented, followed by an illustrative application based on simulations produced with a 3-D mathematical model and a brief case study application.
Billi, Fabrizio; Benya, Paul; Kavanaugh, Aaron; Adams, John; Ebramzadeh, Edward; McKellop, Harry
2012-02-01
Numerous studies indicate highly crosslinked polyethylenes reduce the wear debris volume generated by hip arthroplasty acetabular liners. This, in turns, requires new methods to isolate and characterize them. We describe a method for extracting polyethylene wear particles from bovine serum typically used in wear tests and for characterizing their size, distribution, and morphology. Serum proteins were completely digested using an optimized enzymatic digestion method that prevented the loss of the smallest particles and minimized their clumping. Density-gradient ultracentrifugation was designed to remove contaminants and recover the particles without filtration, depositing them directly onto a silicon wafer. This provided uniform distribution of the particles and high contrast against the background, facilitating accurate, automated, morphometric image analysis. The accuracy and precision of the new protocol were assessed by recovering and characterizing particles from wear tests of three types of polyethylene acetabular cups (no crosslinking and 5 Mrads and 7.5 Mrads of gamma irradiation crosslinking). The new method demonstrated important differences in the particle size distributions and morphologic parameters among the three types of polyethylene that could not be detected using prior isolation methods. The new protocol overcomes a number of limitations, such as loss of nanometer-sized particles and artifactual clumping, among others. The analysis of polyethylene wear particles produced in joint simulator wear tests of prosthetic joints is a key tool to identify the wear mechanisms that produce the particles and predict and evaluate their effects on periprosthetic tissues.
Hu, Long; Xu, Zhiyu; Hu, Boqin; Lu, Zhi John
2017-01-09
Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Characterization of mechanical properties of leather with airborne ultrasonics
USDA-ARS?s Scientific Manuscript database
A nondestructive method to accurately evaluate the quality of hides and leather is urgently needed by leather and hide industries. We previously reported the research results for airborne ultrasonic (AU) testing using non-contact transducers to evaluate the quality of hides and leather. The abilit...
Management characteristics of beef cattle production in the western United States
USDA-ARS?s Scientific Manuscript database
A comprehensive life cycle assessment (LCA) of beef in the United States is being conducted to provide benchmarks and identify opportunities for improvement of the beef value chain. Region-specific data are being collected to accurately characterize cattle production practices. This study reports pr...
Characterizing body temperature and activity changes at the onset of estrus in replacement gilts
USDA-ARS?s Scientific Manuscript database
Accurate estrus detection can improve sow conception rates and increase swine production efficiency. Unfortunately, current estrus detection practices based on individual animal behavior may be inefficient due to large sow populations at commercial farms and the associated labor required. Therefore,...
Error characterization of microwave satellite soil moisture data sets using fourier analysis
USDA-ARS?s Scientific Manuscript database
Soil moisture is a key geophysical variable in hydrological and meteorological processes. Accurate and current observations of soil moisture over meso to global scales used as inputs to hydrological, weather and climate modelling will benefit the predictability and understanding of these processes. ...
Error characterization of microwave satellite soil moisture data sets using fourier analysis
USDA-ARS?s Scientific Manuscript database
Abstract: Soil moisture is a key geophysical variable in hydrological and meteorological processes. Accurate and current observations of soil moisture over mesoscale to global scales as inputs to hydrological, weather and climate modelling will benefit the predictability and understanding of these p...
Management characteristics of beef cattle production in Hawaii
USDA-ARS?s Scientific Manuscript database
A comprehensive life cycle assessment of the United States’ beef value chain requires the collection of region-specific data for accurate characterization of the country’s diverse production practices. Cattle production in Hawaii is very different from the rest of the country due to its unique ecosy...
Semi-volatile compounds present special analytical challenges not met by conventional methods for analysis of ambient particulate matter (PM). Accurate quantification of PM-associated organic compounds requires validation of the laboratory procedures for recovery over a wide v...
Seismic and Geophysical Characterization of Northern Asia
2011-09-01
coast of the Arctic Ocean. Very little independent data exist on the crustal structure or composition in this area. The 10 mHz data, sampling at...greater depth, quite accurately maps the tectonically active and younger regions as lower velocity zones, while regions associated with old cratons show
Canney, Michael S.; Bailey, Michael R.; Crum, Lawrence A.; Khokhlova, Vera A.; Sapozhnikov, Oleg A.
2008-01-01
Acoustic characterization of high intensity focused ultrasound (HIFU) fields is important both for the accurate prediction of ultrasound induced bioeffects in tissues and for the development of regulatory standards for clinical HIFU devices. In this paper, a method to determine HIFU field parameters at and around the focus is proposed. Nonlinear pressure waveforms were measured and modeled in water and in a tissue-mimicking gel phantom for a 2 MHz transducer with an aperture and focal length of 4.4 cm. Measurements were performed with a fiber optic probe hydrophone at intensity levels up to 24 000 W∕cm2. The inputs to a Khokhlov–Zabolotskaya–Kuznetsov-type numerical model were determined based on experimental low amplitude beam plots. Strongly asymmetric waveforms with peak positive pressures up to 80 MPa and peak negative pressures up to 15 MPa were obtained both numerically and experimentally. Numerical simulations and experimental measurements agreed well; however, when steep shocks were present in the waveform at focal intensity levels higher than 6000 W∕cm2, lower values of the peak positive pressure were observed in the measured waveforms. This underrepresentation was attributed mainly to the limited hydrophone bandwidth of 100 MHz. It is shown that a combination of measurements and modeling is necessary to enable accurate characterization of HIFU fields. PMID:19062878
Diversity of predominant lactic acid bacteria associated with cocoa fermentation in Nigeria.
Kostinek, Melanie; Ban-Koffi, Louis; Ottah-Atikpo, Margaret; Teniola, David; Schillinger, Ulrich; Holzapfel, Wilhelm H; Franz, Charles M A P
2008-04-01
The fermentation of cocoa relies on a complex succession of bacteria and filamentous fungi, all of which can have an impact on cocoa flavor. So far, few investigations have focused on the diversity of lactic acid bacteria involved in cocoa fermentation, and many earlier investigations did not rely on polyphasic taxonomical approaches, which take both phenotypic and genotypic characterization techniques into account. In our study, we characterized predominant lactic acid bacteria from cocoa fermentations in Nigeria, using a combination of phenotypic tests, repetitive extragenic palindromic PCR, and sequencing of the 16S rRNA gene of representative strains for accurate species identification. Thus, of a total of 193 lactic acid bacteria (LAB) strains isolated from common media used to cultivate LAB, 40 (20.7%) were heterofermentative and consisted of either L. brevis or L. fermentum strains. The majority of the isolates were homofermentative rods (110 strains; 57% of isolates) which were characterized as L. plantarum strains. The homofermentative cocci consisted predominantly of 35 (18.1% of isolates) Pediococcus acidilactici strains. Thus, the LAB populations derived from these media in this study were accurately described. This can contribute to the further assessment of the effect of common LAB strains on the flavor characteristics of fermenting cocoa in further studies.
Comparing thin slices of verbal communication behavior of varying number and duration.
Carcone, April Idalski; Naar, Sylvie; Eggly, Susan; Foster, Tanina; Albrecht, Terrance L; Brogan, Kathryn E
2015-02-01
The aim of this study was to assess the accuracy of thin slices to characterize the verbal communication behavior of counselors and patients engaged in Motivational Interviewing sessions relative to fully coded sessions. Four thin slice samples that varied in number (four versus six slices) and duration (one- versus two-minutes) were extracted from a previously coded dataset. In the parent study, an observational code scheme was used to characterize specific counselor and patient verbal communication behaviors. For the current study, we compared the frequency of communication codes and the correlations among the full dataset and each thin slice sample. Both the proportion of communication codes and strength of the correlation demonstrated the highest degree of accuracy when a greater number (i.e., six versus four) and duration (i.e., two- versus one-minute) of slices were extracted. These results suggest that thin slice sampling may be a useful and accurate strategy to reduce coding burden when coding specific verbal communication behaviors within clinical encounters. We suggest researchers interested in using thin slice sampling in their own work conduct preliminary research to determine the number and duration of thin slices required to accurately characterize the behaviors of interest. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Chandok, Harshpreet; Shah, Pratik; Akare, Uday Raj; Hindala, Maliram; Bhadoriya, Sneha Singh; Ravi, G V; Sharma, Varsha; Bandaru, Srinivas; Rathore, Pragya; Nayarisseri, Anuraj
2015-09-01
16S rDNA sequencing which has gained wide popularity amongst microbiologists for the molecular characterization and identification of newly discovered isolates provides accurate identification of isolates down to the level of sub-species (strain). Its most important advantage over the traditional biochemical characterization methods is that it can provide an accurate identification of strains with atypical phenotypic characters as well. The following work is an application of 16S rRNA gene sequencing approach to identify a novel species of Probiotic Lactobacillus acidophilus. The sample was collected from pond water samples of rural and urban areas of Krishna district, Vijayawada, Andhra Pradesh, India. Subsequently, the sample was serially diluted and the aliquots were incubated for a suitable time period following which the suspected colony was subjected to 16S rDNA sequencing. The sequence aligned against other species was concluded to be a novel, Probiotic L. acidophilus bacteria, further which were named L. acidophilus strain EMBS081 & EMBS082. After the sequence characterization, the isolate was deposited in GenBank Database, maintained by the National Centre for Biotechnology Information NCBI. The sequence can also be retrieve from EMBL and DDBJ repositories with accession numbers JX255677 and KC150145.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Timothy C.; Oostrom, Martinus; Truex, Michael J.
2013-05-21
Water saturation is an important indicator of contaminant distribution and plays a governing role in contaminant transport within the vadose zone. Understanding the water saturation distribution is critical for both remediation and contaminant flux monitoring in unsaturated environments. In this work we propose and demonstrate a method of remotely determining water saturation levels using gas phase partitioning tracers and time-lapse bulk electrical conductivity measurements. The theoretical development includes the partitioning chemistry for the tracers we demonstrate (ammonia and carbon dioxide), as well as a review of the petrophysical relationship governing how these tracers influence bulk conductivity. We also investigate methodsmore » of utilizing secondary information provided by electrical conductivity breakthrough magnitudes induced by the tracers. We test the method on clean, well characterized, intermediate-scale sand columns under controlled conditions. Results demonstrate the capability to predict partitioning coefficients and accurately monitor gas breakthrough curves along the length of the column according to the corresponding electrical conductivity response, leading to accurate water saturation estimates. This work is motivated by the need to develop effective characterization and monitoring techniques for contaminated deep vadose zone environments, and provides a proof-of-concept toward uniquely characterizing and monitoring water saturation levels at the field scale and in three-dimensions using electrical resistivity tomography.« less
Using a Novel Optical Sensor to Characterize Methane Ebullition Processes
NASA Astrophysics Data System (ADS)
Delwiche, K.; Hemond, H.; Senft-Grupp, S.
2015-12-01
We have built a novel bubble size sensor that is rugged, economical to build, and capable of accurately measuring methane bubble sizes in aquatic environments over long deployment periods. Accurate knowledge of methane bubble size is important to calculating atmospheric methane emissions from in-land waters. By routing bubbles past pairs of optical detectors, the sensor accurately measures bubbles sizes for bubbles between 0.01 mL and 1 mL, with slightly reduced accuracy for bubbles from 1 mL to 1.5 mL. The sensor can handle flow rates up to approximately 3 bubbles per second. Optional sensor attachments include a gas collection chamber for methane sampling and volume verification, and a detachable extension funnel to customize the quantity of intercepted bubbles. Additional features include a data-cable running from the deployed sensor to a custom surface buoy, allowing us to download data without disturbing on-going bubble measurements. We have successfully deployed numerous sensors in Upper Mystic Lake at depths down to 18 m, 1 m above the sediment. The resulting data gives us bubble size distributions and the precise timing of bubbling events over a period of several months. In addition to allowing us to characterize typical bubble size distributions, this data allows us to draw important conclusions about temporal variations in bubble sizes, as well as bubble dissolution rates within the water column.
Theoretical modeling of laser-induced plasmas using the ATOMIC code
NASA Astrophysics Data System (ADS)
Colgan, James; Johns, Heather; Kilcrease, David; Judge, Elizabeth; Barefield, James, II; Clegg, Samuel; Hartig, Kyle
2014-10-01
We report on efforts to model the emission spectra generated from laser-induced breakdown spectroscopy (LIBS). LIBS is a popular and powerful method of quickly and accurately characterizing unknown samples in a remote manner. In particular, LIBS is utilized by the ChemCam instrument on the Mars Science Laboratory. We model the LIBS plasma using the Los Alamos suite of atomic physics codes. Since LIBS plasmas generally have temperatures of somewhere between 3000 K and 12000 K, the emission spectra typically result from the neutral and singly ionized stages of the target atoms. We use the Los Alamos atomic structure and collision codes to generate sets of atomic data and use the plasma kinetics code ATOMIC to perform LTE or non-LTE calculations that generate level populations and an emission spectrum for the element of interest. In this presentation we compare the emission spectrum from ATOMIC with an Fe LIBS laboratory-generated plasma as well as spectra from the ChemCam instrument. We also discuss various physics aspects of the modeling of LIBS plasmas that are necessary for accurate characterization of the plasma, such as multi-element target composition effects, radiation transport effects, and accurate line shape treatments. The Los Alamos National Laboratory is operated by Los Alamos National Security, LLC for the National Nuclear Security Administration of the U.S. Department of Energy under Contract No. DE-AC5206NA25396.
Pomorski, Adam; Kochańczyk, Tomasz; Miłoch, Anna; Krężel, Artur
2013-12-03
Ratiometric chemical probes and genetically encoded sensors are of high interest for both analytical chemists and molecular biologists. Their high sensitivity toward the target ligand and ability to obtain quantitative results without a known sensor concentration have made them a very useful tool in both in vitro and in vivo assays. Although ratiometric sensors are widely used in many applications, their successful and accurate usage depends on how they are characterized in terms of sensing target molecules. The most important feature of probes and sensors besides their optical parameters is an affinity constant toward analyzed molecules. The literature shows that different analytical approaches are used to determine the stability constants, with the ratio approach being most popular. However, oversimplification and lack of attention to detail results in inaccurate determination of stability constants, which in turn affects the results obtained using these sensors. Here, we present a new method where ratio signal is calibrated for borderline values of intensities of both wavelengths, instead of borderline ratio values that generate errors in many studies. At the same time, the equation takes into account the cooperativity factor or fluorescence artifacts and therefore can be used to characterize systems with various stoichiometries and experimental conditions. Accurate determination of stability constants is demonstrated utilizing four known optical ratiometric probes and sensors, together with a discussion regarding other, currently used methods.
HIFU Transducer Characterization Using a Robust Needle Hydrophone
NASA Astrophysics Data System (ADS)
Howard, Samuel M.; Zanelli, Claudio I.
2007-05-01
A robust needle hydrophone has been developed for HIFU transducer characterization and reported on earlier. After a brief review of the hydrophone design and performance, we demonstrate its use to characterize a 1.5 MHz, 10 cm diameter, F-number 1.5 spherically focused source driven to exceed an intensity of 1400 W/cm2at its focus. Quantitative characterization of this source at high powers is assisted by deconvolving the hydrophone's calibrated frequency response in order to accurately reflect the contribution of harmonics generated by nonlinear propagation in the water testing environment. Results are compared to measurements with a membrane hydrophone at 0.3% duty cycle and to theoretical calculations, using measurements of the field at the source's radiating surface as input to a numerical solution of the KZK equation.
Spectral characterization and calibration of AOTF spectrometers and hyper-spectral imaging system
NASA Astrophysics Data System (ADS)
Katrašnik, Jaka; Pernuš, Franjo; Likar, Boštjan
2010-02-01
The goal of this article is to present a novel method for spectral characterization and calibration of spectrometers and hyper-spectral imaging systems based on non-collinear acousto-optical tunable filters. The method characterizes the spectral tuning curve (frequency-wavelength characteristic) of the AOTF (Acousto-Optic Tunable Filter) filter by matching the acquired and modeled spectra of the HgAr calibration lamp, which emits line spectrum that can be well modeled via AOTF transfer function. In this way, not only tuning curve characterization and corresponding spectral calibration but also spectral resolution assessment is performed. The obtained results indicated that the proposed method is efficient, accurate and feasible for routine calibration of AOTF spectrometers and hyper-spectral imaging systems and thereby a highly competitive alternative to the existing calibration methods.
Gong, Rui; Xu, Haisong; Tong, Qingfen
2012-10-20
The colorimetric characterization of active matrix organic light emitting diode (AMOLED) panels suffers from their poor channel independence. Based on the colorimetric characteristics evaluation of channel independence and chromaticity constancy, an accurate colorimetric characterization method, namely, the polynomial compensation model (PC model) considering channel interactions was proposed for AMOLED panels. In this model, polynomial expressions are employed to calculate the relationship between the prediction errors of XYZ tristimulus values and the digital inputs to compensate the XYZ prediction errors of the conventional piecewise linear interpolation assuming the variable chromaticity coordinates (PLVC) model. The experimental results indicated that the proposed PC model outperformed other typical characterization models for the two tested AMOLED smart-phone displays and for the professional liquid crystal display monitor as well.
Improvements to III-nitride light-emitting diodes through characterization and material growth
NASA Astrophysics Data System (ADS)
Getty, Amorette Rose Klug
A variety of experiments were conducted to improve or aid the improvement of the efficiency of III-nitride light-emitting diodes (LEDs), which are a critical area of research for multiple applications, including high-efficiency solid state lighting. To enhance the light extraction in ultraviolet LEDs grown on SiC substrates, a distributed Bragg reflector (DBR) optimized for operation in the range from 250 to 280 nm has been developed using MBE growth techniques. The best devices had a peak reflectivity of 80% with 19.5 periods, which is acceptable for the intended application. DBR surfaces were sufficiently smooth for subsequent epitaxy of the LED device. During the course of this work, pros and cons of AlGaN growth techniques, including analog versus digital alloying, were examined. This work highlighted a need for more accurate values of the refractive index of high-Al-content AlxGa1-xNin the UV wavelength range. We present refractive index results for a wide variety of materials pertinent to the fabrication of optical III-nitride devices. Characterization was done using Variable-Angle Spectroscopic Ellipsometry. The three binary nitrides, and all three ternaries, have been characterized to a greater or lesser extent depending on material compositions available. Semi-transparent p-contact materials and other thin metals for reflecting contacts have been examined to allow optimization of deposition conditions and to allow highly accurate modeling of the behavior of light within these devices. Standard substrate materials have also been characterized for completeness and as an indicator of the accuracy of our modeling technique. We have demonstrated a new technique for estimating the internal quantum efficiency (IQE) of nitride light-emitting diodes. This method is advantageous over the standard low-temperature photoluminescence-based method of estimating IQE, as the new method is conducted under the same conditions as normal device operation. We have developed processing techniques and have characterized patternable absorbing materials which eliminate scattered light within the device, allowing an accurate simulation of the device extraction efficiency. This efficiency, with measurements of the input current and optical output power, allow a straightforward calculation of the IQE. Two sets of devices were measured, one of material grown in-house, with a rough p-GaN surface, and one of commercial LED material, with smooth interfaces and very high internal quantum efficiency.
XPS Protocol for the Characterization of Pristine and Functionalized Single Wall Carbon Nanotubes
NASA Technical Reports Server (NTRS)
Sosa, E. D.; Allada, R.; Huffman, C. B.; Arepalli, S.
2009-01-01
Recent interest in developing new applications for carbon nanotubes (CNT) has fueled the need to use accurate macroscopic and nanoscopic techniques to characterize and understand their chemistry. X-ray photoelectron spectroscopy (XPS) has proved to be a useful analytical tool for nanoscale surface characterization of materials including carbon nanotubes. Recent nanotechnology research at NASA Johnson Space Center (NASA-JSC) helped to establish a characterization protocol for quality assessment for single wall carbon nanotubes (SWCNTs). Here, a review of some of the major factors of the XPS technique that can influence the quality of analytical data, suggestions for methods to maximize the quality of data obtained by XPS, and the development of a protocol for XPS characterization as a complementary technique for analyzing the purity and surface characteristics of SWCNTs is presented. The XPS protocol is then applied to a number of experiments including impurity analysis and the study of chemical modifications for SWCNTs.
In-vivo analysis of ankle joint movement for patient-specific kinematic characterization.
Ferraresi, Carlo; De Benedictis, Carlo; Franco, Walter; Maffiodo, Daniela; Leardini, Alberto
2017-09-01
In this article, a method for the experimental in-vivo characterization of the ankle kinematics is proposed. The method is meant to improve personalization of various ankle joint treatments, such as surgical decision-making or design and application of an orthosis, possibly to increase their effectiveness. This characterization in fact would make the treatments more compatible with the specific patient's joint physiological conditions. This article describes the experimental procedure and the analytical method adopted, based on the instantaneous and mean helical axis theories. The results obtained in this experimental analysis reveal that more accurate techniques are necessary for a robust in-vivo assessment of the tibio-talar axis of rotation.
Coussot, Cecile; Kalyanam, Sureshkumar; Yapp, Rebecca; Insana, Michael F.
2009-01-01
The viscoelastic response of hydropolymers, which include glandular breast tissues, may be accurately characterized for some applications with as few as 3 rheological parameters by applying the Kelvin-Voigt fractional derivative (KVFD) modeling approach. We describe a technique for ultrasonic imaging of KVFD parameters in media undergoing unconfined, quasi-static, uniaxial compression. We analyze the KVFD parameter values in simulated and experimental echo data acquired from phantoms and show that the KVFD parameters may concisely characterize the viscoelastic properties of hydropolymers. We then interpret the KVFD parameter values for normal and cancerous breast tissues and hypothesize that this modeling approach may ultimately be applied to tumor differentiation. PMID:19406700
Satellite remote sensing of isolated wetlands using object-oriented classification of LANDSAT-7 data
There has been an increasing interest in characterizing and mapping isolated depressional wetlands due to a 2001 U.S. Supreme Court decision that effectively removed their protected status. Our objective was to determine the utility of satellite remote sensing to accurately map ...
Characterizing fuels in the 21st century.
David Sandberg; Roger D. Ottmar; Geoffrey H. Cushon
2001-01-01
The ongoing development of sophisticated fire behavior and effects models has demonstrated the need for a comprehensive system of fuel classification that more accurately captures the structural complexity and geographic diversity of fuelbeds. The Fire and Environmental Research Applications Team (FERA) of the USD Forest Service, Pacific Northwest Research Station, is...
Microwave Soil Moisture Retrieval Under Trees Using a Modified Tau-Omega Model
USDA-ARS?s Scientific Manuscript database
IPAD is to provide timely and accurate estimates of global crop conditions for use in up-to-date commodity intelligence reports. A crucial requirement of these global crop yield forecasts is the regional characterization of surface and sub-surface soil moisture. However, due to the spatial heterogen...
The Impacts of Climate Variations on Military Operations in the Horn of Africa
2006-03-01
variability in a region. Climate forecasts are predictions of the future state of the climate , much as we think of weather forecasts but at longer...arrive at accurate characterizations of the future state of the climate . Many of the civilian organizations that generate reanalysis data also
Madenjian, Charles P.; Rutherford, Edward S.; Stow, Craig A.; Roseman, Edward F.; He, Ji X.
2013-01-01
scientists who are closely monitoring Lake Huron’s food web, we believe that the ongoing changes are more accurately characterized as a trophic shift in which benthic pathways have become more prominent. While decreases in abundance have occurred for some species, others are experiencing improved reproduction resulting in the restoration of several important native species.
An update on acquired nystagmus.
Rucker, Janet C
2008-01-01
Proper evaluation and treatment of acquired nystagmus requires accurate characterization of nystagmus type and visual effects. This review addresses important historical and examination features of nystagmus and current concepts of pathogenesis and treatment of gaze-evoked nystagmus, nystagmus due to vision loss, acquired pendular nystagmus, peripheral and central vestibular nystagmus, and periodic alternating nystagmus.
A HYBRID THERMAL VIDEO AND FTTR SPECTROMETER FOR RAPIDLY LOCATING AND CHARACTERIZING GAS LEAKS
Undiscovered gas leaks, known as fugitive emissions, in chemical plants and refinery operations can impact regional air quality as well as being a public health problem. Surveying a facility for potential gas leaks can be a daunting task. An efficient, accurate and cost-effecti...
Neoplastic stomach lesions and their mimickers: spectrum of imaging manifestations
Virmani, Vivek; Sethi, Vineeta; Fraser-Hill, Margret; Fasih, Najla; Kielar, Ania
2012-01-01
Abstract This review illustrates a wide spectrum of gastric neoplasms with emphasis on imaging findings helpful in characterizing various gastric neoplasms. Both the malignant and benign neoplasms along with focal gastric masses mimicking tumour are illustrated. Moreover, imaging clues to reach an accurate diagnosis are emphasized. PMID:22935192
Memory Load Affects Object Individuation in 18-Month-Old Infants
ERIC Educational Resources Information Center
Zosh, Jennifer M.; Feigenson, Lisa
2012-01-01
Accurate representation of a changing environment requires individuation--the ability to determine how many numerically distinct objects are present in a scene. Much research has characterized early individuation abilities by identifying which object features infants can use to individuate throughout development. However, despite the fact that…
Body Emotion Recognition Disproportionately Depends on Vertical Orientations during Childhood
ERIC Educational Resources Information Center
Balas, Benjamin; Auen, Amanda; Saville, Alyson; Schmidt, Jamie
2018-01-01
Children's ability to recognize emotional expressions from faces and bodies develops during childhood. However, the low-level features that support accurate body emotion recognition during development have not been well characterized. This is in marked contrast to facial emotion recognition, which is known to depend upon specific spatial frequency…
Workplace Learning of High Performance Sports Coaches
ERIC Educational Resources Information Center
Rynne, Steven B.; Mallett, Clifford J.; Tinning, Richard
2010-01-01
The Australian coaching workplace (to be referred to as the State Institute of Sport; SIS) under consideration in this study employs significant numbers of full-time performance sport coaches and can be accurately characterized as a genuine workplace. Through a consideration of the interaction between what the workplace (SIS) affords the…
How Can It Cost That Much? A Three-Year Study of Proposal Production Costs.
ERIC Educational Resources Information Center
Wiese, W. C.; Bowden, C. Mal
1997-01-01
Examines significant new business proposal efforts for United States Department of Defense contracts. Identifies six "pillars" of a contractor's proposal preparation costs. Derives a formula that characterizes proposal preparation costs. Demonstrates that a quick, accurate cost model can be developed for proposal publishing. (RS)
Parameterized reduced-order models using hyper-dual numbers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fike, Jeffrey A.; Brake, Matthew Robert
2013-10-01
The goal of most computational simulations is to accurately predict the behavior of a real, physical system. Accurate predictions often require very computationally expensive analyses and so reduced order models (ROMs) are commonly used. ROMs aim to reduce the computational cost of the simulations while still providing accurate results by including all of the salient physics of the real system in the ROM. However, real, physical systems often deviate from the idealized models used in simulations due to variations in manufacturing or other factors. One approach to this issue is to create a parameterized model in order to characterize themore » effect of perturbations from the nominal model on the behavior of the system. This report presents a methodology for developing parameterized ROMs, which is based on Craig-Bampton component mode synthesis and the use of hyper-dual numbers to calculate the derivatives necessary for the parameterization.« less
Sun-Relative Pointing for Dual-Axis Solar Trackers Employing Azimuth and Elevation Rotations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riley, Daniel; Hansen, Clifford W.
Dual axis trackers employing azimuth and elevation rotations are common in the field of photovoltaic (PV) energy generation. Accurate sun-tracking algorithms are widely available. However, a steering algorithm has not been available to accurately point the tracker away from the sun such that a vector projection of the sun beam onto the tracker face falls along a desired path relative to the tracker face. We have developed an algorithm which produces the appropriate azimuth and elevation angles for a dual axis tracker when given the sun position, desired angle of incidence, and the desired projection of the sun beam ontomore » the tracker face. Development of this algorithm was inspired by the need to accurately steer a tracker to desired sun-relative positions in order to better characterize the electro-optical properties of PV and CPV modules.« less
Lindsey, Rebecca L.; Pouseele, Hannes; Chen, Jessica C.; Strockbine, Nancy A.; Carleton, Heather A.
2016-01-01
Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different databases accurately predicted serotype, virulence, and resistance from WGS data, providing a fast and cheaper alternative to conventional typing techniques. PMID:27242777
Fringe Capacitance Correction for a Coaxial Soil Cell
Pelletier, Mathew G.; Viera, Joseph A.; Schwartz, Robert C.; Lascano, Robert J.; Evett, Steven R.; Green, Tim R.; Wanjura, John D.; Holt, Greg A.
2011-01-01
Accurate measurement of moisture content is a prime requirement in hydrological, geophysical and biogeochemical research as well as for material characterization and process control. Within these areas, accurate measurements of the surface area and bound water content is becoming increasingly important for providing answers to many fundamental questions ranging from characterization of cotton fiber maturity, to accurate characterization of soil water content in soil water conservation research to bio-plant water utilization to chemical reactions and diffusions of ionic species across membranes in cells as well as in the dense suspensions that occur in surface films. One promising technique to address the increasing demands for higher accuracy water content measurements is utilization of electrical permittivity characterization of materials. This technique has enjoyed a strong following in the soil-science and geological community through measurements of apparent permittivity via time-domain-reflectometry (TDR) as well in many process control applications. Recent research however, is indicating a need to increase the accuracy beyond that available from traditional TDR. The most logical pathway then becomes a transition from TDR based measurements to network analyzer measurements of absolute permittivity that will remove the adverse effects that high surface area soils and conductivity impart onto the measurements of apparent permittivity in traditional TDR applications. This research examines an observed experimental error for the coaxial probe, from which the modern TDR probe originated, which is hypothesized to be due to fringe capacitance. The research provides an experimental and theoretical basis for the cause of the error and provides a technique by which to correct the system to remove this source of error. To test this theory, a Poisson model of a coaxial cell was formulated to calculate the effective theoretical extra length caused by the fringe capacitance which is then used to correct the experimental results such that experimental measurements utilizing differing coaxial cell diameters and probe lengths, upon correction with the Poisson model derived correction factor, all produce the same results thereby lending support and for an augmented measurement technique for measurement of absolute permittivity. PMID:22346601
Lindsey, Rebecca L; Pouseele, Hannes; Chen, Jessica C; Strockbine, Nancy A; Carleton, Heather A
2016-01-01
Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different databases accurately predicted serotype, virulence, and resistance from WGS data, providing a fast and cheaper alternative to conventional typing techniques.
Performance Models for the Spike Banded Linear System Solver
Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...
2011-01-01
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
Mijailovic, Aleksandar S; Qing, Bo; Fortunato, Daniel; Van Vliet, Krystyn J
2018-04-15
Precise and accurate measurement of viscoelastic mechanical properties becomes increasingly challenging as sample stiffness decreases to elastic moduli <1 kPa, largely due to difficulties detecting initial contact with the compliant sample surface. This limitation is particularly relevant to characterization of biological soft tissues and compliant gels. Here, we employ impact indentation which, in contrast to shear rheology and conventional indentation, does not require contact detection a priori, and present a novel method to extract viscoelastic moduli and relaxation time constants directly from the impact response. We first validate our approach by using both impact indentation and shear rheology to characterize polydimethylsiloxane (PDMS) elastomers of stiffness ranging from 100 s of Pa to nearly 10 kPa. Assuming a linear viscoelastic constitutive model for the material, we find that the moduli and relaxation times obtained from fitting the impact response agree well with those obtained from fitting the rheological response. Next, we demonstrate our validated method on hydrated, biological soft tissues obtained from porcine brain, murine liver, and murine heart, and report the equilibrium shear moduli, instantaneous shear moduli, and relaxation time constants for each tissue. Together, our findings provide a new and straightforward approach capable of probing local mechanical properties of highly compliant viscoelastic materials with millimeter scale spatial resolution, mitigating complications involving contact detection or sample geometric constraints. Characterization and optimization of mechanical properties can be essential for the proper function of biomaterials in diverse applications. However, precise and accurate measurement of viscoelastic mechanical properties becomes increasingly difficult with increased compliance (particularly for elastic moduli <1 kPa), largely due to challenges detecting initial contact with the compliant sample surface and measuring response at short timescale or high frequency. By contrast, impact indentation has highly accurate contact detection and can be used to measure short timescale (glassy) response. Here, we demonstrate an experimental and analytical method that confers significant advantages over existing approaches to extract spatially resolved viscoelastic moduli and characteristic time constants of biological tissues (e.g., brain and heart) and engineered biomaterials. Copyright © 2018 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Methods and apparatus for non-acoustic speech characterization and recognition
Holzrichter, John F.
1999-01-01
By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
2013-01-01
Utilizing semiconductor nanowires for (opto)electronics requires exact knowledge of their current–voltage properties. We report accurate on-top imaging and I–V characterization of individual as-grown nanowires, using a subnanometer resolution scanning tunneling microscope with no need for additional microscopy tools, thus allowing versatile application. We form Ohmic contacts to InP and InAs nanowires without any sample processing, followed by quantitative measurements of diameter dependent I–V properties with a very small spread in measured values compared to standard techniques. PMID:24059470
Timm, Rainer; Persson, Olof; Engberg, David L J; Fian, Alexander; Webb, James L; Wallentin, Jesper; Jönsson, Andreas; Borgström, Magnus T; Samuelson, Lars; Mikkelsen, Anders
2013-11-13
Utilizing semiconductor nanowires for (opto)electronics requires exact knowledge of their current-voltage properties. We report accurate on-top imaging and I-V characterization of individual as-grown nanowires, using a subnanometer resolution scanning tunneling microscope with no need for additional microscopy tools, thus allowing versatile application. We form Ohmic contacts to InP and InAs nanowires without any sample processing, followed by quantitative measurements of diameter dependent I-V properties with a very small spread in measured values compared to standard techniques.
Methods and apparatus for non-acoustic speech characterization and recognition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzrichter, J.F.
By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Ultrasonic characterization of solid liquid suspensions
Panetta, Paul D.
2010-06-22
Using an ultrasonic field, properties of a solid liquid suspension such as through-transmission attenuation, backscattering, and diffuse field are measured. These properties are converted to quantities indicating the strength of different loss mechanisms (such as absorption, single scattering and multiple scattering) among particles in the suspension. Such separation of the loss mechanisms can allow for direct comparison of the attenuating effects of the mechanisms. These comparisons can also indicate a model most likely to accurately characterize the suspension and can aid in determination of properties such as particle size, concentration, and density of the suspension.
Applications of luminescent systems to infectious disease methodology
NASA Technical Reports Server (NTRS)
Picciolo, G. L.; Chappelle, E. W.; Deming, J. W.; Mcgarry, M. A.; Nibley, D. A.; Okrend, H.; Thomas, R. R.
1976-01-01
The characterization of a clinical sample by a simple, fast, accurate, automatable analytical measurement is important in the management of infectious disease. Luminescence assays offer methods rich with options for these measurements. The instrumentation is common to each assay, and the investment is reasonable. Three general procedures were developed to varying degrees of completeness which measure bacterial levels by measuring their ATP, FMN and iron porphyrins. Bacteriuria detection and antibiograms can be determined within half a day. The characterization of the sample for its soluble ATP, FMN or prophyrins was also performed.
Orthotropic elasto-plastic behavior of AS4/APC-2 thermoplastic composite in compression
NASA Technical Reports Server (NTRS)
Sun, C. T.; Rui, Y.
1989-01-01
Uniaxial compression tests were performed on off-axis coupon specimens of unidirectional AS4/APC-2 thermoplastic composite at various temperatures. The elasto-plastic and strength properties of AS4/APC-2 composite were characterized with respect to temperature variation by using a one-parameter orthotropic plasticity model and a one-parameter failure criterion. Experimental results show that the orthotropic plastic behavior can be characterized quite well using the plasticity model, and the matrix-dominant compressive strengths can be predicted very accurately by the one-parameter failure criterion.
NASA Technical Reports Server (NTRS)
Bernstein, Ira B.; Brookshaw, Leigh; Fox, Peter A.
1992-01-01
The present numerical method for accurate and efficient solution of systems of linear equations proceeds by numerically developing a set of basis solutions characterized by slowly varying dependent variables. The solutions thus obtained are shown to have a computational overhead largely independent of the small size of the scale length which characterizes the solutions; in many cases, the technique obviates series solutions near singular points, and its known sources of error can be easily controlled without a substantial increase in computational time.
NASA Astrophysics Data System (ADS)
Graus, Matthew S.; Neumann, Aaron K.; Timlin, Jerilyn A.
2017-01-01
Fungi in the Candida genus are the most common fungal pathogens. They not only cause high morbidity and mortality but can also cost billions of dollars in healthcare. To alleviate this burden, early and accurate identification of Candida species is necessary. However, standard identification procedures can take days and have a large false negative error. The method described in this study takes advantage of hyperspectral confocal fluorescence microscopy, which enables the capability to quickly and accurately identify and characterize the unique autofluorescence spectra from different Candida species with up to 84% accuracy when grown in conditions that closely mimic physiological conditions.
SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY.
Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang
2009-08-07
This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application.
SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY
Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang
2010-01-01
This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application. PMID:21197416
Testing approximations for non-linear gravitational clustering
NASA Technical Reports Server (NTRS)
Coles, Peter; Melott, Adrian L.; Shandarin, Sergei F.
1993-01-01
The accuracy of various analytic approximations for following the evolution of cosmological density fluctuations into the nonlinear regime is investigated. The Zel'dovich approximation is found to be consistently the best approximation scheme. It is extremely accurate for power spectra characterized by n = -1 or less; when the approximation is 'enhanced' by truncating highly nonlinear Fourier modes the approximation is excellent even for n = +1. The performance of linear theory is less spectrum-dependent, but this approximation is less accurate than the Zel'dovich one for all cases because of the failure to treat dynamics. The lognormal approximation generally provides a very poor fit to the spatial pattern.
An anisotropic thermal-stress model for through-silicon via
NASA Astrophysics Data System (ADS)
Liu, Song; Shan, Guangbao
2018-02-01
A two-dimensional thermal-stress model of through-silicon via (TSV) is proposed considering the anisotropic elastic property of the silicon substrate. By using the complex variable approach, the distribution of thermal-stress in the substrate can be characterized more accurately. TCAD 3-D simulations are used to verify the model accuracy and well agree with analytical results (< ±5%). The proposed thermal-stress model can be integrated into stress-driven design flow for 3-D IC , leading to the more accurate timing analysis considering the thermal-stress effect. Project supported by the Aerospace Advanced Manufacturing Technology Research Joint Fund (No. U1537208).
Combined electron beam imaging and ab initio modeling of T1 precipitates in Al-Li-Cu alloys
NASA Astrophysics Data System (ADS)
Dwyer, C.; Weyland, M.; Chang, L. Y.; Muddle, B. C.
2011-05-01
Among the many considerable challenges faced in developing a rational basis for advanced alloy design, establishing accurate atomistic models is one of the most fundamental. Here we demonstrate how advanced imaging techniques in a double-aberration-corrected transmission electron microscope, combined with ab initio modeling, have been used to determine the atomic structure of embedded 1 nm thick T1 precipitates in precipitation-hardened Al-Li-Cu aerospace alloys. The results provide an accurate determination of the controversial T1 structure, and demonstrate how next-generation techniques permit the characterization of embedded nanostructures in alloys and other nanostructured materials.
Phase rainbow refractometry for accurate droplet variation characterization.
Wu, Yingchun; Promvongsa, Jantarat; Saengkaew, Sawitree; Wu, Xuecheng; Chen, Jia; Gréhan, Gérard
2016-10-15
We developed a one-dimensional phase rainbow refractometer for the accurate trans-dimensional measurements of droplet size on the micrometer scale as well as the tiny droplet diameter variations at the nanoscale. The dependence of the phase shift of the rainbow ripple structures on the droplet variations is revealed. The phase-shifting rainbow image is recorded by a telecentric one-dimensional rainbow imaging system. Experiments on the evaporating monodispersed droplet stream show that the phase rainbow refractometer can measure the tiny droplet diameter changes down to tens of nanometers. This one-dimensional phase rainbow refractometer is capable of measuring the droplet refractive index and diameter, as well as variations.
Choi, Sung Soo Sean; Lashkari, Bahman; Dovlo, Edem; Mandelis, Andreas
2016-01-01
Accurate monitoring of blood oxy-saturation level (SO2) in human breast tissues is clinically important for predicting and evaluating possible tumor growth at the site. In this work, four different non-invasive frequency-domain photoacoustic (PA) imaging modalities were compared for their absolute SO2 characterization capability using an in-vitro sheep blood circulation system. Among different PA modes, a new WM-DPAR imaging modality could estimate the SO2 with great accuracy when compared to a commercial blood gas analyzer. The developed WM-DPARI theory was further validated by constructing SO2 tomographic images of a blood-containing plastisol phantom. PMID:27446691
Medical and Surgical Management of Equine Recurrent Uveitis.
McMullen, Richard Joseph; Fischer, Britta Maria
2017-12-01
Equine recurrent uveitis (ERU) is characterized by recurrent bouts of inflammation interrupted by periods of quiescence that vary in duration. There is little consensus on the clinical manifestations, the underlying causes, or the management. The 3 commonly recognized syndromes of ERU (classic, insidious, and posterior) do not accurately separate the clinical manifestations of disease into distinct categories. An accurate diagnosis and early intervention are essential to minimizing the effects of disease and preserving vision. There are multiple medical and surgical options for controlling ERU as long as the disease is recognized early and targeted treatment is initiated immediately. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Martinek, Tomas; Duboué-Dijon, Elise; Timr, Štěpán; Mason, Philip E.; Baxová, Katarina; Fischer, Henry E.; Schmidt, Burkhard; Pluhařová, Eva; Jungwirth, Pavel
2018-06-01
We present a combination of force field and ab initio molecular dynamics simulations together with neutron scattering experiments with isotopic substitution that aim at characterizing ion hydration and pairing in aqueous calcium chloride and formate/acetate solutions. Benchmarking against neutron scattering data on concentrated solutions together with ion pairing free energy profiles from ab initio molecular dynamics allows us to develop an accurate calcium force field which accounts in a mean-field way for electronic polarization effects via charge rescaling. This refined calcium parameterization is directly usable for standard molecular dynamics simulations of processes involving this key biological signaling ion.
Generalized probabilistic scale space for image restoration.
Wong, Alexander; Mishra, Akshaya K
2010-10-01
A novel generalized sampling-based probabilistic scale space theory is proposed for image restoration. We explore extending the definition of scale space to better account for both noise and observation models, which is important for producing accurately restored images. A new class of scale-space realizations based on sampling and probability theory is introduced to realize this extended definition in the context of image restoration. Experimental results using 2-D images show that generalized sampling-based probabilistic scale-space theory can be used to produce more accurate restored images when compared with state-of-the-art scale-space formulations, particularly under situations characterized by low signal-to-noise ratios and image degradation.
NASA Astrophysics Data System (ADS)
Galmed, A. H.; Elshemey, Wael M.
2017-08-01
Differentiating between normal, benign and malignant excised breast tissues is one of the major worldwide challenges that need a quantitative, fast and reliable technique in order to avoid personal errors in diagnosis. Laser induced fluorescence (LIF) is a promising technique that has been applied for the characterization of biological tissues including breast tissue. Unfortunately, only few studies have adopted a quantitative approach that can be directly applied for breast tissue characterization. This work provides a quantitative means for such characterization via introduction of several LIF characterization parameters and determining the diagnostic accuracy of each parameter in the differentiation between normal, benign and malignant excised breast tissues. Extensive analysis on 41 lyophilized breast samples using scatter diagrams, cut-off values, diagnostic indices and receiver operating characteristic (ROC) curves, shows that some spectral parameters (peak height and area under the peak) are superior for characterization of normal, benign and malignant breast tissues with high sensitivity (up to 0.91), specificity (up to 0.91) and accuracy ranking (highly accurate).
Hierarchical parallelisation of functional renormalisation group calculations - hp-fRG
NASA Astrophysics Data System (ADS)
Rohe, Daniel
2016-10-01
The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes.
Snowflake: A Lightweight Portable Stencil DSL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Nathan; Driscoll, Michael; Markley, Charles
Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.
2017-10-20
Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
Snowflake: A Lightweight Portable Stencil DSL
Zhang, Nathan; Driscoll, Michael; Markley, Charles; ...
2017-05-01
Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less
Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin
2018-01-01
Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405
High-Performance Reactive Particle Tracking with Adaptive Representation
NASA Astrophysics Data System (ADS)
Schmidt, M.; Benson, D. A.; Pankavich, S.
2017-12-01
Lagrangian particle tracking algorithms have been shown to be effective tools for modeling chemical reactions in imperfectly-mixed media. One disadvantage of these algorithms is the possible need to employ large numbers of particles in simulations, depending on the concentration covariance structure, and these large particle numbers can lead to long computation times. Two distinct approaches have recently arisen to overcome this. One method employs spatial kernels that are related to a specified, reduced particle number; however, over-wide kernels, dictated by a very low particle number, lead to an excess of reaction calculations and cause a reduction in performance. Another formulation involves hybrid particles that carry multiple species of reactant, wherein each particle is treated as its own well-mixed volume, obviating the need for large numbers of particles for each species but still requiring a fixed number of hybrid particles. Here, we combine these two approaches and demonstrate an improved method for simulating a given system in a computationally efficient manner. Additionally, the independent nature of transport and reaction calculations in this approach allows for significant gains via parallelization in an MPI or OpenMP context. For benchmarking, we choose a CO2 injection simulation with dissolution and precipitation of calcite and dolomite, allowing us to derive the proper treatment of interaction between solid and aqueous phases.