Sample records for characterizing openmp application

  1. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    DOE PAGES

    Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

    2013-01-01

    Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

  2. Characterizing Task-Based OpenMP Programs

    PubMed Central

    Muddukrishna, Ananya; Jonsson, Peter A.; Brorsson, Mats

    2015-01-01

    Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. We demonstrate the utility of our method by quickly diagnosing performance problems and characterizing exposed task parallelism and per-task instruction profiles of benchmarks in the widely-used Barcelona OpenMP Tasks Suite. Programmers can tune performance faster and understand performance tradeoffs more effectively than existing tools by using our method to characterize task-based performance. PMID:25860023

  3. Employing Nested OpenMP for the Parallelization of Multi-Zone Computational Fluid Dynamics Applications

    NASA Technical Reports Server (NTRS)

    Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Jost, Gabriele

    2004-01-01

    In this paper we describe the parallelization of the multi-zone code versions of the NAS Parallel Benchmarks employing multi-level OpenMP parallelism. For our study we use the NanosCompiler, which supports nesting of OpenMP directives and provides clauses to control the grouping of threads, load balancing, and synchronization. We report the benchmark results, compare the timings with those of different hybrid parallelization paradigms and discuss OpenMP implementation issues which effect the performance of multi-level parallel applications.

  4. Toward Enhancing OpenMP's Work-Sharing Directives

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, B M; Huang, L; Jin, H

    2006-05-17

    OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.

  5. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barbara Chapman

    OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close tomore » DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.« less

  7. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  8. Support of Multidimensional Parallelism in the OpenMP Programming Model

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele

    2003-01-01

    OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.

  9. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.« less

  10. Early Experiences Writing Performance Portable OpenMP 4 Codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joubert, Wayne; Hernandez, Oscar R

    In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less

  11. Clomp

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gylenhaal, J.; Bronevetsky, G.

    2007-05-25

    CLOMP is the C version of the Livermore OpenMP benchmark deeloped to measure OpenMP overheads and other performance impacts due to threading (like NUMA memory layouts, memory contention, cache effects, etc.) in order to influence future system design. Current best-in-class implementations of OpenMP have overheads at least ten times larger than is required by many of our applications for effective use of OpenMP. This benchmark shows the significant negative performance impact of these relatively large overheads and of other thread effects. The CLOMP benchmark highly configurable to allow a variety of problem sizes and threading effects to be studied andmore » it carefully checks its results to catch many common threading errors. This benchmark is expected to be included as part of the Sequoia Benchmark suite for the Sequoia procurement.« less

  12. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.

  13. Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

    NASA Astrophysics Data System (ADS)

    Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

    2017-10-01

    In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.

  14. A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, C; Quinlan, D; Panas, T

    2010-01-25

    OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers for optimal performance and to target modern hardware architectures. A variety of extensible and robust research compilers are key to OpenMP's sustainable success in the future. In this paper, we present our efforts to build an OpenMP 3.0 research compiler for C, C++, and Fortran; using the ROSE source-to-source compiler framework. Our goal is to support OpenMP research for ourselves and others. We have extended ROSE's internal representation to handle all of the OpenMP 3.0 constructs and facilitate their manipulation. Since OpenMP research is oftenmore » complicated by the tight coupling of the compiler translations and the runtime system, we present a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries. These rules additionally define how to build a set of translations targeting XOMP. Our work demonstrates how to reuse OpenMP translations across different runtime libraries. This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries. We present an evaluation of our work by demonstrating an analysis tool for OpenMP correctness. We also show how XOMP can be defined using both GOMP and Omni and present comparative performance results against other OpenMP compilers.« less

  15. MPI, HPF or OpenMP: A Study with the NAS Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Hribar, Michelle; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1999-01-01

    Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but the task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study,potentials of applying some of the techniques to realistic aerospace applications will be presented

  16. MPI, HPF or OpenMP: A Study with the NAS Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Hribar, M.; Waheed, A.; Yan, J.; Saini, Subhash (Technical Monitor)

    1999-01-01

    Porting applications to new high performance parallel and distributed platforms is a challenging task. Writing parallel code by hand is time consuming and costly, but this task can be simplified by high level languages and would even better be automated by parallelizing tools and compilers. The definition of HPF (High Performance Fortran, based on data parallel model) and OpenMP (based on shared memory parallel model) standards has offered great opportunity in this respect. Both provide simple and clear interfaces to language like FORTRAN and simplify many tedious tasks encountered in writing message passing programs. In our study, we implemented the parallel versions of the NAS Benchmarks with HPF and OpenMP directives. Comparison of their performance with the MPI implementation and pros and cons of different approaches will be discussed along with experience of using computer-aided tools to help parallelize these benchmarks. Based on the study, potentials of applying some of the techniques to realistic aerospace applications will be presented.

  17. Multilevel Parallelization of AutoDock 4.2.

    PubMed

    Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

    2011-04-28

    Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.

  18. OpenMP 4.5 Validation and Verification Suite

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pophale, Swaroop S; Bernholdt, David E; Hernandez, Oscar R

    2017-12-15

    OpenMP, a directive-based programming API, introduce directives for accelerator devices that programmers are starting to use more frequently in production codes. To make sure OpenMP directives work correctly across architectures, it is critical to have a mechanism that tests for an implementation's conformance to the OpenMP standard. This testing process can uncover ambiguities in the OpenMP specification, which helps compiler developers and users make a better use of the standard. We fill this gap through our validation and verification test suite that focuses on the offload directives available in OpenMP 4.5.

  19. What Scientific Applications can Benefit from Hardware Transactional Memory?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schindewolf, M; Bihari, B; Gyllenhaal, J

    2012-06-04

    Achieving efficient and correct synchronization of multiple threads is a difficult and error-prone task at small scale and, as we march towards extreme scale computing, will be even more challenging when the resulting application is supposed to utilize millions of cores efficiently. Transactional Memory (TM) is a promising technique to ease the burden on the programmer, but only recently has become available on commercial hardware in the new Blue Gene/Q system and hence the real benefit for realistic applications has not been studied, yet. This paper presents the first performance results of TM embedded into OpenMP on a prototype systemmore » of BG/Q and characterizes code properties that will likely lead to benefits when augmented with TM primitives. We first, study the influence of thread count, environment variables and memory layout on TM performance and identify code properties that will yield performance gains with TM. Second, we evaluate the combination of OpenMP with multiple synchronization primitives on top of MPI to determine suitable task to thread ratios per node. Finally, we condense our findings into a set of best practices. These are applied to a Monte Carlo Benchmark and a Smoothed Particle Hydrodynamics method. In both cases an optimized TM version, executed with 64 threads on one node, outperforms a simple TM implementation. MCB with optimized TM yields a speedup of 27.45 over baseline.« less

  20. The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

    NASA Technical Reports Server (NTRS)

    Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.

  1. Testing New Programming Paradigms with NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.

  2. Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

    2003-01-01

    In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  3. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    PubMed

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  4. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    PubMed Central

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  5. Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

    NASA Technical Reports Server (NTRS)

    Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.

  6. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  7. Archer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Atzeni, Simone; Ahn, Dong; Gopalakrishnan, Ganesh

    2017-01-12

    Archer is built on top of the LLVM/Clang compilers that support OpenMP. It applies static and dynamic analysis techniques to detect data races in OpenMP programs generating a very low runtime and memory overhead. Static analyses identify data race free OpenMP regions and exclude them from runtime analysis, which is performed by ThreadSanitizer included in LLVM/Clang.

  8. Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.

  9. Issues Identified During September 2016 IBM OpenMP 4.5 Hackathon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Richards, David F.

    In September, 2016 IBM hosted an OpenMP 4.5 Hackathon at the TJ Watson Research Center. Teams from LLNL, ORNL, SNL, LANL, and LBNL attended the event. As with the 2015 hackathon, IBM produced an extremely useful and successful event with unmatched support from compiler team, applications staff, and facilities. Approximately 24 IBM staff supported 4-day hackathon and spent significant time 4-6 weeks out to prepare environment and become familiar with apps. This hackathon was also the first event to feature LLVM & XL C/C++ and Fortran compilers. This report records many of the issues encountered by the LLNL teams duringmore » the hackathon.« less

  10. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE PAGES

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan; ...

    2017-10-24

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  11. Argobots: A Lightweight Low-Level Threading and Tasking Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less

  12. What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.

  13. CLOMP v1.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gyllenhaal, J.

    CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading. For simplicity, it does not use MPI by default but it is expected to be run on the resources a threaded MPI task would use (e.g., a portion of a shared memory compute node). Compiling with -DWITH_MPI allows packing one or more nodes with CLOMP tasks and having CLOMP report OpenMP performance for the slowest MPI task. On current systems, the strong scaling performance results for 4, 8, or 16 threads are of the most interest. Suggested weakmore » scaling inputs are provided for evaluating future systems. Since MPI is often used to place at least one MPI task per coherence or NUMA domain, it is recommended to focus OpenMP runtime measurements on a subset of node hardware where it is most possible to have low OpenMP overheads (e.g., within one coherence domain or NUMA domain).« less

  14. Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, C; Quinlan, D J; Willcock, J J

    2008-12-12

    Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructuremore » which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.« less

  15. Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

    NASA Astrophysics Data System (ADS)

    Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

    We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.

  16. A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

    PubMed

    Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

    2017-10-01

    The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.

  17. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

    1999-01-01

    As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.

  18. Characterization of UMT2013 Performance on Advanced Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Howell, Louis

    2014-12-31

    This paper presents part of a larger effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. The focus here is on UMT2013, a proxy implementation of deterministic transport for unstructured meshes. I present weak and strong MPI scaling results and studies of OpenMP efficiency on the Sequoia BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. The hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while informationmore » from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Preliminary tests that exploit NVRAM as extended memory on an Ivy Bridge machine designed for “Big Data” applications are also included.« less

  19. A Programming Model Performance Study Using the NAS Parallel Benchmarks

    DOE PAGES

    Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

    2010-01-01

    Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less

  20. A multi-threaded version of MCFM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campbell, John M.; Ellis, R. Keith; Giele, Walter T.

    We report on our findings modifying MCFM using OpenMP to implement multi-threading. By using OpenMP, the modified MCFM will execute on any processor, automatically adjusting to the number of available threads. We then modified the integration routine VEGAS to distribute the event evaluation over the threads, while combining all events at the end of every iteration to optimize the numerical integration. Furthermore, we took special care so that the results of the Monte Carlo integration were independent of the number of threads used, to facilitate the validation of the OpenMP version of MCFM.

  1. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

    DOE PAGES

    Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel

    2014-07-22

    The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less

  2. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  3. Roofline model toolkit: A practical tool for architectural and program analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

    We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less

  4. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  5. Performance Analysis of and Tool Support for Transactional Memory on BG/Q

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schindewolf, M

    2011-12-08

    Martin Schindewolf worked during his internship at the Lawrence Livermore National Laboratory (LLNL) under the guidance of Martin Schulz at the Computer Science Group of the Center for Applied Scientific Computing. We studied the performance of the TM subsystem of BG/Q as well as researched the possibilities for tool support for TM. To study the performance, we run CLOMP-TM. CLOMP-TM is a benchmark designed for the purpose to quantify the overhead of OpenMP and compare different synchronization primitives. To advance CLOMP-TM, we added Message Passing Interface (MPI) routines for a hybrid parallelization. This enables to run multiple MPI tasks, eachmore » running OpenMP, on one node. With these enhancements, a beneficial MPI task to OpenMP thread ratio is determined. Further, the synchronization primitives are ranked as a function of the application characteristics. To demonstrate the usefulness of these results, we investigate a real Monte Carlo simulation called Monte Carlo Benchmark (MCB). Applying the lessons learned yields the best task to thread ratio. Further, we were able to tune the synchronization by transactifying the MCB. Further, we develop tools that capture the performance of the TM run time system and present it to the application's developer. The performance of the TM run time system relies on the built-in statistics. These tools use the Blue Gene Performance Monitoring (BGPM) interface to correlate the statistics from the TM run time system with performance counter values. This combination provides detailed insights in the run time behavior of the application and enables to track down the cause of degraded performance. Further, one tool has been implemented that separates the performance counters in three categories: Successful Speculation, Unsuccessful Speculation and No Speculation. All of the tools are crafted around IBM's xlc compiler for C and C++ and have been run and tested on a Q32 early access system.« less

  6. Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mishra, Alok; Li, Lingda; Kong, Martin

    Here, the latest OpenMP standard offers automatic device offloading capabilities which facilitate GPU programming. Despite this, there remain many challenges. One of these is the unified memory feature introduced in recent GPUs. GPUs in current and future HPC systems have enhanced support for unified memory space. In such systems, CPU and GPU can access each other's memory transparently, that is, the data movement is managed automatically by the underlying system software and hardware. Memory over subscription is also possible in these systems. However, there is a significant lack of knowledge about how this mechanism will perform, and how programmers shouldmore » use it. We have modified several benchmarks codes, in the Rodinia benchmark suite, to study the behavior of OpenMP accelerator extensions and have used them to explore the impact of unified memory in an OpenMP context. We moreover modified the open source LLVM compiler to allow OpenMP programs to exploit unified memory. The results of our evaluation reveal that, while the performance of unified memory is comparable with that of normal GPU offloading for benchmarks with little data reuse, it suffers from significant overhead when GPU memory is over subcribed for benchmarks with large amount of data reuse. Based on these results, we provide several guidelines for programmers to achieve better performance with unified memory.« less

  7. OpenMP parallelization of a gridded SWAT (SWATG)

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

    2017-12-01

    Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.

  8. GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

    NASA Astrophysics Data System (ADS)

    Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.

    2018-07-01

    This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.

  9. Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP

    NASA Technical Reports Server (NTRS)

    Norton, C.; Balsara, D.

    1999-01-01

    Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.

  10. Optics Program Modified for Multithreaded Parallel Computing

    NASA Technical Reports Server (NTRS)

    Lou, John; Bedding, Dave; Basinger, Scott

    2006-01-01

    A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

  11. Performance evaluation of canny edge detection on a tiled multicore architecture

    NASA Astrophysics Data System (ADS)

    Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

    2011-01-01

    In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.

  12. HPC Profiling with the Sun Studio™ Performance Tools

    NASA Astrophysics Data System (ADS)

    Itzkowitz, Marty; Maruyama, Yukon

    In this paper, we describe how to use the Sun Studio Performance Tools to understand the nature and causes of application performance problems. We first explore CPU and memory performance problems for single-threaded applications, giving some simple examples. Then, we discuss multi-threaded performance issues, such as locking and false-sharing of cache lines, in each case showing how the tools can help. We go on to describe OpenMP applications and the support for them in the performance tools. Then we discuss MPI applications, and the techniques used to profile them. Finally, we present our conclusions.

  13. OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms

    DOE PAGES

    Meng, Zhaoyi; Koniges, Alice; He, Yun Helen; ...

    2016-09-21

    In this paper, we investigate the OpenMP parallelization and optimization of two novel data classification algorithms. The new algorithms are based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification algorithms in serial mode. The methods leverage the Nystrom extension to calculate eigenvalue/eigenvectors of the graph Laplacian and this is a self-contained module that can be used in conjunction with other graph-Laplacian based methods such as spectral clustering. We use performance tools to collect the hotspots and memory access of the serial codes and use OpenMP as the parallelization language to parallelizemore » the most time-consuming parts. Where possible, we also use library routines. We then optimize the OpenMP implementations and detail the performance on traditional supercomputer nodes (in our case a Cray XC30), and test the optimization steps on emerging testbed systems based on Intel’s Knights Corner and Landing processors. We show both performance improvement and strong scaling behavior. Finally, a large number of optimization techniques and analyses are necessary before the algorithm reaches almost ideal scaling.« less

  14. Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators

    PubMed Central

    Wang, Wei; Xu, Lifan; Cavazos, John; Huang, Howie H.; Kay, Matthew

    2014-01-01

    Recent developments in modern computational accelerators like Graphics Processing Units (GPUs) and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC) coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least faster than the sequential implementation and faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other computational models of wave propagation in multi-dimensional media. PMID:24497950

  15. Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; Jong, Wibe de

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less

  16. OpenMP performance for benchmark 2D shallow water equations using LBM

    NASA Astrophysics Data System (ADS)

    Sabri, Khairul; Rabbani, Hasbi; Gunawan, Putu Harry

    2018-03-01

    Shallow water equations or commonly referred as Saint-Venant equations are used to model fluid phenomena. These equations can be solved numerically using several methods, like Lattice Boltzmann method (LBM), SIMPLE-like Method, Finite Difference Method, Godunov-type Method, and Finite Volume Method. In this paper, the shallow water equation will be approximated using LBM or known as LABSWE and will be simulated in performance of parallel programming using OpenMP. To evaluate the performance between 2 and 4 threads parallel algorithm, ten various number of grids Lx and Ly are elaborated. The results show that using OpenMP platform, the computational time for solving LABSWE can be decreased. For instance using grid sizes 1000 × 500, the speedup of 2 and 4 threads is observed 93.54 s and 333.243 s respectively.

  17. Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Turney, Raymond D.

    2001-01-01

    This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.

  18. Tycho 2: A Proxy Application for Kinetic Transport Sweeps

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garrett, Charles Kristopher; Warsa, James S.

    2016-09-14

    Tycho 2 is a proxy application that implements discrete ordinates (SN) kinetic transport sweeps on unstructured, 3D, tetrahedral meshes. It has been designed to be small and require minimal dependencies to make collaboration and experimentation as easy as possible. Tycho 2 has been released as open source software. The software is currently in a beta release with plans for a stable release (version 1.0) before the end of the year. The code is parallelized via MPI across spatial cells and OpenMP across angles. Currently, several parallelization algorithms are implemented.

  19. A feasibility study on porting the community land model onto accelerators using OpenACC

    DOE PAGES

    Wang, Dali; Wu, Wei; Winkler, Frank; ...

    2014-01-01

    As environmental models (such as Accelerated Climate Model for Energy (ACME), Parallel Reactive Flow and Transport Model (PFLOTRAN), Arctic Terrestrial Simulator (ATS), etc.) became more and more complicated, we are facing enormous challenges regarding to porting those applications onto hybrid computing architecture. OpenACC appears as a very promising technology, therefore, we have conducted a feasibility analysis on porting the Community Land Model (CLM), a terrestrial ecosystem model within the Community Earth System Models (CESM)). Specifically, we used automatic function testing platform to extract a small computing kernel out of CLM, then we apply this kernel into the actually CLM dataflowmore » procedure, and investigate the strategy of data parallelization and the benefit of data movement provided by current implementation of OpenACC. Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP implementation using single OpenMP thread, but it is 2.8 times slower than the performance of OpenMP implementation using 16 threads. On multiple nodes, MPI_OpenACC implementation demonstrated very good scalability on up to 128 GPUs on 128 computing nodes. This study also provides useful information for us to look into the potential benefits of “deep copy” capability and “routine” feature of OpenACC standards. In conclusion, we believe that our experience on the environmental model, CLM, can be beneficial to many other scientific research programs who are interested to porting their large scale scientific code using OpenACC onto high-end computers, empowered by hybrid computing architecture.« less

  20. KITTEN Lightweight Kernel 0.1 Beta

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pedretti, Kevin; Levenhagen, Michael; Kelly, Suzanne

    2007-12-12

    The Kitten Lightweight Kernel is a simplified OS (operating system) kernel that is intended to manage a compute node's hardware resources. It provides a set of mechanisms to user-level applications for utilizing hardware resources (e.g., allocating memory, creating processes, accessing the network). Kitten is much simpler than general-purpose OS kernels, such as Linux or Windows, but includes all of the esssential functionality needed to support HPC (high-performance computing) MPI, PGAS and OpenMP applications. Kitten provides unique capabilities such as physically contiguous application memory, transparent large page support, and noise-free tick-less operation, which enable HPC applications to obtain greater efficiency andmore » scalability than with general purpose OS kernels.« less

  1. Characterization of Proxy Application Performance on Advanced Architectures. UMT2013, MCB, AMG2013

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Howell, Louis H.; Gunney, Brian T.; Bhatele, Abhinav

    2015-10-09

    Three codes were tested at LLNL as part of a Tri-Lab effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. Teams from Sandia and Los Alamos tested proxy apps of their own. The focus in this report is on the LLNL codes UMT2013, MCB, and AMG2013. We present weak and strong MPI scaling results and studies of OpenMP efficiency on a large BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. Themore » hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while information from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Results from three more speculative tests are also included: one that exploits NVRAM as extended memory, one that studies performance under a power bound, and one that illustrates the effects of changing the torus network mapping on BG/Q.« less

  2. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    NASA Astrophysics Data System (ADS)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  3. Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

    2010-01-01

    Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less

  4. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    NASA Astrophysics Data System (ADS)

    Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

    2018-02-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.

  5. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    NASA Technical Reports Server (NTRS)

    Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2016-01-01

    In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.

  6. PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.

    PubMed

    Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A

    2016-06-01

    New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  7. The Research of the Parallel Computing Development from the Angle of Cloud Computing

    NASA Astrophysics Data System (ADS)

    Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

    2017-10-01

    Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.

  8. Effective Vectorization with OpenMP 4.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham

    This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less

  9. Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology

    NASA Astrophysics Data System (ADS)

    Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.

    2009-04-01

    Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.

  10. High Resolution Aerospace Applications using the NASA Columbia Supercomputer

    NASA Technical Reports Server (NTRS)

    Mavriplis, Dimitri J.; Aftosmis, Michael J.; Berger, Marsha

    2005-01-01

    This paper focuses on the parallel performance of two high-performance aerodynamic simulation packages on the newly installed NASA Columbia supercomputer. These packages include both a high-fidelity, unstructured, Reynolds-averaged Navier-Stokes solver, and a fully-automated inviscid flow package for cut-cell Cartesian grids. The complementary combination of these two simulation codes enables high-fidelity characterization of aerospace vehicle design performance over the entire flight envelope through extensive parametric analysis and detailed simulation of critical regions of the flight envelope. Both packages. are industrial-level codes designed for complex geometry and incorpor.ats. CuStomized multigrid solution algorithms. The performance of these codes on Columbia is examined using both MPI and OpenMP and using both the NUMAlink and InfiniBand interconnect fabrics. Numerical results demonstrate good scalability on up to 2016 CPUs using the NUMAIink4 interconnect, with measured computational rates in the vicinity of 3 TFLOP/s, while InfiniBand showed some performance degradation at high CPU counts, particularly with multigrid. Nonetheless, the results are encouraging enough to indicate that larger test cases using combined MPI/OpenMP communication should scale well on even more processors.

  11. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  12. A Hybrid MPI/OpenMP Approach for Parallel Groundwater Model Calibration on Multicore Computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

    2010-01-01

    Groundwater model calibration is becoming increasingly computationally time intensive. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelism in software and hardware to reduce calibration time on multicore computers with minimal parallelization effort. At first, HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for a uranium transport model with over a hundred species involving nearly a hundred reactions, and a field scale coupled flow and transport model. In the first application, a single parallelizable loop is identified to consume over 97% of the total computational time. With a few lines of OpenMP compiler directives inserted into the code,more » the computational time reduces about ten times on a compute node with 16 cores. The performance is further improved by selectively parallelizing a few more loops. For the field scale application, parallelizable loops in 15 of the 174 subroutines in HGC5 are identified to take more than 99% of the execution time. By adding the preconditioned conjugate gradient solver and BICGSTAB, and using a coloring scheme to separate the elements, nodes, and boundary sides, the subroutines for finite element assembly, soil property update, and boundary condition application are parallelized, resulting in a speedup of about 10 on a 16-core compute node. The Levenberg-Marquardt (LM) algorithm is added into HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, compute nodes at the number of adjustable parameters (when the forward difference is used for Jacobian approximation), or twice that number (if the center difference is used), are used to reduce the calibration time from days and weeks to a few hours for the two applications. This approach can be extended to global optimization scheme and Monte Carol analysis where thousands of compute nodes can be efficiently utilized.« less

  13. Use Computer-Aided Tools to Parallelize Large CFD Applications

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Yan, J.

    2000-01-01

    Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.

  14. Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

    NASA Astrophysics Data System (ADS)

    Doerner, Edgardo

    2016-05-01

    In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.

  16. Parallel protein secondary structure prediction based on neural networks.

    PubMed

    Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

    2004-01-01

    Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.

  17. [Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

    PubMed

    Furuta, Takuya; Sato, Tatsuhiko

    2015-01-01

    Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.

  18. Numerical modeling of exciton-polariton Bose-Einstein condensate in a microcavity

    NASA Astrophysics Data System (ADS)

    Voronych, Oksana; Buraczewski, Adam; Matuszewski, Michał; Stobińska, Magdalena

    2017-06-01

    A novel, optimized numerical method of modeling of an exciton-polariton superfluid in a semiconductor microcavity was proposed. Exciton-polaritons are spin-carrying quasiparticles formed from photons strongly coupled to excitons. They possess unique properties, interesting from the point of view of fundamental research as well as numerous potential applications. However, their numerical modeling is challenging due to the structure of nonlinear differential equations describing their evolution. In this paper, we propose to solve the equations with a modified Runge-Kutta method of 4th order, further optimized for efficient computations. The algorithms were implemented in form of C++ programs fitted for parallel environments and utilizing vector instructions. The programs form the EPCGP suite which has been used for theoretical investigation of exciton-polaritons. Catalogue identifier: AFBQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD-3 No. of lines in distributed program, including test data, etc.: 2157 No. of bytes in distributed program, including test data, etc.: 498994 Distribution format: tar.gz Programming language: C++ with OpenMP extensions (main numerical program), Python (helper scripts). Computer: Modern PC (tested on AMD and Intel processors), HP BL2x220. Operating system: Unix/Linux and Windows. Has the code been vectorized or parallelized?: Yes (OpenMP) RAM: 200 MB for single run Classification: 7, 7.7. Nature of problem: An exciton-polariton superfluid is a novel, interesting physical system allowing investigation of high temperature Bose-Einstein condensation of exciton-polaritons-quasiparticles carrying spin. They have brought a lot of attention due to their unique properties and potential applications in polariton-based optoelectronic integrated circuits. This is an out-of-equilibrium quantum system confined within a semiconductor microcavity. It is described by a set of nonlinear differential equations similar in spirit to the Gross-Pitaevskii (GP) equation, but their unique properties do not allow standard GP solving frameworks to be utilized. Finding an accurate and efficient numerical algorithm as well as development of optimized numerical software is necessary for effective theoretical investigation of exciton-polaritons. Solution method: A Runge-Kutta method of 4th order was employed to solve the set of differential equations describing exciton-polariton superfluids. The method was fitted for the exciton-polariton equations and further optimized. The C++ programs utilize OpenMP extensions and vector operations in order to fully utilize the computer hardware. Running time: 6h for 100 ps evolution, depending on the values of parameters

  19. Implementing Shared Memory Parallelism in MCBEND

    NASA Astrophysics Data System (ADS)

    Bird, Adam; Long, David; Dobson, Geoff

    2017-09-01

    MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.

  20. Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Panyala, Ajay; Chavarría-Miranda, Daniel; Manzano, Joseph B.

    High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structuredmore » locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.« less

  1. a Modeling Method of Fluttering Leaves Based on Point Cloud

    NASA Astrophysics Data System (ADS)

    Tang, J.; Wang, Y.; Zhao, Y.; Hao, W.; Ning, X.; Lv, K.; Shi, Z.; Zhao, M.

    2017-09-01

    Leaves falling gently or fluttering are common phenomenon in nature scenes. The authenticity of leaves falling plays an important part in the dynamic modeling of natural scenes. The leaves falling model has a widely applications in the field of animation and virtual reality. We propose a novel modeling method of fluttering leaves based on point cloud in this paper. According to the shape, the weight of leaves and the wind speed, three basic trajectories of leaves falling are defined, which are the rotation falling, the roll falling and the screw roll falling. At the same time, a parallel algorithm based on OpenMP is implemented to satisfy the needs of real-time in practical applications. Experimental results demonstrate that the proposed method is amenable to the incorporation of a variety of desirable effects.

  2. On the Performance of an Algebraic MultigridSolver on Multicore Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, A H; Schulz, M; Yang, U M

    2010-04-29

    Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG's performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread and process pinning and correct memory associations. We have implemented most of the techniques in a MultiCore SUPport library (MCSup), which helps to map OpenMP applications to multicore machines. We present results using both an MPI-only and a hybrid MPI/OpenMP model.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Messer, Bronson; Harris, James A; Parete-Koon, Suzanne T

    We describe recent development work on the core-collapse supernova code CHIMERA. CHIMERA has consumed more than 100 million cpu-hours on Oak Ridge Leadership Computing Facility (OLCF) platforms in the past 3 years, ranking it among the most important applications at the OLCF. Most of the work described has been focused on exploiting the multicore nature of the current platform (Jaguar) via, e.g., multithreading using OpenMP. In addition, we have begun a major effort to marshal the computational power of GPUs with CHIMERA. The impending upgrade of Jaguar to Titan a 20+ PF machine with an NVIDIA GPU on many nodesmore » makes this work essential.« less

  4. Performance Portability Strategies for Grid C++ Expression Templates

    NASA Astrophysics Data System (ADS)

    Boyle, Peter A.; Clark, M. A.; DeTar, Carleton; Lin, Meifeng; Rana, Verinder; Vaquero Avilés-Casco, Alejandro

    2018-03-01

    One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be reported. We will also report on the challenges of using current OpenMP 4.x for GPU offloading in the same code.

  5. Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

    NASA Astrophysics Data System (ADS)

    Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

    2018-03-01

    Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.

  6. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    NASA Technical Reports Server (NTRS)

    Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2017-01-01

    In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

  7. Hierarchical resilience with lightweight threads.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wheeler, Kyle Bruce

    2011-10-01

    This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less

  8. On a model of three-dimensional bursting and its parallel implementation

    NASA Astrophysics Data System (ADS)

    Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

    2008-04-01

    A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.

  9. GPU Accelerated Browser for Neuroimaging Genomics.

    PubMed

    Zigon, Bob; Li, Huang; Yao, Xiaohui; Fang, Shiaofen; Hasan, Mohammad Al; Yan, Jingwen; Moore, Jason H; Saykin, Andrew J; Shen, Li

    2018-04-25

    Neuroimaging genomics is an emerging field that provides exciting opportunities to understand the genetic basis of brain structure and function. The unprecedented scale and complexity of the imaging and genomics data, however, have presented critical computational bottlenecks. In this work we present our initial efforts towards building an interactive visual exploratory system for mining big data in neuroimaging genomics. A GPU accelerated browsing tool for neuroimaging genomics is created that implements the ANOVA algorithm for single nucleotide polymorphism (SNP) based analysis and the VEGAS algorithm for gene-based analysis, and executes them at interactive rates. The ANOVA algorithm is 110 times faster than the 4-core OpenMP version, while the VEGAS algorithm is 375 times faster than its 4-core OpenMP counter part. This approach lays a solid foundation for researchers to address the challenges of mining large-scale imaging genomics datasets via interactive visual exploration.

  10. Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; de Jong, Wibe

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less

  11. Performance Study of Monte Carlo Codes on Xeon Phi Coprocessors — Testing MCNP 6.1 and Profiling ARCHER Geometry Module on the FS7ONNi Problem

    NASA Astrophysics Data System (ADS)

    Liu, Tianyu; Wolfe, Noah; Lin, Hui; Zieb, Kris; Ji, Wei; Caracappa, Peter; Carothers, Christopher; Xu, X. George

    2017-09-01

    This paper contains two parts revolving around Monte Carlo transport simulation on Intel Many Integrated Core coprocessors (MIC, also known as Xeon Phi). (1) MCNP 6.1 was recompiled into multithreading (OpenMP) and multiprocessing (MPI) forms respectively without modification to the source code. The new codes were tested on a 60-core 5110P MIC. The test case was FS7ONNi, a radiation shielding problem used in MCNP's verification and validation suite. It was observed that both codes became slower on the MIC than on a 6-core X5650 CPU, by a factor of 4 for the MPI code and, abnormally, 20 for the OpenMP code, and both exhibited limited capability of strong scaling. (2) We have recently added a Constructive Solid Geometry (CSG) module to our ARCHER code to provide better support for geometry modelling in radiation shielding simulation. The functions of this module are frequently called in the particle random walk process. To identify the performance bottleneck we developed a CSG proxy application and profiled the code using the geometry data from FS7ONNi. The profiling data showed that the code was primarily memory latency bound on the MIC. This study suggests that despite low initial porting e_ort, Monte Carlo codes do not naturally lend themselves to the MIC platform — just like to the GPUs, and that the memory latency problem needs to be addressed in order to achieve decent performance gain.

  12. NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

    2014-07-01

    We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.

  13. Massively parallel sparse matrix function calculations with NTPoly

    NASA Astrophysics Data System (ADS)

    Dawson, William; Nakajima, Takahito

    2018-04-01

    We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.

  14. BCYCLIC: A parallel block tridiagonal matrix cyclic solver

    NASA Astrophysics Data System (ADS)

    Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.

    2010-09-01

    A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.

  15. Automation of Data Traffic Control on DSM Architecture

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

    2001-01-01

    The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.

  16. Data Race Benchmark Collection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, Chunhua; Lin, Pei-Hung; Asplund, Joshua

    2017-03-21

    This project is a benchmark suite of Open-MP parallel codes that have been checked for data races. The programs are marked to show which do and do not have races. This allows them to be leveraged while testing and developing race detection tools.

  17. Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Biswas, Rupak

    2003-01-01

    This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.

  18. Variational data assimilation system "INM RAS - Black Sea"

    NASA Astrophysics Data System (ADS)

    Parmuzin, Eugene; Agoshkov, Valery; Assovskiy, Maksim; Giniatulin, Sergey; Zakharova, Natalia; Kuimov, Grigory; Fomin, Vladimir

    2013-04-01

    Development of Informational-Computational Systems (ICS) for Data Assimilation Procedures is one of multidisciplinary problems. To study and solve these problems one needs to apply modern results from different disciplines and recent developments in: mathematical modeling; theory of adjoint equations and optimal control; inverse problems; numerical methods theory; numerical algebra and scientific computing. The problems discussed above are studied in the Institute of Numerical Mathematics of the Russian Academy of Science (INM RAS) in ICS for Personal Computers (PC). Special problems and questions arise while effective ICS versions for PC are being developed. These problems and questions can be solved with applying modern methods of numerical mathematics and by solving "parallelism problem" using OpenMP technology and special linear algebra packages. In this work the results on the ICS development for PC-ICS "INM RAS - Black Sea" are presented. In the work the following problems and questions are discussed: practical problems that can be studied by ICS; parallelism problems and their solutions with applying of OpenMP technology and the linear algebra packages used in ICS "INM - Black Sea"; Interface of ICS. The results of ICS "INM RAS - Black Sea" testing are presented. Efficiency of technologies and methods applied are discussed. The work was supported by RFBR, grants No. 13-01-00753, 13-05-00715 and by The Ministry of education and science of Russian Federation, project 8291, project 11.519.11.1005 References: [1] V.I. Agoshkov, M.V. Assovskii, S.A. Lebedev, Numerical simulation of Black Sea hydrothermodynamics taking into account tide-forming forces. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 5-31 [2] E.I. Parmuzin, V.I. Agoshkov, Numerical solution of the variational assimilation problem for sea surface temperature in the model of the Black Sea dynamics. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 69-94 [3] V.B. Zalesny, N.A. Diansky, V.V. Fomin, S.N. Moshonkin, S.G. Demyshev, Numerical model of the circulation of Black Sea and Sea of Azov. Russ. J. Numer. Anal. Math. Modelling (2012) 27, No.1, 95-111 [4] V.I. Agoshkov, S.V. Giniatulin, G.V. Kuimov. OpenMP technology and linear algebra packages in the variation data assimilation systems. - Abstracts of the 1-st China-Russia Conference on Numerical Algebra with Applications in Radiactive Hydrodynamics, Beijing, China, October 16-18, 2012. [5] Zakharova N.B., Agoshkov V.I., Parmuzin E.I., The new method of ARGO buoys system observation data interpolation. Russian Journal of Numerical Analysis and Mathematical Modelling. Vol. 28, Issue 1, 2013.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, Richard D.; Hones, Holger E.

    The RAJA Performance Suite is designed to evaluate performance of the RAJA performance portability library on a wide variety of important high performance computing (HPC) algorithmic lulmels. These kernels assess compiler optimizations and various parallel programming model backends accessible through RAJA, such as OpenMP, CUDA, etc. The Initial version of the suite contains 25 computational kernels, each of which appears in 6 variants: Baseline SequcntiaJ, RAJA SequentiaJ, Baseline OpenMP, RAJA OpenMP, Baseline CUDA, RAJA CUDA. All variants of each kernel perform essentially the same mathematical operations and the loop body code for each kernel is identical across all variants. Theremore » are a few kernels, such as those that contain reduction operations, that require CUDA-specific coding for their CUDA variants. ActuaJ computer instructions executed and how they run in parallel differs depending on the parallel programming model backend used and which optimizations are perfonned by the compiler used to build the Perfonnance Suite executable. The Suite will be used primarily by RAJA developers to perform regular assessments of RAJA performance across a range of hardware platforms and compilers as RAJA features are being developed. It will also be used by LLNL hardware and software vendor panners for new defining requirements for future computing platform procurements and acceptance testing. In particular, the RAJA Performance Suite will be used for compiler acceptance testing of the upcoming CORAUSierra machine {initial LLNL delivery expected in late-2017/early 2018) and the CORAL-2 procurement. The Suite will aJso be used to generate concise source code reproducers of compiler and runtime issues we uncover so that we may provide them to relevant vendors to be fixed.« less

  20. Parameters analysis of a porous medium model for treatment with hyperthermia using OpenMP

    NASA Astrophysics Data System (ADS)

    Freitas Reis, Ruy; dos Santos Loureiro, Felipe; Lobosco, Marcelo

    2015-09-01

    Cancer is the second cause of death in the world so treatments have been developed trying to work around this world health problem. Hyperthermia is not a new technique, but its use in cancer treatment is still at early stage of development. This treatment is based on overheat the target area to a threshold temperature that causes cancerous cell necrosis and apoptosis. To simulate this phenomenon using magnetic nanoparticles in an under skin cancer treatment, a three-dimensional porous medium model was adopted. This study presents a sensibility analysis of the model parameters such as the porosity and blood velocity. To ensure a second-order solution approach, a 7-points centered finite difference method was used for space discretization while a predictor-corrector method was used to time evolution. Due to the massive computations required to find the solution of a three-dimensional model, this paper also presents a first attempt to improve performance using OpenMP, a parallel programming API.

  1. GPU accelerated dynamic functional connectivity analysis for functional MRI data.

    PubMed

    Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu

    2015-07-01

    Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. How to Build MCNP 6.2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bull, Jeffrey S.

    This presentation describes how to build MCNP 6.2. MCNP®* 6.2 can be compiled on Macs, PCs, and most Linux systems. It can also be built for parallel execution using both OpenMP and Messing Passing Interface (MPI) methods. MCNP6 requires Fortran, C, and C++ compilers to build the code.

  3. Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

    NASA Technical Reports Server (NTRS)

    OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

    1998-01-01

    This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

  4. Power and Performance Trade-offs for Space Time Adaptive Processing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino

    Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less

  5. Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang; VanderWijngaart, Rob F.

    2003-01-01

    We describe a new suite of computational benchmarks that models applications featuring multiple levels of parallelism. Such parallelism is often available in realistic flow computations on systems of grids, but had not previously been captured in bench-marks. The new suite, named NPB Multi-Zone, is extended from the NAS Parallel Benchmarks suite, and involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy provides relatively easily exploitable coarse-grain parallelism between meshes. Three reference implementations are available: one serial, one hybrid using the Message Passing Interface (MPI) and OpenMP, and another hybrid using a shared memory multi-level programming model (SMP+OpenMP). We examine the effectiveness of hybrid parallelization paradigms in these implementations on three different parallel computers. We also use an empirical formula to investigate the performance characteristics of the multi-zone benchmarks.

  6. Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

    NASA Astrophysics Data System (ADS)

    Galiatsatos, P. G.; Tennyson, J.

    2012-11-01

    The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.

  7. The Process of Parallelizing the Conjunction Prediction Algorithm of ESA's SSA Conjunction Prediction Service Using GPGPU

    NASA Astrophysics Data System (ADS)

    Fehr, M.; Navarro, V.; Martin, L.; Fletcher, E.

    2013-08-01

    Space Situational Awareness[8] (SSA) is defined as the comprehensive knowledge, understanding and maintained awareness of the population of space objects, the space environment and existing threats and risks. As ESA's SSA Conjunction Prediction Service (CPS) requires the repetitive application of a processing algorithm against a data set of man-made space objects, it is crucial to exploit the highly parallelizable nature of this problem. Currently the CPS system makes use of OpenMP[7] for parallelization purposes using CPU threads, but only a GPU with its hundreds of cores can fully benefit from such high levels of parallelism. This paper presents the adaptation of several core algorithms[5] of the CPS for general-purpose computing on graphics processing units (GPGPU) using NVIDIAs Compute Unified Device Architecture (CUDA).

  8. A numerical differentiation library exploiting parallel architectures

    NASA Astrophysics Data System (ADS)

    Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

    2009-08-01

    We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.

  9. The MOLDY short-range molecular dynamics package

    NASA Astrophysics Data System (ADS)

    Ackland, G. J.; D'Mellow, K.; Daraszewicz, S. L.; Hepburn, D. J.; Uhrin, M.; Stratford, K.

    2011-12-01

    We describe a parallelised version of the MOLDY molecular dynamics program. This Fortran code is aimed at systems which may be described by short-range potentials and specifically those which may be addressed with the embedded atom method. This includes a wide range of transition metals and alloys. MOLDY provides a range of options in terms of the molecular dynamics ensemble used and the boundary conditions which may be applied. A number of standard potentials are provided, and the modular structure of the code allows new potentials to be added easily. The code is parallelised using OpenMP and can therefore be run on shared memory systems, including modern multicore processors. Particular attention is paid to the updates required in the main force loop, where synchronisation is often required in OpenMP implementations of molecular dynamics. We examine the performance of the parallel code in detail and give some examples of applications to realistic problems, including the dynamic compression of copper and carbon migration in an iron-carbon alloy. Program summaryProgram title: MOLDY Catalogue identifier: AEJU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 2 No. of lines in distributed program, including test data, etc.: 382 881 No. of bytes in distributed program, including test data, etc.: 6 705 242 Distribution format: tar.gz Programming language: Fortran 95/OpenMP Computer: Any Operating system: Any Has the code been vectorised or parallelized?: Yes. OpenMP is required for parallel execution RAM: 100 MB or more Classification: 7.7 Nature of problem: Moldy addresses the problem of many atoms (of order 10 6) interacting via a classical interatomic potential on a timescale of microseconds. It is designed for problems where statistics must be gathered over a number of equivalent runs, such as measuring thermodynamic properities, diffusion, radiation damage, fracture, twinning deformation, nucleation and growth of phase transitions, sputtering etc. In the vast majority of materials, the interactions are non-pairwise, and the code must be able to deal with many-body forces. Solution method: Molecular dynamics involves integrating Newton's equations of motion. MOLDY uses verlet (for good energy conservation) or predictor-corrector (for accurate trajectories) algorithms. It is parallelised using open MP. It also includes a static minimisation routine to find the lowest energy structure. Boundary conditions for surfaces, clusters, grain boundaries, thermostat (Nose), barostat (Parrinello-Rahman), and externally applied strain are provided. The initial configuration can be either a repeated unit cell or have all atoms given explictly. Initial velocities are generated internally, but it is also possible to specify the velocity of a particular atom. A wide range of interatomic force models are implemented, including embedded atom, Morse or Lennard-Jones. Thus the program is especially well suited to calculations of metals. Restrictions: The code is designed for short-ranged potentials, and there is no Ewald sum. Thus for long range interactions where all particles interact with all others, the order- N scaling will fail. Different interatomic potential forms require recompilation of the code. Additional comments: There is a set of associated open-source analysis software for postprocessing and visualisation. This includes local crystal structure recognition and identification of topological defects. Running time: A set of test modules for running time are provided. The code scales as order N. The parallelisation shows near-linear scaling with number of processors in a shared memory environment. A typical run of a few tens of nanometers for a few nanoseconds will run on a timescale of days on a multiprocessor desktop.

  10. Parallel heuristics for scalable community detection

    DOE PAGES

    Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth

    2015-08-14

    Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less

  11. A Locality-Based Threading Algorithm for the Configuration-Interaction Method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shan, Hongzhang; Williams, Samuel; Johnson, Calvin

    The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less

  12. A Locality-Based Threading Algorithm for the Configuration-Interaction Method

    DOE PAGES

    Shan, Hongzhang; Williams, Samuel; Johnson, Calvin; ...

    2017-07-03

    The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less

  13. From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    DOE PAGES

    Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...

    2013-01-01

    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less

  14. SToRM: A Model for 2D environmental hydraulics

    USGS Publications Warehouse

    Simões, Francisco J. M.

    2017-01-01

    A two-dimensional (depth-averaged) finite volume Godunov-type shallow water model developed for flow over complex topography is presented. The model, SToRM, is based on an unstructured cell-centered finite volume formulation and on nonlinear strong stability preserving Runge-Kutta time stepping schemes. The numerical discretization is founded on the classical and well established shallow water equations in hyperbolic conservative form, but the convective fluxes are calculated using auto-switching Riemann and diffusive numerical fluxes. Computational efficiency is achieved through a parallel implementation based on the OpenMP standard and the Fortran programming language. SToRM’s implementation within a graphical user interface is discussed. Field application of SToRM is illustrated by utilizing it to estimate peak flow discharges in a flooding event of the St. Vrain Creek in Colorado, U.S.A., in 2013, which reached 850 m3/s (~30,000 f3 /s) at the location of this study.

  15. LAMMPS strong scaling performance optimization on Blue Gene/Q

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coffman, Paul; Jiang, Wei; Romero, Nichols A.

    2014-11-12

    LAMMPS "Large-scale Atomic/Molecular Massively Parallel Simulator" is an open-source molecular dynamics package from Sandia National Laboratories. Significant performance improvements in strong-scaling and time-to-solution for this application on IBM's Blue Gene/Q have been achieved through computational optimizations of the OpenMP versions of the short-range Lennard-Jones term of the CHARMM force field and the long-range Coulombic interaction implemented with the PPPM (particle-particle-particle mesh) algorithm, enhanced by runtime parameter settings controlling thread utilization. Additionally, MPI communication performance improvements were made to the PPPM calculation by re-engineering the parallel 3D FFT to use MPICH collectives instead of point-to-point. Performance testing was done using anmore » 8.4-million atom simulation scaling up to 16 racks on the Mira system at Argonne Leadership Computing Facility (ALCF). Speedups resulting from this effort were in some cases over 2x.« less

  16. Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

    NASA Astrophysics Data System (ADS)

    Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

    2016-04-01

    We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.

  17. Data preprocessing for determining outer/inner parallelization in the nested loop problem using OpenMP

    NASA Astrophysics Data System (ADS)

    Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.

    2017-07-01

    Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.

  18. Performance of hybrid programming models for multiscale cardiac simulations: preparing for petascale computation.

    PubMed

    Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias

    2011-10-01

    Future multiscale and multiphysics models that support research into human disease, translational medical science, and treatment can utilize the power of high-performance computing (HPC) systems. We anticipate that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message-passing processes [e.g., the message-passing interface (MPI)] with multithreading (e.g., OpenMP, Pthreads). The objective of this study is to compare the performance of such hybrid programming models when applied to the simulation of a realistic physiological multiscale model of the heart. Our results show that the hybrid models perform favorably when compared to an implementation using only the MPI and, furthermore, that OpenMP in combination with the MPI provides a satisfactory compromise between performance and code complexity. Having the ability to use threads within MPI processes enables the sophisticated use of all processor cores for both computation and communication phases. Considering that HPC systems in 2012 will have two orders of magnitude more cores than what was used in this study, we believe that faster than real-time multiscale cardiac simulations can be achieved on these systems.

  19. Parallelization of GeoClaw code for modeling geophysical flows with adaptive mesh refinement on many-core systems

    USGS Publications Warehouse

    Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.

    2011-01-01

    We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.

  20. BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations

    PubMed Central

    Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul

    2016-01-01

    Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933

  1. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.

  2. Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scherrer, Chad; Halappanavar, Mahantesh; Tewari, Ambuj

    2012-07-03

    We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.

  3. spammpack, Version 2013-06-18

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-01-17

    This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler

  4. A multithreaded and GPU-optimized compact finite difference algorithm for turbulent mixing at high Schmidt number using petascale computing

    NASA Astrophysics Data System (ADS)

    Clay, M. P.; Yeung, P. K.; Buaria, D.; Gotoh, T.

    2017-11-01

    Turbulent mixing at high Schmidt number is a multiscale problem which places demanding requirements on direct numerical simulations to resolve fluctuations down the to Batchelor scale. We use a dual-grid, dual-scheme and dual-communicator approach where velocity and scalar fields are computed by separate groups of parallel processes, the latter using a combined compact finite difference (CCD) scheme on finer grid with a static 3-D domain decomposition free of the communication overhead of memory transposes. A high degree of scalability is achieved for a 81923 scalar field at Schmidt number 512 in turbulence with a modest inertial range, by overlapping communication with computation whenever possible. On the Cray XE6 partition of Blue Waters, use of a dedicated thread for communication combined with OpenMP locks and nested parallelism reduces CCD timings by 34% compared to an MPI baseline. The code has been further optimized for the 27-petaflops Cray XK7 machine Titan using GPUs as accelerators with the latest OpenMP 4.5 directives, giving 2.7X speedup compared to CPU-only execution at the largest problem size. Supported by NSF Grant ACI-1036170, the NCSA Blue Waters Project with subaward via UIUC, and a DOE INCITE allocation at ORNL.

  5. FY17 CSSE L2 Milestone Report: Analyzing Power Usage Characteristics of Workloads Running on Trinity.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pedretti, Kevin

    This report summarizes the work performed as part of a FY17 CSSE L2 milestone to in- vestigate the power usage behavior of ASC workloads running on the ATS-1 Trinity plat- form. Techniques were developed to instrument application code regions of interest using the Power API together with the Kokkos profiling interface and Caliper annotation library. Experiments were performed to understand the power usage behavior of mini-applications and the SNL/ATDM SPARC application running on ATS-1 Trinity Haswell and Knights Landing compute nodes. A taxonomy of power measurement approaches was identified and presented, providing a guide for application developers to follow. Controlledmore » scaling study experiments were performed on up to 2048 nodes of Trinity along with smaller scale ex- periments on Trinity testbed systems. Additionally, power and energy system monitoring information from Trinity was collected and archived for post analysis of "in-the-wild" work- loads. Results were analyzed to assess the sensitivity of the workloads to ATS-1 compute node type (Haswell vs. Knights Landing), CPU frequency control, node-level power capping control, OpenMP configuration, Knights Landing on-package memory configuration, and algorithm/solver configuration. Overall, this milestone lays groundwork for addressing the long-term goal of determining how to best use and operate future ASC platforms to achieve the greatest benefit subject to a constrained power budget.« less

  6. A Non-Equilibrium Sediment Transport Model for Coastal Inlets and Navigation Channels

    DTIC Science & Technology

    2011-01-01

    exchange of water , sediment, and nutrients between estuaries and the ocean. Because of the multiple interacting forces (waves, wind, tide, river...in parallel using OpenMP. The CMS takes advantage of the Surface- water Modeling System (SMS) interface for grid generation and model setup, as well...as for plotting and post- processing (Zundel, 2000). The circulation model in the CMS (called CMS-Flow) computes the unsteady water level and

  7. Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Doerfler, Douglas; Austin, Brian; Cook, Brandon

    There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less

  8. Composing Data Parallel Code for a SPARQL Graph Engine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste

    Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less

  9. Load Balancing Strategies for Multiphase Flows on Structured Grids

    NASA Astrophysics Data System (ADS)

    Olshefski, Kristopher; Owkes, Mark

    2017-11-01

    The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.

  10. Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray

    2015-12-01

    Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less

  11. Monte Carlo based investigation of berry phase for depth resolved characterization of biomedical scattering samples

    NASA Astrophysics Data System (ADS)

    Baba, J. S.; Koju, V.; John, D.

    2015-03-01

    The propagation of light in turbid media is an active area of research with relevance to numerous investigational fields, e.g., biomedical diagnostics and therapeutics. The statistical random-walk nature of photon propagation through turbid media is ideal for computational based modeling and simulation. Ready access to super computing resources provide a means for attaining brute force solutions to stochastic light-matter interactions entailing scattering by facilitating timely propagation of sufficient (>107) photons while tracking characteristic parameters based on the incorporated physics of the problem. One such model that works well for isotropic but fails for anisotropic scatter, which is the case for many biomedical sample scattering problems, is the diffusion approximation. In this report, we address this by utilizing Berry phase (BP) evolution as a means for capturing anisotropic scattering characteristics of samples in the preceding depth where the diffusion approximation fails. We extend the polarization sensitive Monte Carlo method of Ramella-Roman, et al., to include the computationally intensive tracking of photon trajectory in addition to polarization state at every scattering event. To speed-up the computations, which entail the appropriate rotations of reference frames, the code was parallelized using OpenMP. The results presented reveal that BP is strongly correlated to the photon penetration depth, thus potentiating the possibility of polarimetric depth resolved characterization of highly scattering samples, e.g., biological tissues.

  12. Monte Carlo based investigation of Berry phase for depth resolved characterization of biomedical scattering samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baba, Justin S; John, Dwayne O; Koju, Vijay

    The propagation of light in turbid media is an active area of research with relevance to numerous investigational fields, e.g., biomedical diagnostics and therapeutics. The statistical random-walk nature of photon propagation through turbid media is ideal for computational based modeling and simulation. Ready access to super computing resources provide a means for attaining brute force solutions to stochastic light-matter interactions entailing scattering by facilitating timely propagation of sufficient (>10million) photons while tracking characteristic parameters based on the incorporated physics of the problem. One such model that works well for isotropic but fails for anisotropic scatter, which is the case formore » many biomedical sample scattering problems, is the diffusion approximation. In this report, we address this by utilizing Berry phase (BP) evolution as a means for capturing anisotropic scattering characteristics of samples in the preceding depth where the diffusion approximation fails. We extend the polarization sensitive Monte Carlo method of Ramella-Roman, et al.,1 to include the computationally intensive tracking of photon trajectory in addition to polarization state at every scattering event. To speed-up the computations, which entail the appropriate rotations of reference frames, the code was parallelized using OpenMP. The results presented reveal that BP is strongly correlated to the photon penetration depth, thus potentiating the possibility of polarimetric depth resolved characterization of highly scattering samples, e.g., biological tissues.« less

  13. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    NASA Astrophysics Data System (ADS)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  14. Real-time computation of parameter fitting and image reconstruction using graphical processing units

    NASA Astrophysics Data System (ADS)

    Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

    2017-06-01

    In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.

  15. Checkpointing Shared Memory Programs at the Application-level

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Schulz, M; Szwed, P

    2004-09-08

    Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart(CPR)-the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. Our group is investigating the use of compiler technology to instrument codes to make them self-checkpointing and self-restarting, thereby providing an automatic solution to the problem of making long-running scientific applications resilient to hardware faults. Our previous work focusedmore » on message-passing programs. In this paper, we describe such a system for shared-memory programs running on symmetric multiprocessors. The system has two components: (i)a pre-compiler for source-to-source modification of applications, and (ii) a runtime system that implements a protocol for coordinating CPR among the threads of the parallel application. For the sake of concreteness, we focus on a non-trivial subset of OpenMP that includes barriers and locks. One of the advantages of this approach is that the ability to tolerate faults becomes embedded within the application itself, so applications become self-checkpointing and self-restarting on any platform. We demonstrate this by showing that our transformed benchmarks can checkpoint and restart on three different platforms (Windows/x86, Linux/x86, and Tru64/Alpha). Our experiments show that the overhead introduced by this approach is usually quite small; they also suggest ways in which the current implementation can be tuned to reduced overheads further.« less

  16. CCC7-119 Reactive Molecular Dynamics Simulations of Hot Spot Growth in Shocked Energetic Materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thompson, Aidan P.

    2015-03-01

    The purpose of this work is to understand how defects control initiation in energetic materials used in stockpile components; Sequoia gives us the core-count to run very large-scale simulations of up to 10 million atoms and; Using an OpenMP threaded implementation of the ReaxFF package in LAMMPS, we have been able to get good parallel efficiency running on 16k nodes of Sequoia, with 1 hardware thread per core.

  17. OpenMP Performance on the Columbia Supercomputer

    NASA Technical Reports Server (NTRS)

    Haoqiang, Jin; Hood, Robert

    2005-01-01

    This presentation discusses Columbia World Class Supercomputer which is one of the world's fastest supercomputers providing 61 TFLOPs (10/20/04). Conceived, designed, built, and deployed in just 120 days. A 20-node supercomputer built on proven 512-processor nodes. The largest SGI system in the world with over 10,000 Intel Itanium 2 processors and provides the largest node size incorporating commodity parts (512) and the largest shared-memory environment (2048) with 88% efficiency tops the scalar systems on the Top500 list.

  18. OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithms

    DTIC Science & Technology

    2016-05-01

    composed of hyper - spectral video sequences recording the release of chemical plumes at the Dugway Proving Ground. We use the 329 frames of the...video. Each frame is a hyper - spectral image with dimension 128 × 320 × 129, where 129 is the dimension of the channel of each pixel. The total number of...j=1 . Then we use the nested for- loop to calculate the values of WXY by the formula (1). We then put the corresponding value in an array which

  19. Lattice QCD simulations using the OpenACC platform

    NASA Astrophysics Data System (ADS)

    Majumdar, Pushan

    2016-10-01

    In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with OpenMP like directives. We present some examples of QCD simulation codes using OpenACC and discuss their performance on the Fermi and Kepler GPUs.

  20. Rambrain - a library for virtually extending physical memory

    NASA Astrophysics Data System (ADS)

    Imgrund, Maximilian; Arth, Alexander

    2017-08-01

    We introduce Rambrain, a user space library that manages memory consumption of your code. Using Rambrain you can overcommit memory over the size of physical memory present in the system. Rambrain takes care of temporarily swapping out data to disk and can handle multiples of the physical memory size present. Rambrain is thread-safe, OpenMP and MPI compatible and supports Asynchronous IO. The library was designed to require minimal changes to existing programs and to be easy to use.

  1. Hierarchical parallelisation of functional renormalisation group calculations - hp-fRG

    NASA Astrophysics Data System (ADS)

    Rohe, Daniel

    2016-10-01

    The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes.

  2. Final Report from The University of Texas at Austin for DEGAS: Dynamic Global Address Space programming environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erez, Mattan; Yelick, Katherine; Sarkar, Vivek

    The Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models and runtime systems to meet the challenges of Exascale computing. Our approach is to provide an efficient and scalable programming model that can be adapted to application needs through the use of dynamic runtime features and domain-specific languages for computational kernels. We address the following technical challenges: Programmability: Rich set of programming constructs based on a Hierarchical Partitioned Global Address Space (HPGAS) model, demonstrated in UPC++. Scalability: Hierarchical locality control, lightweight communication (extended GASNet), and ef- ficient synchronization mechanisms (Phasers). Performance Portability:more » Just-in-time specialization (SEJITS) for generating hardware-specific code and scheduling libraries for domain-specific adaptive runtimes (Habanero). Energy Efficiency: Communication-optimal code generation to optimize energy efficiency by re- ducing data movement. Resilience: Containment Domains for flexible, domain-specific resilience, using state capture mechanisms and lightweight, asynchronous recovery mechanisms. Interoperability: Runtime and language interoperability with MPI and OpenMP to encourage broad adoption.« less

  3. Development of full wave code for modeling RF fields in hot non-uniform plasmas

    NASA Astrophysics Data System (ADS)

    Zhao, Liangji; Svidzinski, Vladimir; Spencer, Andrew; Kim, Jin-Soo

    2016-10-01

    FAR-TECH, Inc. is developing a full wave RF modeling code to model RF fields in fusion devices and in general plasma applications. As an important component of the code, an adaptive meshless technique is introduced to solve the wave equations, which allows resolving plasma resonances efficiently and adapting to the complexity of antenna geometry and device boundary. The computational points are generated using either a point elimination method or a force balancing method based on the monitor function, which is calculated by solving the cold plasma dispersion equation locally. Another part of the code is the conductivity kernel calculation, used for modeling the nonlocal hot plasma dielectric response. The conductivity kernel is calculated on a coarse grid of test points and then interpolated linearly onto the computational points. All the components of the code are parallelized using MPI and OpenMP libraries to optimize the execution speed and memory. The algorithm and the results of our numerical approach to solving 2-D wave equations in a tokamak geometry will be presented. Work is supported by the U.S. DOE SBIR program.

  4. Livermore Compiler Analysis Loop Suite

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, R. D.

    2013-03-01

    LCALS is designed to evaluate compiler optimizations and performance of a variety of loop kernels and loop traversal software constructs. Some of the loop kernels are pulled directly from "Livermore Loops Coded in C", developed at LLNL (see item 11 below for details of earlier code versions). The older suites were used to evaluate floating-point performances of hardware platforms prior to porting larger application codes. The LCALS suite is geared toward assissing C++ compiler optimizations and platform performance related to SIMD vectorization, OpenMP threading, and advanced C++ language features. LCALS contains 20 of 24 loop kernels from the older Livermoremore » Loop suites, plus various others representative of loops found in current production appkication codes at LLNL. The latter loops emphasize more diverse loop constructs and data access patterns than the others, such as multi-dimensional difference stencils. The loops are included in a configurable framework, which allows control of compilation, loop sampling for execution timing, which loops are run and their lengths. It generates timing statistics for analysis and comparing variants of individual loops. Also, it is easy to add loops to the suite as desired.« less

  5. Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

    2017-07-01

    Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

  6. MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Taft, James R.

    1999-01-01

    Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.

  7. The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER [Book Chapter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less

  8. A hybrid parallel framework for the cellular Potts model simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less

  9. Parallelizing a peanut butter sandwich

    NASA Astrophysics Data System (ADS)

    Quenette, S. M.

    2005-12-01

    This poster aims to demonstrate, in a novel way, why contemporary computational code development is seemingly hard to a geodynamics modeler (i.e. a non-computer-scientist). For example, to utilise comtemporary computer hardware, parallelisation is required. But why do we chose the explicit approach (MPI) over an implicit (OpenMP) one? How does this relate to the typical geodynamics codes. And do we face this same style of problems in every day life? We aim to demonstrate that the little bit of complexity, fore-thought and effort is worth its while.

  10. A portable approach for PIC on emerging architectures

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor

    2016-03-01

    A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.

  11. Use of general purpose graphics processing units with MODFLOW

    USGS Publications Warehouse

    Hughes, Joseph D.; White, Jeremy T.

    2013-01-01

    To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.

  12. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  13. Towards Highly Scalable Ab Initio Molecular Dynamics (AIMD) Simulations on the Intel Knights Landing Manycore Processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacquelin, Mathias; De Jong, Wibe A.; Bylaska, Eric J.

    2017-07-03

    The Ab Initio Molecular Dynamics (AIMD) method allows scientists to treat the dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. This extremely important method has tremendous computational requirements, because the electronic Schr¨odinger equation, approximated using Kohn-Sham Density Functional Theory (DFT), is solved at every time step. With the advent of manycore architectures, application developers have a significant amount of processing power within each compute node that can only be exploited through massive parallelism. A compute intensive application such as AIMD forms a good candidate to leverage this processing power. In this paper, wemore » focus on adding thread level parallelism to the plane wave DFT methodology implemented in NWChem. Through a careful optimization of tall-skinny matrix products, which are at the heart of the Lagrange multiplier and nonlocal pseudopotential kernels, as well as 3D FFTs, our OpenMP implementation delivers excellent strong scaling on the latest Intel Knights Landing (KNL) processor. We assess the efficiency of our Lagrange multiplier kernels by building a Roofline model of the platform, and verify that our implementation is close to the roofline for various problem sizes. Finally, we present strong scaling results on the complete AIMD simulation for a 64 water molecules test case, that scales up to all 68 cores of the Knights Landing processor.« less

  14. Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms

    DOE PAGES

    Ibrahim, Khaled Z.; Madduri, Kamesh; Williams, Samuel; ...

    2013-07-18

    The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This paper presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Finally, our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.

  15. Hybrid multicore/vectorisation technique applied to the elastic wave equation on a staggered grid

    NASA Astrophysics Data System (ADS)

    Titarenko, Sofya; Hildyard, Mark

    2017-07-01

    In modern physics it has become common to find the solution of a problem by solving numerically a set of PDEs. Whether solving them on a finite difference grid or by a finite element approach, the main calculations are often applied to a stencil structure. In the last decade it has become usual to work with so called big data problems where calculations are very heavy and accelerators and modern architectures are widely used. Although CPU and GPU clusters are often used to solve such problems, parallelisation of any calculation ideally starts from a single processor optimisation. Unfortunately, it is impossible to vectorise a stencil structured loop with high level instructions. In this paper we suggest a new approach to rearranging the data structure which makes it possible to apply high level vectorisation instructions to a stencil loop and which results in significant acceleration. The suggested method allows further acceleration if shared memory APIs are used. We show the effectiveness of the method by applying it to an elastic wave propagation problem on a finite difference grid. We have chosen Intel architecture for the test problem and OpenMP (Open Multi-Processing) since they are extensively used in many applications.

  16. PREMER: a Tool to Infer Biological Networks.

    PubMed

    Villaverde, Alejandro F; Becker, Kolja; Banga, Julio R

    2017-10-04

    Inferring the structure of unknown cellular networks is a main challenge in computational biology. Data-driven approaches based on information theory can determine the existence of interactions among network nodes automatically. However, the elucidation of certain features - such as distinguishing between direct and indirect interactions or determining the direction of a causal link - requires estimating information-theoretic quantities in a multidimensional space. This can be a computationally demanding task, which acts as a bottleneck for the application of elaborate algorithms to large-scale network inference problems. The computational cost of such calculations can be alleviated by the use of compiled programs and parallelization. To this end we have developed PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a software toolbox that can run in parallel and sequential environments. It uses information theoretic criteria to recover network topology and determine the strength and causality of interactions, and allows incorporating prior knowledge, imputing missing data, and correcting outliers. PREMER is a free, open source software tool that does not require any commercial software. Its core algorithms are programmed in FORTRAN 90 and implement OpenMP directives. It has user interfaces in Python and MATLAB/Octave, and runs on Windows, Linux and OSX (https://sites.google.com/site/premertoolbox/).

  17. Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes

    PubMed Central

    2017-01-01

    To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation. PMID:28582389

  18. Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes.

    PubMed

    Einkemmer, Lukas

    2017-01-01

    To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation.

  19. Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

    NASA Astrophysics Data System (ADS)

    Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

    2016-12-01

    The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.

  20. Acceleration of Semiempirical QM/MM Methods through Message Passage Interface (MPI), Hybrid MPI/Open Multiprocessing, and Self-Consistent Field Accelerator Implementations.

    PubMed

    Ojeda-May, Pedro; Nam, Kwangho

    2017-08-08

    The strategy and implementation of scalable and efficient semiempirical (SE) QM/MM methods in CHARMM are described. The serial version of the code was first profiled to identify routines that required parallelization. Afterward, the code was parallelized and accelerated with three approaches. The first approach was the parallelization of the entire QM/MM routines, including the Fock matrix diagonalization routines, using the CHARMM message passage interface (MPI) machinery. In the second approach, two different self-consistent field (SCF) energy convergence accelerators were implemented using density and Fock matrices as targets for their extrapolations in the SCF procedure. In the third approach, the entire QM/MM and MM energy routines were accelerated by implementing the hybrid MPI/open multiprocessing (OpenMP) model in which both the task- and loop-level parallelization strategies were adopted to balance loads between different OpenMP threads. The present implementation was tested on two solvated enzyme systems (including <100 QM atoms) and an S N 2 symmetric reaction in water. The MPI version exceeded existing SE QM methods in CHARMM, which include the SCC-DFTB and SQUANTUM methods, by at least 4-fold. The use of SCF convergence accelerators further accelerated the code by ∼12-35% depending on the size of the QM region and the number of CPU cores used. Although the MPI version displayed good scalability, the performance was diminished for large numbers of MPI processes due to the overhead associated with MPI communications between nodes. This issue was partially overcome by the hybrid MPI/OpenMP approach which displayed a better scalability for a larger number of CPU cores (up to 64 CPUs in the tested systems).

  1. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.

    PubMed

    Ruymgaart, A Peter; Elber, Ron

    2012-11-13

    We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).

  2. A Wideband Fast Multipole Method for the two-dimensional complex Helmholtz equation

    NASA Astrophysics Data System (ADS)

    Cho, Min Hyung; Cai, Wei

    2010-12-01

    A Wideband Fast Multipole Method (FMM) for the 2D Helmholtz equation is presented. It can evaluate the interactions between N particles governed by the fundamental solution of 2D complex Helmholtz equation in a fast manner for a wide range of complex wave number k, which was not easy with the original FMM due to the instability of the diagonalized conversion operator. This paper includes the description of theoretical backgrounds, the FMM algorithm, software structures, and some test runs. Program summaryProgram title: 2D-WFMM Catalogue identifier: AEHI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4636 No. of bytes in distributed program, including test data, etc.: 82 582 Distribution format: tar.gz Programming language: C Computer: Any Operating system: Any operating system with gcc version 4.2 or newer Has the code been vectorized or parallelized?: Multi-core processors with shared memory RAM: Depending on the number of particles N and the wave number k Classification: 4.8, 4.12 External routines: OpenMP ( http://openmp.org/wp/) Nature of problem: Evaluate interaction between N particles governed by the fundamental solution of 2D Helmholtz equation with complex k. Solution method: Multilevel Fast Multipole Algorithm in a hierarchical quad-tree structure with cutoff level which combines low frequency method and high frequency method. Running time: Depending on the number of particles N, wave number k, and number of cores in CPU. CPU time increases as N log N.

  3. Heterogeneous Hardware Parallelism Review of the IN2P3 2016 Computing School

    NASA Astrophysics Data System (ADS)

    Lafage, Vincent

    2017-11-01

    Parallel and hybrid Monte Carlo computation. The Monte Carlo method is the main workhorse for computation of particle physics observables. This paper provides an overview of various HPC technologies that can be used today: multicore (OpenMP, HPX), manycore (OpenCL). The rewrite of a twenty years old Fortran 77 Monte Carlo will illustrate the various programming paradigms in use beyond language implementation. The problem of parallel random number generator will be addressed. We will give a short report of the one week school dedicated to these recent approaches, that took place in École Polytechnique in May 2016.

  4. Parallel implementation of approximate atomistic models of the AMOEBA polarizable model

    NASA Astrophysics Data System (ADS)

    Demerdash, Omar; Head-Gordon, Teresa

    2016-11-01

    In this work we present a replicated data hybrid OpenMP/MPI implementation of a hierarchical progression of approximate classical polarizable models that yields speedups of up to ∼10 compared to the standard OpenMP implementation of the exact parent AMOEBA polarizable model. In addition, our parallel implementation exhibits reasonable weak and strong scaling. The resulting parallel software will prove useful for those who are interested in how molecular properties converge in the condensed phase with respect to the MBE, it provides a fruitful test bed for exploring different electrostatic embedding schemes, and offers an interesting possibility for future exascale computing paradigms.

  5. 3D-PDR: Three-dimensional photodissociation region code

    NASA Astrophysics Data System (ADS)

    Bisbas, T. G.; Bell, T. A.; Viti, S.; Yates, J.; Barlow, M. J.

    2018-03-01

    3D-PDR is a three-dimensional photodissociation region code written in Fortran. It uses the Sundials package (written in C) to solve the set of ordinary differential equations and it is the successor of the one-dimensional PDR code UCL_PDR (ascl:1303.004). Using the HEALpix ray-tracing scheme (ascl:1107.018), 3D-PDR solves a three-dimensional escape probability routine and evaluates the attenuation of the far-ultraviolet radiation in the PDR and the propagation of FIR/submm emission lines out of the PDR. The code is parallelized (OpenMP) and can be applied to 1D and 3D problems.

  6. XaNSoNS: GPU-accelerated simulator of diffraction patterns of nanoparticles

    NASA Astrophysics Data System (ADS)

    Neverov, V. S.

    XaNSoNS is an open source software with GPU support, which simulates X-ray and neutron 1D (or 2D) diffraction patterns and pair-distribution functions (PDF) for amorphous or crystalline nanoparticles (up to ∼107 atoms) of heterogeneous structural content. Among the multiple parameters of the structure the user may specify atomic displacements, site occupancies, molecular displacements and molecular rotations. The software uses general equations nonspecific to crystalline structures to calculate the scattering intensity. It supports four major standards of parallel computing: MPI, OpenMP, Nvidia CUDA and OpenCL, enabling it to run on various architectures, from CPU-based HPCs to consumer-level GPUs.

  7. Compiled MPI: Cost-Effective Exascale Applications Development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Quinlan, D; Lumsdaine, A

    2012-04-10

    The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less

  8. Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations

    PubMed Central

    Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.

    2016-01-01

    Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094

  9. Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.

    PubMed

    Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W

    2016-01-01

    Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.

  10. Cellular automata-based modelling and simulation of biofilm structure on multi-core computers.

    PubMed

    Skoneczny, Szymon

    2015-01-01

    The article presents a mathematical model of biofilm growth for aerobic biodegradation of a toxic carbonaceous substrate. Modelling of biofilm growth has fundamental significance in numerous processes of biotechnology and mathematical modelling of bioreactors. The process following double-substrate kinetics with substrate inhibition proceeding in a biofilm has not been modelled so far by means of cellular automata. Each process in the model proposed, i.e. diffusion of substrates, uptake of substrates, growth and decay of microorganisms and biofilm detachment, is simulated in a discrete manner. It was shown that for flat biofilm of constant thickness, the results of the presented model agree with those of a continuous model. The primary outcome of the study was to propose a mathematical model of biofilm growth; however a considerable amount of focus was also placed on the development of efficient algorithms for its solution. Two parallel algorithms were created, differing in the way computations are distributed. Computer programs were created using OpenMP Application Programming Interface for C++ programming language. Simulations of biofilm growth were performed on three high-performance computers. Speed-up coefficients of computer programs were compared. Both algorithms enabled a significant reduction of computation time. It is important, inter alia, in modelling and simulation of bioreactor dynamics.

  11. Pareto Joint Inversion of Love and Quasi Rayleigh's waves - synthetic study

    NASA Astrophysics Data System (ADS)

    Bogacz, Adrian; Dalton, David; Danek, Tomasz; Miernik, Katarzyna; Slawinski, Michael A.

    2017-04-01

    In this contribution the specific application of Pareto joint inversion in solving geophysical problem is presented. Pareto criterion combine with Particle Swarm Optimization were used to solve geophysical inverse problems for Love and Quasi Rayleigh's waves. Basic theory of forward problem calculation for chosen surface waves is described. To avoid computational problems some simplification were made. This operation allowed foster and more straightforward calculation without lost of solution generality. According to the solving scheme restrictions, considered model must have exact two layers, elastic isotropic surface layer and elastic isotropic half space with infinite thickness. The aim of the inversion is to obain elastic parameters and model geometry using dispersion data. In calculations different case were considered, such as different number of modes for different wave types and different frequencies. Created solutions are using OpenMP standard for parallel computing, which help in reduction of computational times. The results of experimental computations are presented and commented. This research was performed in the context of The Geomechanics Project supported by Husky Energy. Also, this research was partially supported by the Natural Sciences and Engineering Research Council of Canada, grant 238416-2013, and by the Polish National Science Center under contract No. DEC-2013/11/B/ST10/0472.

  12. MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    NASA Astrophysics Data System (ADS)

    DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

    2018-03-01

    With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

  13. Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

    PubMed

    Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

    2014-11-11

    We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.

  14. Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

    PubMed

    Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

  15. OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

    NASA Astrophysics Data System (ADS)

    Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

    2017-11-01

    We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary solutions, respectively. The present OpenMP programs are designed for computers with multi-core processors and optimized for compiling with both commercially-licensed Intel Fortran and popular free open-source GNU Fortran compiler. The programs are easy to use and are elaborated with helpful comments for the users. All input parameters are listed at the beginning of each program. Different output files provide physical quantities such as energy, chemical potential, root-mean-square sizes, densities, etc. We also present speedup test results for new versions of the programs. Program files doi:http://dx.doi.org/10.17632/y8zk3jgn84.2 Licensing provisions: Apache License 2.0 Programming language: OpenMP GNU and Intel Fortran 90. Computer: Any multi-core personal computer or workstation with the appropriate OpenMP-capable Fortran compiler installed. Number of processors used: All available CPU cores on the executing computer. Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 1888; ibid.204 (2016) 209. Does the new version supersede the previous version?: Not completely. It does supersede previous Fortran programs from both references above, but not OpenMP C programs from Comput. Phys. Commun. 204 (2016) 209. Nature of problem: The present Open Multi-Processing (OpenMP) Fortran programs, optimized for use with commercially-licensed Intel Fortran and free open-source GNU Fortran compilers, solve the time-dependent nonlinear partial differential (GP) equation for a trapped Bose-Einstein condensate in one (1d), two (2d), and three (3d) spatial dimensions for six different trap symmetries: axially and radially symmetric traps in 3d, circularly symmetric traps in 2d, fully isotropic (spherically symmetric) and fully anisotropic traps in 2d and 3d, as well as 1d traps, where no spatial symmetry is considered. Solution method: We employ the split-step Crank-Nicolson algorithm to discretize the time-dependent GP equation in space and time. The discretized equation is then solved by imaginary- or real-time propagation, employing adequately small space and time steps, to yield the solution of stationary and non-stationary problems, respectively. Reasons for the new version: Previously published Fortran programs [1,2] have now become popular tools [3] for solving the GP equation. These programs have been translated to the C programming language [4] and later extended to the more complex scenario of dipolar atoms [5]. Now virtually all computers have multi-core processors and some have motherboards with more than one physical computer processing unit (CPU), which may increase the number of available CPU cores on a single computer to several tens. The C programs have been adopted to be very fast on such multi-core modern computers using general-purpose graphic processing units (GPGPU) with Nvidia CUDA and computer clusters using Message Passing Interface (MPI) [6]. Nevertheless, previously developed Fortran programs are also commonly used for scientific computation and most of them use a single CPU core at a time in modern multi-core laptops, desktops, and workstations. Unless the Fortran programs are made aware and capable of making efficient use of the available CPU cores, the solution of even a realistic dynamical 1d problem, not to mention the more complicated 2d and 3d problems, could be time consuming using the Fortran programs. Previously, we published auto-parallel Fortran programs [2] suitable for Intel (but not GNU) compiler for solving the GP equation. Hence, a need for the full OpenMP version of the Fortran programs to reduce the execution time cannot be overemphasized. To address this issue, we provide here such OpenMP Fortran programs, optimized for both Intel and GNU Fortran compilers and capable of using all available CPU cores, which can significantly reduce the execution time. Summary of revisions: Previous Fortran programs [1] for solving the time-dependent GP equation in 1d, 2d, and 3d with different trap symmetries have been parallelized using the OpenMP interface to reduce the execution time on multi-core processors. There are six different trap symmetries considered, resulting in six programs for imaginary-time propagation and six for real-time propagation, totaling to 12 programs included in BEC-GP-OMP-FOR software package. All input data (number of atoms, scattering length, harmonic oscillator trap length, trap anisotropy, etc.) are conveniently placed at the beginning of each program, as before [2]. Present programs introduce a new input parameter, which is designated by Number_of_Threads and defines the number of CPU cores of the processor to be used in the calculation. If one sets the value 0 for this parameter, all available CPU cores will be used. For the most efficient calculation it is advisable to leave one CPU core unused for the background system's jobs. For example, on a machine with 20 CPU cores such that we used for testing, it is advisable to use up to 19 CPU cores. However, the total number of used CPU cores can be divided into more than one job. For instance, one can run three simulations simultaneously using 10, 4, and 5 CPU cores, respectively, thus totaling to 19 used CPU cores on a 20-core computer. The Fortran source programs are located in the directory src, and can be compiled by the make command using the makefile in the root directory BEC-GP-OMP-FOR of the software package. The examples of produced output files can be found in the directory output, although some large density files are omitted, to save space. The programs calculate the values of actually used dimensionless nonlinearities from the physical input parameters, where the input parameters correspond to the identical nonlinearity values as in the previously published programs [1], so that the output files of the old and new programs can be directly compared. The output files are conveniently named such that their contents can be easily identified, following the naming convention introduced in Ref. [2]. For example, a file named -out.txt, where is a name of the individual program, represents the general output file containing input data, time and space steps, nonlinearity, energy and chemical potential, and was named fort.7 in the old Fortran version of programs [1]. A file named -den.txt is the output file with the condensate density, which had the names fort.3 and fort.4 in the old Fortran version [1] for imaginary- and real-time propagation programs, respectively. Other possible density outputs, such as the initial density, are commented out in the programs to have a simpler set of output files, but users can uncomment and re-enable them, if needed. In addition, there are output files for reduced (integrated) 1d and 2d densities for different programs. In the real-time programs there is also an output file reporting the dynamics of evolution of root-mean-square sizes after a perturbation is introduced. The supplied real-time programs solve the stationary GP equation, and then calculate the dynamics. As the imaginary-time programs are more accurate than the real-time programs for the solution of a stationary problem, one can first solve the stationary problem using the imaginary-time programs, adapt the real-time programs to read the pre-calculated wave function and then study the dynamics. In that case the parameter NSTP in the real-time programs should be set to zero and the space mesh and nonlinearity parameters should be identical in both programs. The reader is advised to consult our previous publication where a complete description of the output files is given [2]. A readme.txt file, included in the root directory, explains the procedure to compile and run the programs. We tested our programs on a workstation with two 10-core Intel Xeon E5-2650 v3 CPUs. The parameters used for testing are given in sample input files, provided in the corresponding directory together with the programs. In Table 1 we present wall-clock execution times for runs on 1, 6, and 19 CPU cores for programs compiled using Intel and GNU Fortran compilers. The corresponding columns "Intel speedup" and "GNU speedup" give the ratio of wall-clock execution times of runs on 1 and 19 CPU cores, and denote the actual measured speedup for 19 CPU cores. In all cases and for all numbers of CPU cores, although the GNU Fortran compiler gives excellent results, the Intel Fortran compiler turns out to be slightly faster. Note that during these tests we always ran only a single simulation on a workstation at a time, to avoid any possible interference issues. Therefore, the obtained wall-clock times are more reliable than the ones that could be measured with two or more jobs running simultaneously. We also studied the speedup of the programs as a function of the number of CPU cores used. The performance of the Intel and GNU Fortran compilers is illustrated in Fig. 1, where we plot the speedup and actual wall-clock times as functions of the number of CPU cores for 2d and 3d programs. We see that the speedup increases monotonically with the number of CPU cores in all cases and has large values (between 10 and 14 for 3d programs) for the maximal number of cores. This fully justifies the development of OpenMP programs, which enable much faster and more efficient solving of the GP equation. However, a slow saturation in the speedup with the further increase in the number of CPU cores is observed in all cases, as expected. The speedup tends to increase for programs in higher dimensions, as they become more complex and have to process more data. This is why the speedups of the supplied 2d and 3d programs are larger than those of 1d programs. Also, for a single program the speedup increases with the size of the spatial grid, i.e., with the number of spatial discretization points, since this increases the amount of calculations performed by the program. To demonstrate this, we tested the supplied real2d-th program and varied the number of spatial discretization points NX=NY from 20 to 1000. The measured speedup obtained when running this program on 19 CPU cores as a function of the number of discretization points is shown in Fig. 2. The speedup first increases rapidly with the number of discretization points and eventually saturates. Additional comments: Example inputs provided with the programs take less than 30 minutes to run on a workstation with two Intel Xeon E5-2650 v3 processors (2 QPI links, 10 CPU cores, 25 MB cache, 2.3 GHz).

  16. Fast 2D FWI on a multi and many-cores workstation.

    NASA Astrophysics Data System (ADS)

    Thierry, Philippe; Donno, Daniela; Noble, Mark

    2014-05-01

    Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.

  17. Exact diagonalization of quantum lattice models on coprocessors

    NASA Astrophysics Data System (ADS)

    Siro, T.; Harju, A.

    2016-10-01

    We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.

  18. Fast 3D Surface Extraction 2 pages (including abstract)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sewell, Christopher Meyer; Patchett, John M.; Ahrens, James P.

    Ocean scientists searching for isosurfaces and/or thresholds of interest in high resolution 3D datasets required a tedious and time-consuming interactive exploration experience. PISTON research and development activities are enabling ocean scientists to rapidly and interactively explore isosurfaces and thresholds in their large data sets using a simple slider with real time calculation and visualization of these features. Ocean Scientists can now visualize more features in less time, helping them gain a better understanding of the high resolution data sets they work with on a daily basis. Isosurface timings (512{sup 3} grid): VTK 7.7 s, Parallel VTK (48-core) 1.3 s, PISTONmore » OpenMP (48-core) 0.2 s, PISTON CUDA (Quadro 6000) 0.1 s.« less

  19. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    NASA Technical Reports Server (NTRS)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  20. Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack

    In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less

  1. Parallelization of TWOPORFLOW, a Cartesian Grid based Two-phase Porous Media Code for Transient Thermo-hydraulic Simulations

    NASA Astrophysics Data System (ADS)

    Trost, Nico; Jiménez, Javier; Imke, Uwe; Sanchez, Victor

    2014-06-01

    TWOPORFLOW is a thermo-hydraulic code based on a porous media approach to simulate single- and two-phase flow including boiling. It is under development at the Institute for Neutron Physics and Reactor Technology (INR) at KIT. The code features a 3D transient solution of the mass, momentum and energy conservation equations for two inter-penetrating fluids with a semi-implicit continuous Eulerian type solver. The application domain of TWOPORFLOW includes the flow in standard porous media and in structured porous media such as micro-channels and cores of nuclear power plants. In the latter case, the fluid domain is coupled to a fuel rod model, describing the heat flow inside the solid structure. In this work, detailed profiling tools have been utilized to determine the optimization potential of TWOPORFLOW. As a result, bottle-necks were identified and reduced in the most feasible way, leading for instance to an optimization of the water-steam property computation. Furthermore, an OpenMP implementation addressing the routines in charge of inter-phase momentum-, energy- and mass-coupling delivered good performance together with a high scalability on shared memory architectures. In contrast to that, the approach for distributed memory systems was to solve sub-problems resulting by the decomposition of the initial Cartesian geometry. Thread communication for the sub-problem boundary updates was accomplished by the Message Passing Interface (MPI) standard.

  2. A smooth particle hydrodynamics code to model collisions between solid, self-gravitating objects

    NASA Astrophysics Data System (ADS)

    Schäfer, C.; Riecker, S.; Maindl, T. I.; Speith, R.; Scherrer, S.; Kley, W.

    2016-05-01

    Context. Modern graphics processing units (GPUs) lead to a major increase in the performance of the computation of astrophysical simulations. Owing to the different nature of GPU architecture compared to traditional central processing units (CPUs) such as x86 architecture, existing numerical codes cannot be easily migrated to run on GPU. Here, we present a new implementation of the numerical method smooth particle hydrodynamics (SPH) using CUDA and the first astrophysical application of the new code: the collision between Ceres-sized objects. Aims: The new code allows for a tremendous increase in speed of astrophysical simulations with SPH and self-gravity at low costs for new hardware. Methods: We have implemented the SPH equations to model gas, liquids and elastic, and plastic solid bodies and added a fragmentation model for brittle materials. Self-gravity may be optionally included in the simulations and is treated by the use of a Barnes-Hut tree. Results: We find an impressive performance gain using NVIDIA consumer devices compared to our existing OpenMP code. The new code is freely available to the community upon request. If you are interested in our CUDA SPH code miluphCUDA, please write an email to Christoph Schäfer. miluphCUDA is the CUDA port of miluph. miluph is pronounced [maßl2v]. We do not support the use of the code for military purposes.

  3. BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

    PubMed

    Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A

    2012-01-01

    Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

  4. BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

    PubMed Central

    Ayres, Daniel L.; Darling, Aaron; Zwickl, Derrick J.; Beerli, Peter; Holder, Mark T.; Lewis, Paul O.; Huelsenbeck, John P.; Ronquist, Fredrik; Swofford, David L.; Cummings, Michael P.; Rambaut, Andrew; Suchard, Marc A.

    2012-01-01

    Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. PMID:21963610

  5. GAMERA - The New Magnetospheric Code

    NASA Astrophysics Data System (ADS)

    Lyon, J.; Sorathia, K.; Zhang, B.; Merkin, V. G.; Wiltberger, M. J.; Daldorff, L. K. S.

    2017-12-01

    The Lyon-Fedder-Mobarry (LFM) code has been a main-line magnetospheric simulation code for 30 years. The code base, designed in the age of memory to memory vector ma- chines,is still in wide use for science production but needs upgrading to ensure the long term sustainability. In this presentation, we will discuss our recent efforts to update and improve that code base and also highlight some recent results. The new project GAM- ERA, Grid Agnostic MHD for Extended Research Applications, has kept the original design characteristics of the LFM and made significant improvements. The original de- sign included high order numerical differencing with very aggressive limiting, the ability to use arbitrary, but logically rectangular, grids, and maintenance of div B = 0 through the use of the Yee grid. Significant improvements include high-order upwinding and a non-clipping limiter. One other improvement with wider applicability is an im- proved averaging technique for the singularities in polar and spherical grids. The new code adopts a hybrid structure - multi-threaded OpenMP with an overarching MPI layer for large scale and coupled applications. The MPI layer uses a combination of standard MPI and the Global Array Toolkit from PNL to provide a lightweight mechanism for coupling codes together concurrently. The single processor code is highly efficient and can run magnetospheric simulations at the default CCMC resolution faster than real time on a MacBook pro. We have run the new code through the Athena suite of tests, and the results compare favorably with the codes available to the astrophysics community. LFM/GAMERA has been applied to many different situations ranging from the inner and outer heliosphere and magnetospheres of Venus, the Earth, Jupiter and Saturn. We present example results the Earth's magnetosphere including a coupled ring current (RCM), the magnetospheres of Jupiter and Saturn, and the inner heliosphere.

  6. Multi-core and GPU accelerated simulation of a radial star target imaged with equivalent t-number circular and Gaussian pupils

    NASA Astrophysics Data System (ADS)

    Greynolds, Alan W.

    2013-09-01

    Results from the GelOE optical engineering software are presented for the through-focus, monochromatic coherent and polychromatic incoherent imaging of a radial "star" target for equivalent t-number circular and Gaussian pupils. The FFT-based simulations are carried out using OpenMP threading on a multi-core desktop computer, with and without the aid of a many-core NVIDIA GPU accessing its cuFFT library. It is found that a custom FFT optimized for the 12-core host has similar performance to a simply implemented 256-core GPU FFT. A more sophisticated version of the latter but tuned to reduce overhead on a 448-core GPU is 20 to 28 times faster than a basic FFT implementation running on one CPU core.

  7. OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

    NASA Astrophysics Data System (ADS)

    Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

    OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

  8. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    NASA Astrophysics Data System (ADS)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  9. Parallel fast multipole boundary element method applied to computational homogenization

    NASA Astrophysics Data System (ADS)

    Ptaszny, Jacek

    2018-01-01

    In the present work, a fast multipole boundary element method (FMBEM) and a parallel computer code for 3D elasticity problem is developed and applied to the computational homogenization of a solid containing spherical voids. The system of equation is solved by using the GMRES iterative solver. The boundary of the body is dicretized by using the quadrilateral serendipity elements with an adaptive numerical integration. Operations related to a single GMRES iteration, performed by traversing the corresponding tree structure upwards and downwards, are parallelized by using the OpenMP standard. The assignment of tasks to threads is based on the assumption that the tree nodes at which the moment transformations are initialized can be partitioned into disjoint sets of equal or approximately equal size and assigned to the threads. The achieved speedup as a function of number of threads is examined.

  10. pFUnit 3.0 Tutorial Advanced

    NASA Technical Reports Server (NTRS)

    Clune, Tom

    2014-01-01

    This tutorial will introduce Fortran developers to unit-testing and test-driven development (TDD) using pFUnit. As with other unit-testing frameworks, pFUnit, simplifies the process of writing, collecting, and executing tests while providing clear diagnostic messages for failing tests. pFUnit specifically targets the development of scientific-technical software written in Fortran and includes customized features such as: assertions for multi-dimensional arrays, distributed (MPI) and thread-based (OpenMP) parallellism, and flexible parameterized tests.These sessions will include numerous examples and hands-on exercises that gradually build in complexity. Attendees are expected to have working knowledge of F90, but familiarity with object-oriented syntax in F2003 and MPI will be of benefit for the more advanced examples. By the end of the tutorial the audience should feel comfortable in applying pFUnit within their own development environment.

  11. Development of an Implicit, Charge and Energy Conserving 2D Electromagnetic PIC Code on Advanced Architectures

    NASA Astrophysics Data System (ADS)

    Payne, Joshua; Taitano, William; Knoll, Dana; Liebs, Chris; Murthy, Karthik; Feltman, Nicolas; Wang, Yijie; McCarthy, Colleen; Cieren, Emanuel

    2012-10-01

    In order to solve problems such as the ion coalescence and slow MHD shocks fully kinetically we developed a fully implicit 2D energy and charge conserving electromagnetic PIC code, PlasmaApp2D. PlasmaApp2D differs from previous implicit PIC implementations in that it will utilize advanced architectures such as GPUs and shared memory CPU systems, with problems too large to fit into cache. PlasmaApp2D will be a hybrid CPU-GPU code developed primarily to run on the DARWIN cluster at LANL utilizing four 12-core AMD Opteron CPUs and two NVIDIA Tesla GPUs per node. MPI will be used for cross-node communication, OpenMP will be used for on-node parallelism, and CUDA will be used for the GPUs. Development progress and initial results will be presented.

  12. Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.

    PubMed

    Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal

    2015-06-01

    Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.

  13. The Open Source Snowpack modelling ecosystem

    NASA Astrophysics Data System (ADS)

    Bavay, Mathias; Fierz, Charles; Egger, Thomas; Lehning, Michael

    2016-04-01

    As a large number of numerical snow models are available, a few stand out as quite mature and widespread. One such model is SNOWPACK, the Open Source model that is developed at the WSL Institute for Snow and Avalanche Research SLF. Over the years, various tools have been developed around SNOWPACK in order to expand its use or to integrate additional features. Today, the model is part of a whole ecosystem that has evolved to both offer seamless integration and high modularity so each tool can easily be used outside the ecosystem. Many of these Open Source tools experience their own, autonomous development and are successfully used in their own right in other models and applications. There is Alpine3D, the spatially distributed version of SNOWPACK, that forces it with terrain-corrected radiation fields and optionally with blowing and drifting snow. This model can be used on parallel systems (either with OpenMP or MPI) and has been used for applications ranging from climate change to reindeer herding. There is the MeteoIO pre-processing library that offers fully integrated data access, data filtering, data correction, data resampling and spatial interpolations. This library is now used by several other models and applications. There is the SnopViz snow profile visualization library and application that supports both measured and simulated snow profiles (relying on the CAAML standard) as well as time series. This JavaScript application can be used standalone without any internet connection or served on the web together with simulation results. There is the OSPER data platform effort with a data management service (build on the Global Sensor Network (GSN) platform) as well as a data documenting system (metadata management as a wiki). There are several distributed hydrological models for mountainous areas in ongoing development that require very little information about the soil structure based on the assumption that in step terrain, the most relevant information is contained in the Digital Elevation Model (DEM). There is finally a set of tools making up the operational chain to automatically run, monitor and publish SNOWPACK simulations for operational avalanche warning purposes. This tool chain has been developed with the aim of offering very low maintenance operation and very fast deployment and to easily adapt to other avalanche services.

  14. Ice-sheet modelling accelerated by graphics cards

    NASA Astrophysics Data System (ADS)

    Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek

    2014-11-01

    Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.

  15. A fully non-linear multi-species Fokker–Planck–Landau collision operator for simulation of fusion plasma

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hager, Robert, E-mail: rhager@pppl.gov; Yoon, E.S., E-mail: yoone@rpi.edu; Ku, S., E-mail: sku@pppl.gov

    2016-06-15

    Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. In this article, the non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. The finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable onmore » high-performance computing systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. The collision operator's good weak and strong scaling behavior are shown.« less

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shipman, Galen M.

    These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less

  17. Kernel optimization for short-range molecular dynamics

    NASA Astrophysics Data System (ADS)

    Hu, Changjun; Wang, Xianmeng; Li, Jianjiang; He, Xinfu; Li, Shigang; Feng, Yangde; Yang, Shaofeng; Bai, He

    2017-02-01

    To optimize short-range force computations in Molecular Dynamics (MD) simulations, multi-threading and SIMD optimizations are presented in this paper. With respect to multi-threading optimization, a Partition-and-Separate-Calculation (PSC) method is designed to avoid write conflicts caused by using Newton's third law. Serial bottlenecks are eliminated with no additional memory usage. The method is implemented by using the OpenMP model. Furthermore, the PSC method is employed on Intel Xeon Phi coprocessors in both native and offload models. We also evaluate the performance of the PSC method under different thread affinities on the MIC architecture. In the SIMD execution, we explain the performance influence in the PSC method, considering the "if-clause" of the cutoff radius check. The experiment results show that our PSC method is relatively more efficient compared to some traditional methods. In double precision, our 256-bit SIMD implementation is about 3 times faster than the scalar version.

  18. A fully non-linear multi-species Fokker–Planck–Landau collision operator for simulation of fusion plasma

    DOE PAGES

    Hager, Robert; Yoon, E. S.; Ku, S.; ...

    2016-04-04

    Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. The non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. Moreover, the finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable on high-performance computingmore » systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. As a result, the collision operator's good weak and strong scaling behavior are shown.« less

  19. GANDALF - Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids

    NASA Astrophysics Data System (ADS)

    Hubber, D. A.; Rosotti, G. P.; Booth, R. A.

    2018-01-01

    GANDALF is a new hydrodynamics and N-body dynamics code designed for investigating planet formation, star formation and star cluster problems. GANDALF is written in C++, parallelized with both OPENMP and MPI and contains a PYTHON library for analysis and visualization. The code has been written with a fully object-oriented approach to easily allow user-defined implementations of physics modules or other algorithms. The code currently contains implementations of smoothed particle hydrodynamics, meshless finite-volume and collisional N-body schemes, but can easily be adapted to include additional particle schemes. We present in this paper the details of its implementation, results from the test suite, serial and parallel performance results and discuss the planned future development. The code is freely available as an open source project on the code-hosting website github at https://github.com/gandalfcode/gandalf and is available under the GPLv2 license.

  20. Parallel transformation of K-SVD solar image denoising algorithm

    NASA Astrophysics Data System (ADS)

    Liang, Youwen; Tian, Yu; Li, Mei

    2017-02-01

    The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.

  1. Fast hydrological model calibration based on the heterogeneous parallel computing accelerated shuffled complex evolution method

    NASA Astrophysics Data System (ADS)

    Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke

    2018-01-01

    Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.

  2. Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU

    NASA Astrophysics Data System (ADS)

    Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang

    2017-10-01

    Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Langer, Steven H.; Karlin, Ian; Marinak, Marty M.

    HYDRA is used to simulate a variety of experiments carried out at the National Ignition Facility (NIF) [4] and other high energy density physics facilities. HYDRA has packages to simulate radiation transfer, atomic physics, hydrodynamics, laser propagation, and a number of other physics effects. HYDRA has over one million lines of code and includes both MPI and thread-level (OpenMP and pthreads) parallelism. This paper measures the performance characteristics of HYDRA using hardware counters on an IBM BlueGene/Q system. We report key ratios such as bytes/instruction and memory bandwidth for several different physics packages. The total number of bytes read andmore » written per time step is also reported. We show that none of the packages which use significant time are memory bandwidth limited on a Blue Gene/Q. HYDRA currently issues very few SIMD instructions. The pressure on memory bandwidth will increase if high levels of SIMD instructions can be achieved.« less

  4. Model-independent partial wave analysis using a massively-parallel fitting framework

    NASA Astrophysics Data System (ADS)

    Sun, L.; Aoude, R.; dos Reis, A. C.; Sokoloff, M.

    2017-10-01

    The functionality of GooFit, a GPU-friendly framework for doing maximum-likelihood fits, has been extended to extract model-independent {\\mathscr{S}}-wave amplitudes in three-body decays such as D + → h + h + h -. A full amplitude analysis is done where the magnitudes and phases of the {\\mathscr{S}}-wave amplitudes are anchored at a finite number of m 2(h + h -) control points, and a cubic spline is used to interpolate between these points. The amplitudes for {\\mathscr{P}}-wave and {\\mathscr{D}}-wave intermediate states are modeled as spin-dependent Breit-Wigner resonances. GooFit uses the Thrust library, with a CUDA backend for NVIDIA GPUs and an OpenMP backend for threads with conventional CPUs. Performance on a variety of platforms is compared. Executing on systems with GPUs is typically a few hundred times faster than executing the same algorithm on a single CPU.

  5. GPU accelerated implementation of NCI calculations using promolecular density.

    PubMed

    Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric

    2017-05-30

    The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  6. Gregarious Data Re-structuring in a Many Core Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

    this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less

  7. Proceeding On : Parallelisation Of Critical Code Passages In PHOENIX/3D

    NASA Astrophysics Data System (ADS)

    Arkenberg, Mario; Wichert, Viktoria; Hauschildt, Peter H.

    2016-10-01

    Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach here, by introducing especially adapted, parallel numerical methods and correspondingly parallelising time critical code passages. In the following, we present our work on PHOENIX/3D.While parallelisation is generally worthwhile, it requires revision of time-consuming subroutines with respect to separability of localised data and variables in order to determine the optimal approach. Of course, the same applies to the code structure. The importance of this ongoing work can be showcased by recently derived benchmark results, which were generated utilis- ing MPI and OpenMP. Furthermore, the need for a careful and thorough choice of an adequate, machine dependent setup is discussed.

  8. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Shuangshuang; Chen, Yousu; Wu, Di

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less

  9. SU-F-BRD-02: Application of ARCHERRT-- A GPU-Based Monte Carlo Dose Engine for Radiation Therapy -- to Tomotherapy and Patient-Independent IMRT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Su, L; Du, X; Liu, T

    Purpose: As a module of ARCHER -- Accelerated Radiation-transport Computations in Heterogeneous EnviRonments, ARCHER{sub RT} is designed for RadioTherapy (RT) dose calculation. This paper describes the application of ARCHERRT on patient-dependent TomoTherapy and patient-independent IMRT. It also conducts a 'fair' comparison of different GPUs and multicore CPU. Methods: The source input used for patient-dependent TomoTherapy is phase space file (PSF) generated from optimized plan. For patient-independent IMRT, the open filed PSF is used for different cases. The intensity modulation is simulated by fluence map. The GEANT4 code is used as benchmark. DVH and gamma index test are employed to evaluatemore » the accuracy of ARCHER{sub RT} code. Some previous studies reported misleading speedups by comparing GPU code with serial CPU code. To perform a fairer comparison, we write multi-thread code with OpenMP to fully exploit computing potential of CPU. The hardware involved in this study are a 6-core Intel E5-2620 CPU and 6 NVIDIA M2090 GPUs, a K20 GPU and a K40 GPU. Results: Dosimetric results from ARCHER{sub RT} and GEANT4 show good agreement. The 2%/2mm gamma test pass rates for different clinical cases are 97.2% to 99.7%. A single M2090 GPU needs 50~79 seconds for the simulation to achieve a statistical error of 1% in the PTV. The K40 card is about 1.7∼1.8 times faster than M2090 card. Using 6 M2090 card, the simulation can be finished in about 10 seconds. For comparison, Intel E5-2620 needs 507∼879 seconds for the same simulation. Conclusion: We successfully applied ARCHER{sub RT} to Tomotherapy and patient-independent IMRT, and conducted a fair comparison between GPU and CPU performance. The ARCHER{sub RT} code is both accurate and efficient and may be used towards clinical applications.« less

  10. Optimized multiple quantum MAS lineshape simulations in solid state NMR

    NASA Astrophysics Data System (ADS)

    Brouwer, William J.; Davis, Michael C.; Mueller, Karl T.

    2009-10-01

    The majority of nuclei available for study in solid state Nuclear Magnetic Resonance have half-integer spin I>1/2, with corresponding electric quadrupole moment. As such, they may couple with a surrounding electric field gradient. This effect introduces anisotropic line broadening to spectra, arising from distinct chemical species within polycrystalline solids. In Multiple Quantum Magic Angle Spinning (MQMAS) experiments, a second frequency dimension is created, devoid of quadrupolar anisotropy. As a result, the center of gravity of peaks in the high resolution dimension is a function of isotropic second order quadrupole and chemical shift alone. However, for complex materials, these parameters take on a stochastic nature due in turn to structural and chemical disorder. Lineshapes may still overlap in the isotropic dimension, complicating the task of assignment and interpretation. A distributed computational approach is presented here which permits simulation of the two-dimensional MQMAS spectrum, generated by random variates from model distributions of isotropic chemical and quadrupole shifts. Owing to the non-convex nature of the residual sum of squares (RSS) function between experimental and simulated spectra, simulated annealing is used to optimize the simulation parameters. In this manner, local chemical environments for disordered materials may be characterized, and via a re-sampling approach, error estimates for parameters produced. Program summaryProgram title: mqmasOPT Catalogue identifier: AEEC_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEC_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3650 No. of bytes in distributed program, including test data, etc.: 73 853 Distribution format: tar.gz Programming language: C, OCTAVE Computer: UNIX/Linux Operating system: UNIX/Linux Has the code been vectorised or parallelized?: Yes RAM: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 3.5M, SMP AMD opteron Classification: 2.3 External routines: OCTAVE ( http://www.gnu.org/software/octave/), GNU Scientific Library ( http://www.gnu.org/software/gsl/), OPENMP ( http://openmp.org/wp/) Nature of problem: The optimal simulation and modeling of multiple quantum magic angle spinning NMR spectra, for general systems, especially those with mild to significant disorder. The approach outlined and implemented in C and OCTAVE also produces model parameter error estimates. Solution method: A model for each distinct chemical site is first proposed, for the individual contribution of crystallite orientations to the spectrum. This model is averaged over all powder angles [1], as well as the (stochastic) parameters; isotropic chemical shift and quadrupole coupling constant. The latter is accomplished via sampling from a bi-variate Gaussian distribution, using the Box-Muller algorithm to transform Sobol (quasi) random numbers [2]. A simulated annealing optimization is performed, and finally the non-linear jackknife [3] is applied in developing model parameter error estimates. Additional comments: The distribution contains a script, mqmasOpt.m, which runs in the OCTAVE language workspace. Running time: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 58.35 seconds, SMP AMD opteron. References:S.K. Zaremba, Annali di Matematica Pura ed Applicata 73 (1966) 293. H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods, SIAM, 1992. T. Fox, D. Hinkley, K. Larntz, Technometrics 22 (1980) 29.

  11. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    NASA Astrophysics Data System (ADS)

    Dey, T.; Rodrigue, P.

    2015-07-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.

  12. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

    NASA Astrophysics Data System (ADS)

    Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

    2017-08-01

    The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.

  13. OpenSWPC: an open-source integrated parallel simulation code for modeling seismic wave propagation in 3D heterogeneous viscoelastic media

    NASA Astrophysics Data System (ADS)

    Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi

    2017-07-01

    We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.

  14. ATDM LANL FleCSI: Topology and Execution Framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bergen, Benjamin Karl

    FleCSI is a compile-time configurable C++ framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. This means that FleCSI is potentially useful to many different ECP projects. Current support includes multidimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures (to identify data dependencies between distributed-memory address spaces). FleCSI introduces a functional programming model with control, execution, and data abstractionsmore » that are consistent with state-of-the-art task-based runtimes such as Legion and Charm++. The model also provides support for fine-grained, data-parallel execution with backend support for runtimes such as OpenMP and C++17. The FleCSI abstraction layer provides the developer with insulation from the underlying runtimes, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. This project is essential to the ECP Ristra Next-Generation Code project, part of ASC ATDM, because it provides a hierarchically parallel programming model that is consistent with the design of modern system architectures, but which allows for the straightforward expression of algorithmic parallelism in a portably performant manner.« less

  15. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

    PubMed

    Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

    2015-07-01

    GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.

  16. batman: BAsic Transit Model cAlculatioN in Python

    NASA Astrophysics Data System (ADS)

    Kreidberg, Laura

    2015-11-01

    I introduce batman, a Python package for modeling exoplanet transit light curves. The batman package supports calculation of light curves for any radially symmetric stellar limb darkening law, using a new integration algorithm for models that cannot be quickly calculated analytically. The code uses C extension modules to speed up model calculation and is parallelized with OpenMP. For a typical light curve with 100 data points in transit, batman can calculate one million quadratic limb-darkened models in 30 seconds with a single 1.7 GHz Intel Core i5 processor. The same calculation takes seven minutes using the four-parameter nonlinear limb darkening model (computed to 1 ppm accuracy). Maximum truncation error for integrated models is an input parameter that can be set as low as 0.001 ppm, ensuring that the community is prepared for the precise transit light curves we anticipate measuring with upcoming facilities. The batman package is open source and publicly available at https://github.com/lkreidberg/batman .

  17. High-Productivity Computing in Computational Physics Education

    NASA Astrophysics Data System (ADS)

    Tel-Zur, Guy

    2011-03-01

    We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.

  18. spMC: an R-package for 3D lithological reconstructions based on spatial Markov chains

    NASA Astrophysics Data System (ADS)

    Sartore, Luca; Fabbri, Paolo; Gaetan, Carlo

    2016-09-01

    The paper presents the spatial Markov Chains (spMC) R-package and a case study of subsoil simulation/prediction located in a plain site of Northeastern Italy. spMC is a quite complete collection of advanced methods for data inspection, besides spMC implements Markov Chain models to estimate experimental transition probabilities of categorical lithological data. Furthermore, simulation methods based on most known prediction methods (as indicator Kriging and CoKriging) were implemented in spMC package. Moreover, other more advanced methods are available for simulations, e.g. path methods and Bayesian procedures, that exploit the maximum entropy. Since the spMC package was developed for intensive geostatistical computations, part of the code is implemented for parallel computations via the OpenMP constructs. A final analysis of this computational efficiency compares the simulation/prediction algorithms by using different numbers of CPU cores, and considering the example data set of the case study included in the package.

  19. Pythran: enabling static optimization of scientific Python programs

    NASA Astrophysics Data System (ADS)

    Guelton, Serge; Brunet, Pierrick; Amini, Mehdi; Merlini, Adrien; Corbillon, Xavier; Raynaud, Alan

    2015-01-01

    Pythran is an open source static compiler that turns modules written in a subset of Python language into native ones. Assuming that scientific modules do not rely much on the dynamic features of the language, it trades them for powerful, possibly inter-procedural, optimizations. These optimizations include detection of pure functions, temporary allocation removal, constant folding, Numpy ufunc fusion and parallelization, explicit thread-level parallelism through OpenMP annotations, false variable polymorphism pruning, and automatic vector instruction generation such as AVX or SSE. In addition to these compilation steps, Pythran provides a C++ runtime library that leverages the C++ STL to provide generic containers, and the Numeric Template Toolbox for Numpy support. It takes advantage of modern C++11 features such as variadic templates, type inference, move semantics and perfect forwarding, as well as classical idioms such as expression templates. Unlike the Cython approach, Pythran input code remains compatible with the Python interpreter. Output code is generally as efficient as the annotated Cython equivalent, if not more, but without the backward compatibility loss.

  20. A Parallel 2D Numerical Simulation of Tumor Cells Necrosis by Local Hyperthermia

    NASA Astrophysics Data System (ADS)

    Reis, R. F.; Loureiro, F. S.; Lobosco, M.

    2014-03-01

    Hyperthermia has been widely used in cancer treatment to destroy tumors. The main idea of the hyperthermia is to heat a specific region like a tumor so that above a threshold temperature the tumor cells are destroyed. This can be accomplished by many heat supply techniques and the use of magnetic nanoparticles that generate heat when an alternating magnetic field is applied has emerged as a promise technique. In the present paper, the Pennes bioheat transfer equation is adopted to model the thermal tumor ablation in the context of magnetic nanoparticles. Numerical simulations are carried out considering different injection sites for the nanoparticles in an attempt to achieve better hyperthermia conditions. Explicit finite difference method is employed to solve the equations. However, a large amount of computation is required for this purpose. Therefore, this work also presents an initial attempt to improve performance using OpenMP, a parallel programming API. Experimental results were quite encouraging: speedups around 35 were obtained on a 64-core machine.

  1. Parallel k-means++

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less

  2. ARES v2: new features and improved performance

    NASA Astrophysics Data System (ADS)

    Sousa, S. G.; Santos, N. C.; Adibekyan, V.; Delgado-Mena, E.; Israelian, G.

    2015-05-01

    Aims: We present a new upgraded version of ARES. The new version includes a series of interesting new features such as automatic radial velocity correction, a fully automatic continuum determination, and an estimation of the errors for the equivalent widths. Methods: The automatic correction of the radial velocity is achieved with a simple cross-correlation function, and the automatic continuum determination, as well as the estimation of the errors, relies on a new approach to evaluating the spectral noise at the continuum level. Results: ARES v2 is totally compatible with its predecessor. We show that the fully automatic continuum determination is consistent with the previous methods applied for this task. It also presents a significant improvement on its performance thanks to the implementation of a parallel computation using the OpenMP library. Automatic Routine for line Equivalent widths in stellar Spectra - ARES webpage: http://www.astro.up.pt/~sousasag/ares/Based on observations made with ESO Telescopes at the La Silla Paranal Observatory under programme ID 075.D-0800(A).

  3. GPU accelerated particle visualization with Splotch

    NASA Astrophysics Data System (ADS)

    Rivi, M.; Gheller, C.; Dykes, T.; Krokos, M.; Dolag, K.

    2014-07-01

    Splotch is a rendering algorithm for exploration and visual discovery in particle-based datasets coming from astronomical observations or numerical simulations. The strengths of the approach are production of high quality imagery and support for very large-scale datasets through an effective mix of the OpenMP and MPI parallel programming paradigms. This article reports our experiences in re-designing Splotch for exploiting emerging HPC architectures nowadays increasingly populated with GPUs. A performance model is introduced to guide our re-factoring of Splotch. A number of parallelization issues are discussed, in particular relating to race conditions and workload balancing, towards achieving optimal performances. Our implementation was accomplished by using the CUDA programming paradigm. Our strategy is founded on novel schemes achieving optimized data organization and classification of particles. We deploy a reference cosmological simulation to present performance results on acceleration gains and scalability. We finally outline our vision for future work developments including possibilities for further optimizations and exploitation of hybrid systems and emerging accelerators.

  4. Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

    NASA Astrophysics Data System (ADS)

    Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.

    2013-03-01

    The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.

  5. Facilitating arrhythmia simulation: the method of quantitative cellular automata modeling and parallel running

    PubMed Central

    Zhu, Hao; Sun, Yan; Rajagopal, Gunaretnam; Mondry, Adrian; Dhar, Pawan

    2004-01-01

    Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods described. PMID:15339335

  6. Revised Extended Grid Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martz, Roger L.

    The Revised Eolus Grid Library (REGL) is a mesh-tracking library that was developed for use with the MCNP6TM computer code so that (radiation) particles can track on an unstructured mesh. The unstructured mesh is a finite element representation of any geometric solid model created with a state-of-the-art CAE/CAD tool. The mesh-tracking library is written using modern Fortran and programming standards; the library is Fortran 2003 compliant. The library was created with a defined application programmer interface (API) so that it could easily integrate with other particle tracking/transport codes. The library does not handle parallel processing via the message passing interfacemore » (mpi), but has been used successfully where the host code handles the mpi calls. The library is thread-safe and supports the OpenMP paradigm. As a library, all features are available through the API and overall a tight coupling between it and the host code is required. Features of the library are summarized with the following list: Can accommodate first and second order 4, 5, and 6-sided polyhedra; any combination of element types may appear in a single geometry model; parts may not contain tetrahedra mixed with other element types; pentahedra and hexahedra can be together in the same part; robust handling of overlaps and gaps; tracks element-to-element to produce path length results at the element level; finds element numbers for a given mesh location; finds intersection points on element faces for the particle tracks; produce a data file for post processing results analysis; reads Abaqus .inp input (ASCII) files to obtain information for the global mesh-model; supports parallel input processing via mpi; and support parallel particle transport by both mpi and OpenMP.« less

  7. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations

    PubMed Central

    Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

    2015-01-01

    GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008

  8. mm_par2.0: An object-oriented molecular dynamics simulation program parallelized using a hierarchical scheme with MPI and OPENMP

    NASA Astrophysics Data System (ADS)

    Oh, Kwang Jin; Kang, Ji Hoon; Myung, Hun Joo

    2012-02-01

    We have revised a general purpose parallel molecular dynamics simulation program mm_par using the object-oriented programming. We parallelized the revised version using a hierarchical scheme in order to utilize more processors for a given system size. The benchmark result will be presented here. New version program summaryProgram title: mm_par2.0 Catalogue identifier: ADXP_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXP_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2 390 858 No. of bytes in distributed program, including test data, etc.: 25 068 310 Distribution format: tar.gz Programming language: C++ Computer: Any system operated by Linux or Unix Operating system: Linux Classification: 7.7 External routines: We provide wrappers for FFTW [1], Intel MKL library [2] FFT routine, and Numerical recipes [3] FFT, random number generator, and eigenvalue solver routines, SPRNG [4] random number generator, Mersenne Twister [5] random number generator, space filling curve routine. Catalogue identifier of previous version: ADXP_v1_0 Journal reference of previous version: Comput. Phys. Comm. 174 (2006) 560 Does the new version supersede the previous version?: Yes Nature of problem: Structural, thermodynamic, and dynamical properties of fluids and solids from microscopic scales to mesoscopic scales. Solution method: Molecular dynamics simulation in NVE, NVT, and NPT ensemble, Langevin dynamics simulation, dissipative particle dynamics simulation. Reasons for new version: First, object-oriented programming has been used, which is known to be open for extension and closed for modification. It is also known to be better for maintenance. Second, version 1.0 was based on atom decomposition and domain decomposition scheme [6] for parallelization. However, atom decomposition is not popular due to its poor scalability. On the other hand, domain decomposition scheme is better for scalability. It still has a limitation in utilizing a large number of cores on recent petascale computers due to the requirement that the domain size is larger than the potential cutoff distance. To go beyond such a limitation, a hierarchical parallelization scheme has been adopted in this new version and implemented using MPI [7] and OPENMP [8]. Summary of revisions: (1) Object-oriented programming has been used. (2) A hierarchical parallelization scheme has been adopted. (3) SPME routine has been fully parallelized with parallel 3D FFT using volumetric decomposition scheme [9]. K.J.O. thanks Mr. Seung Min Lee for useful discussion on programming and debugging. Running time: Running time depends on system size and methods used. For test system containing a protein (PDB id: 5DHFR) with CHARMM22 force field [10] and 7023 TIP3P [11] waters in simulation box having dimension 62.23 Å×62.23 Å×62.23 Å, the benchmark results are given in Fig. 1. Here the potential cutoff distance was set to 12 Å and the switching function was applied from 10 Å for the force calculation in real space. For the SPME [12] calculation, K, K, and K were set to 64 and the interpolation order was set to 4. To do the fast Fourier transform, we used Intel MKL library. All bonds including hydrogen atoms were constrained using SHAKE/RATTLE algorithms [13,14]. The code was compiled using Intel compiler version 11.1 and mvapich2 version 1.5. Fig. 2 shows performance gains from using CUDA-enabled version [15] of mm_par for 5DHFR simulation in water on Intel Core2Quad 2.83 GHz and GeForce GTX 580. Even though mm_par2.0 is not ported yet for GPU, its performance data would be useful to expect mm_par2.0 performance on GPU. Timing results for 1000 MD steps. 1, 2, 4, and 8 in the figure mean the number of OPENMP threads. Timing results for 1000 MD steps from double precision simulation on CPU, single precision simulation on GPU, and double precision simulation on GPU.

  9. A High Order, Locally-Adaptive Method for the Navier-Stokes Equations

    NASA Astrophysics Data System (ADS)

    Chan, Daniel

    1998-11-01

    I have extended the FOSLS method of Cai, Manteuffel and McCormick (1997) and implemented it within the framework of a spectral element formulation using the Legendre polynomial basis function. The FOSLS method solves the Navier-Stokes equations as a system of coupled first-order equations and provides the ellipticity that is needed for fast iterative matrix solvers like multigrid to operate efficiently. Each element is treated as an object and its properties are self-contained. Only C^0 continuity is imposed across element interfaces; this design allows local grid refinement and coarsening without the burden of having an elaborate data structure, since only information along element boundaries is needed. With the FORTRAN 90 programming environment, I can maintain a high computational efficiency by employing a hybrid parallel processing model. The OpenMP directives provides parallelism in the loop level which is executed in a shared-memory SMP and the MPI protocol allows the distribution of elements to a cluster of SMP's connected via a commodity network. This talk will provide timing results and a comparison with a second order finite difference method.

  10. OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

    NASA Astrophysics Data System (ADS)

    Ki, Seo Jin; Sugimura, Tak; Kim, Albert S.

    2015-02-01

    We developed a practical method to accelerate execution of Soil and Water Assessment Tool (SWAT) using open (free) computational resources. The SWAT source code (rev 622) was recompiled using a non-commercial Intel FORTRAN compiler in Ubuntu 12.04 LTS Linux platform, and newly named iOMP-SWAT in this study. GNU utilities of make, gprof, and diff were used to develop the iOMP-SWAT package, profile memory usage, and check identicalness of parallel and serial simulations. Among 302 SWAT subroutines, the slowest routines were identified using GNU gprof, and later modified using Open Multiple Processing (OpenMP) library in an 8-core shared memory system. In addition, a C wrapping function was used to rapidly set large arrays to zero by cross compiling with the original SWAT FORTRAN package. A universal speedup ratio of 2.3 was achieved using input data sets of a large number of hydrological response units. As we specifically focus on acceleration of a single SWAT run, the use of iOMP-SWAT for parameter calibrations will significantly improve the performance of SWAT optimization.

  11. Performance Analysis of GFDL's GCM Line-By-Line Radiative Transfer Model on GPU and MIC Architectures

    NASA Astrophysics Data System (ADS)

    Menzel, R.; Paynter, D.; Jones, A. L.

    2017-12-01

    Due to their relatively low computational cost, radiative transfer models in global climate models (GCMs) run on traditional CPU architectures generally consist of shortwave and longwave parameterizations over a small number of wavelength bands. With the rise of newer GPU and MIC architectures, however, the performance of high resolution line-by-line radiative transfer models may soon approach those of the physical parameterizations currently employed in GCMs. Here we present an analysis of the current performance of a new line-by-line radiative transfer model currently under development at GFDL. Although originally designed to specifically exploit GPU architectures through the use of CUDA, the radiative transfer model has recently been extended to include OpenMP in an effort to also effectively target MIC architectures such as Intel's Xeon Phi. Using input data provided by the upcoming Radiative Forcing Model Intercomparison Project (RFMIP, as part of CMIP 6), we compare model results and performance data for various model configurations and spectral resolutions run on both GPU and Intel Knights Landing architectures to analogous runs of the standard Oxford Reference Forward Model on traditional CPUs.

  12. Parallelization of elliptic solver for solving 1D Boussinesq model

    NASA Astrophysics Data System (ADS)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

    The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less

  14. Multi-GPU parallel algorithm design and analysis for improved inversion of probability tomography with gravity gradiometry data

    NASA Astrophysics Data System (ADS)

    Hou, Zhenlong; Huang, Danian

    2017-09-01

    In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.

  15. Optimization of groundwater artificial recharge systems using a genetic algorithm: a case study in Beijing, China

    NASA Astrophysics Data System (ADS)

    Hao, Qichen; Shao, Jingli; Cui, Yali; Zhang, Qiulan; Huang, Linxian

    2018-05-01

    An optimization approach is used for the operation of groundwater artificial recharge systems in an alluvial fan in Beijing, China. The optimization model incorporates a transient groundwater flow model, which allows for simulation of the groundwater response to artificial recharge. The facilities' operation with regard to recharge rates is formulated as a nonlinear programming problem to maximize the volume of surface water recharged into the aquifers under specific constraints. This optimization problem is solved by the parallel genetic algorithm (PGA) based on OpenMP, which could substantially reduce the computation time. To solve the PGA with constraints, the multiplicative penalty method is applied. In addition, the facilities' locations are implicitly determined on the basis of the results of the recharge-rate optimizations. Two scenarios are optimized and the optimal results indicate that the amount of water recharged into the aquifers will increase without exceeding the upper limits of the groundwater levels. Optimal operation of this artificial recharge system can also contribute to the more effective recovery of the groundwater storage capacity.

  16. Global magnetohydrodynamic simulations on multiple GPUs

    NASA Astrophysics Data System (ADS)

    Wong, Un-Hong; Wong, Hon-Cheng; Ma, Yonghui

    2014-01-01

    Global magnetohydrodynamic (MHD) models play the major role in investigating the solar wind-magnetosphere interaction. However, the huge computation requirement in global MHD simulations is also the main problem that needs to be solved. With the recent development of modern graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA), it is possible to perform global MHD simulations in a more efficient manner. In this paper, we present a global magnetohydrodynamic (MHD) simulator on multiple GPUs using CUDA 4.0 with GPUDirect 2.0. Our implementation is based on the modified leapfrog scheme, which is a combination of the leapfrog scheme and the two-step Lax-Wendroff scheme. GPUDirect 2.0 is used in our implementation to drive multiple GPUs. All data transferring and kernel processing are managed with CUDA 4.0 API instead of using MPI or OpenMP. Performance measurements are made on a multi-GPU system with eight NVIDIA Tesla M2050 (Fermi architecture) graphics cards. These measurements show that our multi-GPU implementation achieves a peak performance of 97.36 GFLOPS in double precision.

  17. BLESS 2: accurate, memory-efficient and fast error correction method.

    PubMed

    Heo, Yun; Ramachandran, Anand; Hwu, Wen-Mei; Ma, Jian; Chen, Deming

    2016-08-01

    The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. Freely available at https://sourceforge.net/projects/bless-ec dchen@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Hybrid MPI/OpenMP Implementation of the ORAC Molecular Dynamics Program for Generalized Ensemble and Fast Switching Alchemical Simulations.

    PubMed

    Procacci, Piero

    2016-06-27

    We present a new release (6.0β) of the ORAC program [Marsili et al. J. Comput. Chem. 2010, 31, 1106-1116] with a hybrid OpenMP/MPI (open multiprocessing message passing interface) multilevel parallelism tailored for generalized ensemble (GE) and fast switching double annihilation (FS-DAM) nonequilibrium technology aimed at evaluating the binding free energy in drug-receptor system on high performance computing platforms. The production of the GE or FS-DAM trajectories is handled using a weak scaling parallel approach on the MPI level only, while a strong scaling force decomposition scheme is implemented for intranode computations with shared memory access at the OpenMP level. The efficiency, simplicity, and inherent parallel nature of the ORAC implementation of the FS-DAM algorithm, project the code as a possible effective tool for a second generation high throughput virtual screening in drug discovery and design. The code, along with documentation, testing, and ancillary tools, is distributed under the provisions of the General Public License and can be freely downloaded at www.chim.unifi.it/orac .

  19. End-to-end performance measurement of Internet based medical applications.

    PubMed

    Dev, P; Harris, D; Gutierrez, D; Shah, A; Senger, S

    2002-01-01

    We present a method to obtain an end-to-end characterization of the performance of an application over a network. This method is not dependent on any specific application or type of network. The method requires characterization of network parameters, such as latency and packet loss, between the expected server or client endpoints, as well as characterization of the application's constraints on these parameters. A subjective metric is presented that integrates these characterizations and that operates over a wide range of applications and networks. We believe that this method may be of wide applicability as research and educational applications increasingly make use of computation and data servers that are distributed over the Internet.

  20. Using the Eclipse Parallel Tools Platform to Assist Earth Science Model Development and Optimization on High Performance Computers

    NASA Astrophysics Data System (ADS)

    Alameda, J. C.

    2011-12-01

    Development and optimization of computational science models, particularly on high performance computers, and with the advent of ubiquitous multicore processor systems, practically on every system, has been accomplished with basic software tools, typically, command-line based compilers, debuggers, performance tools that have not changed substantially from the days of serial and early vector computers. However, model complexity, including the complexity added by modern message passing libraries such as MPI, and the need for hybrid code models (such as openMP and MPI) to be able to take full advantage of high performance computers with an increasing core count per shared memory node, has made development and optimization of such codes an increasingly arduous task. Additional architectural developments, such as many-core processors, only complicate the situation further. In this paper, we describe how our NSF-funded project, "SI2-SSI: A Productive and Accessible Development Workbench for HPC Applications Using the Eclipse Parallel Tools Platform" (WHPC) seeks to improve the Eclipse Parallel Tools Platform, an environment designed to support scientific code development targeted at a diverse set of high performance computing systems. Our WHPC project to improve Eclipse PTP takes an application-centric view to improve PTP. We are using a set of scientific applications, each with a variety of challenges, and using PTP to drive further improvements to both the scientific application, as well as to understand shortcomings in Eclipse PTP from an application developer perspective, to drive our list of improvements we seek to make. We are also partnering with performance tool providers, to drive higher quality performance tool integration. We have partnered with the Cactus group at Louisiana State University to improve Eclipse's ability to work with computational frameworks and extremely complex build systems, as well as to develop educational materials to incorporate into computational science and engineering codes. Finally, we are partnering with the lead PTP developers at IBM, to ensure we are as effective as possible within the Eclipse community development. We are also conducting training and outreach to our user community, including conference BOF sessions, monthly user calls, and an annual user meeting, so that we can best inform the improvements we make to Eclipse PTP. With these activities we endeavor to encourage use of modern software engineering practices, as enabled through the Eclipse IDE, with computational science and engineering applications. These practices include proper use of source code repositories, tracking and rectifying issues, measuring and monitoring code performance changes against both optimizations as well as ever-changing software stacks and configurations on HPC systems, as well as ultimately encouraging development and maintenance of testing suites -- things that have become commonplace in many software endeavors, but have lagged in the development of science applications. We view that the challenge with the increased complexity of both HPC systems and science applications demands the use of better software engineering methods, preferably enabled by modern tools such as Eclipse PTP, to help the computational science community thrive as we evolve the HPC landscape.

  1. Capturing Petascale Application Characteristics with the Sequoia Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vetter, Jeffrey S; Bhatia, Nikhil; Grobelny, Eric M

    2005-01-01

    Characterization of the computation, communication, memory, and I/O demands of current scientific applications is crucial for identifying which technologies will enable petascale scientific computing. In this paper, we present the Sequoia Toolkit for characterizing HPC applications. The Sequoia Toolkit consists of the Sequoia trace capture library and the Sequoia Event Analysis Library, or SEAL, that facilitates the development of tools for analyzing Sequoia event traces. Using the Sequoia Toolkit, we have characterized the behavior of application runs with up to 2048 application processes. To illustrate the use of the Sequoia Toolkit, we present a preliminary characterization of LAMMPS, a molecularmore » dynamics application of great interest to the computational biology community.« less

  2. PhysiCell: An open source physics-based cell simulator for 3-D multicellular systems.

    PubMed

    Ghaffarizadeh, Ahmadreza; Heiland, Randy; Friedman, Samuel H; Mumenthaler, Shannon M; Macklin, Paul

    2018-02-01

    Many multicellular systems problems can only be understood by studying how cells move, grow, divide, interact, and die. Tissue-scale dynamics emerge from systems of many interacting cells as they respond to and influence their microenvironment. The ideal "virtual laboratory" for such multicellular systems simulates both the biochemical microenvironment (the "stage") and many mechanically and biochemically interacting cells (the "players" upon the stage). PhysiCell-physics-based multicellular simulator-is an open source agent-based simulator that provides both the stage and the players for studying many interacting cells in dynamic tissue microenvironments. It builds upon a multi-substrate biotransport solver to link cell phenotype to multiple diffusing substrates and signaling factors. It includes biologically-driven sub-models for cell cycling, apoptosis, necrosis, solid and fluid volume changes, mechanics, and motility "out of the box." The C++ code has minimal dependencies, making it simple to maintain and deploy across platforms. PhysiCell has been parallelized with OpenMP, and its performance scales linearly with the number of cells. Simulations up to 105-106 cells are feasible on quad-core desktop workstations; larger simulations are attainable on single HPC compute nodes. We demonstrate PhysiCell by simulating the impact of necrotic core biomechanics, 3-D geometry, and stochasticity on the dynamics of hanging drop tumor spheroids and ductal carcinoma in situ (DCIS) of the breast. We demonstrate stochastic motility, chemical and contact-based interaction of multiple cell types, and the extensibility of PhysiCell with examples in synthetic multicellular systems (a "cellular cargo delivery" system, with application to anti-cancer treatments), cancer heterogeneity, and cancer immunology. PhysiCell is a powerful multicellular systems simulator that will be continually improved with new capabilities and performance improvements. It also represents a significant independent code base for replicating results from other simulation platforms. The PhysiCell source code, examples, documentation, and support are available under the BSD license at http://PhysiCell.MathCancer.org and http://PhysiCell.sf.net.

  3. Application of Adjoint Method and Spectral-Element Method to Tomographic Inversion of Regional Seismological Structure Beneath Japanese Islands

    NASA Astrophysics Data System (ADS)

    Tsuboi, S.; Miyoshi, T.; Obayashi, M.; Tono, Y.; Ando, K.

    2014-12-01

    Recent progress in large scale computing by using waveform modeling technique and high performance computing facility has demonstrated possibilities to perform full-waveform inversion of three dimensional (3D) seismological structure inside the Earth. We apply the adjoint method (Liu and Tromp, 2006) to obtain 3D structure beneath Japanese Islands. First we implemented Spectral-Element Method to K-computer in Kobe, Japan. We have optimized SPECFEM3D_GLOBE (Komatitsch and Tromp, 2002) by using OpenMP so that the code fits hybrid architecture of K-computer. Now we could use 82,134 nodes of K-computer (657,072 cores) to compute synthetic waveform with about 1 sec accuracy for realistic 3D Earth model and its performance was 1.2 PFLOPS. We use this optimized SPECFEM3D_GLOBE code and take one chunk around Japanese Islands from global mesh and compute synthetic seismograms with accuracy of about 10 second. We use GAP-P2 mantle tomography model (Obayashi et al., 2009) as an initial 3D model and use as many broadband seismic stations available in this region as possible to perform inversion. We then use the time windows for body waves and surface waves to compute adjoint sources and calculate adjoint kernels for seismic structure. We have performed several iteration and obtained improved 3D structure beneath Japanese Islands. The result demonstrates that waveform misfits between observed and theoretical seismograms improves as the iteration proceeds. We now prepare to use much shorter period in our synthetic waveform computation and try to obtain seismic structure for basin scale model, such as Kanto basin, where there are dense seismic network and high seismic activity. Acknowledgements: This research was partly supported by MEXT Strategic Program for Innovative Research. We used F-net seismograms of the National Research Institute for Earth Science and Disaster Prevention.

  4. Parallel Execution of Functional Mock-up Units in Buildings Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ozmen, Ozgur; Nutaro, James J.; New, Joshua Ryan

    2016-06-30

    A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported asmore » a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.« less

  5. cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

    PubMed

    Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

    2016-01-01

    Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.

  6. A Coarse-to-Fine Geometric Scale-Invariant Feature Transform for Large Size High Resolution Satellite Image Registration

    PubMed Central

    Chang, Xueli; Du, Siliang; Li, Yingying; Fang, Shenghui

    2018-01-01

    Large size high resolution (HR) satellite image matching is a challenging task due to local distortion, repetitive structures, intensity changes and low efficiency. In this paper, a novel matching approach is proposed for the large size HR satellite image registration, which is based on coarse-to-fine strategy and geometric scale-invariant feature transform (SIFT). In the coarse matching step, a robust matching method scale restrict (SR) SIFT is implemented at low resolution level. The matching results provide geometric constraints which are then used to guide block division and geometric SIFT in the fine matching step. The block matching method can overcome the memory problem. In geometric SIFT, with area constraints, it is beneficial for validating the candidate matches and decreasing searching complexity. To further improve the matching efficiency, the proposed matching method is parallelized using OpenMP. Finally, the sensing image is rectified to the coordinate of reference image via Triangulated Irregular Network (TIN) transformation. Experiments are designed to test the performance of the proposed matching method. The experimental results show that the proposed method can decrease the matching time and increase the number of matching points while maintaining high registration accuracy. PMID:29702589

  7. Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

    NASA Astrophysics Data System (ADS)

    Jiang, Jinpeng; Zhu, Peimin

    2018-05-01

    Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.

  8. Parallel peak pruning for scalable SMP contour tree computation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

    As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less

  9. Petascale computation performance of lightweight multiscale cardiac models using hybrid programming models.

    PubMed

    Pope, Bernard J; Fitch, Blake G; Pitman, Michael C; Rice, John J; Reumann, Matthias

    2011-01-01

    Future multiscale and multiphysics models must use the power of high performance computing (HPC) systems to enable research into human disease, translational medical science, and treatment. Previously we showed that computationally efficient multiscale models will require the use of sophisticated hybrid programming models, mixing distributed message passing processes (e.g. the message passing interface (MPI)) with multithreading (e.g. OpenMP, POSIX pthreads). The objective of this work is to compare the performance of such hybrid programming models when applied to the simulation of a lightweight multiscale cardiac model. Our results show that the hybrid models do not perform favourably when compared to an implementation using only MPI which is in contrast to our results using complex physiological models. Thus, with regards to lightweight multiscale cardiac models, the user may not need to increase programming complexity by using a hybrid programming approach. However, considering that model complexity will increase as well as the HPC system size in both node count and number of cores per node, it is still foreseeable that we will achieve faster than real time multiscale cardiac simulations on these systems using hybrid programming models.

  10. Accelerated event-by-event Monte Carlo microdosimetric calculations of electrons and protons tracks on a multi-core CPU and a CUDA-enabled GPU.

    PubMed

    Kalantzis, Georgios; Tachibana, Hidenobu

    2014-01-01

    For microdosimetric calculations event-by-event Monte Carlo (MC) methods are considered the most accurate. The main shortcoming of those methods is the extensive requirement for computational time. In this work we present an event-by-event MC code of low projectile energy electron and proton tracks for accelerated microdosimetric MC simulations on a graphic processing unit (GPU). Additionally, a hybrid implementation scheme was realized by employing OpenMP and CUDA in such a way that both GPU and multi-core CPU were utilized simultaneously. The two implementation schemes have been tested and compared with the sequential single threaded MC code on the CPU. Performance comparison was established on the speed-up for a set of benchmarking cases of electron and proton tracks. A maximum speedup of 67.2 was achieved for the GPU-based MC code, while a further improvement of the speedup up to 20% was achieved for the hybrid approach. The results indicate the capability of our CPU-GPU implementation for accelerated MC microdosimetric calculations of both electron and proton tracks without loss of accuracy. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  11. Acoustic 3D modeling by the method of integral equations

    NASA Astrophysics Data System (ADS)

    Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.

    2018-02-01

    This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.

  12. fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.

    PubMed

    Hung, Ling-Hong; Samudrala, Ram

    2014-06-15

    fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.

  13. Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

    NASA Astrophysics Data System (ADS)

    Hause, Benjamin; Parker, Scott; Chen, Yang

    2013-10-01

    We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.

  14. A novel heterogeneous algorithm to simulate multiphase flow in porous media on multicore CPU-GPU systems

    NASA Astrophysics Data System (ADS)

    McClure, J. E.; Prins, J. F.; Miller, C. T.

    2014-07-01

    Multiphase flow implementations of the lattice Boltzmann method (LBM) are widely applied to the study of porous medium systems. In this work, we construct a new variant of the popular "color" LBM for two-phase flow in which a three-dimensional, 19-velocity (D3Q19) lattice is used to compute the momentum transport solution while a three-dimensional, seven velocity (D3Q7) lattice is used to compute the mass transport solution. Based on this formulation, we implement a novel heterogeneous GPU-accelerated algorithm in which the mass transport solution is computed by multiple shared memory CPU cores programmed using OpenMP while a concurrent solution of the momentum transport is performed using a GPU. The heterogeneous solution is demonstrated to provide speedup of 2.6 × as compared to multi-core CPU solution and 1.8 × compared to GPU solution due to concurrent utilization of both CPU and GPU bandwidths. Furthermore, we verify that the proposed formulation provides an accurate physical representation of multiphase flow processes and demonstrate that the approach can be applied to perform heterogeneous simulations of two-phase flow in porous media using a typical GPU-accelerated workstation.

  15. Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

    NASA Astrophysics Data System (ADS)

    Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

    2016-09-01

    The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.

  16. Finite-Difference Algorithm for Simulating 3D Electromagnetic Wavefields in Conductive Media

    NASA Astrophysics Data System (ADS)

    Aldridge, D. F.; Bartel, L. C.; Knox, H. A.

    2013-12-01

    Electromagnetic (EM) wavefields are routinely used in geophysical exploration for detection and characterization of subsurface geological formations of economic interest. Recorded EM signals depend strongly on the current conductivity of geologic media. Hence, they are particularly useful for inferring fluid content of saturated porous bodies. In order to enhance understanding of field-recorded data, we are developing a numerical algorithm for simulating three-dimensional (3D) EM wave propagation and diffusion in heterogeneous conductive materials. Maxwell's equations are combined with isotropic constitutive relations to obtain a set of six, coupled, first-order partial differential equations governing the electric and magnetic vectors. An advantage of this system is that it does not contain spatial derivatives of the three medium parameters electric permittivity, magnetic permeability, and current conductivity. Numerical solution methodology consists of explicit, time-domain finite-differencing on a 3D staggered rectangular grid. Temporal and spatial FD operators have order 2 and N, where N is user-selectable. We use an artificially-large electric permittivity to maximize the FD timestep, and thus reduce execution time. For the low frequencies typically used in geophysical exploration, accuracy is not unduly compromised. Grid boundary reflections are mitigated via convolutional perfectly matched layers (C-PMLs) imposed at the six grid flanks. A shared-memory-parallel code implementation via OpenMP directives enables rapid algorithm execution on a multi-thread computational platform. Good agreement is obtained in comparisons of numerically-generated data with reference solutions. EM wavefields are sourced via point current density and magnetic dipole vectors. Spatially-extended inductive sources (current carrying wire loops) are under development. We are particularly interested in accurate representation of high-conductivity sub-grid-scale features that are common in industrial environments (borehole casing, pipes, railroad tracks). Present efforts are oriented toward calculating the EM responses of these objects via a First Born Approximation approach. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the US Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  17. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    NASA Astrophysics Data System (ADS)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195

  18. Workload Characterization of CFD Applications Using Partial Differential Equation Solvers

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    Workload characterization is used for modeling and evaluating of computing systems at different levels of detail. We present workload characterization for a class of Computational Fluid Dynamics (CFD) applications that solve Partial Differential Equations (PDEs). This workload characterization focuses on three high performance computing platforms: SGI Origin2000, EBM SP-2, a cluster of Intel Pentium Pro bases PCs. We execute extensive measurement-based experiments on these platforms to gather statistics of system resource usage, which results in workload characterization. Our workload characterization approach yields a coarse-grain resource utilization behavior that is being applied for performance modeling and evaluation of distributed high performance metacomputing systems. In addition, this study enhances our understanding of interactions between PDE solver workloads and high performance computing platforms and is useful for tuning these applications.

  19. Nanoporous Gold: Fabrication, Characterization, and Applications

    PubMed Central

    Seker, Erkin; Reed, Michael L.; Begley, Matthew R.

    2009-01-01

    Nanoporous gold (np-Au) has intriguing material properties that offer potential benefits for many applications due to its high specific surface area, well-characterized thiol-gold surface chemistry, high electrical conductivity, and reduced stiffness. The research on np-Au has taken place on various fronts, including advanced microfabrication and characterization techniques to probe unusual nanoscale properties and applications spanning from fuel cells to electrochemical sensors. Here, we provide a review of the recent advances in np-Au research, with special emphasis on microfabrication and characterization techniques. We conclude the paper with a brief outline of challenges to overcome in the study of nanoporous metals.

  20. Characterization of spray deposition and drift from a low drift nozzle for aerial application at different application altitudes

    USDA-ARS?s Scientific Manuscript database

    A complex interaction of controllable and uncontrollable factors is involved in aerial application of crop production and protection materials. Although it is difficult to completely characterize spray deposition and drift, these important factors can be estimated with appropriate sampling protocol ...

  1. The VOLSCAT package for electron and positron scattering of molecular targets: A new high throughput approach to cross-section and resonances computation

    NASA Astrophysics Data System (ADS)

    Sanna, N.; Baccarelli, I.; Morelli, G.

    2009-12-01

    VOLSCAT is a computer program which implements the Single Center Expansion (SCE) method to solve the scattering equation for the elastic collision of electrons/positrons off molecular targets. The scattering potential needed is calculated by on-the-fly calls to the external SCELib library for molecular properties, recently ported to GPU computing environment and ClearSpeed platforms, and made available by means of an Application Program Interface (SCELib-API) which is also provided with the VOLSCAT package in a beta version. The result is a high throughput approach to the solution of the complex e/e-molecule scattering problem, with allows for intensive calculations both for the number of systems which can be studied and for their size. Accurate partial and total elastic cross sections are produced in output together with the associated eigenphase sums. Indirect scattering processes arising from the formation of temporary negative ions can also be analyzed through the computation of the resonances' parameters. Program summaryProgram title: VOLSCAT V1.0 Catalogue identifier: AEEW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4 618 353 No. of bytes in distributed program, including test data, etc.: 120 307 536 Distribution format: tar.gz Programming language: Fortran90 Computer: All SMP platforms based on AIX, Linux and SUNOS operating systems over SPARC, POWER, Intel Itanium2, X86, em64t and Opteron processors Operating system: SUNOS, IBM AIX, Linux RedHat (Enterprise), Linux SuSE (SLES) Has the code been vectorized or parallelized?: Yes. The parallel version in the present release of the code is limited to the OpenMP calculation of the exchange potential V or V. The number of OpenMP threads can then be set in the input script. RAM: For a typical (isolated) biomolecule (e.g. Cytosine or Ribose) a converged calculation would require from 320 MB up to 2.5 GB. Word size: 64 bits Classification: 16.5 External routines: LAPACK (dsyev, dgetri, dgetrf) ( http://www.netlib.org/lapack/) Nature of problem: In this set of codes an efficient procedure is implemented to calculate partial cross section for the scattering between an electron/positron and a molecular target as a function of the collision energies. Solution method: The scattering equations are derived in the framework of the Single Center Expansion (SCE) procedure which allows the reduction of the original three-dimensional problem to a radial (one-dimensional) equation through the expansion of the scattering potential and the system wavefunction in a set of symmetry-adapted (real) spherical harmonics. The local part of the electrostatic interaction between the charged projectile (electron/positron) and the molecular target is provided in input by the SCELib library, which also provides the correlation and polarization corrections for the short-range and long-range part, respectively, of the interaction. A proper Application Programming Interface (API) is used by VOLSCAT to load the energy-independent part of the potential while the non-local exchange contribution is approximated by a local form and calculated on the fly in the VOLSCAT run for each desired collision energy. The resulting SCE one-dimensional homogeneous scattering equation is rewritten in an integral form by means of the standard Green's function technique resulting in a set of Volterra coupled equations which are solved to give the phase shifts and cross sections for any desired impact energy in terms of the partial components defined by the irreducible representations of the symmetry point group to which the target molecule belongs. The total cross section can then be straightforwardly calculated by summing over all the partial cross sections produced in the output. By the Breit-Wigner analysis of the eigenphase sum produced as a function of the energy one can also get information on the location of possible resonance states arising in the collision process. Restrictions: Depending on the molecular system under study and on the operating conditions the program may or may not fit into available RAM memory. Additional comments: A beta version of SCELib-API is included in the distribution package. Running time: The execution time strongly depends on the molecular target description and on the hardware/OS chosen, it is directly proportional to the (r,θ,φ) grid size and to the number of angular basis functions used.

  2. Site Characterization Technologies for DNAPL Investigations

    EPA Pesticide Factsheets

    This document is intended to help managers at sites with potential or confirmed DNAPL contamination identify suitable characterization technologies, screen the technologies for potential application, learn about applications at similar sites, and...

  3. Automated quantitative micro-mineralogical characterization for environmental applications

    USGS Publications Warehouse

    Smith, Kathleen S.; Hoal, K.O.; Walton-Day, Katherine; Stammer, J.G.; Pietersen, K.

    2013-01-01

    Characterization of ore and waste-rock material using automated quantitative micro-mineralogical techniques (e.g., QEMSCAN® and MLA) has the potential to complement traditional acid-base accounting and humidity cell techniques when predicting acid generation and metal release. These characterization techniques, which most commonly are used for metallurgical, mineral-processing, and geometallurgical applications, can be broadly applied throughout the mine-life cycle to include numerous environmental applications. Critical insights into mineral liberation, mineral associations, particle size, particle texture, and mineralogical residence phase(s) of environmentally important elements can be used to anticipate potential environmental challenges. Resources spent on initial characterization result in lower uncertainties of potential environmental impacts and possible cost savings associated with remediation and closure. Examples illustrate mineralogical and textural characterization of fluvial tailings material from the upper Arkansas River in Colorado.

  4. Advanced NDE techniques for quantitative characterization of aircraft

    NASA Technical Reports Server (NTRS)

    Heyman, Joseph S.; Winfree, William P.

    1990-01-01

    Recent advances in nondestructive evaluation (NDE) at NASA Langley Research Center and their applications that have resulted in quantitative assessment of material properties based on thermal and ultrasonic measurements are reviewed. Specific applications include ultrasonic determination of bolt tension, ultrasonic and thermal characterization of bonded layered structures, characterization of composite materials, and disbonds in aircraft skins.

  5. 40 CFR 761.260 - Applicability.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Site Characterization Sampling for PCB Remediation Waste in Accordance with § 761.61(a)(2) § 761.260 Applicability. This subpart provides a method for collecting new data for characterizing a PCB remediation waste...

  6. Snowflake: A Lightweight Portable Stencil DSL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Nathan; Driscoll, Michael; Markley, Charles

    Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less

  7. Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.

    2017-10-20

    Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less

  8. Snowflake: A Lightweight Portable Stencil DSL

    DOE PAGES

    Zhang, Nathan; Driscoll, Michael; Markley, Charles; ...

    2017-05-01

    Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less

  9. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

    PubMed Central

    Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin

    2018-01-01

    Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405

  10. High-Performance Reactive Particle Tracking with Adaptive Representation

    NASA Astrophysics Data System (ADS)

    Schmidt, M.; Benson, D. A.; Pankavich, S.

    2017-12-01

    Lagrangian particle tracking algorithms have been shown to be effective tools for modeling chemical reactions in imperfectly-mixed media. One disadvantage of these algorithms is the possible need to employ large numbers of particles in simulations, depending on the concentration covariance structure, and these large particle numbers can lead to long computation times. Two distinct approaches have recently arisen to overcome this. One method employs spatial kernels that are related to a specified, reduced particle number; however, over-wide kernels, dictated by a very low particle number, lead to an excess of reaction calculations and cause a reduction in performance. Another formulation involves hybrid particles that carry multiple species of reactant, wherein each particle is treated as its own well-mixed volume, obviating the need for large numbers of particles for each species but still requiring a fixed number of hybrid particles. Here, we combine these two approaches and demonstrate an improved method for simulating a given system in a computationally efficient manner. Additionally, the independent nature of transport and reaction calculations in this approach allows for significant gains via parallelization in an MPI or OpenMP context. For benchmarking, we choose a CO2 injection simulation with dissolution and precipitation of calcite and dolomite, allowing us to derive the proper treatment of interaction between solid and aqueous phases.

  11. Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing

    NASA Astrophysics Data System (ADS)

    Adnan; Sato, Mitsuhisa

    Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.

  12. Hot Chips and Hot Interconnects for High End Computing Systems

    NASA Technical Reports Server (NTRS)

    Saini, Subhash

    2005-01-01

    I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).

  13. Study on Development of 1D-2D Coupled Real-time Urban Inundation Prediction model

    NASA Astrophysics Data System (ADS)

    Lee, Seungsoo

    2017-04-01

    In recent years, we are suffering abnormal weather condition due to climate change around the world. Therefore, countermeasures for flood defense are urgent task. In this research, study on development of 1D-2D coupled real-time urban inundation prediction model using predicted precipitation data based on remote sensing technology is conducted. 1 dimensional (1D) sewerage system analysis model which was introduced by Lee et al. (2015) is used to simulate inlet and overflow phenomena by interacting with surface flown as well as flows in conduits. 2 dimensional (2D) grid mesh refinement method is applied to depict road networks for effective calculation time. 2D surface model is coupled with 1D sewerage analysis model in order to consider bi-directional flow between both. Also parallel computing method, OpenMP, is applied to reduce calculation time. The model is estimated by applying to 25 August 2014 extreme rainfall event which caused severe inundation damages in Busan, Korea. Oncheoncheon basin is selected for study basin and observed radar data are assumed as predicted rainfall data. The model shows acceptable calculation speed with accuracy. Therefore it is expected that the model can be used for real-time urban inundation forecasting system to minimize damages.

  14. Optimization of computations for adjoint field and Jacobian needed in 3D CSEM inversion

    NASA Astrophysics Data System (ADS)

    Dehiya, Rahul; Singh, Arun; Gupta, Pravin K.; Israil, M.

    2017-01-01

    We present the features and results of a newly developed code, based on Gauss-Newton optimization technique, for solving three-dimensional Controlled-Source Electromagnetic inverse problem. In this code a special emphasis has been put on representing the operations by block matrices for conjugate gradient iteration. We show how in the computation of Jacobian, the matrix formed by differentiation of system matrix can be made independent of frequency to optimize the operations at conjugate gradient step. The coarse level parallel computing, using OpenMP framework, is used primarily due to its simplicity in implementation and accessibility of shared memory multi-core computing machine to almost anyone. We demonstrate how the coarseness of modeling grid in comparison to source (comp`utational receivers) spacing can be exploited for efficient computing, without compromising the quality of the inverted model, by reducing the number of adjoint calls. It is also demonstrated that the adjoint field can even be computed on a grid coarser than the modeling grid without affecting the inversion outcome. These observations were reconfirmed using an experiment design where the deviation of source from straight tow line is considered. Finally, a real field data inversion experiment is presented to demonstrate robustness of the code.

  15. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

    PubMed

    Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

    2010-10-01

    Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  16. Code Modernization of VPIC

    NASA Astrophysics Data System (ADS)

    Bird, Robert; Nystrom, David; Albright, Brian

    2017-10-01

    The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.

  17. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

    DOE PAGES

    Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.; ...

    2018-01-05

    Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less

  18. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.

    Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less

  19. Hybrid Parallel Contour Trees, Version 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sewell, Christopher; Fasel, Patricia; Carr, Hamish

    A common operation in scientific visualization is to compute and render a contour of a data set. Given a function of the form f : R^d -> R, a level set is defined as an inverse image f^-1(h) for an isovalue h, and a contour is a single connected component of a level set. The Reeb graph can then be defined to be the result of contracting each contour to a single point, and is well defined for Euclidean spaces or for general manifolds. For simple domains, the graph is guaranteed to be a tree, and is called the contourmore » tree. Analysis can then be performed on the contour tree in order to identify isovalues of particular interest, based on various metrics, and render the corresponding contours, without having to know such isovalues a priori. This code is intended to be the first data-parallel algorithm for computing contour trees. Our implementation will use the portable data-parallel primitives provided by Nvidia’s Thrust library, allowing us to compile our same code for both GPUs and multi-core CPUs. Native OpenMP and purely serial versions of the code will likely also be included. It will also be extended to provide a hybrid data-parallel / distributed algorithm, allowing scaling beyond a single GPU or CPU.« less

  20. QR-decomposition based SENSE reconstruction using parallel architecture.

    PubMed

    Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad

    2018-04-01

    Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. The characterization of weighted local hardy spaces on domains and its application.

    PubMed

    Wang, Heng-geng; Yang, Xiao-ming

    2004-09-01

    In this paper, we give the four equivalent characterizations for the weighted local hardy spaces on Lipschitz domains. Also, we give their application for the harmonic function defined in bounded Lipschitz domains.

  2. Preparation and characterization of Sb2Se3 devices for memory applications

    NASA Astrophysics Data System (ADS)

    Shylashree, N.; Uma B., V.; Dhanush, S.; Abachi, Sagar; Nisarga, A.; Aashith, K.; Sangeetha B., G.

    2018-05-01

    In this paper, A phase change material of Sb2Se3 was proposed for non volatile memory application. The thin film device preparation and characterization were carried out. The deposition method used was vapor evaporation technique and a thickness of 180nm was deposited. The switching between the SET and RESET state is shown by the I-V characterization. The change of phase was studied using R-V characterization. Different fundamental modes were also identified using Raman spectroscopy.

  3. Invited article: Dielectric material characterization techniques and designs of high-Q resonators for applications from micro to millimeter-waves frequencies applicable at room and cryogenic temperatures.

    PubMed

    Le Floch, Jean-Michel; Fan, Y; Humbert, Georges; Shan, Qingxiao; Férachou, Denis; Bara-Maillet, Romain; Aubourg, Michel; Hartnett, John G; Madrangeas, Valerie; Cros, Dominique; Blondy, Jean-Marc; Krupka, Jerzy; Tobar, Michael E

    2014-03-01

    Dielectric resonators are key elements in many applications in micro to millimeter wave circuits, including ultra-narrow band filters and frequency-determining components for precision frequency synthesis. Distributed-layered and bulk low-loss crystalline and polycrystalline dielectric structures have become very important for building these devices. Proper design requires careful electromagnetic characterization of low-loss material properties. This includes exact simulation with precision numerical software and precise measurements of resonant modes. For example, we have developed the Whispering Gallery mode technique for microwave applications, which has now become the standard for characterizing low-loss structures. This paper will give some of the most common characterization techniques used in the micro to millimeter wave regime at room and cryogenic temperatures for designing high-Q dielectric loaded cavities.

  4. Instruction-level performance modeling and characterization of multimedia applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luo, Y.; Cameron, K.W.

    1999-06-01

    One of the challenges for characterizing and modeling realistic multimedia applications is the lack of access to source codes. On-chip performance counters effectively resolve this problem by monitoring run-time behaviors at the instruction-level. This paper presents a novel technique of characterizing and modeling workloads at the instruction level for realistic multimedia applications using hardware performance counters. A variety of instruction counts are collected from some multimedia applications, such as RealPlayer, GSM Vocoder, MPEG encoder/decoder, and speech synthesizer. These instruction counts can be used to form a set of abstract characteristic parameters directly related to a processor`s architectural features. Based onmore » microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. The biggest advantage of this new characterization technique is a better understanding of processor utilization efficiency and architectural bottleneck for each application. This technique also provides predictive insight of future architectural enhancements and their affect on current codes. In this paper the authors also attempt to model architectural effect on processor utilization without memory influence. They derive formulas for calculating CPI{sub 0}, CPI without memory effect, and they quantify utilization of architectural parameters. These equations are architecturally diagnostic and predictive in nature. Results provide promise in code characterization, and empirical/analytical modeling.« less

  5. Real-time X-ray Diffraction: Applications to Materials Characterization

    NASA Technical Reports Server (NTRS)

    Rosemeier, R. G.

    1984-01-01

    With the high speed growth of materials it becomes necessary to develop measuring systems which also have the capabilities of characterizing these materials at high speeds. One of the conventional techniques of characterizing materials was X-ray diffraction. Film, which is the oldest method of recording the X-ray diffraction phenomenon, is not quite adequate in most circumstances to record fast changing events. Even though conventional proportional counters and scintillation counters can provide the speed necessary to record these changing events, they lack the ability to provide image information which may be important in some types of experiment or production arrangements. A selected number of novel applications of using X-ray diffraction to characterize materials in real-time are discussed. Also, device characteristics of some X-ray intensifiers useful in instantaneous X-ray diffraction applications briefly presented. Real-time X-ray diffraction experiments with the incorporation of image X-ray intensification add a new dimension in the characterization of materials. The uses of real-time image intensification in laboratory and production arrangements are quite unlimited and their application depends more upon the ingenuity of the scientist or engineer.

  6. Applications of Surface Plasmon Resonance for Characterization of Molecules Important in the Pathogenesis and Treatment of Neurodegenerative Diseases

    PubMed Central

    Wittenberg, Nathan J.; Wootla, Bharath; Jordan, Luke R.; Denic, Aleksandar; Warrington, Arthur E.; Oh, Sang-Hyun; Rodriguez, Moses

    2014-01-01

    Characterization of binding kinetics and affinity between a potential new drug and its receptor are key steps in the development of new drugs. Among the techniques available to determine binding affinities, surface plasmon resonance has emerged as the gold standard because it can measure binding and dissociation rates in real-time in a label-free fashion. Surface plasmon resonance is now finding applications in the characterization of molecules for treatment of neurodegenerative diseases, characterization of molecules associated with pathogenesis of neurodegenerative diseases and detection of neurodegenerative disease biomarkers. In addition it has been used in the characterization of a new class of natural autoantibodies that have therapeutic potential in a number of neurologic diseases. In this review we will introduce surface plasmon resonance and describe some applications of the technique that pertain to neurodegenerative disorders and their treatment. PMID:24625008

  7. OFF, Open source Finite volume Fluid dynamics code: A free, high-order solver based on parallel, modular, object-oriented Fortran API

    NASA Astrophysics Data System (ADS)

    Zaghi, S.

    2014-07-01

    OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git has been adopted in order to facilitate the collaborative maintenance and improvement of the code; CopyrightsOFF is a free software that anyone can use, copy, distribute, study, change and improve under the GNU Public License version 3. The present paper is a manifesto of OFF code and presents the currently implemented features and ongoing developments. This work is focused on the computational techniques adopted and a detailed description of the main API characteristics is reported. OFF capabilities are demonstrated by means of one and two dimensional examples and a three dimensional real application.

  8. Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

    NASA Technical Reports Server (NTRS)

    Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

    2004-01-01

    High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel Benchmarks (NPB). In this paper, we present some interesting performance results of ow OpenMP parallel implementation on different architectures such as the SGI Origin2000, SGI Altix, and Cray MTA-2.

  9. List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

    NASA Astrophysics Data System (ADS)

    Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

    2014-03-01

    List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.

  10. Numerical Aspects of Nonhydrostatic Implementations Applied to a Parallel Finite Element Tsunami Model

    NASA Astrophysics Data System (ADS)

    Fuchs, A.; Androsov, A.; Harig, S.; Hiller, W.; Rakowsky, N.

    2012-04-01

    Based on the jeopardy of devastating tsunamis and the unpredictability of such events, tsunami modelling as part of warning systems is still a contemporary topic. The tsunami group of Alfred Wegener Institute developed the simulation tool TsunAWI as contribution to the Early Warning System in Indonesia. Although the precomputed scenarios for this purpose qualify for satisfying deliverables, the study of further improvements continues. While TsunAWI is governed by the Shallow Water Equations, an extension of the model is based on a nonhydrostatic approach. At the arrival of a tsunami wave in coastal regions with rough bathymetry, the term containing the nonhydrostatic part of pressure, that is neglected in the original hydrostatic model, gains in importance. In consideration of this term, a better approximation of the wave is expected. Differences of hydrostatic and nonhydrostatic model results are contrasted in the standard benchmark problem of a solitary wave runup on a plane beach. The observation data provided by Titov and Synolakis (1995) serves as reference. The nonhydrostatic approach implies a set of equations that are similar to the Shallow Water Equations, so the variation of the code can be implemented on top. However, this additional routines cause a lot of issues you have to cope with. So far the computations of the model were purely explicit. In the nonhydrostatic version the determination of an additional unknown and the solution of a large sparse system of linear equations is necessary. The latter constitutes the lion's share of computing time and memory requirement. Since the corresponding matrix is only symmetric in structure and not in values, an iterative Krylov Subspace Method is used, in particular the restarted Generalized Minimal Residual Algorithm GMRES(m). With regard to optimization, we present a comparison of several combinations of sequential and parallel preconditioning techniques respective number of iterations and setup/application time. Since the used software package pARMS 3.2, that provides solving and preconditioning techniques, works via MPI parallelism, in an auxiliary branch we adapted TsunAWI and switched from OpenMP to MPI with attached importance to internal partition management.

  11. PhysiCell: An open source physics-based cell simulator for 3-D multicellular systems

    PubMed Central

    Ghaffarizadeh, Ahmadreza; Mumenthaler, Shannon M.

    2018-01-01

    Many multicellular systems problems can only be understood by studying how cells move, grow, divide, interact, and die. Tissue-scale dynamics emerge from systems of many interacting cells as they respond to and influence their microenvironment. The ideal “virtual laboratory” for such multicellular systems simulates both the biochemical microenvironment (the “stage”) and many mechanically and biochemically interacting cells (the “players” upon the stage). PhysiCell—physics-based multicellular simulator—is an open source agent-based simulator that provides both the stage and the players for studying many interacting cells in dynamic tissue microenvironments. It builds upon a multi-substrate biotransport solver to link cell phenotype to multiple diffusing substrates and signaling factors. It includes biologically-driven sub-models for cell cycling, apoptosis, necrosis, solid and fluid volume changes, mechanics, and motility “out of the box.” The C++ code has minimal dependencies, making it simple to maintain and deploy across platforms. PhysiCell has been parallelized with OpenMP, and its performance scales linearly with the number of cells. Simulations up to 105-106 cells are feasible on quad-core desktop workstations; larger simulations are attainable on single HPC compute nodes. We demonstrate PhysiCell by simulating the impact of necrotic core biomechanics, 3-D geometry, and stochasticity on the dynamics of hanging drop tumor spheroids and ductal carcinoma in situ (DCIS) of the breast. We demonstrate stochastic motility, chemical and contact-based interaction of multiple cell types, and the extensibility of PhysiCell with examples in synthetic multicellular systems (a “cellular cargo delivery” system, with application to anti-cancer treatments), cancer heterogeneity, and cancer immunology. PhysiCell is a powerful multicellular systems simulator that will be continually improved with new capabilities and performance improvements. It also represents a significant independent code base for replicating results from other simulation platforms. The PhysiCell source code, examples, documentation, and support are available under the BSD license at http://PhysiCell.MathCancer.org and http://PhysiCell.sf.net. PMID:29474446

  12. NUTS and BOLTS: Applications of Fluorescence Detected Sedimentation

    PubMed Central

    Kroe, Rachel R.; Laue, Thomas M.

    2008-01-01

    Analytical ultracentrifugation is a widely used method for characterizing the solution behavior of macromolecules. However, the two commonly used detectors (absorbance and interference) impose some fundamental restrictions on the concentrations and complexity of the solutions that can be analyzed. The recent addition of a fluorescence detector for the XL-I analytical ultracentrifuge (AU-FDS) enables two different types of sedimentation experiments. First, the AU-FDS can detect picomolar concentrations of labeled solutes allowing the characterization of very dilute solutions of macromolecules, applications we call Normal Use Tracer Sedimentation (NUTS). The great sensitivity of NUTS analysis allows the characterization of small quantities of materials and high affinity interactions. Second, AU-FDS allows characterization of trace quantities of labeled molecules in solutions containing high concentrations and complex mixtures of unlabeled molecules, applications we call Biological On Line Tracer Sedimentation (BOLTS). The discrimination of BOLTS enables the size distribution of a labeled macromolecule to be determined in biological milieu such as cell lysates and serum. Examples are presented that embody features of both NUTS and BOLTS applications, along with our observations on these applications. PMID:19103145

  13. Capillary electrophoresis application in metal speciation and complexation characterization

    USDA-ARS?s Scientific Manuscript database

    Capillary electrophoresis is amenable to the separation of metal ionic species and the characterization of metal-ligand interactions. This book chapter reviews and discusses three representative case studies in applications of CE technology in speciation and reactions of metal with organic molecules...

  14. 40 CFR 761.260 - Applicability.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) TOXIC SUBSTANCES CONTROL ACT... Site Characterization Sampling for PCB Remediation Waste in Accordance with § 761.61(a)(2) § 761.260 Applicability. This subpart provides a method for collecting new data for characterizing a PCB remediation waste...

  15. 40 CFR 761.260 - Applicability.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) TOXIC SUBSTANCES CONTROL ACT... Site Characterization Sampling for PCB Remediation Waste in Accordance with § 761.61(a)(2) § 761.260 Applicability. This subpart provides a method for collecting new data for characterizing a PCB remediation waste...

  16. 40 CFR 761.260 - Applicability.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) TOXIC SUBSTANCES CONTROL ACT... Site Characterization Sampling for PCB Remediation Waste in Accordance with § 761.61(a)(2) § 761.260 Applicability. This subpart provides a method for collecting new data for characterizing a PCB remediation waste...

  17. 40 CFR 761.260 - Applicability.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) TOXIC SUBSTANCES CONTROL ACT... Site Characterization Sampling for PCB Remediation Waste in Accordance with § 761.61(a)(2) § 761.260 Applicability. This subpart provides a method for collecting new data for characterizing a PCB remediation waste...

  18. Design and implementation of handheld and desktop software for the structured reporting of hepatic masses using the LI-RADS schema.

    PubMed

    Clark, Toshimasa J; McNeeley, Michael F; Maki, Jeffrey H

    2014-04-01

    The Liver Imaging Reporting and Data System (LI-RADS) can enhance communication between radiologists and clinicians if applied consistently. We identified an institutional need to improve liver imaging report standardization and developed handheld and desktop software to serve this purpose. We developed two complementary applications that implement the LI-RADS schema. A mobile application for iOS devices written in the Objective-C language allows for rapid characterization of hepatic observations under a variety of circumstances. A desktop application written in the Java language allows for comprehensive observation characterization and standardized report text generation. We chose the applications' languages and feature sets based on the computing resources of target platforms, anticipated usage scenarios, and ease of application installation, deployment, and updating. Our primary results are the publication of the core source code implementing the LI-RADS algorithm and the availability of the applications for use worldwide via our website, http://www.liradsapp.com/. The Java application is free open-source software that can be integrated into nearly any vendor's reporting system. The iOS application is distributed through Apple's iTunes App Store. Observation categorizations of both programs have been manually validated to be correct. The iOS application has been used to characterize liver tumors during multidisciplinary conferences of our institution, and several faculty members, fellows, and residents have adopted the generated text of Java application into their diagnostic reports. Although these two applications were developed for the specific reporting requirements of our liver tumor service, we intend to apply this development model to other diseases as well. Through semiautomated structured report generation and observation characterization, we aim to improve patient care while increasing radiologist efficiency. Published by Elsevier Inc.

  19. Linear polarizer local characterizations by polarimetric imaging for applications to polarimetric sensors for torque measurement for hybrid cars

    NASA Astrophysics Data System (ADS)

    Georges, F.; Remouche, M.; Meyrueis, P.

    2011-06-01

    Usually manufacturer's specifications do not deal with the ability of linear sheet polarizers to have a constant transmittance function over their geometric area. These parameters are fundamental for developing low cost polarimetric sensors(for instance rotation, torque, displacement) specifically for hybrid car (thermic + electricity power). It is then necessary to specially characterize commercial polarizers sheets to find if they are adapted to this kind of applications. In this paper, we present measuring methods and bench developed for this purpose, and some preliminary characterization results. We state conclusions for effective applications to hybrid car gearbox control and monitoring.

  20. Application of metamaterial concepts to sensors and chipless RFID

    NASA Astrophysics Data System (ADS)

    Martín, F.; Herrojo, C.; Vélez, P.; Su, L.; Mata-Contreras, J.; Paredes, F.

    2018-02-01

    Several strategies for the implementation of microwave sensors based on the use of metamaterial-inspired resonators are pointed out, and examples of applications, including sensors for dielectric characterization and sensors for the measurement of spatial variables, are provided. It will be also shown that novel microwave encoders for chipless RFID systems with very high data capacity can be implemented. The fields of applications of the devices discussed in this talk include dielectric characterization of solids and liquids, angular velocity sensors for space applications, and near-field chipless RFID systems for secure paper applications, among others.

  1. Thyroid tissue constituents characterization and application to in vivo studies by broadband (600-1200 nm) diffuse optical spectroscopy

    NASA Astrophysics Data System (ADS)

    Konugolu Venkata Sekar, Sanathana; Farina, Andrea; dalla Mora, Alberto; Taroni, Paola; Lindner, Claus; Mora, Mireia; Farzam, Parisa; Pagliazzi, Marco; Squarcia, Mattia; Halperin, Irene; Hanzu, Felicia A.; Dehghani, Hamid; Durduran, Turgut; Pifferi, Antonio

    2017-07-01

    We present the first broadband (600-1100 nm) diffuse optical characterization of thyroglobulin and tyrosine, which are thyroid-specific tissue constituents. In-vivo measurements at the thyroid region enabled their quantification for functional and diagnostic applications.

  2. Physico-chemical characterization of functionalized polypropylenic fibers for prosthetic applications

    NASA Astrophysics Data System (ADS)

    Nisticò, Roberto; Faga, Maria Giulia; Gautier, Giovanna; Magnacca, Giuliana; D'Angelo, Domenico; Ciancio, Emanuele; Piacenza, Giacomo; Lamberti, Roberta; Martorana, Selanna

    2012-08-01

    Polypropylene (PP) fibers can be manufactured to form nets which can find application as prosthesis in hernioplasty. One of the most important problem to deal with when nets are applied in vivo consists in the reproduction of bacteria within the net fibers intersections. This occurs right after the application of the prosthesis, and causes infections, thus it is fundamental to remove bacteria in the very early stage of the nets application. This paper deals with the physico-chemical characterization of such nets, pre-treated by atmospheric pressure plasma dielectric barrier discharge apparatus (APP-DBD) and functionalized with an antibiotic drug such as chitosan. The physico-chemical characterization of sterilized nets, before and after the functionalization with chitosan, was carried out by means of scanning electron microscopy (SEM) coupled with EDS spectroscopy, FTIR spectroscopy, drop shape analysis (DSA), X-ray diffraction and thermal analyses (TGA and DSC). The aim of the work is to individuate a good strategy to characterize this kind of materials, to understand the effects of polypropylene pre-treatment on functionalization efficiency, to follow the materials ageing in order to study the effects of the surface treatment for in vivo applications.

  3. Energy technology characterizations handbook: environmental pollution and control factors. Third edition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1983-03-01

    This Handbook deals with environmental characterization information for a range of energy-supply systems and provides supplementary information on environmental controls applicable to a select group of environmentally characterized energy systems. Environmental residuals, physical-resource requirements, and discussion of applicable standards are the principal information provided. The quantitative and qualitative data provided are useful for evaluating alternative policy and technical strategies and for assessing the environmental impact of facility siting, energy production, and environmental controls.

  4. Efficient in-situ visualization of unsteady flows in climate simulation

    NASA Astrophysics Data System (ADS)

    Vetter, Michael; Olbrich, Stephan

    2017-04-01

    The simulation of climate data tends to produce very large data sets, which hardly can be processed in classical post-processing visualization applications. Typically, the visualization pipeline consisting of the processes data generation, visualization mapping and rendering is distributed into two parts over the network or separated via file transfer. Within most traditional post-processing scenarios the simulation is done on a supercomputer whereas the data analysis and visualization is done on a graphics workstation. That way temporary data sets with huge volume have to be transferred over the network, which leads to bandwidth bottlenecks and volume limitations. The solution to this issue is the avoidance of temporary storage, or at least significant reduction of data complexity. Within the Climate Visualization Lab - as part of the Cluster of Excellence "Integrated Climate System Analysis and Prediction" (CliSAP) at the University of Hamburg, in cooperation with the German Climate Computing Center (DKRZ) - we develop and integrate an in-situ approach. Our software framework DSVR is based on the separation of the process chain between the mapping and the rendering processes. It couples the mapping process directly to the simulation by calling methods of a parallelized data extraction library, which create a time-based sequence of geometric 3D scenes. This sequence is stored on a special streaming server with an interactive post-filtering option and then played-out asynchronously in a separate 3D viewer application. Since the rendering is part of this viewer application, the scenes can be navigated interactively. In contrast to other in-situ approaches where 2D images are created as part of the simulation or synchronous co-visualization takes place, our method supports interaction in 3D space and in time, as well as fixed frame rates. To integrate in-situ processing based on our DSVR framework and methods in the ICON climate model, we are continuously evolving the data structures and mapping algorithms of the framework to support the ICON model's native grid structures, since DSVR originally was designed for rectilinear grids only. We now have implemented a new output module to ICON to take advantage of the DSVR visualization. The visualization can be configured as most output modules by using a specific namelist and is exemplarily integrated within the non-hydrostatic atmospheric model time loop. With the integration of a DSVR based in-situ pathline extraction within ICON, a further milestone is reached. The pathline algorithm as well as the grid data structures have been optimized for the domain decomposition used for the parallelization of ICON based on MPI and OpenMP. The software implementation and evaluation is done on the supercomputers at DKRZ. In principle, the data complexity is reduced from O(n3) to O(m), where n is the grid resolution and m the number of supporting point of all pathlines. The stability and scalability evaluation is done using Atmospheric Model Intercomparison Project (AMIP) runs. We will give a short introduction in our software framework, as well as a short overview on the implementation and usage of DSVR within ICON. Furthermore, we will present visualization and evaluation results of sample applications.

  5. Resistance to abrasion of extrinsic porcelain esthetic characterization techniques.

    PubMed

    Chi, Woo J; Browning, William; Looney, Stephen; Mackert, J Rodway; Windhorn, Richard J; Rueggeberg, Frederick

    2017-01-01

    A novel esthetic porcelain characterization technique involves mixing an appropriate amount of ceramic colorants with clear, low-fusing porcelain (LFP), applying the mixture on the external surfaces, and firing the combined components onto the surface of restorations in a porcelain oven. This method may provide better esthetic qualities and toothbrush abrasion resistance compared to the conventional techniques of applying color-corrective porcelain colorants alone, or applying a clear glaze layer over the colorants. However, there is no scientific literature to support this claim. This research evaluated toothbrush abrasion resistance of a novel porcelain esthetic characterization technique by subjecting specimens to various durations of simulated toothbrush abrasion. The results were compared to those obtained using the conventional characterization techniques of colorant application only or colorant followed by placement of a clear over-glaze. Four experimental groups, all of which were a leucite reinforced ceramic of E TC1 (Vita A1) shade, were prepared and fired in a porcelain oven according to the manufacturer's instructions. Group S (stain only) was characterized by application of surface colorants to provide a definitive shade of Vita A3.5. Group GS (glaze over stain) was characterized by application of a layer of glaze over the existing colorant layer as used for Group S. Group SL (stain+LFP) was characterized by application of a mixture of colorants and clear low-fusing add-on porcelain to provide a definitive shade of Vita A3.5. Group C (Control) was used as a control without any surface characterization. The 4 groups were subjected to mechanical toothbrushing using a 1:1 water-to-toothpaste solution for a simulated duration of 32 years of clinical use. The amount of wear was measured at time intervals simulating every 4 years of toothbrushing. These parameters were evaluated longitudinally for all groups as well as compared at similar time points among groups. In this study, the novel external characterization technique (stain+LFP: Group SL) did not significantly enhance the wear resistance against toothbrush abrasion. Instead, the average wear of the applied extrinsic porcelain was 2 to 3 times more than Group S (stain only) and Group GS (glaze over stain). Application of a glaze layer over the colorants (Group GS) showed a significant improvement on wear resistance. Despite its superior physical properties, the leucite reinforced ceramic core (Group C) showed 2 to 4 times more wear when compared with other test groups. A conventional external esthetic characterization technique of applying a glaze layer over the colorants (Group GS) significantly enhanced the surface wear resistance to toothbrush abrasion when compared with other techniques involving application of colorants only (Group S) or mixture of colorant and LFP (Group SL). The underlying core ceramic had significantly less wear resistance compared with all externally characterized specimens. The novel esthetic characterization technique showed more wear and less color stability, and is thus not advocated as the "best" method for surface characterization. Application of a glaze layer provides a more wear-resistant surface from toothbrush abrasion when adjusting or extrinsically characterizing leucite reinforced ceramic restorations. Without the glaze layer, the restoration is subjected to a 2 to 4 times faster rate and amount of wear leading to possible shade mismatch.

  6. Nanoparticle surface characterization and clustering through concentration-dependent surface adsorption modeling.

    PubMed

    Chen, Ran; Zhang, Yuntao; Sahneh, Faryad Darabi; Scoglio, Caterina M; Wohlleben, Wendel; Haase, Andrea; Monteiro-Riviere, Nancy A; Riviere, Jim E

    2014-09-23

    Quantitative characterization of nanoparticle interactions with their surrounding environment is vital for safe nanotechnological development and standardization. A recent quantitative measure, the biological surface adsorption index (BSAI), has demonstrated promising applications in nanomaterial surface characterization and biological/environmental prediction. This paper further advances the approach beyond the application of five descriptors in the original BSAI to address the concentration dependence of the descriptors, enabling better prediction of the adsorption profile and more accurate categorization of nanomaterials based on their surface properties. Statistical analysis on the obtained adsorption data was performed based on three different models: the original BSAI, a concentration-dependent polynomial model, and an infinite dilution model. These advancements in BSAI modeling showed a promising development in the application of quantitative predictive modeling in biological applications, nanomedicine, and environmental safety assessment of nanomaterials.

  7. Ultrathin Ferroelectric Films: Growth, Characterization, Physics and Applications.

    PubMed

    Wang, Ying; Chen, Weijin; Wang, Biao; Zheng, Yue

    2014-09-11

    Ultrathin ferroelectric films are of increasing interests these years, owing to the need of device miniaturization and their wide spectrum of appealing properties. Recent advanced deposition methods and characterization techniques have largely broadened the scope of experimental researches of ultrathin ferroelectric films, pushing intensive property study and promising device applications. This review aims to cover state-of-the-art experimental works of ultrathin ferroelectric films, with a comprehensive survey of growth methods, characterization techniques, important phenomena and properties, as well as device applications. The strongest emphasis is on those aspects intimately related to the unique phenomena and physics of ultrathin ferroelectric films. Prospects and challenges of this field also have been highlighted.

  8. Ultrathin Ferroelectric Films: Growth, Characterization, Physics and Applications

    PubMed Central

    Wang, Ying; Chen, Weijin; Wang, Biao; Zheng, Yue

    2014-01-01

    Ultrathin ferroelectric films are of increasing interests these years, owing to the need of device miniaturization and their wide spectrum of appealing properties. Recent advanced deposition methods and characterization techniques have largely broadened the scope of experimental researches of ultrathin ferroelectric films, pushing intensive property study and promising device applications. This review aims to cover state-of-the-art experimental works of ultrathin ferroelectric films, with a comprehensive survey of growth methods, characterization techniques, important phenomena and properties, as well as device applications. The strongest emphasis is on those aspects intimately related to the unique phenomena and physics of ultrathin ferroelectric films. Prospects and challenges of this field also have been highlighted. PMID:28788196

  9. Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

    NASA Astrophysics Data System (ADS)

    Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

    2016-06-01

    CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

  10. Parallel, Distributed Scripting with Python

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, P J

    2002-05-24

    Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less

  11. LFRic: Building a new Unified Model

    NASA Astrophysics Data System (ADS)

    Melvin, Thomas; Mullerworth, Steve; Ford, Rupert; Maynard, Chris; Hobson, Mike

    2017-04-01

    The LFRic project, named for Lewis Fry Richardson, aims to develop a replacement for the Met Office Unified Model in order to meet the challenges which will be presented by the next generation of exascale supercomputers. This project, a collaboration between the Met Office, STFC Daresbury and the University of Manchester, builds on the earlier GungHo project to redesign the dynamical core, in partnership with NERC. The new atmospheric model aims to retain the performance of the current ENDGame dynamical core and associated subgrid physics, while also enabling a far greater scalability and flexibility to accommodate future supercomputer architectures. Design of the model revolves around a principle of a 'separation of concerns', whereby the natural science aspects of the code can be developed without worrying about the underlying architecture, while machine dependent optimisations can be carried out at a high level. These principles are put into practice through the development of an autogenerated Parallel Systems software layer (known as the PSy layer) using a domain-specific compiler called PSyclone. The prototype model includes a re-write of the dynamical core using a mixed finite element method, in which different function spaces are used to represent the various fields. It is able to run in parallel with MPI and OpenMP and has been tested on over 200,000 cores. In this talk an overview of the both the natural science and computational science implementations of the model will be presented.

  12. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  13. A GPU-accelerated semi-implicit fractional step method for numerical solutions of incompressible Navier-Stokes equations

    NASA Astrophysics Data System (ADS)

    Ha, Sanghyun; Park, Junshin; You, Donghyun

    2017-11-01

    Utility of the computational power of modern Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. Due to its serial and bandwidth-bound nature, the present choice of numerical methods is considered to be a good candidate for evaluating the potential of GPUs for solving Navier-Stokes equations using non-explicit time integration. An efficient algorithm is presented for GPU acceleration of the Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution method used in the semi-implicit fractional-step method. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while Navier-Stokes equations are computed on a GPU. Extension to multiple NVIDIA GPUs is implemented using NVLink supported by the Pascal architecture. Performance of the present method is experimented on multiple Tesla P100 GPUs compared with a single-core Xeon E5-2650 v4 CPU in simulations of boundary-layer flow over a flat plate. Supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (Ministry of Science, ICT and Future Planning NRF-2016R1E1A2A01939553, NRF-2014R1A2A1A11049599, and Ministry of Trade, Industry and Energy 201611101000230).

  14. Stochastic coalescence in finite systems: an algorithm for the numerical solution of the multivariate master equation.

    NASA Astrophysics Data System (ADS)

    Alfonso, Lester; Zamora, Jose; Cruz, Pedro

    2015-04-01

    The stochastic approach to coagulation considers the coalescence process going in a system of a finite number of particles enclosed in a finite volume. Within this approach, the full description of the system can be obtained from the solution of the multivariate master equation, which models the evolution of the probability distribution of the state vector for the number of particles of a given mass. Unfortunately, due to its complexity, only limited results were obtained for certain type of kernels and monodisperse initial conditions. In this work, a novel numerical algorithm for the solution of the multivariate master equation for stochastic coalescence that works for any type of kernels and initial conditions is introduced. The performance of the method was checked by comparing the numerically calculated particle mass spectrum with analytical solutions obtained for the constant and sum kernels, with an excellent correspondence between the analytical and numerical solutions. In order to increase the speedup of the algorithm, software parallelization techniques with OpenMP standard were used, along with an implementation in order to take advantage of new accelerator technologies. Simulations results show an important speedup of the parallelized algorithms. This study was funded by a grant from Consejo Nacional de Ciencia y Tecnologia de Mexico SEP-CONACYT CB-131879. The authors also thanks LUFAC® Computacion SA de CV for CPU time and all the support provided.

  15. Investigation of obstacle effect to improve conjugate heat transfer in backward facing step channel using fast simulation of incompressible flow

    NASA Astrophysics Data System (ADS)

    Nouri-Borujerdi, Ali; Moazezi, Arash

    2018-01-01

    The current study investigates the conjugate heat transfer characteristics for laminar flow in backward facing step channel. All of the channel walls are insulated except the lower thick wall under a constant temperature. The upper wall includes a insulated obstacle perpendicular to flow direction. The effect of obstacle height and location on the fluid flow and heat transfer are numerically explored for the Reynolds number in the range of 10 ≤ Re ≤ 300. Incompressible Navier-Stokes and thermal energy equations are solved simultaneously in fluid region by the upwind compact finite difference scheme based on flux-difference splitting in conjunction with artificial compressibility method. In the thick wall, the energy equation is obtained by Laplace equation. A multi-block approach is used to perform parallel computing to reduce the CPU time. Each block is modeled separately by sharing boundary conditions with neighbors. The developed program for modeling was written in FORTRAN language with OpenMP API. The obtained results showed that using of the multi-block parallel computing method is a simple robust scheme with high performance and high-order accurate. Moreover, the obtained results demonstrated that the increment of Reynolds number and obstacle height as well as decrement of horizontal distance between the obstacle and the step improve the heat transfer.

  16. Efficient Thread Labeling for Monitoring Programs with Nested Parallelism

    NASA Astrophysics Data System (ADS)

    Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee

    It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.

  17. OpenRBC: Redefining the Frontier of Red Blood Cell Simulations at Protein Resolution

    NASA Astrophysics Data System (ADS)

    Tang, Yu-Hang; Lu, Lu; Li, He; Grinberg, Leopold; Sachdeva, Vipin; Evangelinos, Constantinos; Karniadakis, George

    We present a from-scratch development of OpenRBC, a coarse-grained molecular dynamics code, which is capable of performing an unprecedented in silico experiment - simulating an entire mammal red blood cell lipid bilayer and cytoskeleton modeled by 4 million mesoscopic particles - on a single shared memory node. To achieve this, we invented an adaptive spatial searching algorithm to accelerate the computation of short-range pairwise interactions in an extremely sparse 3D space. The algorithm is based on a Voronoi partitioning of the point cloud of coarse-grained particles, and is continuously updated over the course of the simulation. The algorithm enables the construction of a lattice-free cell list, i.e. the key spatial searching data structure in our code, in O (N) time and space space with cells whose position and shape adapts automatically to the local density and curvature. The code implements NUMA/NUCA-aware OpenMP parallelization and achieves perfect scaling with up to hundreds of hardware threads. The code outperforms a legacy solver by more than 8 times in time-to-solution and more than 20 times in problem size, thus providing a new venue for probing the cytomechanics of red blood cells. This work was supported by the Department of Energy (DOE) Collaboratory on Mathematics for Mesoscopic Model- ing of Materials (CM4). YHT acknowledges partial financial support from an IBM Ph.D. Scholarship Award.

  18. Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

    NASA Astrophysics Data System (ADS)

    Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

    2017-03-01

    We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.

  19. Reasoning Maps: A Generally Applicable Method for Characterizing Hypothesis-Testing Behaviour. Research Report

    ERIC Educational Resources Information Center

    White, Brian

    2004-01-01

    This paper presents a generally applicable method for characterizing subjects' hypothesis-testing behaviour based on a synthesis that extends on previous work. Beginning with a transcript of subjects' speech and videotape of their actions, a Reasoning Map is created that depicts the flow of their hypotheses, tests, predictions, results, and…

  20. Dual-energy CT revisited with multidetector CT: review of principles and clinical applications.

    PubMed

    Karçaaltıncaba, Muşturay; Aktaş, Aykut

    2011-09-01

    Although dual-energy CT (DECT) was first conceived in the 1970s, it was not widely used for CT indications. Recently, the simultaneous acquisition of volumetric dual-energy data has been introduced using multidetector CT (MDCT) with two X-ray tubes and rapid kVp switching (gemstone spectral imaging). Two major advantages of DECT are material decomposition by acquiring two image series with different kVp and the elimination of misregistration artifacts. Hounsfield unit measurements by DECT are not absolute and can change depending on the kVp used for an acquisition. Typically, a combination of 80/140 kVp is used for DECT, but for some applications, 100/140 kVp is preferred. In this study, we summarized the clinical applications of DECT and included images that were acquired using the dual-source CT and rapid kVp switching. In general, unenhanced images can be avoided by using DECT for body and neurological applications; iodine can be removed from the image, and a virtual, non-contrast (water) image can be obtained. Neuroradiological applications allow for the removal of bone and calcium from the carotid and brain CT angiography. Thorax applications include perfusion imaging in patients with pulmonary thromboemboli and other chest diseases, xenon ventilation-perfusion imaging and solitary nodule characterization. Cardiac applications include dual-energy cardiac perfusion, viability and cardiac iron detection. The removal of calcific plaques from arteries, bone removal and aortic stent graft evaluation may be achieved in the vascular system. Abdominal applications include the detection and characterization of liver and pancreas masses, the diagnosis of steatosis and iron overload, DECT colonoscopy and CT cholangiography. Urinary system applications are urinary calculi characterization (uric acid vs. non-uric acid), renal cyst characterization and mass characterization. Musculoskeletal applications permit the differentiation of gout from pseudogout and a reduction of metal artifacts. Recent introduction of iterative reconstruction techniques can increase the use of DECT techniques; the use of dual energy in patients with a high BMI is limited due to noise and the radiation dose. DECT may be a good alternative to PET-CT. Iodine map images can quantify iodine uptake, and this approach may be more effective than obtaining non-contrast and post-contrast images for the diagnosis of a solid mass. Thus, computer-aided detection may be used more effectively in CT applications. DECT is a promising technique with potential clinical applications.

  1. Space Software Defined Radio Characterization to Enable Reuse

    NASA Technical Reports Server (NTRS)

    Mortensen, Dale J.; Bishop, Daniel W.; Chelmins, David

    2012-01-01

    NASA's Space Communication and Navigation Testbed is beginning operations on the International Space Station this year. The objective is to promote new software defined radio technologies and associated software application reuse, enabled by this first flight of NASA's Space Telecommunications Radio System architecture standard. The Space Station payload has three software defined radios onboard that allow for a wide variety of communications applications; however, each radio was only launched with one waveform application. By design the testbed allows new waveform applications to be uploaded and tested by experimenters in and outside of NASA. During the system integration phase of the testbed special waveform test modes and stand-alone test waveforms were used to characterize the SDR platforms for the future experiments. Characterization of the Testbed's JPL SDR using test waveforms and specialized ground test modes is discussed in this paper. One of the test waveforms, a record and playback application, can be utilized in a variety of ways, including new satellite on-orbit checkout as well as independent on-board testbed experiments.

  2. Characterization of interdigitated electrode piezoelectric fiber composites under high electrical and mechanical loading

    NASA Astrophysics Data System (ADS)

    Rodgers, John P.; Bent, Aaron A.; Hagood, Nesbitt W.

    1996-05-01

    The primary objective of this work is to develop a standard methodology for characterizing structural actuation systems intended for operation in high electrical and mechanical loading environments. The designed set of tests evaluates the performance of the active materials system under realistic operating conditions. The tests are also used to characterize piezoelectric fiber composites which have been developed as an alternative to monolithic piezoceramic wafers for structural actuation applications. The performance of this actuator system has been improved using an interdigitated electrode pattern, which orients the primary component of the electric field into the plane of the structure, enabling the use of the primary piezoelectric effect along the active fibers. One possible application of this technology is in the integral twist actuation of helicopter rotor blades for higher harmonic control. This application requires actuators which can withstand the harsh rotor blade operating environment. This includes large numbers of electrical and mechanical cycles with considerable centripetal and bending loads. The characterization tests include standard active material tests as well as application-driven tests which evaluate the performance of the actuators during simulated operation. Test results for several actuator configurations are provided, including S2 glass- reinforced and E-glass laminated actuators. The study concludes that the interdigitated electrode piezoelectric fiber composite actuator has great potential for high loading applications.

  3. Regionalization of land use impact models for life cycle assessment: Recommendations for their use on the global scale and their applicability to Brazil

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pavan, Ana Laura Raymundo, E-mail: laurarpavan@gmail.com; Ometto, Aldo Roberto; Department of Production Engineering, São Carlos School of Engineering, University of São Paulo, Av. Trabalhador São-Carlense 400, São Carlos 13566-590, SP

    Life Cycle Assessment (LCA) is the main technique for evaluate the environmental impacts of product life cycles. A major challenge in the field of LCA is spatial and temporal differentiation in Life Cycle Impact Assessment (LCIA) methods, especially impacts resulting from land occupation and land transformation. Land use characterization modeling has advanced considerably over the last two decades and many approaches have recently included crucial aspects such as geographic differentiation. Nevertheless, characterization models have so far not been systematically reviewed and evaluated to determine their applicability to South America. Given that Brazil is the largest country in South America, thismore » paper analyzes the main international characterization models currently available in the literature, with a view to recommending regionalized models applicable on a global scale for land use life cycle impact assessments, and discusses their feasibility for regionalized assessment in Brazil. The analytical methodology involves classification based on the following criteria: midpoint/endpoint approach, scope of application, area of data collection, biogeographical differentiation, definition of recovery time and reference situation; followed by an evaluation of thirteen scientific robustness and environmental relevance subcriteria. The results of the scope of application are distributed among 25% of the models developed for the European context, and 50% have a global scope. There is no consensus in the literature about the definition of parameters such biogeographical differentiation and reference situation, and our review indicates that 35% of the models use ecoregion division while 40% use the concept of potential natural vegetation. Four characterization models show high scores in terms of scientific robustness and environmental relevance. These models are recommended for application in land use life cycle impact assessments, and also to serve as references for the development or adaptation of regional methodological procedures for Brazil. - Highlights: • A discussion is made on performing regionalized impact assessments using spatial differentiation in LCA. • A review is made of 20 characterization models for land use impacts in Life Cycle Impact Assessment. • Four characterization models are recommended according to different land use impact pathways for application in Brazil.« less

  4. Intelligent signal analysis and recognition

    NASA Technical Reports Server (NTRS)

    Levinson, Robert; Helman, Daniel; Oswalt, Edward

    1987-01-01

    Progress in the research and development of self-organizing database system that can support the identification and characterization of signals in an RF environment is described. As the radio frequency spectrum becomes more crowded, there are a number of situations that require a characterization of the RF environment. This database system is designed to be practical in applications where communications and other instruments encounter a time varying and complex RF environment. The primary application of this system is the guidance and control of NASA's SETI Microwave Observing Project. Other possible applications include selection of telemety bands for communication with spacecraft, and the scheduling of antenna for radio astronomy are two examples where characterization of the RF environment is required. In these applications, the RF environment is constantly changing, and even experienced operators cannot quickly identify the multitude of signals that can be encountered. Some of these signals are repetitive, others appear to occur sporadically.

  5. Characterization of in-swath spray deposition for CP-11TT flat-fan nozzles used in low volume aerial application of crop production and protection materials

    USDA-ARS?s Scientific Manuscript database

    For aerial application of crop production and protection materials, a complex interaction of controllable and uncontrollable factors is involved. It is difficult to completely characterize spray drift and deposition, but estimates can be made with appropriate sampling protocol and analysis. With c...

  6. An application of geostatistics and fractal geometry for reservoir characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aasum, Y.; Kelkar, M.G.; Gupta, S.P.

    1991-03-01

    This paper presents an application of geostatistics and fractal geometry concepts for 2D characterization of rock properties (k and {phi}) in a dolomitic, layered-cake reservoir. The results indicate that lack of closely spaced data yield effectively random distributions of properties. Further, incorporation of geology reduces uncertainties in fractal interpolation of wellbore properties.

  7. A Video Game-Based Framework for Analyzing Human-Robot Interaction: Characterizing Interface Design in Real-Time Interactive Multimedia Applications

    DTIC Science & Technology

    2006-01-01

    segments video game interaction into domain-independent components which together form a framework that can be used to characterize real-time interactive...multimedia applications in general and HRI in particular. We provide examples of using the components in both the video game and the Unmanned Aerial

  8. Applications of capillary electrophoresis in characterizing recombinant protein therapeutics.

    PubMed

    Zhao, Shuai Sherry; Chen, David D Y

    2014-01-01

    The use of recombinant protein for therapeutic applications has increased significantly in the last three decades. The heterogeneity of these proteins, often caused by the complex biosynthesis pathways and the subsequent PTMs, poses a challenge for drug characterization to ensure its safety, quality, integrity, and efficacy. CE, with its simple instrumentation, superior separation efficiency, small sample consumption, and short analysis time, is a well-suited analytical tool for therapeutic protein characterization. Different separation modes, including CIEF, SDS-CGE, CZE, and CE-MS, provide complementary information of the proteins. The CE applications for recombinant therapeutic proteins from the year 2000 to June 2013 are reviewed and technical concerns are discussed in this article. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Characterization and Emulsification Properties of Rhamnolipid and Sophorolipid Biosurfactants and Their Applications

    PubMed Central

    Nguyen, Thu T.; Sabatini, David A.

    2011-01-01

    Due to their non-toxic nature, biodegradability and production from renewable resources, research has shown an increasing interest in the use of biosurfactants in a wide variety of applications. This paper reviews the characterization of rhamnolipid and sophorolipid biosurfactants based on their hydrophilicity/hydrophobicity and their ability to form microemulsions with a range of oils without additives. The use of the biosurfactants in applications such as detergency and vegetable oil extraction for biodiesel application is also discussed. Rhamnolipid was found to be a hydrophilic surfactant while sophorolipid was found to be very hydrophobic. Therefore, rhamnolipid and sophorolipid biosurfactants in mixtures showed robust performance in these applications. PMID:21541055

  10. High Performance Polymers and Composites (HiPPAC) Center

    NASA Technical Reports Server (NTRS)

    Mintz, Eric A.; Veazie, David

    2005-01-01

    NASA University Research Centers funding has allowed Clark Atlanta University (CAU) to establish a High Performance Polymers and Composites (HiPPAC) Research Center. Clark Atlanta University, through the HiPPAC Center has consolidated and expanded its polymer and composite research capabilities through the development of research efforts in: (1) Synthesis and characterization of polymeric NLO, photorefractive, and piezoelectric materials; (2) Characterization and engineering applications of induced strain smart materials; (3) Processable polyimides and additives to enhance polyimide processing for composite applications; (4) Fabrication and mechanical characterization of polymer based composites.

  11. Semiconductor Characterization: from Growth to Manufacturing

    NASA Astrophysics Data System (ADS)

    Colombo, Luigi

    The successful growth and/or deposition of materials for any application require basic understanding of the materials physics for a given device. At the beginning, the first and most obvious characterization tool is visual observation; this is particularly true for single crystal growth. The characterization tools are usually prioritized in order of ease of measurement, and have become especially sophisticated as we have moved from the characterization of macroscopic crystals and films to atomically thin materials and nanostructures. While a lot attention is devoted to characterization and understanding of materials physics at the nano level, the characterization of single crystals as substrates or active components is still critically important. In this presentation, I will review and discuss the basic materials characterization techniques used to get to the materials physics to bring crystals and thin films from research to manufacturing in the fields of infrared detection, non-volatile memories, and transistors. Finally I will present and discuss metrology techniques used to understand the physics and chemistry of atomically thin two-dimensional materials for future device applications.

  12. Instruction-Level Characterization of Scientific Computing Applications Using Hardware Performance Counters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luo, Y.; Cameron, K.W.

    1998-11-24

    Workload characterization has been proven an essential tool to architecture design and performance evaluation in both scientific and commercial computing areas. Traditional workload characterization techniques include FLOPS rate, cache miss ratios, CPI (cycles per instruction or IPC, instructions per cycle) etc. With the complexity of sophisticated modern superscalar microprocessors, these traditional characterization techniques are not powerful enough to pinpoint the performance bottleneck of an application on a specific microprocessor. They are also incapable of immediately demonstrating the potential performance benefit of any architectural or functional improvement in a new processor design. To solve these problems, many people rely on simulators,more » which have substantial constraints especially on large-scale scientific computing applications. This paper presents a new technique of characterizing applications at the instruction level using hardware performance counters. It has the advantage of collecting instruction-level characteristics in a few runs virtually without overhead or slowdown. A variety of instruction counts can be utilized to calculate some average abstract workload parameters corresponding to microprocessor pipelines or functional units. Based on the microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. In particular, the analysis results can provide some insight to the problem that only a small percentage of processor peak performance can be achieved even for many very cache-friendly codes. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. Eventually, these abstract parameters can lead to the creation of an analytical microprocessor pipeline model and memory hierarchy model.« less

  13. Characterizations of biodegradable epoxy-coated cellulose nanofibrils (CNF) thin film for flexible microwave applications

    Treesearch

    Hongyi Mi; Chien-Hao Liu; Tzu-Husan Chang; Jung-Hun Seo; Huilong Zhang; Sang June Cho; Nader Behdad; Zhenqiang Ma; Chunhua Yao; Zhiyong Cai; Shaoqin Gong

    2016-01-01

    Wood pulp cellulose nanofibrils (CNF) thin film is a novel recyclable and biodegradable material. We investigated the microwave dielectric properties of the epoxy coated-CNF thin film for potential broad applications in flexible high speed electronics. The characterizations of dielectric properties were carried out in a frequency range of 1–10 GHz. The dielectric...

  14. On-Wafer Characterization of Millimeter-Wave Antennas for Wireless Applications

    NASA Technical Reports Server (NTRS)

    Simons, Rainee N.; Lee, Richard Q.

    1998-01-01

    The paper demonstrates a de-embedding technique and a direct on-substrate measurement technique for fast and inexpensive characterization of miniature antennas for wireless applications at millimeter-wave frequencies. The technique is demonstrated by measurements on a tapered slot antenna (TSA). The measured results at Ka-Band frequencies include input impedance, mutual coupling between two TSAs and absolute gain of TSA.

  15. Material characterization of active fiber composites for integral twist-actuated rotor blade application

    NASA Astrophysics Data System (ADS)

    Wickramasinghe, Viresh K.; Hagood, Nesbitt W.

    2004-10-01

    The primary objective of this work was to perform material characterization of the active fiber composite (AFC) actuator system for the Boeing active material rotor (AMR) blade application. The purpose of the AMR was to demonstrate active vibration control in helicopters through integral twist-actuation of the blade. The AFCs were a new structural actuator system consisting of piezoceramic fibers embedded in an epoxy matrix and sandwiched between interdigitated electrodes to enhance actuation performance. These conformable actuators were integrated directly into the blade spar laminate as active plies within the composite structure to perform structural control. Therefore, extensive electromechanical material characterization was required to evaluate AFCs both as actuators and as structural components of the blade. The characterization tests designed to extract important electromechanical properties under simulated blade operating conditions included nominal actuation tests, stress-strain tests and actuation under tensile load tests. This paper presents the test results as well as the comprehensive testing procedure developed to evaluate the relevant properties of the AFCs for structural application. The material characterization tests provided an invaluable insight into the behavior of the AFCs under various electromechanical conditions. The results from this comprehensive material characterization of the AFC actuator system supported the design and operation of the AMR blades scheduled for wind tunnel tests.

  16. Three-dimensional nanoscale characterisation of materials by atom probe tomography

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Devaraj, Arun; Perea, Daniel E.; Liu, Jia

    The development of three-dimensional (3D), characterization techniques with high spatial and mass resolution is crucial for understanding and developing advanced materials for many engineering applications as well as for understanding natural materials. In recent decades, atom probe tomography (APT) which combines a point projection microscope and time-of-flight mass spectrometer has evolved to be an excellent characterization technique capable of providing 3D nanoscale characterization of materials with sub-nanometer scale spatial resolution, with equal sensitivity for all elements. This review discusses the current state as of beginning of the year 2016 of APT instrumentation, new developments in sample preparation methods, experimental proceduresmore » for different material classes, reconstruction of APT results, the current status of correlative microscopy, and application of APT for microstructural characterization in established scientific areas like structural materials as well as new applications in semiconducting nanowires, semiconductor devices, battery materials, catalyst materials, geological materials and biological materials. Finally, a brief perspective is given regarding the future of APT.« less

  17. Nondestructive Evaluation Approaches Developed for Material Characterization in Aeronautics and Space Applications

    NASA Technical Reports Server (NTRS)

    Baaklini, George Y.; Kautz, Harold E.; Gyekenyesi, Andrew L.; Abdul-Aziz, Ali; Martin, Richard E.

    2001-01-01

    At the NASA Glenn Research Center, nondestructive evaluation (NDE) approaches were developed or tailored for characterizing advanced material systems. The emphasis was on high-temperature aerospace propulsion applications. The material systems included monolithic ceramics, superalloys, and high-temperature composites. In the aeronautics area, the major applications were cooled ceramic plate structures for turbine applications, gamma-TiAl blade materials for low-pressure turbines, thermoelastic stress analysis for residual stress measurements in titanium-based and nickel-based engine materials, and acousto-ultrasonics for creep damage assessment in nickel-based alloys. In the space area, applications consisted of cooled carbon-carbon composites for gas generator combustors and flywheel rotors composed of carbon-fiber-reinforced polymer matrix composites for energy storage on the International Space Station.

  18. Potential application of machine vision technology to saffron (Crocus sativus L.) quality characterization.

    PubMed

    Kiani, Sajad; Minaei, Saeid

    2016-12-01

    Saffron quality characterization is an important issue in the food industry and of interest to the consumers. This paper proposes an expert system based on the application of machine vision technology for characterization of saffron and shows how it can be employed in practical usage. There is a correlation between saffron color and its geographic location of production and some chemical attributes which could be properly used for characterization of saffron quality and freshness. This may be accomplished by employing image processing techniques coupled with multivariate data analysis for quantification of saffron properties. Expert algorithms can be made available for prediction of saffron characteristics such as color as well as for product classification. Copyright © 2016. Published by Elsevier Ltd.

  19. Metalorganic chemical vapor deposition and characterization of ZnO materials

    NASA Astrophysics Data System (ADS)

    Sun, Shangzu; Tompa, Gary S.; Hoerman, Brent; Look, David C.; Claflin, Bruce B.; Rice, Catherine E.; Masaun, Puneet

    2006-04-01

    Zinc oxide is attracting growing interest for potential applications in electronics, optoelectronics, photonics, and chemical and biochemical sensing, among other applications. We report herein our efforts in the growth and characterization of p- and n-type ZnO materials by metalorganic chemical vapor deposition (MOCVD), focusing on recent nitrogen-doped films grown using diethyl zinc as the zinc precursor and nitric oxide (NO) as the dopant. Characterization results, including resistivity, Hall measurements, photoluminescence, and SIMS, are reported and discussed. Electrical behavior was observed to be dependent on illumination, atmosphere, and heat treatment, especially for p-type material.

  20. Characterization of dielectric properties of nanocellulose from wood and algae for electrical insulator applications.

    PubMed

    Le Bras, David; Strømme, Maria; Mihranyan, Albert

    2015-05-07

    Cellulose is one of the oldest electrically insulating materials used in oil-filled high-power transformers and cables. However, reports on the dielectric properties of nanocellulose for electrical insulator applications are scarce. The aim of this study was to characterize the dielectric properties of two nanocellulose types from wood, viz., nanofibrillated cellulose (NFC), and algae, viz., Cladophora cellulose, for electrical insulator applications. The cellulose materials were characterized with X-ray diffraction, nitrogen gas and moisture sorption isotherms, helium pycnometry, mechanical testing, and dielectric spectroscopy at various relative humidities. The algae nanocellulose sample was more crystalline and had a lower moisture sorption capacity at low and moderate relative humidities, compared to NFC. On the other hand, it was much more porous, which resulted in lower strength and higher dielectric loss than for NFC. It is concluded that the solid-state properties of nanocellulose may have a substantial impact on the dielectric properties of electrical insulator applications.

  1. Synthesis, characterization, bioactivity and potential application of phenolic acid grafted chitosan: A review.

    PubMed

    Liu, Jun; Pu, Huimin; Liu, Shuang; Kan, Juan; Jin, Changhai

    2017-10-15

    In recent years, increasing attention has been paid to the grafting of phenolic acid onto chitosan in order to enhance the bioactivity and widen the application of chitosan. Here, we present a comprehensive overview on the recent advances of phenolic acid grafted chitosan (phenolic acid-g-chitosan) in many aspects, including the synthetic method, structural characterization, biological activity, physicochemical property and potential application. In general, four kinds of techniques including carbodiimide based coupling, enzyme catalyzed grafting, free radical mediated grafting and electrochemical methods are frequently used for the synthesis of phenolic acid-g-chitosan. The structural characterization of phenolic acid-g-chitosan can be determined by several instrumental methods. The physicochemical properties of chitosan are greatly altered after grafting. As compared with chitosan, phenolic acid-g-chitosan exhibits enhanced antioxidant, antimicrobial, antitumor, anti-allergic, anti-inflammatory, anti-diabetic and acetylcholinesterase inhibitory activities. Notably, phenolic acid-g-chitosan shows potential applications in many fields as coating agent, packing material, encapsulation agent and bioadsorbent. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. SITE CHARACTERIZATION AND ANALYSIS PENETROMETER SYSTEM(SCAPS) LAZER-INDUCED FLUORESCENCE (LIF) SENSOR AND SUPPORT SYSTEM

    EPA Science Inventory

    The Consortium for Site Characterization Technology (CSCT) has established a formal program to accelerate acceptance and application of innovative monitoring and site characterization technologies that improve the way the nation manages its environmental problems. In 1995 the CS...

  3. Application of geostatistics to coal-resource characterization and mine planning. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kauffman, P.W.; Walton, D.R.; Martuneac, L.

    1981-12-01

    Geostatistics is a proven method of ore reserve estimation in many non-coal mining areas but little has been published concerning its application to coal resources. This report presents the case for using geostatistics for coal mining applications and describes how a coal mining concern can best utilize geostatistical techniques for coal resource characterization and mine planning. An overview of the theory of geostatistics is also presented. Many of the applications discussed are documented in case studies that are a part of the report. The results of an exhaustive literature search are presented and recommendations are made for needed future researchmore » and demonstration projects.« less

  4. Colorimetric Characterization of Mobile Devices for Vision Applications.

    PubMed

    de Fez, Dolores; Luque, Maria José; García-Domene, Maria Carmen; Camps, Vicente; Piñero, David

    2016-01-01

    Available applications for vision testing in mobile devices usually do not include detailed setup instructions, sacrificing rigor to obtain portability and ease of use. In particular, colorimetric characterization processes are generally obviated. We show that different mobile devices differ also in colorimetric profile and that those differences limit the range of applications for which they are most adequate. The color reproduction characteristics of four mobile devices, two smartphones (Samsung Galaxy S4, iPhone 4s) and two tablets (Samsung Galaxy Tab 3, iPad 4), have been evaluated using two procedures: 3D LUT (Look Up Table) and a linear model assuming primary constancy and independence of the channels. The color reproduction errors have been computed with the CIEDE2000 color difference formula. There is good constancy of primaries but large deviations of additivity. The 3D LUT characterization yields smaller reproduction errors and dispersions for the Tab 3 and iPhone 4 devices, but for the iPad 4 and S4, both models are equally good. The smallest reproduction errors occur with both Apple devices, although the iPad 4 has the highest number of outliers of all devices with both colorimetric characterizations. Even though there is good constancy of primaries, the large deviations of additivity exhibited by the devices and the larger reproduction errors make any characterization based on channel independence not recommendable. The smartphone screens show, in average, the best color reproduction performance, particularly the iPhone 4, and therefore, they are more adequate for applications requiring precise color reproduction.

  5. Characterization of Nanocomposites by Thermal Analysis

    PubMed Central

    Corcione, Carola Esposito; Frigione, Mariaenrica

    2012-01-01

    In materials research, the development of polymer nanocomposites (PN) is rapidly emerging as a multidisciplinary research field with results that could broaden the applications of polymers to many different industries. PN are polymer matrices (thermoplastics, thermosets or elastomers) that have been reinforced with small quantities of nano-sized particles, preferably characterized by high aspect ratios, such as layered silicates and carbon nanotubes. Thermal analysis (TA) is a useful tool to investigate a wide variety of properties of polymers and it can be also applied to PN in order to gain further insight into their structure. This review illustrates the versatile applications of TA methods in the emerging field of polymer nanomaterial research, presenting some examples of applications of differential scanning calorimetry (DSC), thermogravimetric analysis (TGA), dynamic mechanical thermal analysis (DMTA) and thermal mechanical analysis (TMA) for the characterization of nanocomposite materials.

  6. On the importance of identifying, characterizing, and predicting fundamental phenomena towards microbial electrochemistry applications.

    PubMed

    Torres, César Iván

    2014-06-01

    The development of microbial electrochemistry research toward technological applications has increased significantly in the past years, leading to many process configurations. This short review focuses on the need to identify and characterize the fundamental phenomena that control the performance of microbial electrochemical cells (MXCs). Specifically, it discusses the importance of recent efforts to discover and characterize novel microorganisms for MXC applications, as well as recent developments to understand transport limitations in MXCs. As we increase our understanding of how MXCs operate, it is imperative to continue modeling efforts in order to effectively predict their performance, design efficient MXC technologies, and implement them commercially. Thus, the success of MXC technologies largely depends on the path of identifying, understanding, and predicting fundamental phenomena that determine MXC performance. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Characterizing a novel strain of Bacillus amyloliquefaciens BAC03 for potential biological control application

    USDA-ARS?s Scientific Manuscript database

    Aims: Identify and characterize a bacterial strain from suppressive soil, BAC03, evaluate its antimicrobial activity against Streptomyces scabies and other microorganisms, and characterize an antimicrobial substance produced by this strain. Methods and Results: Bacterial strain BAC03 (isolated from ...

  8. METAL OXIDE NANOPARTICLES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    FERNANDEZ-GARCIA,M.; RODGRIGUEZ, J.A.

    2007-10-01

    This chapter covers the fundamental science, synthesis, characterization, physicochemical properties and applications of oxide nanomaterials. Explains fundamental aspects that determine the growth and behavior of these systems, briefly examines synthetic procedures using bottom-up and top-down fabrication technologies, discusses the sophisticated experimental techniques and state of the art theory results used to characterize the physico-chemical properties of oxide solids and describe the current knowledge concerning key oxide materials with important technological applications.

  9. Fast and Accurate Support Vector Machines on Large Scale Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry

    Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less

  10. Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

    Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less

  11. Massively parallel and linear-scaling algorithm for second-order Moller–Plesset perturbation theory applied to the study of supramolecular wires

    DOE PAGES

    Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...

    2016-11-16

    Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less

  12. PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules

    NASA Astrophysics Data System (ADS)

    Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.

    2018-03-01

    The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.

  13. GPU-accelerated algorithms for many-particle continuous-time quantum walks

    NASA Astrophysics Data System (ADS)

    Piccinini, Enrico; Benedetti, Claudia; Siloi, Ilaria; Paris, Matteo G. A.; Bordone, Paolo

    2017-06-01

    Many-particle continuous-time quantum walks (CTQWs) represent a resource for several tasks in quantum technology, including quantum search algorithms and universal quantum computation. In order to design and implement CTQWs in a realistic scenario, one needs effective simulation tools for Hamiltonians that take into account static noise and fluctuations in the lattice, i.e. Hamiltonians containing stochastic terms. To this aim, we suggest a parallel algorithm based on the Taylor series expansion of the evolution operator, and compare its performances with those of algorithms based on the exact diagonalization of the Hamiltonian or a 4th order Runge-Kutta integration. We prove that both Taylor-series expansion and Runge-Kutta algorithms are reliable and have a low computational cost, the Taylor-series expansion showing the additional advantage of a memory allocation not depending on the precision of calculation. Both algorithms are also highly parallelizable within the SIMT paradigm, and are thus suitable for GPGPU computing. In turn, we have benchmarked 4 NVIDIA GPUs and 3 quad-core Intel CPUs for a 2-particle system over lattices of increasing dimension, showing that the speedup provided by GPU computing, with respect to the OPENMP parallelization, lies in the range between 8x and (more than) 20x, depending on the frequency of post-processing. GPU-accelerated codes thus allow one to overcome concerns about the execution time, and make it possible simulations with many interacting particles on large lattices, with the only limit of the memory available on the device.

  14. Green synthesis and characterization of novel gold nanocomposites for electrochemical sensing applications.

    PubMed

    Tanwar, Shivani; Ho, Ja-an Annie; Magi, Emanuele

    2013-12-15

    Synthesis, characterization and application of Au-PANI-Calix and Au-PANI-Nap nanocomposites, is reported herein. An easy template free green synthesis is proposed and discussed for easy expediency. A variety of analytical techniques were used to characterize the nanocomposites: UV-vis spectroscopy, Fourier transform infrared spectroscopy (FTIR), Raman spectroscopy, Dynamic light scattering (DLS), X-ray diffraction (XRD), Energy-dispersive X-ray spectroscopy (EDX), and X-ray photoelectron spectroscopy (XPS) were used to characterize the nanocomposites. Surface morphology was studied by transmission electron microscopy (TEM). The nanocomposites were immobilized on screen-printed electrode and showed electroactivity in neutral pH, making them promising candidates for various analytical applications. A sensitive and selective detection of Cu(2+) was perceived on the Au-PANI-Calix modified electrode with no interference from ions K(+), Ni(2+), Co(2+), Pb(2+), Cr(3+) with a detection limit of 10nM. The copper detection is facilitated for accessible ligation with 4-sulfocalix[4]arene, so as the Cu(II)-Calix complex formed. The electrode modified with Au-PANI-Nap showed sensing application towards H2O2 with a detection limit of 1 μM. The modified electrodes were reproducible and stable for 2 months. © 2013 Elsevier B.V. All rights reserved.

  15. A Modular Pipelined Processor for High Resolution Gamma-Ray Spectroscopy

    NASA Astrophysics Data System (ADS)

    Veiga, Alejandro; Grunfeld, Christian

    2016-02-01

    The design of a digital signal processor for gamma-ray applications is presented in which a single ADC input can simultaneously provide temporal and energy characterization of gamma radiation for a wide range of applications. Applying pipelining techniques, the processor is able to manage and synchronize very large volumes of streamed real-time data. Its modular user interface provides a flexible environment for experimental design. The processor can fit in a medium-sized FPGA device operating at ADC sampling frequency, providing an efficient solution for multi-channel applications. Two experiments are presented in order to characterize its temporal and energy resolution.

  16. The spectroscopic characterization of newly developed emissive materials and the effects of environment on their photophysical properties

    NASA Astrophysics Data System (ADS)

    McNamara, Louis Edward, III

    The development of new materials capable of efficient charge transfer and energy storage has become increasingly important in many areas of modern chemical research. This is especially true for the development of emissive optoelectronic devices and in the field of solar to electric energy conversion. The characterization of the photophysical properties of new molecular systems for these applications has become critical in the design and development of these materials. Many molecular building blocks have been developed and understanding the properties of these molecules at a fundamental level is essential for their successful implementation and future engineering. This dissertation focuses on the characterization of some of these newly-developed molecular systems. The spectroscopic studies focus on the characterization of newly-developed molecules based on perylene and indolizine derivatives for solar to electric energy conversion, thienopyrazine derivatives for near infrared (NIR) emissive applications, an SCS pincer complex for blue emissive materials and a fluorescent probe for medical applications. The effects of noncovalent interactions are also investigated on these systems and a benchmark biological molecule trimethylamine N-oxide (TMAO).

  17. Characterizing wood-plastic composites via data-driven methodologies

    Treesearch

    John G. Michopoulos; John C. Hermanson; Robert Badaliance

    2007-01-01

    The recent increase of wood-plastic composite materials in various application areas has underlined the need for an efficient and robust methodology to characterize their nonlinear anisotropic constitutive behavior. In addition, the multiplicity of various loading conditions in structures utilizing these materials further increases the need for a characterization...

  18. NASA Thermographic Inspection of Advanced Composite Materials

    NASA Technical Reports Server (NTRS)

    Cramer, K. Elliott

    2004-01-01

    As the use of advanced composite materials continues to increase in the aerospace community, the need for a quantitative, rapid, in situ inspection technology has become a critical concern throughout the industry. In many applications it is necessary to monitor changes in these materials over an extended period of time to determine the effects of various load conditions. Additionally, the detection and characterization of defects such as delaminations, is of great concern. This paper will present the application of infrared thermography to characterize various composite materials and show the advantages of different heat source types. Finally, various analysis methodologies used for quantitative material property characterization will be discussed.

  19. Phosphates based pigments for new anti-corrosion application: Synthesis and characterization

    NASA Astrophysics Data System (ADS)

    Tbib, B.; Eddya, M.; El-Hami, K.

    2018-02-01

    Our study focused on pyrophosphates SrZn1-xMxP2O7 using four series by substituting M with manganese (Mn), cobalt (Co), nickel (Ni), and copper (Cu). They were prepared by reaction in the solid state at 1000 °C for 24 hours and then characterized by X-ray diffraction, which showed that the obtained products are pure. The characterization by UV-visible spectroscopy was used to explain the color of the obtained materials and the optical properties showing the optical energy gap and disorder of these materials. Potential application could be done using the new anti-corrosion pigments based on phosphates.

  20. The Chameleon Effect: characterization challenges due to the variability of nanoparticles and their surfaces of nanoparticles and their surfaces

    NASA Astrophysics Data System (ADS)

    Baer, Donald R.

    2018-05-01

    Nanoparticles in a variety of forms are increasing important in fundamental research, technological and medical applications, and environmental or toxicology studies. Physical and chemical drivers that lead to multiple types of particle instabilities complicate both the ability to produce, appropriately characterize, and consistently deliver well-defined particles, frequently leading to inconsistencies and conflicts in the published literature. This perspective suggests that provenance information, beyond that often recorded or reported, and application of a set of core characterization methods, including a surface sensitive technique, consistently applied at critical times can serve as tools in the effort minimize reproducibility issues.

  1. Proceedings of the 13th biennial conference on carbon. Extended abstracts and program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    None

    1977-01-01

    Properties of carbon are covered including: mechanical and frictional properties; chemical reactivity and surfaces; aerospace applications; carbonization and graphitization; industrial applications; electrical and thermal properties; biomaterials applications; fibers and composites; nuclear applications; activated carbon and adsorption; advances in carbon characterization; and micromechanics and modeling. (GHT)

  2. Broadband surface plasmon jets: direct observation of plasmon propagation for application to sensors and optical communications in microscale and nanoscale circuitry

    DOEpatents

    Bouhelier, Alexandre [Westmont, IL; Wiederrecht, Gary P [Elmhurst, IL

    2008-02-19

    A system and method for generating and using broadband surface plasmons in a metal film for characterization of analyte on or near the metal film. The surface plasmons interact with the analyte and generate leakage radiation which has spectral features which can be used to inspect, identify and characterize the analyte. The broadband plasmon excitation enables high-bandwidth photonic applications.

  3. The characterization of organic contaminants during the development of the Space Station water reclamation and management system

    NASA Technical Reports Server (NTRS)

    Cole, H.; Habercom, M.; Crenshaw, M.; Johnson, S.; Manuel, S.; Martindale, W.; Whitman, G.; Traweek, M.

    1991-01-01

    Examples of the application of various methods for characterizing samples for alcohols, fatty acids, detergents, and volatile/semivolatile basic, neutral, and phenolic acid contaminants are presented. Data, applications, and interpretations are given for a variety of methods including sample preparation/cleanup procedures, ion chromatography, and gas chromatography with various detectors. Summaries of the major organic contaminants that contribute to the total organic carbon content are presented.

  4. Synthesis, characterization and in vitro biocompatibility assessment of a novel tripeptide hydrogelator, as a promising scaffold for tissue engineering applications.

    PubMed

    Pospišil, Tihomir; Ferhatović Hamzić, Lejla; Brkić Ahmed, Lada; Lovrić, Marija; Gajović, Srećko; Frkanec, Leo

    2016-10-20

    We have synthesized and characterized a self-assembling tripeptide hydrogelator Ac-l-Phe-l-Phe-l-Ala-NH2. A series of experiments showed that the hydrogel material could serve as a stabile and biocompatible physical support as it improves the survival of HEK293T cells in vitro, thus being a promising biomaterial for use in tissue engineering applications.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    NREL developed a modeling and experimental strategy to characterize thermal performance of materials. The technique provides critical data on thermal properties with relevance for electronics packaging applications. Thermal contact resistance and bulk thermal conductivity were characterized for new high-performance materials such as thermoplastics, boron-nitride nanosheets, copper nanowires, and atomically bonded layers. The technique is an important tool for developing designs and materials that enable power electronics packaging with small footprint, high power density, and low cost for numerous applications.

  6. Scanning Angle Raman spectroscopy in polymer thin film characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nguyen, Vy H.T.

    The focus of this thesis is the application of Raman spectroscopy for the characterization of thin polymer films. Chapter 1 provides background information and motivation, including the fundamentals of Raman spectroscopy for chemical analysis, scanning angle Raman scattering and scanning angle Raman scattering for applications in thin polymer film characterization. Chapter 2 represents a published manuscript that focuses on the application of scanning angle Raman spectroscopy for the analysis of submicron thin films with a description of methodology for measuring the film thickness and location of an interface between two polymer layers. Chapter 3 provides an outlook and future directionsmore » for the work outlined in this thesis. Appendix A, contains a published manuscript that outlines the use of Raman spectroscopy to aid in the synthesis of heterogeneous catalytic systems. Appendix B and C contain published manuscripts that set a foundation for the work presented in Chapter 2.« less

  7. Simultaneous dual-color fluorescence microscope: a characterization study.

    PubMed

    Li, Zheng; Chen, Xiaodong; Ren, Liqiang; Song, Jie; Li, Yuhua; Zheng, Bin; Liu, Hong

    2013-01-01

    High spatial resolution and geometric accuracy is crucial for chromosomal analysis of clinical cytogenetic applications. High resolution and rapid simultaneous acquisition of multiple fluorescent wavelengths can be achieved by utilizing concurrent imaging with multiple detectors. However, such class of microscopic systems functions differently from traditional fluorescence microscopes. To develop a practical characterization framework to assess and optimize the performance of a high resolution and dual-color fluorescence microscope designed for clinical chromosomal analysis. A dual-band microscopic imaging system utilizes a dichroic mirror, two sets of specially selected optical filters, and two detectors to simultaneously acquire two fluorescent wavelengths. The system's geometric distortion, linearity, the modulation transfer function, and the dual detectors' alignment were characterized. Experiment results show that the geometric distortion at lens periphery is less than 1%. Both fluorescent channels show linear signal responses, but there exists discrepancy between the two due to the detectors' non-uniform response ratio to different wavelengths. In terms of the spatial resolution, the two contrast transfer function curves trend agreeably with the spatial frequency. The alignment measurement allows quantitatively assessing the cameras' alignment. A result image of adjusted alignment is demonstrated to show the reduced discrepancy by using the alignment measurement method. In this paper, we present a system characterization study and its methods for a specially designed imaging system for clinical cytogenetic applications. The presented characterization methods are not only unique to this dual-color imaging system but also applicable to evaluation and optimization of other similar multi-color microscopic image systems for improving their clinical utilities for future cytogenetic applications.

  8. Application of microtremor horizontal-to-vertical spectral ratio (MHVSR) analysis for site characterization: State of the art

    USGS Publications Warehouse

    Molnar, S.; Cassidy, J. F.; Castellaro, S.; Cornou, C.; Crow, H.; Hunter, J. A.; Matsushima, S.; Sanchez-Sesma, F. J.; Yong, Alan

    2018-01-01

    Nakamura (Q Rep Railway Tech Res Inst 30:25–33, 1989) popularized the application of the horizontal-to-vertical spectral ratio (HVSR) analysis of microtremor (seismic noise or ambient vibration) recordings to estimate the predominant frequency and amplification factor of earthquake shaking. During the following quarter century, popularity in the microtremor HVSR (MHVSR) method grew; studies have verified the stability of a site’s MHVSR response over time and validated the MHVSR response with that of earthquake HVSR response. Today, MHVSR analysis is a popular reconnaissance tool used worldwide for seismic microzonation and earthquake site characterization in numerous regions, specifically, in the mapping of site period or fundamental frequency and inverted for shear-wave velocity depth profiles, respectively. However, the ubiquity of MHVSR analysis is predominantly a consequence of its ease in application rather than our full understanding of its theory. We present the state of the art in MHVSR analyses in terms of the development of its theoretical basis, current state of practice, and we comment on its future for applications in earthquake site characterization.

  9. Polymer Brushes: Synthesis, Characterization, Applications

    NASA Astrophysics Data System (ADS)

    Advincula, Rigoberto C.; Brittain, William J.; Caster, Kenneth C.; Rühe, Jürgen

    2004-09-01

    Materials scientists, polymer chemists, surface physicists and materials engineers will find this book a complete and detailed treatise on the field of polymer brushes, their synthesis, characterization and manifold applications. In a first section, the various synthetic pathways and different surface materials are introduced and explained, followed by a second section covering important aspects of characterization and analysis in both flat surfaces and particles. These specific surface initiated polymerization (SIP) systems such as linear polymers, homopolymers, block copolymers, and hyperbranched polymers are unique compared to previously reported systems by chemisorption or physisorption. They have found their way in both large-scale and miniature applications of polymer brushes, which is covered in the last section. Such 'hairy' surfaces offer fascinating opportunities for addressing numerous problems of both academic and, in particular, industrial interest: high-quality, functional or protective coatings, composite materials, surface engineered particles, metal-organic interfaces, biological applications, micro-patterning, colloids, nanoparticles, functional devices, and many more. It is the desire of the authors that this book will be of benefit to readers who want to "brush-up on polymers".

  10. Application of Microtremor Horizontal-to-Vertical Spectral Ratio (MHVSR) Analysis for Site Characterization: State of the Art

    NASA Astrophysics Data System (ADS)

    Molnar, S.; Cassidy, J. F.; Castellaro, S.; Cornou, C.; Crow, H.; Hunter, J. A.; Matsushima, S.; Sánchez-Sesma, F. J.; Yong, A.

    2018-03-01

    Nakamura (Q Rep Railway Tech Res Inst 30:25-33, 1989) popularized the application of the horizontal-to-vertical spectral ratio (HVSR) analysis of microtremor (seismic noise or ambient vibration) recordings to estimate the predominant frequency and amplification factor of earthquake shaking. During the following quarter century, popularity in the microtremor HVSR (MHVSR) method grew; studies have verified the stability of a site's MHVSR response over time and validated the MHVSR response with that of earthquake HVSR response. Today, MHVSR analysis is a popular reconnaissance tool used worldwide for seismic microzonation and earthquake site characterization in numerous regions, specifically, in the mapping of site period or fundamental frequency and inverted for shear-wave velocity depth profiles, respectively. However, the ubiquity of MHVSR analysis is predominantly a consequence of its ease in application rather than our full understanding of its theory. We present the state of the art in MHVSR analyses in terms of the development of its theoretical basis, current state of practice, and we comment on its future for applications in earthquake site characterization.

  11. Application of Microtremor Horizontal-to-Vertical Spectral Ratio (MHVSR) Analysis for Site Characterization: State of the Art

    NASA Astrophysics Data System (ADS)

    Molnar, S.; Cassidy, J. F.; Castellaro, S.; Cornou, C.; Crow, H.; Hunter, J. A.; Matsushima, S.; Sánchez-Sesma, F. J.; Yong, A.

    2018-07-01

    Nakamura (Q Rep Railway Tech Res Inst 30:25-33, 1989) popularized the application of the horizontal-to-vertical spectral ratio (HVSR) analysis of microtremor (seismic noise or ambient vibration) recordings to estimate the predominant frequency and amplification factor of earthquake shaking. During the following quarter century, popularity in the microtremor HVSR (MHVSR) method grew; studies have verified the stability of a site's MHVSR response over time and validated the MHVSR response with that of earthquake HVSR response. Today, MHVSR analysis is a popular reconnaissance tool used worldwide for seismic microzonation and earthquake site characterization in numerous regions, specifically, in the mapping of site period or fundamental frequency and inverted for shear-wave velocity depth profiles, respectively. However, the ubiquity of MHVSR analysis is predominantly a consequence of its ease in application rather than our full understanding of its theory. We present the state of the art in MHVSR analyses in terms of the development of its theoretical basis, current state of practice, and we comment on its future for applications in earthquake site characterization.

  12. SURFACE APPLICATION SYSTEM FOR IN SITU GROUND-WATER BIOREMEDIATION: SITE CHARACTERIZATION AND MODELING

    EPA Science Inventory

    Surface application offers an inexpensive, noninvasive alternative to injection wells and infiltration galleries for in situ ground-water bioreniediation applications. The technology employs artificial recharge to create favorable hydraulic conditions for mixing and vertical tran...

  13. Evaluation of semiconductor devices for Electric and Hybrid Vehicle (EHV) ac-drive applications, volume 2

    NASA Technical Reports Server (NTRS)

    Lee, F. C.; Chen, D. Y.; Jovanic, M.; Hopkins, D. C.

    1985-01-01

    Test data of switching times characterization of bipolar transistors, of field effect transistor's switching times on-resistance and characterization, comparative data of field effect transistors, and test data of field effect transistor's parallel operation characterization are given. Data is given in the form of graphs.

  14. Developing Carbon Nanotube Standards at NASA

    NASA Technical Reports Server (NTRS)

    Nikolaev, Pasha; Arepalli, Sivaram; Sosa, Edward; Gorelik, Olga; Yowell, Leonard

    2007-01-01

    Single wall carbon nanotubes (SWCNTs) are currently being produced and processed by several methods. Many researchers are continuously modifying existing methods and developing new methods to incorporate carbon nanotubes into other materials and utilize the phenomenal properties of SWCNTs. These applications require availability of SWCNTs with known properties and there is a need to characterize these materials in a consistent manner. In order to monitor such progress, it is critical to establish a means by which to define the quality of SWCNT material and develop characterization standards to evaluate of nanotube quality across the board. Such characterization standards should be applicable to as-produced materials as well as processed SWCNT materials. In order to address this issue, NASA Johnson Space Center has developed a protocol for purity and dispersion characterization of SWCNTs (Ref.1). The NASA JSC group is currently working with NIST, ANSI and ISO to establish purity and dispersion standards for SWCNT material. A practice guide for nanotube characterization is being developed in cooperation with NIST (Ref.2). Furthermore, work is in progress to incorporate additional characterization methods for electrical, mechanical, thermal, optical and other properties of SWCNTs.

  15. Developing Carbon Nanotube Standards at NASA

    NASA Technical Reports Server (NTRS)

    Nikolaev, Pasha; Arepalli, Sivaram; Sosa, Edward; Gorelik, Olga; Yowell, Leonard

    2007-01-01

    Single wall carbon nanotubes (SWCNTs) are currently being produced and processed by several methods. Many researchers are continuously modifying existing methods and developing new methods to incorporate carbon nanotubes into other materials and utilize the phenomenal properties of SWCNTs. These applications require availability of SWCNTs with known properties and there is a need to characterize these materials in a consistent manner. In order to monitor such progress, it is critical to establish a means by which to define the quality of SWCNT material and develop characterization standards to evaluate of nanotube quality across the board. Such characterization standards should be applicable to as-produced materials as well as processed SWCNT materials. In order to address this issue, NASA Johnson Space Center has developed a protocol for purity and dispersion characterization of SWCNTs. The NASA JSC group is currently working with NIST, ANSI and ISO to establish purity and dispersion standards for SWCNT material. A practice guide for nanotube characterization is being developed in cooperation with NIST. Furthermore, work is in progress to incorporate additional characterization methods for electrical, mechanical, thermal, optical and other properties of SWCNTs.

  16. Characterization of Contrast Agent Microbubbles for Ultrasound Imaging and Therapy Research.

    PubMed

    Mulvana, Helen; Browning, Richard J; Luan, Ying; de Jong, Nico; Tang, Meng-Xing; Eckersley, Robert J; Stride, Eleanor

    2017-01-01

    The high efficiency with which gas microbubbles can scatter ultrasound compared with the surrounding blood pool or tissues has led to their widespread employment as contrast agents in ultrasound imaging. In recent years, their applications have been extended to include super-resolution imaging and the stimulation of localized bio-effects for therapy. The growing exploitation of contrast agents in ultrasound and in particular these recent developments have amplified the need to characterize and fully understand microbubble behavior. The aim in doing so is to more fully exploit their utility for both diagnostic imaging and potential future therapeutic applications. This paper presents the key characteristics of microbubbles that determine their efficacy in diagnostic and therapeutic applications and the corresponding techniques for their measurement. In each case, we have presented information regarding the methods available and their respective strengths and limitations, with the aim of presenting information relevant to the selection of appropriate characterization methods. First, we examine methods for determining the physical properties of microbubble suspensions and then techniques for acoustic characterization of both suspensions and single microbubbles. The next section covers characterization of microbubbles as therapeutic agents, including as drug carriers for which detailed understanding of their surface characteristics and drug loading capacity is required. Finally, we discuss the attempts that have been made to allow comparison across the methods employed by various groups to characterize and describe their microbubble suspensions and promote wider discussion and comparison of microbubble behavior.

  17. Magnetic Characterization of Iron Oxide Nanoparticles for Biomedical Applications.

    PubMed

    Maldonado-Camargo, Lorena; Unni, Mythreyi; Rinaldi, Carlos

    2017-01-01

    Iron oxide nanoparticles are of interest in a wide range of biomedical applications due to their response to applied magnetic fields and their unique magnetic properties. Magnetization measurements in constant and time-varying magnetic field are often carried out to quantify key properties of iron oxide nanoparticles. This chapter describes the importance of thorough magnetic characterization of iron oxide nanoparticles intended for use in biomedical applications. A basic introduction to relevant magnetic properties of iron oxide nanoparticles is given, followed by protocols and conditions used for measurement of magnetic properties, along with examples of data obtained from each measurement, and methods of data analysis.

  18. Thermal Remote Anemometer Device

    NASA Technical Reports Server (NTRS)

    Heyman, Joseph S.; Heath, D. Michele; Winfree, William P.; Miller, William E.; Welch, Christopher S.

    1988-01-01

    Thermal Remote Anemometer Device developed for remote, noncontacting, passive measurement of thermal properties of sample. Model heated locally by scanning laser beam and cooled by wind in tunnel. Thermal image of model analyzed to deduce pattern of airflow around model. For materials applications, system used for evaluation of thin films and determination of thermal diffusivity and adhesive-layer contact. For medical applications, measures perfusion through skin to characterize blood flow and used to determine viabilities of grafts and to characterize tissues.

  19. Biomedical Applications of Terahertz Spectroscopy: A Brief Review

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vargas-Luna, M.; Huerta-Franco, R.

    The Terahertz (THz) window of the electromagnetic spectrum has been partially explored but almost unexploited commercially. In recent years there has been an increased interest and a technological boost in THz research for detection systems, material characterization and imaging. Among many hot topics the researchers are interested in medical applications, and protein characterization. We present a general overview of the field showing some of the handicaps and promises of this region of the electromagnetic spectru000.

  20. Increasing Heavy Oil Reserves in the Wilmington Oil Field Through Advanced Reservoir Characterization and Thermal Production Technologies, Class III

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    City of Long Beach; Tidelands Oil Production Company; University of Southern California

    2002-09-30

    The objective of this project was to increase the recoverable heavy oil reserves within sections of the Wilmington Oil Field, near Long Beach, California through the testing and application of advanced reservoir characterization and thermal production technologies. The successful application of these technologies would result in expanding their implementation throughout the Wilmington Field and, through technology transfer, to other slope and basin clastic (SBC) reservoirs.

  1. Full-wave Characterization of Rough Terrain Surface Effects for Forward-looking Radar Applications: A Scattering and Imaging Study from the Electromagnetic Perspective

    DTIC Science & Technology

    2011-09-01

    and Imaging Framework First, the parallelized 3-D FDTD algorithm is applied to simulate composite scattering from targets in a rough ground...solver as pertinent to forward-looking radar sensing , the effects of surface clutter on multistatic target imaging are illustrated with large-scale...Full-wave Characterization of Rough Terrain Surface Effects for Forward-looking Radar Applications: A Scattering and Imaging Study from the

  2. Characterization of the Structural and Chemical Properties of Copper Chelators in Seawater

    DTIC Science & Technology

    2000-09-30

    are pink). All of the strains that don’t make the ligand contain phycocyanin (the cultures are green). So far we have no explanation for this...thiol that does not co-elute with any of her thiol standards. IMPACT/ APPLICATIONS Successful characterization of this material could lead to new...insight into the sources and chemistry of Cu ligands in seawater. Potential applications could arise if we identify a new class of chelators selective

  3. Characterization of Low Pressure Cold Plasma in the Cleaning of Contaminated Surfaces

    NASA Technical Reports Server (NTRS)

    Lanz, Devin Garrett; Hintze, Paul E.

    2016-01-01

    The characterization of low pressure cold plasma is a broad topic which would benefit many different applications involving such plasma. The characterization described in this paper focuses on cold plasma used as a medium in cleaning and disinfection applications. Optical Emission Spectroscopy (OES) and Mass Spectrometry (MS) are the two analytical methods used in this paper to characterize the plasma. OES analyzes molecules in the plasma phase by displaying the light emitted by the plasma molecules on a graph of wavelength vs. intensity. OES was most useful in identifying species which may interact with other molecules in the plasma, such as atomic oxygen or hydroxide radicals. Extracting useful data from the MS is done by filtering out the peaks generated by expected molecules and looking for peaks caused by foreign ones leaving the plasma chamber. This paper describes the efforts at setting up and testing these methods in order to accurately and effectively characterize the plasma.

  4. Pre-Application Meeting for TCGA Expansion RFA - TCGA

    Cancer.gov

    The National Cancer Institute (NCI) hosted a pre-application meeting for TCGA funding of Genome Characterization Centers and Genome Data Analysis Centers. At this pre-application meeting, NCI staff presented on the goals and objectives for the TCGA Research Network, discussed the Request for Applications (RFA) peer review process and answered questions.

  5. One-Dimensional Perovskite Manganite Oxide Nanostructures: Recent Developments in Synthesis, Characterization, Transport Properties, and Applications

    NASA Astrophysics Data System (ADS)

    Li, Lei; Liang, Lizhi; Wu, Heng; Zhu, Xinhua

    2016-03-01

    One-dimensional nanostructures, including nanowires, nanorods, nanotubes, nanofibers, and nanobelts, have promising applications in mesoscopic physics and nanoscale devices. In contrast to other nanostructures, one-dimensional nanostructures can provide unique advantages in investigating the size and dimensionality dependence of the materials' physical properties, such as electrical, thermal, and mechanical performances, and in constructing nanoscale electronic and optoelectronic devices. Among the one-dimensional nanostructures, one-dimensional perovskite manganite nanostructures have been received much attention due to their unusual electron transport and magnetic properties, which are indispensable for the applications in microelectronic, magnetic, and spintronic devices. In the past two decades, much effort has been made to synthesize and characterize one-dimensional perovskite manganite nanostructures in the forms of nanorods, nanowires, nanotubes, and nanobelts. Various physical and chemical deposition techniques and growth mechanisms are explored and developed to control the morphology, identical shape, uniform size, crystalline structure, defects, and homogenous stoichiometry of the one-dimensional perovskite manganite nanostructures. This article provides a comprehensive review of the state-of-the-art research activities that focus on the rational synthesis, structural characterization, fundamental properties, and unique applications of one-dimensional perovskite manganite nanostructures in nanotechnology. It begins with the rational synthesis of one-dimensional perovskite manganite nanostructures and then summarizes their structural characterizations. Fundamental physical properties of one-dimensional perovskite manganite nanostructures are also highlighted, and a range of unique applications in information storages, field-effect transistors, and spintronic devices are discussed. Finally, we conclude this review with some perspectives/outlook and future researches in these fields.

  6. One-Dimensional Perovskite Manganite Oxide Nanostructures: Recent Developments in Synthesis, Characterization, Transport Properties, and Applications.

    PubMed

    Li, Lei; Liang, Lizhi; Wu, Heng; Zhu, Xinhua

    2016-12-01

    One-dimensional nanostructures, including nanowires, nanorods, nanotubes, nanofibers, and nanobelts, have promising applications in mesoscopic physics and nanoscale devices. In contrast to other nanostructures, one-dimensional nanostructures can provide unique advantages in investigating the size and dimensionality dependence of the materials' physical properties, such as electrical, thermal, and mechanical performances, and in constructing nanoscale electronic and optoelectronic devices. Among the one-dimensional nanostructures, one-dimensional perovskite manganite nanostructures have been received much attention due to their unusual electron transport and magnetic properties, which are indispensable for the applications in microelectronic, magnetic, and spintronic devices. In the past two decades, much effort has been made to synthesize and characterize one-dimensional perovskite manganite nanostructures in the forms of nanorods, nanowires, nanotubes, and nanobelts. Various physical and chemical deposition techniques and growth mechanisms are explored and developed to control the morphology, identical shape, uniform size, crystalline structure, defects, and homogenous stoichiometry of the one-dimensional perovskite manganite nanostructures. This article provides a comprehensive review of the state-of-the-art research activities that focus on the rational synthesis, structural characterization, fundamental properties, and unique applications of one-dimensional perovskite manganite nanostructures in nanotechnology. It begins with the rational synthesis of one-dimensional perovskite manganite nanostructures and then summarizes their structural characterizations. Fundamental physical properties of one-dimensional perovskite manganite nanostructures are also highlighted, and a range of unique applications in information storages, field-effect transistors, and spintronic devices are discussed. Finally, we conclude this review with some perspectives/outlook and future researches in these fields.

  7. Enrichment and characterization of ferritin for nanomaterial applications

    NASA Astrophysics Data System (ADS)

    Ghirlando, Rodolfo; Mutskova, Radina; Schwartz, Chad

    2016-01-01

    Ferritin is a ubiquitous iron storage protein utilized as a nanomaterial for labeling biomolecules and nanoparticle construction. Commercially available preparations of horse spleen ferritin, widely used as a starting material, contain a distribution of ferritins with different iron loads. We describe a detailed approach to the enrichment of differentially loaded ferritin molecules by common biophysical techniques such as size exclusion chromatography and preparative ultracentrifugation, and characterize these preparations by dynamic light scattering, and analytical ultracentrifugation. We demonstrate a combination of methods to standardize an approach for determining the chemical load of nearly any particle, including nanoparticles and metal colloids. Purification and characterization of iron content in monodisperse ferritin species is particularly critical for several applications in nanomaterial science.

  8. Characterizing SOI Wafers By Use Of AOTF-PHI

    NASA Technical Reports Server (NTRS)

    Cheng, Li-Jen; Li, Guann-Pyng; Zang, Deyu

    1995-01-01

    Developmental nondestructive method of characterizing layers of silicon-on-insulator (SOI) wafer involves combination of polarimetric hyperspectral imaging by use of acousto-optical tunable filters (AOTF-PHI) and computational resources for extracting pertinent data on SOI wafers from polarimetric hyperspectral images. Offers high spectral resolution and both ease and rapidity of optical-wavelength tuning. Further efforts to implement all of processing of polarimetric spectral image data in special-purpose hardware for sake of procesing speed. Enables characterization of SOI wafers in real time for online monitoring and adjustment of production. Also accelerates application of AOTF-PHI to other applications in which need for high-resolution spectral imaging, both with and without polarimetry.

  9. Stochastic Characterization of Communication Network Latency for Wide Area Grid Control Applications.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ameme, Dan Selorm Kwami; Guttromson, Ross

    This report characterizes communications network latency under various network topologies and qualities of service (QoS). The characterizations are probabilistic in nature, allowing deeper analysis of stability for Internet Protocol (IP) based feedback control systems used in grid applications. The work involves the use of Raspberry Pi computers as a proxy for a controlled resource, and an ns-3 network simulator on a Linux server to create an experimental platform (testbed) that can be used to model wide-area grid control network communications in smart grid. Modbus protocol is used for information transport, and Routing Information Protocol is used for dynamic route selectionmore » within the simulated network.« less

  10. The Chameleon Effect: Characterization Challenges Due to the Variability of Nanoparticles and Their Surfaces

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baer, Donald R.

    Nanoparticles in a variety of forms are of increasing importance in fundamental research, technological and medical applications, and environmental or toxicology studies. Physical and chemical drivers that lead to multiple types of particle instabilities complicate both the ability to produce and consistently deliver well defined particles and their appropriate characterization, frequently leading to inconsistencies and conflicts in the published literature. This perspective suggests that provenance information, beyond that often recorded or reported, and application of a set of core characterization methods, including a surface sensitive technique, consistently applied at critical times can serve as tools in the effort minimize reproducibilitymore » issues.« less

  11. Watershed characterization and analysis using the VELMA model

    EPA Science Inventory

    We developed a broadly applicable watershed simulator – VELMA (Visualizing Ecosystem and Land Management Assessments) – to characterize hydrological and ecological processes essential to the healthy functioning of watersheds, and to identify best management practices ...

  12. Nanostructured cerium oxide: preparation, characterization, and application in energy and environmental catalysis

    DOE PAGES

    Tang, Wen-Xiang; Gao, Pu-Xian

    2016-11-10

    Nanostructured cerium oxide (CeO 2) with outstanding physical and chemical properties has attracted extensive interests over the past few decades in environment and energy-related applications. With controllable synthesis of nanostructured CeO 2, much more features were technologically brought out from defect chemistry to structure-derived effects. This paper highlights recent progress on the synthesis and characterization of nanostructured ceria-based materials as well as the traditional and new applications. Specifically, several typical applications based on the desired ceria nanostructures are focused to showcase the importance of nanostructure-derived effects. Moreover, some challenges and perspectives on the nanostructured ceria are presented, such as defectsmore » controlling and retainment, scale-up fabrication, and monolithic devices. Hopefully, this paper can provide an improved understanding of nanostructured CeO 2 and offer new opportunities to promote the further research and applications in the future.« less

  13. Delayed Gamma-ray Spectroscopy for Safeguards Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mozin, Vladimir

    The delayed gamma-ray assay technique utilizes an external neutron source (D-D, D-T, or electron accelerator-driven), and high-resolution gamma-ray spectrometers to perform characterization of SNM materials behind shielding and in complex configurations such as a nuclear fuel assembly. High-energy delayed gamma-rays (2.5 MeV and above) observed following the active interrogation, provide a signature for identification of specific fissionable isotopes in a mixed sample, and determine their relative content. Potential safeguards applications of this method are: 1) characterization of fresh and spent nuclear fuel assemblies in wet or dry storage; 2) analysis of uranium enrichment in shielded or non-characterized containers or inmore » the presence of a strong radioactive background and plutonium contamination; 3) characterization of bulk and waste and product streams at SNM processing plants. Extended applications can include warhead confirmation and warhead dismantlement confirmation in the arms control area, as well as SNM diagnostics for the emergency response needs. In FY16 and prior years, the project has demonstrated the delayed gamma-ray measurement technique as a robust SNM assay concept. A series of empirical and modeling studies were conducted to characterize its response sensitivity, develop analysis methodologies, and analyze applications. Extensive experimental tests involving weapons-grade Pu, HEU and depleted uranium samples were completed at the Idaho Accelerator Center and LLNL Dome facilities for various interrogation time regimes and effects of the neutron source parameters. A dedicated delayed gamma-ray response modeling technique was developed and its elements were benchmarked in representative experimental studies, including highresolution gamma-ray measurements of spent fuel at the CLAB facility in Sweden. The objective of the R&D effort in FY17 is to experimentally demonstrate the feasibility of the delayed gamma-ray interrogation of shielded SNM samples with portable neutron sources suitable for field applications.« less

  14. Initial Assessment of X-Ray Computer Tomography image analysis for material defect microstructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kane, Joshua James; Windes, William Enoch

    2016-06-01

    The original development work leading to this report was focused on the non destructive three-dimensional (3-D) characterization of nuclear graphite as a means to better understand the nature of the inherent pore structure. The pore structure of graphite and its evolution under various environmental factors such as irradiation, mechanical stress, and oxidation plays an important role in their observed properties and characteristics. If we are to transition from an empirical understanding of graphite behavior to a truly predictive mechanistic understanding the pore structure must be well characterized and understood. As the pore structure within nuclear graphite is highly interconnected andmore » truly 3-D in nature, 3-D characterization techniques are critical. While 3-D characterization has been an excellent tool for graphite pore characterization, it is applicable to a broad number of materials systems over many length scales. Given the wide range of applications and the highly quantitative nature of the tool, it is quite surprising to discover how few materials researchers understand and how valuable of a tool 3-D image processing and analysis can be. Ultimately, this report is intended to encourage broader use of 3 D image processing and analysis in materials science and engineering applications, more specifically nuclear-related materials applications, by providing interested readers with enough familiarity to explore its vast potential in identifying microstructure changes. To encourage this broader use, the report is divided into two main sections. Section 2 provides an overview of some of the key principals and concepts needed to extract a wide variety of quantitative metrics from a 3-D representation of a material microstructure. The discussion includes a brief overview of segmentation methods, connective components, morphological operations, distance transforms, and skeletonization. Section 3 focuses on the application of concepts from Section 2 to relevant materials at Idaho National Laboratory. In this section, image analysis examples featuring nuclear graphite will be discussed in detail. Additionally, example analyses from Transient Reactor Test Facility low-enriched uranium conversion, Advanced Gas Reactor like compacts, and tristructural isotopic particles are shown to give a broader perspective of the applicability to relevant materials of interest.« less

  15. SCaLeM: A Framework for Characterizing and Analyzing Execution Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Manzano Franco, Joseph B.; Krishnamoorthy, Sriram

    2014-10-13

    As scalable parallel systems evolve towards more complex nodes with many-core architectures and larger trans-petascale & upcoming exascale deployments, there is a need to understand, characterize and quantify the underlying execution models being used on such systems. Execution models are a conceptual layer between applications & algorithms and the underlying parallel hardware and systems software on which those applications run. This paper presents the SCaLeM (Synchronization, Concurrency, Locality, Memory) framework for characterizing and execution models. SCaLeM consists of three basic elements: attributes, compositions and mapping of these compositions to abstract parallel systems. The fundamental Synchronization, Concurrency, Locality and Memory attributesmore » are used to characterize each execution model, while the combinations of those attributes in the form of compositions are used to describe the primitive operations of the execution model. The mapping of the execution model’s primitive operations described by compositions, to an underlying abstract parallel system can be evaluated quantitatively to determine its effectiveness. Finally, SCaLeM also enables the representation and analysis of applications in terms of execution models, for the purpose of evaluating the effectiveness of such mapping.« less

  16. Fabrication and characterization of SU-8-based capacitive micromachined ultrasonic transducer for airborne applications

    NASA Astrophysics Data System (ADS)

    Joseph, Jose; Singh, Shiv Govind; Vanjari, Siva Rama Krishna

    2018-01-01

    We present a successful fabrication and characterization of a capacitive micromachined ultrasonic transducer (CMUT) with SU-8 as the membrane material. The goal of this research is to develop a post-CMOS compatible CMUT that can be monolithically integrated with the CMOS circuitry. The fabrication is based on a simple, three mask process, with all wet etching steps involved so that the device can be realized with minimal laboratory conditions. The maximum temperature involved in the whole process flow was 140°C, and hence, it is post-CMOS compatible. The fabricated device exhibited a resonant frequency of 835 kHz with bandwidth 62 kHz, when characterized in air. The pull-in and snapback characteristics of the device were analyzed. The influence of membrane radius on the center frequency and bandwidth was also experimentally evaluated by fabricating CMUTs with membrane radius varying from 30 to 54 μm with an interval of 4 μm. These devices were vibrating at frequencies from 5.2 to 1.8 MHz with an average Q-factor of 23.41. Acoustic characterization of the fabricated devices was performed in air, demonstrating the applicability of SU-8 CMUTs in airborne applications.

  17. Applications of UV/Vis Spectroscopy in Characterization and Catalytic Activity of Noble Metal Nanoparticles Fabricated in Responsive Polymer Microgels: A Review.

    PubMed

    Begum, Robina; Farooqi, Zahoor H; Naseem, Khalida; Ali, Faisal; Batool, Madeeha; Xiao, Jianliang; Irfan, Ahmad

    2018-11-02

    Noble metal nanoparticles loaded smart polymer microgels have gained much attention due to fascinating combination of their properties in a single system. These hybrid systems have been extensively used in biomedicines, photonics, and catalysis. Hybrid microgels are characterized by using various techniques but UV/Vis spectroscopy is an easily available technique for characterization of noble metal nanoparticles loaded microgels. This technique is widely used for determination of size and shape of metal nanoparticles. The tuning of optical properties of noble metal nanoparticles under various stimuli can be studied using UV/Vis spectroscopic method. Time course UV/Vis spectroscopy can also be used to monitor the kinetics of swelling and deswelling of microgels and hybrid microgels. Growth of metal nanoparticles in polymeric network or growth of polymeric network around metal nanoparticle core can be studied by using UV/Vis spectroscopy. This technique can also be used for investigation of various applications of hybrid materials in catalysis, photonics, and sensing. This tutorial review describes the uses of UV/Vis spectroscopy in characterization and catalytic applications of responsive hybrid microgels with respect to recent research progress in this area.

  18. A Retrospective Evaluation of the Use of Mass Spectrometry in FDA Biologics License Applications

    NASA Astrophysics Data System (ADS)

    Rogstad, Sarah; Faustino, Anneliese; Ruth, Ashley; Keire, David; Boyne, Michael; Park, Jun

    2017-05-01

    The characterization sections of biologics license applications (BLAs) approved by the United States Food and Drug Administration (FDA) between 2000 and 2015 were investigated to examine the extent of the use of mass spectrometry. Mass spectrometry was found to be integral to the characterization of these biotherapeutics. Of the 80 electronically submitted monoclonal antibody and protein biotherapeutic BLAs included in this study, 79 were found to use mass spectrometric workflows for protein or impurity characterization. To further examine how MS is being used in successful BLAs, the applications were filtered based on the type and number of quality attributes characterized, the mass spectrometric workflows used (peptide mapping, intact mass analysis, and cleaved glycan analysis), the methods used to introduce the proteins into the gas phase (ESI, MALDI, or LC-ESI), and the specific types of instrumentation used. Analyses were conducted over a time course based on the FDA BLA approval to determine if any trends in utilization could be observed over time. Additionally, the different classes of protein-based biotherapeutics among the approved BLAs were clustered to determine if any trends could be attributed to the specific type of biotherapeutic.

  19. Mathematical investigations of branch length similarity entropy profiles of shapes for various resolutions

    NASA Astrophysics Data System (ADS)

    Jeon, Wonju; Lee, Sang-Hee

    2012-12-01

    In our previous study, we defined the branch length similarity (BLS) entropy for a simple network consisting of a single node and numerous branches. As the first application of this entropy to characterize shapes, the BLS entropy profiles of 20 battle tank shapes were calculated from simple networks created by connecting pixels in the boundary of the shape. The profiles successfully characterized the tank shapes through a comparison of their BLS entropy profiles. Following the application, this entropy was used to characterize human's emotional faces, such as happiness and sad, and to measure the degree of complexity for termite tunnel networks. These applications indirectly indicate that the BLS entropy profile can be a useful tool to characterize networks and shapes. However, the ability of the BLS entropy in the characterization depends on the image resolution because the entropy is determined by the number of nodes for the boundary of a shape. Higher resolution means more nodes. If the entropy is to be widely used in the scientific community, the effect of the resolution on the entropy profile should be understood. In the present study, we mathematically investigated the BLS entropy profile of a shape with infinite resolution and numerically investigated the variation in the pattern of the entropy profile caused by changes in the resolution change in the case of finite resolution.

  20. Intermixing optical and microwave signals in GaAs microstrip circuits for phase-locking applications

    NASA Astrophysics Data System (ADS)

    Li, Ming G.; Chauchard, Eve A.; Lee, Chi H.; Hung, Hing-Loi A.

    1990-12-01

    The microwave modulation of the interference generated by optical beams that are reflected from the top and bottom surfaces of GaAs substrate adjacent to a microstrip line is studied. The detected modulation is used to directly characterize the electrooptic effect. This optical-microwave intermixing technique is applied to phase-lock a free-running microwave oscillator with picosecond laser pulses. One potential application of this technique is for the optical on-wafer characterization of MMICs.

  1. Electro-Optical Sensing Apparatus and Method for Characterizing Free-Space Electromagnetic Radiation

    DOEpatents

    Zhang, Xi-Cheng; Libelo, Louis Francis; Wu, Qi

    1999-09-14

    Apparatus and methods for characterizing free-space electromagnetic energy, and in particular, apparatus/method suitable for real-time two-dimensional far-infrared imaging applications are presented. The sensing technique is based on a non-linear coupling between a low-frequency electric field and a laser beam in an electro-optic crystal. In addition to a practical counter-propagating sensing technique, a co-linear approach is described which provides longer radiated field--optical beam interaction length, thereby making imaging applications practical.

  2. Characterization of Al 2219 material for the application of the spin-forming-process

    NASA Astrophysics Data System (ADS)

    Mueller-Wiesner, D.; Sieger, E.; Ernsberger, K.

    1991-10-01

    The shells of the propellant tanks of the Ariane 5 EPS stage are to be manufactured by the spin forming process. The material for the shells (hemispheres) is the aluminum alloy 2219. By a material characterization program optimized parameters for the application of the forming process starting from different material conditions (T31 temper and '0' condition) are determined. Based on the results of this program it was decided to start spin forming in the '0' condition for flight hardware.

  3. METHODS FOR MULTI-SPATIAL SCALE CHARACTERIZATION OF RIPARIAN CORRIDORS

    EPA Science Inventory

    This paper describes the application of aerial photography and GIS technology to develop flexible and transferable methods for multi-spatial scale characterization and analysis of riparian corridors. Relationships between structural attributes of riparian corridors and indicator...

  4. Portable image analysis system for characterizing aggregate morphology.

    DOT National Transportation Integrated Search

    2008-01-01

    In the last decade, the application of image-based evaluation of particle shape, angularity and texture has been widely researched to characterize aggregate morphology. These efforts have been driven by the knowledge that the morphologic characterist...

  5. Multiscale Structure of UXO Site Characterization: Spatial Estimation and Uncertainty Quantification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ostrouchov, George; Doll, William E.; Beard, Les P.

    2009-01-01

    Unexploded ordnance (UXO) site characterization must consider both how the contamination is generated and how we observe that contamination. Within the generation and observation processes, dependence structures can be exploited at multiple scales. We describe a conceptual site characterization process, the dependence structures available at several scales, and consider their statistical estimation aspects. It is evident that most of the statistical methods that are needed to address the estimation problems are known but their application-specific implementation may not be available. We demonstrate estimation at one scale and propose a representation for site contamination intensity that takes full account of uncertainty,more » is flexible enough to answer regulatory requirements, and is a practical tool for managing detailed spatial site characterization and remediation. The representation is based on point process spatial estimation methods that require modern computational resources for practical application. These methods have provisions for including prior and covariate information.« less

  6. Development of Si(1-x)Ge(x) technology for microwave sensing applications

    NASA Technical Reports Server (NTRS)

    Mena, Rafael A.; Taub, Susan R.; Alterovitz, Samuel A.; Young, Paul E.; Simons, Rainee N.; Rosenfeld, David

    1993-01-01

    The progress for the first year of the work done under the Director's Discretionary Fund (DDF) research project entitled, 'Development of Si(1-x)Ge(x) Technology for Microwave Sensing Applications.' This project includes basic material characterization studies of silicon-germanium (SiGe), device processing on both silicon (Si) and SiGe substrates, and microwave characterization of transmission lines on silicon substrates. The material characterization studies consisted of ellipsometric and magneto-transport measurements and theoretical calculations of the SiGe band-structure. The device fabrication efforts consisted of establishing SiGe device processing capabilities in the Lewis cleanroom. The characterization of microwave transmission lines included studying the losses of various coplanar transmission lines and the development of transitions on silicon. Each part of the project is discussed individually and the findings for each part are presented. Future directions are also discussed.

  7. Characterization of Custom-Designed Charge-Coupled Devices for Applications to Gas and Aerosol Monitoring Sensorcraft Instrument

    NASA Technical Reports Server (NTRS)

    Refaat, Tamer F.; Abedin, M. Nurul; Farnsworth, Glenn R.; Garcia, Christopher S.; Zawodny, Joseph M.

    2005-01-01

    Custom-designed charge-coupled devices (CCD) for Gas and Aerosols Monitoring Sensorcraft instrument were developed. These custom-designed CCD devices are linear arrays with pixel format of 512x1 elements and pixel size of 10x200 sq m. These devices were characterized at NASA Langley Research Center to achieve a full well capacity as high as 6,000,000 e-. This met the aircraft flight mission requirements in terms of signal-to-noise performance and maximum dynamic range. Characterization and analysis of the electrical and optical properties of the CCDs were carried out at room temperature. This includes measurements of photon transfer curves, gain coefficient histograms, read noise, and spectral response. Test results obtained on these devices successfully demonstrated the objectives of the aircraft flight mission. In this paper, we describe the characterization results and also discuss their applications to future mission.

  8. Applications Of Graphite Fluoride Fibers In Outer Space

    NASA Technical Reports Server (NTRS)

    Hung, Ching-Cheng; Long, Martin; Dever, Therese

    1993-01-01

    Report characterizes graphite fluoride fibers made from commercially available graphitized carbon fibers and discusses some potential applications of graphite fluoride fibers in outer space. Applications include heat-sinking printed-circuit boards, solar concentrators, and absorption of radar waves. Other applications based on exploitation of increased resistance to degradation by atomic oxygen, present in low orbits around Earth.

  9. Using NASA Using Remote Sensing in Public Health Applications

    NASA Technical Reports Server (NTRS)

    Estes, Sue; Haynes, John

    2011-01-01

    The Public Health application area focuses on Earth science applications to public health and safety, particularly regarding infectious disease, emergency preparedness and response, and environmental health issues. The application explores issues of toxic and pathogenic exposure, as well as natural and man-made hazards and their effects, for risk characterization/mitigation and improvements to health and safety.

  10. Pesticide Applicator Profiling: Using Polycom[R] Distance Delivery for Continuing Education and Characterizing Florida's Licensed Applicators

    ERIC Educational Resources Information Center

    Fishel, Fred; Langeland, Ken

    2011-01-01

    The University of Florida offers continuing education units (CEUs) via distance technology using Polycom[R] to meet requirements for applicators of pesticides to renew their licenses. A large statewide event conducted in 2010 also included a needs assessment of this group concerning CEUs. Results indicate that these applicators strongly prefer…

  11. 78 FR 24107 - Version 5 Critical Infrastructure Protection Reliability Standards

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-24

    ... native applications or print-to-PDF format and not a scanned format. Mail/Hand Delivery: Those unable to... criteria that characterize their impact for the application of cyber security requirements commensurate... recognition. Requirement R2 requires testing to verify response plan effectiveness and consistent application...

  12. Homeland Security and Defense Applications

    ScienceCinema

    None

    2018-01-16

    Homeland Security and Defense Applications personnel are the best in the world at detecting and locating dirty bombs, loose nukes, and other radiological sources. The site trains the Nation's emergency responders, who would be among the first to confront a radiological or nuclear emergency. Homeland Security and Defense Applications highly training personnel, characterize the threat environment, produce specialized radiological nuclear detection equipment, train personnel on the equipment and its uses, test and evaluate the equipment, and develop different kinds of high-tech equipment to defeat terrorists. In New York City for example, NNSS scientists assisted in characterizing the radiological nuclear environment after 9/11, and produced specialized radiological nuclear equipment to assist local officials in their Homeland Security efforts.

  13. SULFATE REDUCTION IN GROUNDWATER: CHARACTERIZATION AND APPLICATIONS FOR REMEDIATION

    PubMed Central

    Miao, Z.; Brusseau, M. L.; Carroll, K. C.; Carreón-Diazconti, C.; Johnson, B.

    2013-01-01

    Sulfate is ubiquitous in groundwater, with both natural and anthropogenic sources. Sulfate reduction reactions play a significant role in mediating redox conditions and biogeochemical processes for subsurface systems. They also serve as the basis for innovative in-situ methods for groundwater remediation. An overview of sulfate reduction in subsurface environments is provided, along with a brief discussion of characterization methods and applications for addressing acid mine drainage. We then focus on two innovative, in-situ methods for remediating sulfate-contaminated groundwater, the use of zero-valent iron (ZVI) and the addition of electron-donor substrates. The advantages and limitations associated with the methods are discussed, with examples of prior applications. PMID:21947714

  14. Synthesis and characterization of magnesium doped cerium oxide for the fuel cell application

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumar, Amit; Kumari, Monika; Kumar, Mintu

    2016-05-06

    Cerium oxide has attained much attentions in global nanotechnology market due to valuable application for catalytic, fuel additive, and widely as electrolyte in solid oxide fuel cell. Doped cerium oxide has large oxygen vacancies that allow for greater reactivity and faster ion transport. These properties make cerium oxide suitable material for SOFCs application. Cerium oxide electrolyte requires lower operation temperature which shows improvement in processing and the fabrication technique. In our work, we synthesized magnesium doped cerium oxide by the co-precipitation method. With the magnesium doping catalytic reactivity of CeO{sub 2} was increased. Synthesized nanoparticle were characterized by the XRDmore » and UV absorption techniques.« less

  15. Theoretical and experimental research in space photovoltaics

    NASA Technical Reports Server (NTRS)

    Faur, Mircea; Faur, Maria

    1995-01-01

    Theoretical and experimental research is outlined for indium phosphide solar cells, other solar cells for space applications, fabrication and performance measurements of shallow homojunction InP solar cells for space applications, improved processing steps and InP material characterization with applications to fabrication of high efficiency radiation resistant InP solar cells and other opto-electronic InP devices, InP solar cells fabricated by thermal diffusion, experiment-based predicted high efficiency solar cells fabricated by closed-ampoule thermal diffusion, radiation resistance of diffused junction InP solar cells, chemical and electrochemical characterization and processing of InP diffused structures and solar cells, and progress in p(+)n InP diffused solar cells.

  16. Atmospheric-pressure plasma jet characterization and applications on melanoma cancer treatment (B/16-F10)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mashayekh, Shahriar; Rajaee, Hajar; Hassan, Zuhir M.

    2015-09-15

    A new approach in medicine is the use of cold plasma for various applications such as sterilization blood coagulation and cancer cell treatment. In this paper, a pin-to-hole plasma jet for biological applications has been designed and manufactured and characterized. The characterization includes power consumption via Lissajous method, thermal behavior of atmospheric-pressure plasma jet by using Infra-red camera as a novel method and using Speicair software to determine vibrational and transitional temperatures, and optical emission spectroscopy to determine the generated species. Treatment of Melanoma cancer cells (B16/F10) was also implemented, and tetrazolium salt dye (MTT assay) and flow cytometry weremore » used to evaluate viability. Effect of ultraviolet photons on cancerous cells was also observed using an MgF{sub 2} crystal with MTT assay. Finally, in-vivo studies on C57 type mice were also done in order to have a better understanding of the effects in real conditions.« less

  17. Examination of time-reversal acoustics in shallow water and applications to noncoherent underwater communications.

    PubMed

    Smith, Kevin B; Abrantes, Antonio A M; Larraza, Andres

    2003-06-01

    The shallow water acoustic communication channel is characterized by strong signal degradation caused by multipath propagation and high spatial and temporal variability of the channel conditions. At the receiver, multipath propagation causes intersymbol interference and is considered the most important of the channel distortions. This paper examines the application of time-reversal acoustic (TRA) arrays, i.e., phase-conjugated arrays (PCAs), that generate a spatio-temporal focus of acoustic energy at the receiver location, eliminating distortions introduced by channel propagation. This technique is self-adaptive and automatically compensates for environmental effects and array imperfections without the need to explicitly characterize the environment. An attempt is made to characterize the influences of a PCA design on its focusing properties with particular attention given to applications in noncoherent underwater acoustic communication systems. Due to the PCA spatial diversity focusing properties, PC arrays may have an important role in an acoustic local area network. Each array is able to simultaneously transmit different messages that will focus only at the destination receiver node.

  18. Small-Molecule Binding Aptamers: Selection Strategies, Characterization, and Applications

    PubMed Central

    Ruscito, Annamaria; DeRosa, Maria C.

    2016-01-01

    Aptamers are single-stranded, synthetic oligonucleotides that fold into 3-dimensional shapes capable of binding non-covalently with high affinity and specificity to a target molecule. They are generated via an in vitro process known as the Systematic Evolution of Ligands by EXponential enrichment, from which candidates are screened and characterized, and then used in various applications. These applications range from therapeutic uses to biosensors for target detection. Aptamers for small molecule targets such as toxins, antibiotics, molecular markers, drugs, and heavy metals will be the focus of this review. Their accurate detection is needed for the protection and wellbeing of humans and animals. However, the small molecular weights of these targets, including the drastic size difference between the target and the oligonucleotides, make it challenging to select, characterize, and apply aptamers for their detection. Thus, recent (since 2012) notable advances in small molecule aptamers, which have overcome some of these challenges, are presented here, while defining challenges that still exist are discussed. PMID:27242994

  19. Development of method to characterize emissions from spray polyurethane foam insulation

    EPA Science Inventory

    This presentation updates symposium participants re EPA progress towards development of SPF insulation emissions characterization methods. The presentation highlights evaluation of experiments investigating emissions after application of SPF to substrates in micro chambers and i...

  20. Role of Imaging Specrometer Data for Model-based Cross-calibration of Imaging Sensors

    NASA Technical Reports Server (NTRS)

    Thome, Kurtis John

    2014-01-01

    Site characterization benefits from imaging spectrometry to determine spectral bi-directional reflectance of a well-understood surface. Cross calibration approaches, uncertainties, role of imaging spectrometry, model-based site characterization, and application to product validation.

  1. A Review of Interface Electronic Systems for AT-cut Quartz Crystal Microbalance Applications in Liquids

    PubMed Central

    Arnau, Antonio

    2008-01-01

    From the first applications of AT-cut quartz crystals as sensors in solutions more than 20 years ago, the so-called quartz crystal microbalance (QCM) sensor is becoming into a good alternative analytical method in a great deal of applications such as biosensors, analysis of biomolecular interactions, study of bacterial adhesion at specific interfaces, pathogen and microorganism detection, study of polymer film-biomolecule or cell-substrate interactions, immunosensors and an extensive use in fluids and polymer characterization and electrochemical applications among others. The appropriate evaluation of this analytical method requires recognizing the different steps involved and to be conscious of their importance and limitations. The first step involved in a QCM system is the accurate and appropriate characterization of the sensor in relation to the specific application. The use of the piezoelectric sensor in contact with solutions strongly affects its behavior and appropriate electronic interfaces must be used for an adequate sensor characterization. Systems based on different principles and techniques have been implemented during the last 25 years. The interface selection for the specific application is important and its limitations must be known to be conscious of its suitability, and for avoiding the possible error propagation in the interpretation of results. This article presents a comprehensive overview of the different techniques used for AT-cut quartz crystal microbalance in in-solution applications, which are based on the following principles: network or impedance analyzers, decay methods, oscillators and lock-in techniques. The electronic interfaces based on oscillators and phase-locked techniques are treated in detail, with the description of different configurations, since these techniques are the most used in applications for detection of analytes in solutions, and in those where a fast sensor response is necessary. PMID:27879713

  2. Self-discharge analysis and characterization of supercapacitors for environmentally powered wireless sensor network applications

    NASA Astrophysics Data System (ADS)

    Yang, Hengzhao; Zhang, Ying

    2011-10-01

    A new approach is presented to characterize the variable leakage resistance, a parameter in the variable leakage resistance model we developed to model supercapacitors used in environmentally powered wireless sensor network applications. Based on an analysis of the supercapacitor terminal behavior during the self-discharge, the variable leakage resistance is modeled as a function of the supercapacitor terminal voltage instead of the self-discharge time, which is more practical for an environmentally powered wireless sensor node. The new characterization approach is implemented and validated using MATLAB Simulink with a 10 F supercapacitor as an example. In addition, effects of initial voltages and temperatures on the supercapacitor self-discharge rate and the variable leakage resistance value are explored.

  3. Principle, system, and applications of tip-enhanced Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Zhang, MingQian; Wang, Rui; Wu, XiaoBin; Wang, Jia

    2012-08-01

    Raman spectroscopy is a powerful technique in chemical information characterization. However, this spectral method is subject to two obstacles in nano-material detection. One is diffraction limited spatial resolution, and the other is its inherent small Raman cross section and weak signaling. To resolve these problems, a new approach has been developed, denoted as tip-enhanced Raman spectroscopy (TERS). TERS is capable of high-resolution and high-sensitivity detection and demonstrated to be a promising spectroscopic and micro-topographic method to characterize nano-materials and nanostructures. In this paper, the principle and experimental system of TERS are discussed. The latest application of TERS in molecule detection, biological specimen identification, nanao-material characterization, and semi-conductor material determination with some specific experimental examples are presented.

  4. Statistical analysis of field data for aircraft warranties

    NASA Astrophysics Data System (ADS)

    Lakey, Mary J.

    Air Force and Navy maintenance data collection systems were researched to determine their scientific applicability to the warranty process. New and unique algorithms were developed to extract failure distributions which were then used to characterize how selected families of equipment typically fails. Families of similar equipment were identified in terms of function, technology and failure patterns. Statistical analyses and applications such as goodness-of-fit test, maximum likelihood estimation and derivation of confidence intervals for the probability density function parameters were applied to characterize the distributions and their failure patterns. Statistical and reliability theory, with relevance to equipment design and operational failures were also determining factors in characterizing the failure patterns of the equipment families. Inferences about the families with relevance to warranty needs were then made.

  5. Polydimethylsiloxane films doped with NdFeB powder: magnetic characterization and potential applications in biomedical engineering and microrobotics.

    PubMed

    Iacovacci, V; Lucarini, G; Innocenti, C; Comisso, N; Dario, P; Ricotti, L; Menciassi, A

    2015-12-01

    This work reports the fabrication, magnetic characterization and controlled navigation of film-shaped microrobots consisting of a polydimethylsiloxane-NdFeB powder composite material. The fabrication process relies on spin-coating deposition, powder orientation and permanent magnetization. Films with different powder concentrations (10 %, 30 %, 50 % and 70 % w/w) were fabricated and characterized in terms of magnetic properties and magnetic navigation performances (by exploiting an electromagnet-based platform). Standardized data are provided, thus enabling the exploitation of these composite materials in a wide range of applications, from MEMS/microrobot development to biomedical systems. Finally, the possibility to microfabricate free-standing polymeric structures and the biocompatibility of the proposed composite materials is demonstrated.

  6. Nanomaterial Dispersion/Dissolution Characterization: Scientific Operating Procedure SOP-F-1

    DTIC Science & Technology

    2016-05-01

    ER D C/ EL S R- 16 -1 Environmental Consequences of Nanotechnologies Nanomaterial Dispersion/Dissolution Characterization Scientific...Nanotechnologies ERDC/EL SR-16-1 May 2016 Nanomaterial Dispersion/Dissolution Characterization Scientific Operating Procedure SOP-F-1 Lesley Miller...diagnostic application. While microscopy represents the only available method for measuring particle size, this is very labor intensive and prone to

  7. Programmable Triplet Formation and Decay in Metal-Organic Chromophores

    DTIC Science & Technology

    2011-12-13

    potential applications in optical limiting molecules has resulted in the synthesis and characterization of many new classes of chromophores in...Castellano, F.N. Inorg. Chem. 2006, 45, 4304-4306. Inorganic Chemistry Cover May 29, 2006. The synthesis , structural characterization, and...The synthesis , photophysics, electronic structure, and electrochemical characterization of 4′-tert- butylacetylene-2,2′:6′,2″-terpyridineplatinum(II

  8. Characterization of Metakaolin-Based Geopolymer (Briefing chart)

    DTIC Science & Technology

    2014-08-31

    High Strain Rate Dynamic Characterization of Metakaolin and Fly Ash Bsed Geopolymers for Structural Applications The concrete community has...mechanical properties and microstructure of metakaolin geopolymer , obtain the static properties of metakaolin geopolymer for high strain rate Hopkinson...U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 Metakaolin, fly ash, geopolymer , characterization, high strain rate

  9. Electromagnetic diagnostic techniques for hypervelocity projectile detection, velocity measurement, and size characterization: Theoretical concept and first experimental test

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Uhlig, W. Casey; Heine, Andreas, E-mail: andreas.heine@emi.fraunhofer.de

    2015-11-14

    A new measurement technique is suggested to augment the characterization and understanding of hypervelocity projectiles before impact. The electromagnetic technique utilizes magnetic diffusion principles to detect particles, measure velocity, and indicate relative particle dimensions. It is particularly suited for detection of small particles that may be difficult to track utilizing current characterization methods, such as high-speed video or flash radiography but can be readily used for large particle detection, where particle spacing or location is not practical for other measurement systems. In this work, particles down to 2 mm in diameter have been characterized while focusing on confining the detection signalmore » to enable multi-particle characterization with limited particle-to-particle spacing. The focus of the paper is on the theoretical concept and the analysis of its applicability based on analytical and numerical calculation. First proof-of-principle experimental tests serve to further validate the method. Some potential applications are the characterization of particles from a shaped-charge jet after its break-up and investigating debris in impact experiments to test theoretical models for the distribution of particles size, number, and velocity.« less

  10. Low-energy transmission electron diffraction and imaging of large-area graphene

    PubMed Central

    Zhao, Wei; Xia, Bingyu; Lin, Li; Xiao, Xiaoyang; Liu, Peng; Lin, Xiaoyang; Peng, Hailin; Zhu, Yuanmin; Yu, Rong; Lei, Peng; Wang, Jiangtao; Zhang, Lina; Xu, Yong; Zhao, Mingwen; Peng, Lianmao; Li, Qunqing; Duan, Wenhui; Liu, Zhongfan; Fan, Shoushan; Jiang, Kaili

    2017-01-01

    Two-dimensional (2D) materials have attracted interest because of their excellent properties and potential applications. A key step in realizing industrial applications is to synthesize wafer-scale single-crystal samples. Until now, single-crystal samples, such as graphene domains up to the centimeter scale, have been synthesized. However, a new challenge is to efficiently characterize large-area samples. Currently, the crystalline characterization of these samples still relies on selected-area electron diffraction (SAED) or low-energy electron diffraction (LEED), which is more suitable for characterizing very small local regions. This paper presents a highly efficient characterization technique that adopts a low-energy electrostatically focused electron gun and a super-aligned carbon nanotube (SACNT) film sample support. It allows rapid crystalline characterization of large-area graphene through a single photograph of a transmission-diffracted image at a large beam size. Additionally, the low-energy electron beam enables the observation of a unique diffraction pattern of adsorbates on the suspended graphene at room temperature. This work presents a simple and convenient method for characterizing the macroscopic structures of 2D materials, and the instrument we constructed allows the study of the weak interaction with 2D materials. PMID:28879233

  11. Low-energy transmission electron diffraction and imaging of large-area graphene.

    PubMed

    Zhao, Wei; Xia, Bingyu; Lin, Li; Xiao, Xiaoyang; Liu, Peng; Lin, Xiaoyang; Peng, Hailin; Zhu, Yuanmin; Yu, Rong; Lei, Peng; Wang, Jiangtao; Zhang, Lina; Xu, Yong; Zhao, Mingwen; Peng, Lianmao; Li, Qunqing; Duan, Wenhui; Liu, Zhongfan; Fan, Shoushan; Jiang, Kaili

    2017-09-01

    Two-dimensional (2D) materials have attracted interest because of their excellent properties and potential applications. A key step in realizing industrial applications is to synthesize wafer-scale single-crystal samples. Until now, single-crystal samples, such as graphene domains up to the centimeter scale, have been synthesized. However, a new challenge is to efficiently characterize large-area samples. Currently, the crystalline characterization of these samples still relies on selected-area electron diffraction (SAED) or low-energy electron diffraction (LEED), which is more suitable for characterizing very small local regions. This paper presents a highly efficient characterization technique that adopts a low-energy electrostatically focused electron gun and a super-aligned carbon nanotube (SACNT) film sample support. It allows rapid crystalline characterization of large-area graphene through a single photograph of a transmission-diffracted image at a large beam size. Additionally, the low-energy electron beam enables the observation of a unique diffraction pattern of adsorbates on the suspended graphene at room temperature. This work presents a simple and convenient method for characterizing the macroscopic structures of 2D materials, and the instrument we constructed allows the study of the weak interaction with 2D materials.

  12. File-System Workload on a Scientific Multiprocessor

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1995-01-01

    Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.

  13. On Characterizing Particle Shape

    NASA Technical Reports Server (NTRS)

    Ennis, Bryan J.; Rickman, Douglas; Rollins, A. Brent; Ennis, Brandon

    2014-01-01

    It is well known that particle shape affects flow characteristics of granular materials, as well as a variety of other solids processing issues such as compaction, rheology, filtration and other two-phase flow problems. The impact of shape crosses many diverse and commercially important applications, including pharmaceuticals, civil engineering, metallurgy, health, and food processing. Two applications studied here include the dry solids flow of lunar simulants (e.g. JSC-1, NU-LHT-2M, OB-1), and the flow properties of wet concrete, including final compressive strength. A multi-dimensional generalized, engineering method to quantitatively characterize particle shapes has been developed, applicable to both single particle orientation and multi-particle assemblies. The two-dimension, three dimension inversion problem is also treated, and the application of these methods to DEM model particles will be discussed. In the case of lunar simulants, flow properties of six lunar simulants have been measured, and the impact of particle shape on flowability - as characterized by the shape method developed here -- is discussed, especially in the context of three simulants of similar size range. In the context of concrete processing, concrete construction is a major contributor to greenhouse gas production, of which the major contributor is cement binding loading. Any optimization in concrete rheology and packing that can reduce cement loading and improve strength loading can also reduce currently required construction safety factors. The characterization approach here is also demonstrated for the impact of rock aggregate shape on concrete slump rheology and dry compressive strength.

  14. Spray Deposition and Drift Characteristics of a Low Drift Nozzle for Aerial Application at Different Application Altitudes

    USDA-ARS?s Scientific Manuscript database

    A complex interaction of controllable and uncontrollable factors is involved in aerial application of crop production and protection materials. Although it is difficult to completely characterize spray deposition and drift, these important factors can be estimated with appropriate sampling protocol ...

  15. System Quality Characteristics for Selecting Mobile Learning Applications

    ERIC Educational Resources Information Center

    Sarrab, Mohamed; Al-Shihi, Hafedh; Al-Manthari, Bader

    2015-01-01

    The majority of M-learning (Mobile learning) applications available today are developed for the formal learning and education environment. These applications are characterized by the improvement in the interaction between learners and instructors to provide high interaction and flexibility to the learning process. M-learning is gaining increased…

  16. Power-constrained supercomputing

    NASA Astrophysics Data System (ADS)

    Bailey, Peter E.

    As we approach exascale systems, power is turning from an optimization goal to a critical operating constraint. With power bounds imposed by both stakeholders and the limitations of existing infrastructure, achieving practical exascale computing will therefore rely on optimizing performance subject to a power constraint. However, this requirement should not add to the burden of application developers; optimizing the runtime environment given restricted power will primarily be the job of high-performance system software. In this dissertation, we explore this area and develop new techniques that extract maximum performance subject to a particular power constraint. These techniques include a method to find theoretical optimal performance, a runtime system that shifts power in real time to improve performance, and a node-level prediction model for selecting power-efficient operating points. We use a linear programming (LP) formulation to optimize application schedules under various power constraints, where a schedule consists of a DVFS state and number of OpenMP threads for each section of computation between consecutive message passing events. We also provide a more flexible mixed integer-linear (ILP) formulation and show that the resulting schedules closely match schedules from the LP formulation. Across four applications, we use our LP-derived upper bounds to show that current approaches trail optimal, power-constrained performance by up to 41%. This demonstrates limitations of current systems, and our LP formulation provides future optimization approaches with a quantitative optimization target. We also introduce Conductor, a run-time system that intelligently distributes available power to nodes and cores to improve performance. The key techniques used are configuration space exploration and adaptive power balancing. Configuration exploration dynamically selects the optimal thread concurrency level and DVFS state subject to a hardware-enforced power bound. Adaptive power balancing efficiently predicts where critical paths are likely to occur and distributes power to those paths. Greater power, in turn, allows increased thread concurrency levels, CPU frequency/voltage, or both. We describe these techniques in detail and show that, compared to the state-of-the-art technique of using statically predetermined, per-node power caps, Conductor leads to a best-case performance improvement of up to 30%, and an average improvement of 19.1%. At the node level, an accurate power/performance model will aid in selecting the right configuration from a large set of available configurations. We present a novel approach to generate such a model offline using kernel clustering and multivariate linear regression. Our model requires only two iterations to select a configuration, which provides a significant advantage over exhaustive search-based strategies. We apply our model to predict power and performance for different applications using arbitrary configurations, and show that our model, when used with hardware frequency-limiting in a runtime system, selects configurations with significantly higher performance at a given power limit than those chosen by frequency-limiting alone. When applied to a set of 36 computational kernels from a range of applications, our model accurately predicts power and performance; our runtime system based on the model maintains 91% of optimal performance while meeting power constraints 88% of the time. When the runtime system violates a power constraint, it exceeds the constraint by only 6% in the average case, while simultaneously achieving 54% more performance than an oracle. Through the combination of the above contributions, we hope to provide guidance and inspiration to research practitioners working on runtime systems for power-constrained environments. We also hope this dissertation will draw attention to the need for software and runtime-controlled power management under power constraints at various levels, from the processor level to the cluster level.

  17. PS3 CELL Development for Scientific Computation and Research

    NASA Astrophysics Data System (ADS)

    Christiansen, M.; Sevre, E.; Wang, S. M.; Yuen, D. A.; Liu, S.; Lyness, M. D.; Broten, M.

    2007-12-01

    The Cell processor is one of the most powerful processors on the market, and researchers in the earth sciences may find its parallel architecture to be very useful. A cell processor, with 7 cores, can easily be obtained for experimentation by purchasing a PlayStation 3 (PS3) and installing linux and the IBM SDK. Each core of the PS3 is capable of 25 GFLOPS giving a potential limit of 150 GFLOPS when using all 6 SPUs (synergistic processing units) by using vectorized algorithms. We have used the Cell's computational power to create a program which takes simulated tsunami datasets, parses them, and returns a colorized height field image using ray casting techniques. As expected, the time required to create an image is inversely proportional to the number of SPUs used. We believe that this trend will continue when multiple PS3s are chained using OpenMP functionality and are in the process of researching this. By using the Cell to visualize tsunami data, we have found that its greatest feature is its power. This fact entwines well with the needs of the scientific community where the limiting factor is time. Any algorithm, such as the heat equation, that can be subdivided into multiple parts can take advantage of the PS3 Cell's ability to split the computations across the 6 SPUs reducing required run time by one sixth. Further vectorization of the code can allow for 4 simultanious floating point operations by using the SIMD (single instruction multiple data) capabilities of the SPU increasing efficiency 24 times.

  18. A GPU-accelerated semi-implicit fractional-step method for numerical solutions of incompressible Navier-Stokes equations

    NASA Astrophysics Data System (ADS)

    Ha, Sanghyun; Park, Junshin; You, Donghyun

    2018-01-01

    Utility of the computational power of Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. The Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution methods used in the semi-implicit fractional-step method take advantage of multiple tridiagonal matrices whose inversion is known as the major bottleneck for acceleration on a typical multi-core machine. A novel implementation of the semi-implicit fractional-step method designed for GPU acceleration of the incompressible Navier-Stokes equations is presented. Aspects of the programing model of Compute Unified Device Architecture (CUDA), which are critical to the bandwidth-bound nature of the present method are discussed in detail. A data layout for efficient use of CUDA libraries is proposed for acceleration of tridiagonal matrix inversion and fast Fourier transform. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while the Navier-Stokes equations are computed on a GPU. Performance of the present method using CUDA is assessed by comparing the speed of solving three tridiagonal matrices using ADI with the speed of solving one heptadiagonal matrix using a conjugate gradient method. An overall speedup of 20 times is achieved using a Tesla K40 GPU in comparison with a single-core Xeon E5-2660 v3 CPU in simulations of turbulent boundary-layer flow over a flat plate conducted on over 134 million grids. Enhanced performance of 48 times speedup is reached for the same problem using a Tesla P100 GPU.

  19. The High Level Data Reduction Library

    NASA Astrophysics Data System (ADS)

    Ballester, P.; Gabasch, A.; Jung, Y.; Modigliani, A.; Taylor, J.; Coccato, L.; Freudling, W.; Neeser, M.; Marchetti, E.

    2015-09-01

    The European Southern Observatory (ESO) provides pipelines to reduce data for most of the instruments at its Very Large telescope (VLT). These pipelines are written as part of the development of VLT instruments, and are used both in the ESO's operational environment and by science users who receive VLT data. All the pipelines are highly specific geared toward instruments. However, experience showed that the independently developed pipelines include significant overlap, duplication and slight variations of similar algorithms. In order to reduce the cost of development, verification and maintenance of ESO pipelines, and at the same time improve the scientific quality of pipelines data products, ESO decided to develop a limited set of versatile high-level scientific functions that are to be used in all future pipelines. The routines are provided by the High-level Data Reduction Library (HDRL). To reach this goal, we first compare several candidate algorithms and verify them during a prototype phase using data sets from several instruments. Once the best algorithm and error model have been chosen, we start a design and implementation phase. The coding of HDRL is done in plain C and using the Common Pipeline Library (CPL) functionality. HDRL adopts consistent function naming conventions and a well defined API to minimise future maintenance costs, implements error propagation, uses pixel quality information, employs OpenMP to take advantage of multi-core processors, and is verified with extensive unit and regression tests. This poster describes the status of the project and the lesson learned during the development of reusable code implementing algorithms of high scientific quality.

  20. DISPATCH: a numerical simulation framework for the exa-scale era - I. Fundamentals

    NASA Astrophysics Data System (ADS)

    Nordlund, Åke; Ramsey, Jon P.; Popovas, Andrius; Küffmeier, Michael

    2018-06-01

    We introduce a high-performance simulation framework that permits the semi-independent, task-based solution of sets of partial differential equations, typically manifesting as updates to a collection of `patches' in space-time. A hybrid MPI/OpenMP execution model is adopted, where work tasks are controlled by a rank-local `dispatcher' which selects, from a set of tasks generally much larger than the number of physical cores (or hardware threads), tasks that are ready for updating. The definition of a task can vary, for example, with some solving the equations of ideal magnetohydrodynamics (MHD), others non-ideal MHD, radiative transfer, or particle motion, and yet others applying particle-in-cell (PIC) methods. Tasks do not have to be grid based, while tasks that are, may use either Cartesian or orthogonal curvilinear meshes. Patches may be stationary or moving. Mesh refinement can be static or dynamic. A feature of decisive importance for the overall performance of the framework is that time-steps are determined and applied locally; this allows potentially large reductions in the total number of updates required in cases when the signal speed varies greatly across the computational domain, and therefore a corresponding reduction in computing time. Another feature is a load balancing algorithm that operates `locally' and aims to simultaneously minimize load and communication imbalance. The framework generally relies on already existing solvers, whose performance is augmented when run under the framework, due to more efficient cache usage, vectorization, local time-stepping, plus near-linear and, in principle, unlimited OpenMP and MPI scaling.

  1. Research on bathymetry estimation by Worldview-2 based with the semi-analytical model

    NASA Astrophysics Data System (ADS)

    Sheng, L.; Bai, J.; Zhou, G.-W.; Zhao, Y.; Li, Y.-C.

    2015-04-01

    South Sea Islands of China are far away from the mainland, the reefs takes more than 95% of south sea, and most reefs scatter over interested dispute sensitive area. Thus, the methods of obtaining the reefs bathymetry accurately are urgent to be developed. Common used method, including sonar, airborne laser and remote sensing estimation, are limited by the long distance, large area and sensitive location. Remote sensing data provides an effective way for bathymetry estimation without touching over large area, by the relationship between spectrum information and bathymetry. Aimed at the water quality of the south sea of China, our paper develops a bathymetry estimation method without measured water depth. Firstly the semi-analytical optimization model of the theoretical interpretation models has been studied based on the genetic algorithm to optimize the model. Meanwhile, OpenMP parallel computing algorithm has been introduced to greatly increase the speed of the semi-analytical optimization model. One island of south sea in China is selected as our study area, the measured water depth are used to evaluate the accuracy of bathymetry estimation from Worldview-2 multispectral images. The results show that: the semi-analytical optimization model based on genetic algorithm has good results in our study area;the accuracy of estimated bathymetry in the 0-20 meters shallow water area is accepted.Semi-analytical optimization model based on genetic algorithm solves the problem of the bathymetry estimation without water depth measurement. Generally, our paper provides a new bathymetry estimation method for the sensitive reefs far away from mainland.

  2. A fast algorithm for forward-modeling of gravitational fields in spherical coordinates with 3D Gauss-Legendre quadrature

    NASA Astrophysics Data System (ADS)

    Zhao, G.; Liu, J.; Chen, B.; Guo, R.; Chen, L.

    2017-12-01

    Forward modeling of gravitational fields at large-scale requires to consider the curvature of the Earth and to evaluate the Newton's volume integral in spherical coordinates. To acquire fast and accurate gravitational effects for subsurface structures, subsurface mass distribution is usually discretized into small spherical prisms (called tesseroids). The gravity fields of tesseroids are generally calculated numerically. One of the commonly used numerical methods is the 3D Gauss-Legendre quadrature (GLQ). However, the traditional GLQ integration suffers from low computational efficiency and relatively poor accuracy when the observation surface is close to the source region. We developed a fast and high accuracy 3D GLQ integration based on the equivalence of kernel matrix, adaptive discretization and parallelization using OpenMP. The equivalence of kernel matrix strategy increases efficiency and reduces memory consumption by calculating and storing the same matrix elements in each kernel matrix just one time. In this method, the adaptive discretization strategy is used to improve the accuracy. The numerical investigations show that the executing time of the proposed method is reduced by two orders of magnitude compared with the traditional method that without these optimized strategies. High accuracy results can also be guaranteed no matter how close the computation points to the source region. In addition, the algorithm dramatically reduces the memory requirement by N times compared with the traditional method, where N is the number of discretization of the source region in the longitudinal direction. It makes the large-scale gravity forward modeling and inversion with a fine discretization possible.

  3. The Development of High-speed Full-function Storm Surge Model and the Case Study of 2013 Typhoon Haiyan

    NASA Astrophysics Data System (ADS)

    Tsai, Y. L.; Wu, T. R.; Lin, C. Y.; Chuang, M. H.; Lin, C. W.

    2016-02-01

    An ideal storm surge operational model should feature as: 1. Large computational domain which covers the complete typhoon life cycle. 2. Supporting both parametric and atmospheric models. 3. Capable of calculating inundation area for risk assessment. 4. Tides are included for accurate inundation simulation. Literature review shows that not many operational models reach the goals for the fast calculation, and most of the models have limited functions. In this paper, a well-developed COMCOT (COrnell Multi-grid Coupled of Tsunami Model) tsunami model is chosen as the kernel to establish a storm surge model which solves the nonlinear shallow water equations on both spherical and Cartesian coordinates directly. The complete evolution of storm surge including large-scale propagation and small-scale offshore run-up can be simulated by nested-grid scheme. The global tide model TPXO 7.2 established by Oregon State University is coupled to provide astronomical boundary conditions. The atmospheric model named WRF (Weather Research and Forecasting Model) is also coupled to provide metrological fields. The high-efficiency thin-film method is adopted to evaluate the storm surge inundation. Our in-house model has been optimized by OpenMp (Open Multi-Processing) with the performance which is 10 times faster than the original version and makes it an early-warning storm surge model. In this study, the thorough simulation of 2013 Typhoon Haiyan is performed. The detailed results will be presented in Oceanic Science Meeting of 2016 in terms of surge propagation and high-resolution inundation areas.

  4. Characterization of the solvent properties of oleochemical carbonates

    USDA-ARS?s Scientific Manuscript database

    Oleophilic carbonates, such as hexadecyl carbonate, can be characterized with respect to their solvent properties using inverse gas chromatography (IGC). Physicochemical properties of these renewable lipid derivatives are of importance for applications such as their use as phase change materials, fu...

  5. Optical Metamaterials: Design, Characterization and Applications

    ERIC Educational Resources Information Center

    Chaturvedi, Pratik

    2009-01-01

    Artificially engineered metamaterials have emerged with properties and functionalities previously unattainable in natural materials. The scientific breakthroughs made in this new class of electromagnetic materials are closely linked with progress in developing physics-driven design, novel fabrication and characterization methods. The intricate…

  6. Incorporation of Fiber Bragg Sensors for Shape Memory Polyurethanes Characterization.

    PubMed

    Alberto, Nélia; Fonseca, Maria A; Neto, Victor; Nogueira, Rogério; Oliveira, Mónica; Moreira, Rui

    2017-11-11

    Shape memory polyurethanes (SMPUs) are thermally activated shape memory materials, which can be used as actuators or sensors in applications including aerospace, aeronautics, automobiles or the biomedical industry. The accurate characterization of the memory effect of these materials is therefore mandatory for the technology's success. The shape memory characterization is normally accomplished using mechanical testing coupled with a heat source, where a detailed knowledge of the heat cycle and its influence on the material properties is paramount but difficult to monitor. In this work, fiber Bragg grating (FBG) sensors were embedded into SMPU samples aiming to study and characterize its shape memory effect. The samples were obtained by injection molding, and the entire processing cycle was successfully monitored, providing a process global quality signature. Moreover, the integrity and functionality of the FBG sensors were maintained during and after the embedding process, demonstrating the feasibility of the technology chosen for the purpose envisaged. The results of the shape memory effect characterization demonstrate a good correlation between the reflected FBG peak with the temperature and induced strain, proving that this technology is suitable for this particular application.

  7. Incorporation of Fiber Bragg Sensors for Shape Memory Polyurethanes Characterization

    PubMed Central

    Nogueira, Rogério; Moreira, Rui

    2017-01-01

    Shape memory polyurethanes (SMPUs) are thermally activated shape memory materials, which can be used as actuators or sensors in applications including aerospace, aeronautics, automobiles or the biomedical industry. The accurate characterization of the memory effect of these materials is therefore mandatory for the technology’s success. The shape memory characterization is normally accomplished using mechanical testing coupled with a heat source, where a detailed knowledge of the heat cycle and its influence on the material properties is paramount but difficult to monitor. In this work, fiber Bragg grating (FBG) sensors were embedded into SMPU samples aiming to study and characterize its shape memory effect. The samples were obtained by injection molding, and the entire processing cycle was successfully monitored, providing a process global quality signature. Moreover, the integrity and functionality of the FBG sensors were maintained during and after the embedding process, demonstrating the feasibility of the technology chosen for the purpose envisaged. The results of the shape memory effect characterization demonstrate a good correlation between the reflected FBG peak with the temperature and induced strain, proving that this technology is suitable for this particular application. PMID:29137136

  8. 3D Elastic Wavefield Tomography

    NASA Astrophysics Data System (ADS)

    Guasch, L.; Warner, M.; Stekl, I.; Umpleby, A.; Shah, N.

    2010-12-01

    Wavefield tomography, or waveform inversion, aims to extract the maximum information from seismic data by matching trace by trace the response of the solid earth to seismic waves using numerical modelling tools. Its first formulation dates from the early 80's, when Albert Tarantola developed a solid theoretical basis that is still used today with little change. Due to computational limitations, the application of the method to 3D problems has been unaffordable until a few years ago, and then only under the acoustic approximation. Although acoustic wavefield tomography is widely used, a complete solution of the seismic inversion problem requires that we account properly for the physics of wave propagation, and so must include elastic effects. We have developed a 3D tomographic wavefield inversion code that incorporates the full elastic wave equation. The bottle neck of the different implementations is the forward modelling algorithm that generates the synthetic data to be compared with the field seismograms as well as the backpropagation of the residuals needed to form the direction update of the model parameters. Furthermore, one or two extra modelling runs are needed in order to calculate the step-length. Our approach uses a FD scheme explicit time-stepping by finite differences that are 4th order in space and 2nd order in time, which is a 3D version of the one developed by Jean Virieux in 1986. We chose the time domain because an explicit time scheme is much less demanding in terms of memory than its frequency domain analogue, although the discussion of wich domain is more efficient still remains open. We calculate the parameter gradients for Vp and Vs by correlating the normal and shear stress wavefields respectively. A straightforward application would lead to the storage of the wavefield at all grid points at each time-step. We tackled this problem using two different approaches. The first one makes better use of resources for small models of dimension equal or less than 300x300x300 nodes, and it under-samples the wavefield reducing the number of stored time-steps by an order of magnitude. For bigger models the wavefield is stored only at the boundaries of the model and then re-injected while the residuals are backpropagated allowing to compute the correlation 'on the fly'. In terms of computational resource, the elastic code is an order of magnitude more demanding than the equivalent acoustic code. We have combined shared memory with distributed memory parallelisation using OpenMP and MPI respectively. Thus, we take advantage of the increasingly common multi-core architecture processors. We have successfully applied our inversion algorithm to different realistic complex 3D models. The models had non-linear relations between pressure and shear wave velocities. The shorter wavelengths of the shear waves improve the resolution of the images obtained with respect to a purely acoustic approach.

  9. Holographic Recordings in Dye/Polymer Systems For Engineering Applications

    NASA Astrophysics Data System (ADS)

    Lessard, Roger A.; Couture, Jean J.

    1990-04-01

    Since Gabor's first demonstration of reconstructed wavefronts, many holographic techniques provided interesting tools and applications. Presently the future of holography is strongly dependent upon new holographic recording thin films. Because of their excellent responses to high spatial frequency grating recordings (up to 6800 cycles/mm), photopolymers and photocrosslinking materials seem to be good candidates to overcome some limitations. Dichromated gelatin films demons-trated excellent properties for permanent recording grating applications like HOE construction but they are humidity sensitive and they need a chemical development. Today's holographic works need real-time like recording material and law cost organic materials as DYE/POLYMER systems offer some possibilities. We present a review of research works done in our holography laboratories of COPL at Universite Laval. Using an automated spatial frequency analyzer designed at COPL, DYE/POLYMER systems are characterized for transmission holography and also for applications involving real-time holography and four-wave mixing techniques. Also, most of our characterization studies consider volume polarization holograms. The second subject is devoted to polarization hologram recordings in thin colored polyvinyl alcohol films. Those AZO/WA solid films are erasable and can be used for many thousands duty cycles for polarization volume holograms. Holographic characterization studies are conducted in order to know best experimental conditions and applications that allow to use those films. Finally, sensitized PVA films will be discussed.

  10. Flexible microwave ablation applicator for the treatment of pulmonary malignancies

    NASA Astrophysics Data System (ADS)

    Pfannenstiel, Austin; Keast, Tom; Kramer, Steve; Wibowo, Henky; Prakash, Punit

    2017-02-01

    Microwave ablation (MWA) is an emerging minimally invasive treatment option for malignant lung tumors. Compared to other energy modalities, such as radiofrequency ablation, MWA offers the advantages of deeper penetration within high impedance tissues such as aerated lung, shorter treatment times, and less susceptibility to the cooling heat-sink effects of air and blood flow. Previous studies have demonstrated clinical use of MWA for treating lung tumors; however, these procedures have relied upon the percutaneous application of rigid microwave antennas. The objective of our work was to develop and characterize a novel flexible microwave applicator which could be integrated with a bronchoscopic imaging and software guidance platform to expand the use of MWA as a treatment option for small (< 2cm) pulmonary tumors. This applicator would allow physicians an even less invasive, immediate treatment option for lung tumors identified within the scope of current medical procedures. It may also improve applicator placement accuracy and increase efficacy while minimizing the risk of procedural complications. A 2D-axisymmetric coupled FEM electromagnetic-heat transfer model was implemented to characterize expected antenna radiation patterns, ablation size and shape, and optimize antenna design for lung tissue. A prototype device was fabricated and evaluated in ex vivo tissues to verify simulation results and serve as proof-of-concept. Additional experiments were conducted in an in vivo animal model to further characterize the proposed system.

  11. Application Of Positron Beams For The Characterization Of Nano-scale Pores In Thin Films

    NASA Astrophysics Data System (ADS)

    Hirata, K.; Ito, K.; Kobayashi, Y.; Suzuki, R.; Ohdaira, T.; Eijt, S. W. H.; Schut, H.; van Veen, A.

    2003-08-01

    We applied three positron annihilation techniques, positron 3γ-annihilation spectroscopy, positron annihilation lifetime spectroscopy, and angular correlation of annihilation radiation, to the characterization of nano-scale pores in thin films by combining them with variable-energy positron beams. Characterization of pores in thin films is an important part of the research on various thin films of industrial importance. The results of our recent studies on pore characterization of thin films by positron beams will be reported here.

  12. Analysis of in-service failures and advances in microstructural characterization. Microstructural science Volume 26

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abramovici, E.; Northwood, D.O.; Shehata, M.T.

    1999-01-01

    The contents include Analysis of In-Service Failures (tutorials, transportation industry, corrosion and materials degradation, electronic and advanced materials); 1998 Sorby Award Lecture by Kay Geels, Struers A/S (Metallographic Preparation from Sorby to the Present); Advances in Microstructural Characterization (characterization techniques using high resolution and focused ion beam, characterization of microstructural clustering and correlation with performance); Advanced Applications (advanced alloys and intermetallic compounds, plasma spray coatings and other surface coatings, corrosion, and materials degradation).

  13. Study to perform preliminary experiments to evaluate particle generation and characterization techniques for zero-gravity cloud physics experiments

    NASA Technical Reports Server (NTRS)

    Katz, U.

    1982-01-01

    Methods of particle generation and characterization with regard to their applicability for experiments requiring cloud condensation nuclei (CCN) of specified properties were investigated. Since aerosol characterization is a prerequisite to assessing performance of particle generation equipment, techniques for characterizing aerosol were evaluated. Aerosol generation is discussed, and atomizer and photolytic generators including preparation of hydrosols (used with atomizers) and the evaluation of a flight version of an atomizer are studied.

  14. Pulsed CO2 characterization for lidar use

    NASA Technical Reports Server (NTRS)

    Jaenisch, Holger M.

    1992-01-01

    An account is given of a scaled functional testbed laser for space-qualified coherent-detection lidar applications which employs a CO2 laser. This laser has undergone modification and characterization for inherent performance capabilities as a model of coherent detection. While characterization results show good overall performance that is in agreement with theoretical predictions, frequency-stability and pulse-length limitations severely limit the laser's use in coherent detection.

  15. Multi-scale Computational Electromagnetics for Phenomenology and Saliency Characterization in Remote Sensing

    DTIC Science & Technology

    2016-07-15

    AFRL-AFOSR-JP-TR-2016-0068 Multi-scale Computational Electromagnetics for Phenomenology and Saliency Characterization in Remote Sensing Hean-Teik...SUBTITLE Multi-scale Computational Electromagnetics for Phenomenology and Saliency Characterization in Remote Sensing 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER... electromagnetics to the application in microwave remote sensing as well as extension of modelling capability with computational flexibility to study

  16. Multi-scale Computational Electromagnetics for Phenomenology and Saliency Characterization in Remote Sensing

    DTIC Science & Technology

    2016-07-15

    AFRL-AFOSR-JP-TR-2016-0068 Multi-scale Computational Electromagnetics for Phenomenology and Saliency Characterization in Remote Sensing Hean-Teik...SUBTITLE Multi-scale Computational Electromagnetics for Phenomenology and Saliency Characterization in Remote Sensing 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER...electromagnetics to the application in microwave remote sensing as well as extension of modelling capability with computational flexibility to study

  17. Analytical ultrasonics for structural materials

    NASA Technical Reports Server (NTRS)

    Kupperman, D. S.

    1986-01-01

    The application of ultrasonic velocity and attenuation measurements to characterize the microstructure of structural materials is discussed. Velocity measurements in cast stainless steel are correlated with microstructural variations ranging from equiaxed (elastically isotropic) to columnar (elastically anisotropic) grain structure. The effect of the anisotropic grain structure on the deviation of ultrasonic waves in cast stainless steel is also reported. Field-implementable techniques for distinguishing equiaxed from columnar grain structures in cast strainless steel structural members are presented. The application of ultrasonic velocity measurements to characterize structural ceramics in the green state is also discussed.

  18. Electro-optical and Magneto-optical Sensing Apparatus and Method for Characterizing Free-space Electromagnetic Radiation

    DOEpatents

    Zhang, Xi-Cheng; Riordan, Jenifer Ann; Sun, Feng-Guo

    2000-08-29

    Apparatus and methods for characterizing free-space electromagnetic energy, and in particular, apparatus/method suitable for real-time two-dimensional far-infrared imaging applications are presented. The sensing technique is based on a non-linear coupling between a low-frequency electric (or magnetic) field and a laser beam in an electro-optic (or magnetic-optic) crystal. In addition to a practical counter-propagating sensing technique, a co-linear approach is described which provides longer radiated field-optical beam interaction length, thereby making imaging applications practical.

  19. Increasing Heavy Oil Reserves in the Wilmington Oil Field Through Advanced Reservoir Characterization and Thermal Production Technologies, Class III

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    City of Long Beach; Tidelands Oil Production Company; University of Southern California

    2002-09-30

    The objective of this project was to increase the recoverable heavy oil reserves within sections of the Wilmington Oil Field, near Long Beach, California through the testing and application of advanced reservoir characterization and thermal production technologies. It was hoped that the successful application of these technologies would result in their implementation throughout the Wilmington Field and, through technology transfer, will be extended to increase the recoverable oil reserves in other slope and basin clastic (SBC) reservoirs.

  20. Creep analysis of silicone for podiatry applications.

    PubMed

    Janeiro-Arocas, Julia; Tarrío-Saavedra, Javier; López-Beceiro, Jorge; Naya, Salvador; López-Canosa, Adrián; Heredia-García, Nicolás; Artiaga, Ramón

    2016-10-01

    This work shows an effective methodology to characterize the creep-recovery behavior of silicones before their application in podiatry. The aim is to characterize, model and compare the creep-recovery properties of different types of silicone used in podiatry orthotics. Creep-recovery phenomena of silicones used in podiatry orthotics is characterized by dynamic mechanical analysis (DMA). Silicones provided by Herbitas are compared by observing their viscoelastic properties by Functional Data Analysis (FDA) and nonlinear regression. The relationship between strain and time is modeled by fixed and mixed effects nonlinear regression to compare easily and intuitively podiatry silicones. Functional ANOVA and Kohlrausch-Willians-Watts (KWW) model with fixed and mixed effects allows us to compare different silicones observing the values of fitting parameters and their physical meaning. The differences between silicones are related to the variations of breadth of creep-recovery time distribution and instantaneous deformation-permanent strain. Nevertheless, the mean creep-relaxation time is the same for all the studied silicones. Silicones used in palliative orthoses have higher instantaneous deformation-permanent strain and narrower creep-recovery distribution. The proposed methodology based on DMA, FDA and nonlinear regression is an useful tool to characterize and choose the proper silicone for each podiatry application according to their viscoelastic properties. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. An Evaluation of ALOS Data in Disaster Applications

    NASA Astrophysics Data System (ADS)

    Igarashi, Tamotsu; Igarashi, Tamotsu; Furuta, Ryoich; Ono, Makoto

    ALOS is the advanced land observing satellite, providing image data from onboard sensors; PRISM, AVNIR-2 and PALSAR. PRISM is the sensor of panchromatic stereo, high resolution three-line-scanner to characterize the earth surface. The accuracy of position in image and height of Digital Surface Model (DSM) are high, therefore the geographic information extraction is improved in the field of disaster applications with providing images of disaster area. Especially pan-sharpened 3D image composed with PRISM and the four-band visible near-infrared radiometer AVNIR-2 data is expected to provide information to understand the geographic and topographic feature. PALSAR is the advanced multi-functional synthetic aperture radar (SAR) operated in L-band, appropriate for the use of land surface feature characterization. PALSAR has many improvements from JERS-1/SAR, such as high sensitivity, having high resolution, polarimetric and scan SAR observation modes. PALSAR is also applicable for SAR interferometry processing. This paper describes the evaluation of ALOS data characteristic from the view point of disaster applications, through some exercise applications.

  2. Recent advances in non-ionic surfactant vesicles (niosomes): self-assembly, fabrication, characterization, drug delivery applications and limitations.

    PubMed

    Abdelkader, Hamdy; Alani, Adam W G; Alany, Raid G

    2014-03-01

    Non-ionic surfactant vesicles, simply known as niosomes are synthetic vesicles with potential technological applications. Niosomes have the same potential advantages of phospholipid vesicles (liposomes) of being able to accommodate both water soluble and lipid soluble drug molecules control their release and as such serve as versatile drug delivery devices of numerous applications. Additionally, niosomes can be considered as more economically, chemically, and occasionally physically stable alternatives to liposomes. Niosomes can be fabricated using simple methods of preparations and from widely used surfactants in pharmaceutical technology. Many reports have discussed niosomes in terms of physicochemical properties and their applications as drug delivery systems. In this report, a brief and simplified summary of different theories of self-assembly will be given. Furthermore manufacturing methods, physical characterization techniques, bilayer membrane additives, unconventional niosomes (discomes, proniosomes, elastic and polyhedral niosomes), their recent applications as drug delivery systems, limitations and directions for future research will be discussed.

  3. Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sreepathi, Sarat; D'Azevedo, Eduardo; Philip, Bobby

    On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phasemore » of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.« less

  4. Processor Emulator with Benchmark Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lloyd, G. Scott; Pearce, Roger; Gokhale, Maya

    2015-11-13

    A processor emulator and a suite of benchmark applications have been developed to assist in characterizing the performance of data-centric workloads on current and future computer architectures. Some of the applications have been collected from other open source projects. For more details on the emulator and an example of its usage, see reference [1].

  5. Synthesis and characterization of corn oil polyhydroxy fatty acids designed as additive agent for many applications

    USDA-ARS?s Scientific Manuscript database

    Before the advent of the modern food industry, vegetable oils (triglycerides) from many sources had a long history of use as condiments in cooking, personal care and other therapeutic applications. Industrial applications of vegetable oils, on the other hand, have been limited on account of the shor...

  6. Modular HPC I/O characterization with Darshan

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Snyder, Shane; Carns, Philip; Harms, Kevin

    2016-11-13

    Contemporary high-performance computing (HPC) applications encompass a broad range of distinct I/O strategies and are often executed on a number of different compute platforms in their lifetime. These large-scale HPC platforms employ increasingly complex I/O subsystems to provide a suitable level of I/O performance to applications. Tuning I/O workloads for such a system is nontrivial, and the results generally are not portable to other HPC systems. I/O profiling tools can help to address this challenge, but most existing tools only instrument specific components within the I/O subsystem that provide a limited perspective on I/O performance. The increasing diversity of scientificmore » applications and computing platforms calls for greater flexibililty and scope in I/O characterization.« less

  7. Thermal characterization of three-dimensional printed components for light-emitting diode lighting system applications

    NASA Astrophysics Data System (ADS)

    Perera, Indika U.; Narendran, Nadarajah; Terentyeva, Valeria

    2018-04-01

    This study investigated the thermal properties of three-dimensional (3-D) printed components with the potential to be used for thermal management in light-emitting diode (LED) applications. Commercially available filament materials with and without a metal filler were characterized with changes to the print orientation. 3-D printed components with an in-plane orientation had >30 % better effective thermal conductivity compared with components printed with a cross-plane orientation. A finite-element analysis was modeled to understand the effective thermal conductivity changes in the 3-D printed components. A simple thermal resistance model was used to estimate the required effective thermal conductivity of the 3-D printed components to be a viable alternative in LED thermal management applications.

  8. Photogrammetry and Videogrammetry Methods Development for Solar Sail Structures. Masters Thesis awarded by George Washington Univ.

    NASA Technical Reports Server (NTRS)

    Pappa, Richard S. (Technical Monitor); Black, Jonathan T.

    2003-01-01

    This report discusses the development and application of metrology methods called photogrammetry and videogrammetry that make accurate measurements from photographs. These methods have been adapted for the static and dynamic characterization of gossamer structures, as four specific solar sail applications demonstrate. The applications prove that high-resolution, full-field, non-contact static measurements of solar sails using dot projection photogrammetry are possible as well as full-field, non-contact, dynamic characterization using dot projection videogrammetry. The accuracy of the measurement of the resonant frequencies and operating deflection shapes that were extracted surpassed expectations. While other non-contact measurement methods exist, they are not full-field and require significantly more time to take data.

  9. Application of thermal analysis techniques in activated carbon production

    USGS Publications Warehouse

    Donnals, G.L.; DeBarr, J.A.; Rostam-Abadi, M.; Lizzio, A.A.; Brady, T.A.

    1996-01-01

    Thermal analysis techniques have been used at the ISGS as an aid in the development and characterization of carbon adsorbents. Promising adsorbents from fly ash, tires, and Illinois coals have been produced for various applications. Process conditions determined in the preparation of gram quantities of carbons were used as guides in the preparation of larger samples. TG techniques developed to characterize the carbon adsorbents included the measurement of the kinetics of SO2 adsorption, the performance of rapid proximate analyses, and the determination of equilibrium methane adsorption capacities. Thermal regeneration of carbons was assessed by TG to predict the life cycle of carbon adsorbents in different applications. TPD was used to determine the nature of surface functional groups and their effect on a carbon's adsorption properties.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Osmanlioglu, Ahmet Erdal

    Pre-treatment of radioactive waste is the first step in waste management program that occurs after waste generation from various applications in Turkey. Pre-treatment and characterization practices are carried out in Radioactive Waste Management Unit (RWMU) at Cekmece Nuclear Research and Training Center (CNRTC) in Istanbul. This facility has been assigned to take all low-level radioactive wastes generated by nuclear applications in Turkey. The wastes are generated from research and nuclear applications mainly in medicine, biology, agriculture, quality control in metal processing and construction industries. These wastes are classified as low- level radioactive wastes. Pre-treatment practices cover several steps. In thismore » paper, main steps of pre-treatment and characterization are presented. Basically these are; collection, segregation, chemical adjustment, size reduction and decontamination operations. (author)« less

  11. 25th anniversary article: semiconductor nanowires--synthesis, characterization, and applications.

    PubMed

    Dasgupta, Neil P; Sun, Jianwei; Liu, Chong; Brittman, Sarah; Andrews, Sean C; Lim, Jongwoo; Gao, Hanwei; Yan, Ruoxue; Yang, Peidong

    2014-04-09

    Semiconductor nanowires (NWs) have been studied extensively for over two decades for their novel electronic, photonic, thermal, electrochemical and mechanical properties. This comprehensive review article summarizes major advances in the synthesis, characterization, and application of these materials in the past decade. Developments in the understanding of the fundamental principles of "bottom-up" growth mechanisms are presented, with an emphasis on rational control of the morphology, stoichiometry, and crystal structure of the materials. This is followed by a discussion of the application of nanowires in i) electronic, ii) sensor, iii) photonic, iv) thermoelectric, v) photovoltaic, vi) photoelectrochemical, vii) battery, viii) mechanical, and ix) biological applications. Throughout the discussion, a detailed explanation of the unique properties associated with the one-dimensional nanowire geometry will be presented, and the benefits of these properties for the various applications will be highlighted. The review concludes with a brief perspective on future research directions, and remaining barriers which must be overcome for the successful commercial application of these technologies. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Fabrication and characterization of thick-film piezoelectric lead zirconate titanate ceramic resonators by tape-casting.

    PubMed

    Qin, Lifeng; Sun, Yingying; Wang, Qing-Ming; Zhong, Youliang; Ou, Ming; Jiang, Zhishui; Tian, Wei

    2012-12-01

    In this paper, thick-film piezoelectric lead zirconate titanate (PZT) ceramic resonators with thicknesses down to tens of micrometers have been fabricated by tape-casting processing. PZT ceramic resonators with composition near the morphotropic phase boundary and with different dopants added were prepared for piezoelectric transducer applications. Material property characterization for these thick-film PZT resonators is essential for device design and applications. For the property characterization, a recently developed normalized electrical impedance spectrum method was used to determine the electromechanical coefficient and the complex piezoelectric, elastic, and dielectric coefficients from the electrical measurement of resonators using thick films. In this work, nine PZT thick-film resonators have been fabricated and characterized, and two different types of resonators, namely thickness longitudinal and transverse modes, were used for material property characterization. The results were compared with those determined by the IEEE standard method, and they agreed well. It was found that depending on the PZT formulation and dopants, the relative permittivities ε(T)(33)/ε(0) measured at 2 kHz for these thick-films are in the range of 1527 to 4829, piezoelectric stress constants (e(33) in the range of 15 to 26 C/m(2), piezoelectric strain constants (d(31)) in the range of -169 × 10(-12) C/N to -314 × 10(-12) C/N, electromechanical coupling coefficients (k(t)) in the range of 0.48 to 0.53, and k(31) in the range of 0.35 to 0.38. The characterization results shows tape-casting processing can be used to fabricate high-quality PZT thick-film resonators, and the extracted material constants can be used to for device design and application.

  13. IMPROVING BIOGENIC EMISSION ESTIMATES WITH SATELLITE IMAGERY

    EPA Science Inventory

    This presentation will review how existing and future applications of satellite imagery can improve the accuracy of biogenic emission estimates. Existing applications of satellite imagery to biogenic emission estimates have focused on characterizing land cover. Vegetation dat...

  14. Screen-Printing Fabrication and Characterization of Stretchable Electronics

    PubMed Central

    Suikkola, Jari; Björninen, Toni; Mosallaei, Mahmoud; Kankkunen, Timo; Iso-Ketola, Pekka; Ukkonen, Leena; Vanhala, Jukka; Mäntysalo, Matti

    2016-01-01

    This article focuses on the fabrication and characterization of stretchable interconnects for wearable electronics applications. Interconnects were screen-printed with a stretchable silver-polymer composite ink on 50-μm thick thermoplastic polyurethane. The initial sheet resistances of the manufactured interconnects were an average of 36.2 mΩ/◽, and half the manufactured samples withstood single strains of up to 74%. The strain proportionality of resistance is discussed, and a regression model is introduced. Cycling strain increased resistance. However, the resistances here were almost fully reversible, and this recovery was time-dependent. Normalized resistances to 10%, 15%, and 20% cyclic strains stabilized at 1.3, 1.4, and 1.7. We also tested the validity of our model for radio-frequency applications through characterization of a stretchable radio-frequency identification tag. PMID:27173424

  15. Expert judgement and uncertainty quantification for climate change

    NASA Astrophysics Data System (ADS)

    Oppenheimer, Michael; Little, Christopher M.; Cooke, Roger M.

    2016-05-01

    Expert judgement is an unavoidable element of the process-based numerical models used for climate change projections, and the statistical approaches used to characterize uncertainty across model ensembles. Here, we highlight the need for formalized approaches to unifying numerical modelling with expert judgement in order to facilitate characterization of uncertainty in a reproducible, consistent and transparent fashion. As an example, we use probabilistic inversion, a well-established technique used in many other applications outside of climate change, to fuse two recent analyses of twenty-first century Antarctic ice loss. Probabilistic inversion is but one of many possible approaches to formalizing the role of expert judgement, and the Antarctic ice sheet is only one possible climate-related application. We recommend indicators or signposts that characterize successful science-based uncertainty quantification.

  16. Use of model calibration to achieve high accuracy in analysis of computer networks

    DOEpatents

    Frogner, Bjorn; Guarro, Sergio; Scharf, Guy

    2004-05-11

    A system and method are provided for creating a network performance prediction model, and calibrating the prediction model, through application of network load statistical analyses. The method includes characterizing the measured load on the network, which may include background load data obtained over time, and may further include directed load data representative of a transaction-level event. Probabilistic representations of load data are derived to characterize the statistical persistence of the network performance variability and to determine delays throughout the network. The probabilistic representations are applied to the network performance prediction model to adapt the model for accurate prediction of network performance. Certain embodiments of the method and system may be used for analysis of the performance of a distributed application characterized as data packet streams.

  17. Needs assessment for nondestructive testing and materials characterization for improved reliability in structural ceramics for heat engines

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, D.R.; McClung, R.W.; Janney, M.A.

    1987-08-01

    A needs assessment was performed for nondestructive testing and materials characterization to achieve improved reliability in ceramic materials for heat engine applications. Raw materials, green state bodies, and sintered ceramics were considered. The overall approach taken to improve reliability of structural ceramics requires key inspections throughout the fabrication flowsheet, including raw materials, greed state, and dense parts. The applications of nondestructive inspection and characterization techniques to ceramic powders and other raw materials, green ceramics, and sintered ceramics are discussed. The current state of inspection technology is reviewed for all identified attributes and stages of a generalized flowsheet for advanced structuralmore » ceramics, and research and development requirements are identified and listed in priority order. 164 refs., 3 figs.« less

  18. Tamoxifen citrate loaded ethosomes for transdermal drug delivery system: preparation and characterization.

    PubMed

    Sarwa, Khomendra Kumar; Suresh, Preeti K; Debnath, Manabendra; Ahmad, Mohammad Zaki

    2013-08-01

    Long term tamoxifen citrate therapy is imperative to treat several dermatological and hormonal sensitive disorders. Successful oral and parenteral administration of tamoxifen citrate has been challenging since it undergoes enzymatic degradation and has poor aqueous solubility issues. In the present work, tamoxifen citrate loaded ethosomes were prepared and characterized for transdermal applications. The prepared formulations were characterized for morphological features, particle size distribution, calorimetric attributes, zeta potential and drug entrapment. Permeation profile of prepared ethosomes was compared with liposomes and hydroethonalic solution across cellophane membrane and human cadaver skin. Results of the permeation studies indicate that ethosomes were able to deliver >90% drug within 24 hours of application, while liposomes and hydroethanolic solution delivered only 39.04% and 36.55% respectively. Skin deposition and stability studies are also reported.

  19. Electrochemical capacitors: mechanism, materials, systems, characterization and applications.

    PubMed

    Wang, Yonggang; Song, Yanfang; Xia, Yongyao

    2016-10-24

    Electrochemical capacitors (i.e. supercapacitors) include electrochemical double-layer capacitors that depend on the charge storage of ion adsorption and pseudo-capacitors that are based on charge storage involving fast surface redox reactions. The energy storage capacities of supercapacitors are several orders of magnitude higher than those of conventional dielectric capacitors, but are much lower than those of secondary batteries. They typically have high power density, long cyclic stability and high safety, and thus can be considered as an alternative or complement to rechargeable batteries in applications that require high power delivery or fast energy harvesting. This article reviews the latest progress in supercapacitors in charge storage mechanisms, electrode materials, electrolyte materials, systems, characterization methods, and applications. In particular, the newly developed charge storage mechanism for intercalative pseudocapacitive behaviour, which bridges the gap between battery behaviour and conventional pseudocapacitive behaviour, is also clarified for comparison. Finally, the prospects and challenges associated with supercapacitors in practical applications are also discussed.

  20. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations.

    PubMed

    Dwivedi, Bhakti; Kowalski, Jeanne

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.

  1. shinyGISPA: A web application for characterizing phenotype by gene sets using multiple omics data combinations

    PubMed Central

    Dwivedi, Bhakti

    2018-01-01

    While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010

  2. Novel solid state lasers for Lidar applications at 2 μm

    NASA Astrophysics Data System (ADS)

    Della Valle, G.; Galzerano, G.; Toncelli, A.; Tonelli, M.; Laporta, P.

    2005-09-01

    A review on the results achieved by our group in the development of novel solid-state lasers for Lidar applications at 2 μm is presented. These lasers, based on fluoride crystals (YLF4, BaY2F8, and KYF4) doped with Tm and Ho ions, are characterized by high-efficiency and wide wavelength tunability around 2 μm. Single crystals of LiYF4, BaY2F8, and KYF4 codoped with the same Tm3+ and Ho3+ concentrations were successfully grown by the Czochralski method. The full spectroscopic characterization of the different laser crystals and the comparison between the laser performance are presented. Continuous wave operation was efficiently demonstrated by means of a CW diode-pumping. These oscillators find interesting applications in the field of remote sensing (Lidar and Dial systems) as well as in high-resolution molecular spectroscopy, frequency metrology, and biomedical applications.

  3. A characterization of workflow management systems for extreme-scale applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia

    We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less

  4. A characterization of workflow management systems for extreme-scale applications

    DOE PAGES

    Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...

    2017-02-16

    We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less

  5. Innovations in microelectronics and solid state at North Carolina Agricultural and Technical State University

    NASA Technical Reports Server (NTRS)

    Williams, L., Jr.

    1977-01-01

    Research in the following areas is described: (1) Characterization and applications of metallic oxide devices; (2) Electronic properties and energy conversion in organic amorphous semiconductors; (3) Material growth and characterization directed toward improving 3-5 heterojunction solar cells.

  6. Characterizing slow chemical exchange in nucleic acids by carbon CEST and low spin-lock field R(1ρ) NMR spectroscopy.

    PubMed

    Zhao, Bo; Hansen, Alexandar L; Zhang, Qi

    2014-01-08

    Quantitative characterization of dynamic exchange between various conformational states provides essential insights into the molecular basis of many regulatory RNA functions. Here, we present an application of nucleic-acid-optimized carbon chemical exchange saturation transfer (CEST) and low spin-lock field R(1ρ) relaxation dispersion (RD) NMR experiments in characterizing slow chemical exchange in nucleic acids that is otherwise difficult if not impossible to be quantified by the ZZ-exchange NMR experiment. We demonstrated the application on a 47-nucleotide fluoride riboswitch in the ligand-free state, for which CEST and R(1ρ) RD profiles of base and sugar carbons revealed slow exchange dynamics involving a sparsely populated (p ~ 10%) and shortly lived (τ ~ 10 ms) NMR "invisible" state. The utility of CEST and low spin-lock field R(1ρ) RD experiments in studying slow exchange was further validated in characterizing an exchange as slow as ~60 s(-1).

  7. Characterizing Slow Chemical Exchange in Nucleic Acids by Carbon CEST and Low Spin-Lock Field R1ρ NMR Spectroscopy

    PubMed Central

    Zhao, Bo; Hansen, Alexandar L.; Zhang, Qi

    2016-01-01

    Quantitative characterization of dynamic exchange between various conformational states provides essential insights into the molecular basis of many regulatory RNA functions. Here, we present an application of nucleic-acid-optimized carbon chemical exchange saturation transfer (CEST) and low spin-lock field R1ρ relaxation dispersion (RD) NMR experiments in characterizing slow chemical exchange in nucleic acids that is otherwise difficult if not impossible to be quantified by the ZZ-exchange NMR experiment. We demonstrated the application on a 47-nucleotide fluoride riboswitch in the ligand-free state, for which CEST and R1ρ RD profiles of base and sugar carbons revealed slow exchange dynamics involving a sparsely populated (p ~ 10%) and shortly lived (τ ~ 10 ms) NMR “invisible” state. The utility of CEST and low spin-lock field R1ρ RD experiments in studying slow exchange was further validated in characterizing an exchange as slow as ~60 s−1. PMID:24299272

  8. State-of-the-art characterization techniques for advanced lithium-ion batteries

    NASA Astrophysics Data System (ADS)

    Lu, Jun; Wu, Tianpin; Amine, Khalil

    2017-03-01

    To meet future needs for industries from personal devices to automobiles, state-of-the-art rechargeable lithium-ion batteries will require both improved durability and lowered costs. To enhance battery performance and lifetime, understanding electrode degradation mechanisms is of critical importance. Various advanced in situ and operando characterization tools developed during the past few years have proven indispensable for optimizing battery materials, understanding cell degradation mechanisms, and ultimately improving the overall battery performance. Here we review recent progress in the development and application of advanced characterization techniques such as in situ transmission electron microscopy for high-performance lithium-ion batteries. Using three representative electrode systems—layered metal oxides, Li-rich layered oxides and Si-based or Sn-based alloys—we discuss how these tools help researchers understand the battery process and design better battery systems. We also summarize the application of the characterization techniques to lithium-sulfur and lithium-air batteries and highlight the importance of those techniques in the development of next-generation batteries.

  9. Synthesis and characterization of dihexyldithiocarbamate as a chelating agent in extraction of gold(III)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fatimah, Soja Siti, E-mail: soja-sf@upi.edu; Department of Chemistry, Faculty of Mathematics and Natural Sciences, Padjadjaran University, Jl. Raya Bandung-Sumedang, Km. 21, Jatinangor; Bahti, Husein H.

    2016-02-08

    The use of dialkyldithiocarbamates as chelating agents of transition metals have been developing for decades. Many chelating agents have been synthesized and used in the extraction of the metals. Studies on particular aspects of extraction of the metals, such as the effect of increasing hydrophobicity of chelating agents on the effectiveness of the extraction, have been done. However, despite the many studies on the synthesis and applications of this type of chelating agents, interests in the aspect of molecular structure of the synthesized ligands and of their complexes, have been limited. This study aimed at synthesizing and characterizing dihexylthiocarbamate, andmore » using the ligand for the extraction of gold III). Characterization of the ligand and of its metal complex were done by using elemental analysis, DTG, and spectroscopic methods to include NMR, ({sup 1}H, and {sup 13}C), FTIR, and MS-ESI. Data on the synthesis, characterization, and the application of the ligand as a chelating agent are presented.« less

  10. Synthesis and characterization of dihexyldithiocarbamate as a chelating agent in extraction of gold(III)

    NASA Astrophysics Data System (ADS)

    Fatimah, Soja Siti; Bahti, Husein H.; Hastiawan, Iwan; Permanasari, Anna

    2016-02-01

    The use of dialkyldithiocarbamates as chelating agents of transition metals have been developing for decades. Many chelating agents have been synthesized and used in the extraction of the metals. Studies on particular aspects of extraction of the metals, such as the effect of increasing hydrophobicity of chelating agents on the effectiveness of the extraction, have been done. However, despite the many studies on the synthesis and applications of this type of chelating agents, interests in the aspect of molecular structure of the synthesized ligands and of their complexes, have been limited. This study aimed at synthesizing and characterizing dihexylthiocarbamate, and using the ligand for the extraction of gold III). Characterization of the ligand and of its metal complex were done by using elemental analysis, DTG, and spectroscopic methods to include NMR, (1H, and 13C), FTIR, and MS-ESI. Data on the synthesis, characterization, and the application of the ligand as a chelating agent are presented.

  11. Analysis of acoustic emission signals and monitoring of machining processes

    PubMed

    Govekar; Gradisek; Grabec

    2000-03-01

    Monitoring of a machining process on the basis of sensor signals requires a selection of informative inputs in order to reliably characterize and model the process. In this article, a system for selection of informative characteristics from signals of multiple sensors is presented. For signal analysis, methods of spectral analysis and methods of nonlinear time series analysis are used. With the aim of modeling relationships between signal characteristics and the corresponding process state, an adaptive empirical modeler is applied. The application of the system is demonstrated by characterization of different parameters defining the states of a turning machining process, such as: chip form, tool wear, and onset of chatter vibration. The results show that, in spite of the complexity of the turning process, the state of the process can be well characterized by just a few proper characteristics extracted from a representative sensor signal. The process characterization can be further improved by joining characteristics from multiple sensors and by application of chaotic characteristics.

  12. The uniformity study of non-oxide thin film at device level using electron energy loss spectroscopy

    NASA Astrophysics Data System (ADS)

    Li, Zhi-Peng; Zheng, Yuankai; Li, Shaoping; Wang, Haifeng

    2018-05-01

    Electron energy loss spectroscopy (EELS) has been widely used as a chemical analysis technique to characterize materials chemical properties, such as element valence states, atoms/ions bonding environment. This study provides a new method to characterize physical properties (i.e., film uniformity, grain orientations) of non-oxide thin films in the magnetic device by using EELS microanalysis on scanning transmission electron microscope. This method is based on analyzing white line ratio of spectra and related extended energy loss fine structures so as to correlate it with thin film uniformity. This new approach can provide an effective and sensitive method to monitor/characterize thin film quality (i.e., uniformity) at atomic level for thin film development, which is especially useful for examining ultra-thin films (i.e., several nanometers) or embedded films in devices for industry applications. More importantly, this technique enables development of quantitative characterization of thin film uniformity and it would be a remarkably useful technique for examining various types of devices for industrial applications.

  13. Extensional rheometry with a handheld mobile device

    NASA Astrophysics Data System (ADS)

    Marshall, Kristin A.; Liedtke, Aleesha M.; Todt, Anika H.; Walker, Travis W.

    2017-06-01

    The on-site characterization of complex fluids is important for a number of academic and industrial applications. Consequently, a need exists to develop portable rheometers that can provide in the field diagnostics and serve as tools for rapid quality assurance. With the advancement of smartphone technology and the widespread global ownership of smart devices, mobile applications are attractive as platforms for rheological characterization. The present work investigates the use of a smartphone device for the extensional characterization of a series of Boger fluids composed of glycerol/water and poly(ethylene oxide), taking advantage of the increasing high-speed video capabilities (currently up to 240 Hz capture rate at 720p) of smartphone cameras. We report a noticeable difference in the characterization of samples with slight variations in polymer concentration and discuss current device limitations. Potential benefits of a handheld extensional rheometer include its use as a point-of-care diagnostic tool, especially in developing communities, as well as a simple and inexpensive tool for assessing product quality in industry.

  14. State of Fire Behavior Models and their Application to Ecosystem and Smoke Management Issues: Special Session Summary Report

    DTIC Science & Technology

    2013-10-24

    advance fire science: (1) fire behavior, (2) ecological effects of fire, (3) carbon accounting , (4) emissions characterization, and (5) fire plume...relates to smoke management. 3) Carbon accounting in forest management and prescribed fire programs (including tradeoffs such as prescribed burning...carbon accounting , 4) emissions characterization and 5) fire plume dispersion. 1) Fire behavior. Better characterization of wildland fire behavior is

  15. Ultrasound Imaging Techniques for Spatiotemporal Characterization of Composition, Microstructure, and Mechanical Properties in Tissue Engineering.

    PubMed

    Deng, Cheri X; Hong, Xiaowei; Stegemann, Jan P

    2016-08-01

    Ultrasound techniques are increasingly being used to quantitatively characterize both native and engineered tissues. This review provides an overview and selected examples of the main techniques used in these applications. Grayscale imaging has been used to characterize extracellular matrix deposition, and quantitative ultrasound imaging based on the integrated backscatter coefficient has been applied to estimating cell concentrations and matrix morphology in tissue engineering. Spectral analysis has been employed to characterize the concentration and spatial distribution of mineral particles in a construct, as well as to monitor mineral deposition by cells over time. Ultrasound techniques have also been used to measure the mechanical properties of native and engineered tissues. Conventional ultrasound elasticity imaging and acoustic radiation force imaging have been applied to detect regions of altered stiffness within tissues. Sonorheometry and monitoring of steady-state excitation and recovery have been used to characterize viscoelastic properties of tissue using a single transducer to both deform and image the sample. Dual-mode ultrasound elastography uses separate ultrasound transducers to produce a more potent deformation force to microscale characterization of viscoelasticity of hydrogel constructs. These ultrasound-based techniques have high potential to impact the field of tissue engineering as they are further developed and their range of applications expands.

  16. Dynamic characterization of AFM probes by laser Doppler vibrometry and stroboscopic holographic methodologies

    NASA Astrophysics Data System (ADS)

    Kuppers, J. D.; Gouverneur, I. M.; Rodgers, M. T.; Wenger, J.; Furlong, C.

    2006-08-01

    In atomic probe microscopy, micro-probes of various sizes, geometries, and materials are used to define the interface between the samples under investigation and the measuring detectors and instrumentation. Therefore, measuring resolution in atomic probe microscopy is highly dependent on the transfer function characterizing the micro-probes used. In this paper, characterization of the dynamic transfer function of specific micro-cantilever probes used in an Atomic Force Microscope (AFM) operating in the tapping mode is presented. Characterization is based on the combined application of laser Doppler vibrometry (LDV) and real-time stroboscopic optoelectronic holographic microscopy (OEHM) methodologies. LDV is used for the rapid measurement of the frequency response of the probes due to an excitation function containing multiple frequency components. Data obtained from the measured frequency response is used to identify the principal harmonics. In order to identify mode shapes corresponding to the harmonics, full-field of view OEHM is applied. This is accomplished by measurements of motion at various points on the excitation curve surrounding the identified harmonics. It is shown that the combined application of LDV and OEHM enables the high-resolution characterization of mode shapes of vibration, damping characteristics, as well as transient response of the micro-cantilever probes. Such characterization is necessary in high-resolution AFM measurements.

  17. Application of flow field-flow fractionation for the characterization of macromolecules of biological interest: a review

    PubMed Central

    Qureshi, Rashid Nazir

    2010-01-01

    An overview is given of the recent literature on (bio) analytical applications of flow field-flow fractionation (FlFFF). FlFFF is a liquid-phase separation technique that can separate macromolecules and particles according to size. The technique is increasingly used on a routine basis in a variety of application fields. In food analysis, FlFFF is applied to determine the molecular size distribution of starches and modified celluloses, or to study protein aggregation during food processing. In industrial analysis, it is applied for the characterization of polysaccharides that are used as thickeners and dispersing agents. In pharmaceutical and biomedical laboratories, FlFFF is used to monitor the refolding of recombinant proteins, to detect aggregates of antibodies, or to determine the size distribution of drug carrier particles. In environmental studies, FlFFF is used to characterize natural colloids in water streams, and especially to study trace metal distributions over colloidal particles. In this review, first a short discussion of the state of the art in instrumentation is given. Developments in the coupling of FlFFF to various detection modes are then highlighted. Finally, application studies are discussed and ordered according to the type of (bio) macromolecules or bioparticles that are fractionated. PMID:20957473

  18. 10 CFR 60.151 - Applicability.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 10 Energy 2 2011-01-01 2011-01-01 false Applicability. 60.151 Section 60.151 Energy NUCLEAR REGULATORY COMMISSION (CONTINUED) DISPOSAL OF HIGH-LEVEL RADIOACTIVE WASTES IN GEOLOGIC REPOSITORIES Quality... to activities related thereto. These activities include: site characterization, facility and...

  19. 10 CFR 60.151 - Applicability.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 10 Energy 2 2010-01-01 2010-01-01 false Applicability. 60.151 Section 60.151 Energy NUCLEAR REGULATORY COMMISSION (CONTINUED) DISPOSAL OF HIGH-LEVEL RADIOACTIVE WASTES IN GEOLOGIC REPOSITORIES Quality... to activities related thereto. These activities include: site characterization, facility and...

  20. Characterization of Carbon Nano-Onions for Heavy Metal Ion Remediation

    EPA Science Inventory

    Carbonaceous nanomaterials, such as fullerene C60, carbon nanotubes, and their functionalized derivatives have been demonstrated to possess high sorption capacity for organic and heavy metal contaminants, indicating a potential for remediation application. The actual application ...

Top