NASA Astrophysics Data System (ADS)
Myre, Joseph M.
Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
A survey of CPU-GPU heterogeneous computing techniques
Mittal, Sparsh; Vetter, Jeffrey S.
2015-07-04
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
A survey of CPU-GPU heterogeneous computing techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S.
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
Methodologies and systems for heterogeneous concurrent computing
NASA Technical Reports Server (NTRS)
Sunderam, V. S.
1994-01-01
Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Heterogeneous Distributed Computing for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Sunderam, Vaidy S.
1998-01-01
The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Heterogeneous concurrent computing with exportable services
NASA Technical Reports Server (NTRS)
Sunderam, Vaidy
1995-01-01
Heterogeneous concurrent computing, based on the traditional process-oriented model, is approaching its functionality and performance limits. An alternative paradigm, based on the concept of services, supporting data driven computation, and built on a lightweight process infrastructure, is proposed to enhance the functional capabilities and the operational efficiency of heterogeneous network-based concurrent computing. TPVM is an experimental prototype system supporting exportable services, thread-based computation, and remote memory operations that is built as an extension of and an enhancement to the PVM concurrent computing system. TPVM offers a significantly different computing paradigm for network-based computing, while maintaining a close resemblance to the conventional PVM model in the interest of compatibility and ease of transition Preliminary experiences have demonstrated that the TPVM framework presents a natural yet powerful concurrent programming interface, while being capable of delivering performance improvements of upto thirty percent.
State-of-the-art in Heterogeneous Computing
Brodtkorb, Andre R.; Dyken, Christopher; Hagen, Trond R.; ...
2010-01-01
Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, availablemore » software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.« less
Scout: high-performance heterogeneous computing made simple
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jablin, James; Mc Cormick, Patrick; Herlihy, Maurice
2011-01-26
Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of performance-oriented programs do not abstract the software development from the science. Furthermore, the problem is becoming increasingly complex and pressing with the continued development of many-core and heterogeneous (CPU-GPU) architectures. To acbieve high performance, scientists must expertly navigate both software and hardware. Co-design between computer scientists and research scientists can alleviate but not solve this problem. The science community requires better tools for developing, optimizing, and future-proofing codes, allowing scientists to focusmore » on their research while still achieving high computational performance. Scout is a parallel programming language and extensible compiler framework targeting heterogeneous architectures. It provides the abstraction required to buffer scientists from the constantly-shifting details of hardware while still realizing higb-performance by encapsulating software and hardware optimization within a compiler framework.« less
System for Performing Single Query Searches of Heterogeneous and Dispersed Databases
NASA Technical Reports Server (NTRS)
Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)
2017-01-01
The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
FAST: framework for heterogeneous medical image computing and visualization.
Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank
2015-11-01
Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
High-performance computing on GPUs for resistivity logging of oil and gas wells
NASA Astrophysics Data System (ADS)
Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.
2017-10-01
We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
Asynchronous Replica Exchange Software for Grid and Heterogeneous Computing.
Gallicchio, Emilio; Xia, Junchao; Flynn, William F; Zhang, Baofeng; Samlalsingh, Sade; Mentes, Ahmet; Levy, Ronald M
2015-11-01
Parallel replica exchange sampling is an extended ensemble technique often used to accelerate the exploration of the conformational ensemble of atomistic molecular simulations of chemical systems. Inter-process communication and coordination requirements have historically discouraged the deployment of replica exchange on distributed and heterogeneous resources. Here we describe the architecture of a software (named ASyncRE) for performing asynchronous replica exchange molecular simulations on volunteered computing grids and heterogeneous high performance clusters. The asynchronous replica exchange algorithm on which the software is based avoids centralized synchronization steps and the need for direct communication between remote processes. It allows molecular dynamics threads to progress at different rates and enables parameter exchanges among arbitrary sets of replicas independently from other replicas. ASyncRE is written in Python following a modular design conducive to extensions to various replica exchange schemes and molecular dynamics engines. Applications of the software for the modeling of association equilibria of supramolecular and macromolecular complexes on BOINC campus computational grids and on the CPU/MIC heterogeneous hardware of the XSEDE Stampede supercomputer are illustrated. They show the ability of ASyncRE to utilize large grids of desktop computers running the Windows, MacOS, and/or Linux operating systems as well as collections of high performance heterogeneous hardware devices.
NASA Astrophysics Data System (ADS)
Pruhs, Kirk
A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
Scattering Properties of Heterogeneous Mineral Particles with Absorbing Inclusions
NASA Technical Reports Server (NTRS)
Dlugach, Janna M.; Mishchenko, Michael I.
2015-01-01
We analyze the results of numerically exact computer modeling of scattering and absorption properties of randomly oriented poly-disperse heterogeneous particles obtained by placing microscopic absorbing grains randomly on the surfaces of much larger spherical mineral hosts or by imbedding them randomly inside the hosts. These computations are paralleled by those for heterogeneous particles obtained by fully encapsulating fractal-like absorbing clusters in the mineral hosts. All computations are performed using the superposition T-matrix method. In the case of randomly distributed inclusions, the results are compared with the outcome of Lorenz-Mie computations for an external mixture of the mineral hosts and absorbing grains. We conclude that internal aggregation can affect strongly both the integral radiometric and differential scattering characteristics of the heterogeneous particle mixtures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkata, Manjunath Gorentla; Aderholdt, William F
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. along with hierarchical-heterogeneous memory, the system typically has a high-performing network ad a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also for running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecturemore » supports the convergence of the Big-Compute and Big-Data, the programming models and software layer have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. A programming abstraction to address this problem. The programming abstraction is implemented as a software library and runs on pre-exascale and exascale systems supporting current and emerging system architecture. Using distributed data-structures as a central concept, it provides (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications.« less
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...
2015-05-22
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
NASA Astrophysics Data System (ADS)
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke
2018-01-01
Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.
Sabne, Amit J.; Sakdhnagool, Putt; Lee, Seyong; ...
2015-07-13
Accelerator-based heterogeneous computing is gaining momentum in the high-performance computing arena. However, the increased complexity of heterogeneous architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle this problem. Although the abstraction provided by OpenACC offers productivity, it raises questions concerning both functional and performance portability. In this article, the authors propose HeteroIR, a high-level, architecture-independent intermediate representation, to map high-level programming models, such as OpenACC, to heterogeneous architectures. They present a compiler approach that translates OpenACC programs into HeteroIR and accelerator kernels to obtain OpenACC functional portability. They then evaluate the performance portability obtained bymore » OpenACC with their approach on 12 OpenACC programs on Nvidia CUDA, AMD GCN, and Intel Xeon Phi architectures. They study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.« less
Random sphere packing model of heterogeneous propellants
NASA Astrophysics Data System (ADS)
Kochevets, Sergei Victorovich
It is well recognized that combustion of heterogeneous propellants is strongly dependent on the propellant morphology. Recent developments in computing systems make it possible to start three-dimensional modeling of heterogeneous propellant combustion. A key component of such large scale computations is a realistic model of industrial propellants which retains the true morphology---a goal never achieved before. The research presented develops the Random Sphere Packing Model of heterogeneous propellants and generates numerical samples of actual industrial propellants. This is done by developing a sphere packing algorithm which randomly packs a large number of spheres with a polydisperse size distribution within a rectangular domain. First, the packing code is developed, optimized for performance, and parallelized using the OpenMP shared memory architecture. Second, the morphology and packing fraction of two simple cases of unimodal and bimodal packs are investigated computationally and analytically. It is shown that both the Loose Random Packing and Dense Random Packing limits are not well defined and the growth rate of the spheres is identified as the key parameter controlling the efficiency of the packing. For a properly chosen growth rate, computational results are found to be in excellent agreement with experimental data. Third, two strategies are developed to define numerical samples of polydisperse heterogeneous propellants: the Deterministic Strategy and the Random Selection Strategy. Using these strategies, numerical samples of industrial propellants are generated. The packing fraction is investigated and it is shown that the experimental values of the packing fraction can be achieved computationally. It is strongly believed that this Random Sphere Packing Model of propellants is a major step forward in the realistic computational modeling of heterogeneous propellant of combustion. In addition, a method of analysis of the morphology of heterogeneous propellants is developed which uses the concept of multi-point correlation functions. A set of intrinsic length scales of local density fluctuations in random heterogeneous propellants is identified by performing a Monte-Carlo study of the correlation functions. This method of analysis shows great promise for understanding the origins of the combustion instability of heterogeneous propellants, and is believed to become a valuable tool for the development of safe and reliable rocket engines.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
Song, Fengguang; Dongarra, Jack
2014-10-01
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Fengguang; Dongarra, Jack
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
HeNCE: A Heterogeneous Network Computing Environment
Beguelin, Adam; Dongarra, Jack J.; Geist, George Al; ...
1994-01-01
Network computing seeks to utilize the aggregate resources of many networked computers to solve a single problem. In so doing it is often possible to obtain supercomputer performance from an inexpensive local area network. The drawback is that network computing is complicated and error prone when done by hand, especially if the computers have different operating systems and data formats and are thus heterogeneous. The heterogeneous network computing environment (HeNCE) is an integrated graphical environment for creating and running parallel programs over a heterogeneous collection of computers. It is built on a lower level package called parallel virtual machine (PVM).more » The HeNCE philosophy of parallel programming is to have the programmer graphically specify the parallelism of a computation and to automate, as much as possible, the tasks of writing, compiling, executing, debugging, and tracing the network computation. Key to HeNCE is a graphical language based on directed graphs that describe the parallelism and data dependencies of an application. Nodes in the graphs represent conventional Fortran or C subroutines and the arcs represent data and control flow. This article describes the present state of HeNCE, its capabilities, limitations, and areas of future research.« less
RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices
Alzantot, Moustafa; Wang, Yingnan; Ren, Zhengshuang; Srivastava, Mani B.
2018-01-01
Mobile devices have become an essential part of our daily lives. By virtue of both their increasing computing power and the recent progress made in AI, mobile devices evolved to act as intelligent assistants in many tasks rather than a mere way of making phone calls. However, popular and commonly used tools and frameworks for machine intelligence are still lacking the ability to make proper use of the available heterogeneous computing resources on mobile devices. In this paper, we study the benefits of utilizing the heterogeneous (CPU and GPU) computing resources available on commodity android devices while running deep learning models. We leveraged the heterogeneous computing framework RenderScript to accelerate the execution of deep learning models on commodity Android devices. Our system is implemented as an extension to the popular open-source framework TensorFlow. By integrating our acceleration framework tightly into TensorFlow, machine learning engineers can now easily make benefit of the heterogeneous computing resources on mobile devices without the need of any extra tools. We evaluate our system on different android phones models to study the trade-offs of running different neural network operations on the GPU. We also compare the performance of running different models architectures such as convolutional and recurrent neural networks on CPU only vs using heterogeneous computing resources. Our result shows that although GPUs on the phones are capable of offering substantial performance gain in matrix multiplication on mobile devices. Therefore, models that involve multiplication of large matrices can run much faster (approx. 3 times faster in our experiments) due to GPU support. PMID:29629431
RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices.
Alzantot, Moustafa; Wang, Yingnan; Ren, Zhengshuang; Srivastava, Mani B
2017-06-01
Mobile devices have become an essential part of our daily lives. By virtue of both their increasing computing power and the recent progress made in AI, mobile devices evolved to act as intelligent assistants in many tasks rather than a mere way of making phone calls. However, popular and commonly used tools and frameworks for machine intelligence are still lacking the ability to make proper use of the available heterogeneous computing resources on mobile devices. In this paper, we study the benefits of utilizing the heterogeneous (CPU and GPU) computing resources available on commodity android devices while running deep learning models. We leveraged the heterogeneous computing framework RenderScript to accelerate the execution of deep learning models on commodity Android devices. Our system is implemented as an extension to the popular open-source framework TensorFlow. By integrating our acceleration framework tightly into TensorFlow, machine learning engineers can now easily make benefit of the heterogeneous computing resources on mobile devices without the need of any extra tools. We evaluate our system on different android phones models to study the trade-offs of running different neural network operations on the GPU. We also compare the performance of running different models architectures such as convolutional and recurrent neural networks on CPU only vs using heterogeneous computing resources. Our result shows that although GPUs on the phones are capable of offering substantial performance gain in matrix multiplication on mobile devices. Therefore, models that involve multiplication of large matrices can run much faster (approx. 3 times faster in our experiments) due to GPU support.
PHoToNs–A parallel heterogeneous and threads oriented code for cosmological N-body simulation
NASA Astrophysics Data System (ADS)
Wang, Qiao; Cao, Zong-Yan; Gao, Liang; Chi, Xue-Bin; Meng, Chen; Wang, Jie; Wang, Long
2018-06-01
We introduce a new code for cosmological simulations, PHoToNs, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer (HPC) systems and threads oriented programming. PHoToNs adopts a hybrid scheme to compute gravitational force, with the conventional Particle-Mesh (PM) algorithm to compute the long-range force, the Tree algorithm to compute the short range force and the direct summation Particle-Particle (PP) algorithm to compute gravity from very close particles. A self-similar space filling a Peano-Hilbert curve is used to decompose the computing domain. Threads programming is advantageously used to more flexibly manage the domain communication, PM calculation and synchronization, as well as Dual Tree Traversal on the CPU+MIC platform. PHoToNs scales well and efficiency of the PP kernel achieves 68.6% of peak performance on MIC and 74.4% on CPU platforms. We also test the accuracy of the code against the much used Gadget-2 in the community and found excellent agreement.
Job Scheduling in a Heterogeneous Grid Environment
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Smith, Warren; Oliker, Leonid; Biswas, Rupak
2004-01-01
Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. However, a number of major technical hurdles must be overcome before this potential can be realized. One problem that is critical to effective utilization of computational grids is the efficient scheduling of jobs. This work addresses this problem by describing and evaluating a grid scheduling architecture and three job migration algorithms. The architecture is scalable and does not assume control of local site resources. The job migration policies use the availability and performance of computer systems, the network bandwidth available between systems, and the volume of input and output data associated with each job. An extensive performance comparison is presented using real workloads from leading computational centers. The results, based on several key metrics, demonstrate that the performance of our distributed migration algorithms is significantly greater than that of a local scheduling framework and comparable to a non-scalable global scheduling approach.
NASA Astrophysics Data System (ADS)
Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin
2016-06-01
CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
Solving global shallow water equations on heterogeneous supercomputers
Fu, Haohuan; Gan, Lin; Yang, Chao; Xue, Wei; Wang, Lanning; Wang, Xinliang; Huang, Xiaomeng; Yang, Guangwen
2017-01-01
The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved. PMID:28282428
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network
NASA Astrophysics Data System (ADS)
Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan
2017-01-01
The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
Transformation of OODT CAS to Perform Larger Tasks
NASA Technical Reports Server (NTRS)
Mattmann, Chris; Freeborn, Dana; Crichton, Daniel; Hughes, John; Ramirez, Paul; Hardman, Sean; Woollard, David; Kelly, Sean
2008-01-01
A computer program denoted OODT CAS has been transformed to enable performance of larger tasks that involve greatly increased data volumes and increasingly intensive processing of data on heterogeneous, geographically dispersed computers. Prior to the transformation, OODT CAS (also alternatively denoted, simply, 'CAS') [wherein 'OODT' signifies 'Object-Oriented Data Technology' and 'CAS' signifies 'Catalog and Archive Service'] was a proven software component used to manage scientific data from spaceflight missions. In the transformation, CAS was split into two separate components representing its canonical capabilities: file management and workflow management. In addition, CAS was augmented by addition of a resource-management component. This third component enables CAS to manage heterogeneous computing by use of diverse resources, including high-performance clusters of computers, commodity computing hardware, and grid computing infrastructures. CAS is now more easily maintainable, evolvable, and reusable. These components can be used separately or, taking advantage of synergies, can be used together. Other elements of the transformation included addition of a separate Web presentation layer that supports distribution of data products via Really Simple Syndication (RSS) feeds, and provision for full Resource Description Framework (RDF) exports of metadata.
A uniform approach for programming distributed heterogeneous computing systems
Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio; Fahringer, Thomas
2014-01-01
Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater’s performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations. PMID:25844015
A uniform approach for programming distributed heterogeneous computing systems.
Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio; Fahringer, Thomas
2014-12-01
Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.
NASA Astrophysics Data System (ADS)
McClure, J. E.; Prins, J. F.; Miller, C. T.
2014-07-01
Multiphase flow implementations of the lattice Boltzmann method (LBM) are widely applied to the study of porous medium systems. In this work, we construct a new variant of the popular "color" LBM for two-phase flow in which a three-dimensional, 19-velocity (D3Q19) lattice is used to compute the momentum transport solution while a three-dimensional, seven velocity (D3Q7) lattice is used to compute the mass transport solution. Based on this formulation, we implement a novel heterogeneous GPU-accelerated algorithm in which the mass transport solution is computed by multiple shared memory CPU cores programmed using OpenMP while a concurrent solution of the momentum transport is performed using a GPU. The heterogeneous solution is demonstrated to provide speedup of 2.6 × as compared to multi-core CPU solution and 1.8 × compared to GPU solution due to concurrent utilization of both CPU and GPU bandwidths. Furthermore, we verify that the proposed formulation provides an accurate physical representation of multiphase flow processes and demonstrate that the approach can be applied to perform heterogeneous simulations of two-phase flow in porous media using a typical GPU-accelerated workstation.
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing
NASA Astrophysics Data System (ADS)
Shi, X.
2017-10-01
Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Performance predictors of brain-computer interfaces in patients with amyotrophic lateral sclerosis
NASA Astrophysics Data System (ADS)
Geronimo, A.; Simmons, Z.; Schiff, S. J.
2016-04-01
Objective. Patients with amyotrophic lateral sclerosis (ALS) may benefit from brain-computer interfaces (BCI), but the utility of such devices likely will have to account for the functional, cognitive, and behavioral heterogeneity of this neurodegenerative disorder. Approach. In this study, a heterogeneous group of patients with ALS participated in a study on BCI based on the P300 event related potential and motor-imagery. Results. The presence of cognitive impairment in these patients significantly reduced the quality of the control signals required to use these communication systems, subsequently impairing performance, regardless of progression of physical symptoms. Loss in performance among the cognitively impaired was accompanied by a decrease in the signal-to-noise ratio of task-relevant EEG band power. There was also evidence that behavioral dysfunction negatively affects P300 speller performance. Finally, older participants achieved better performance on the P300 system than the motor-imagery system, indicating a preference of BCI paradigm with age. Significance. These findings highlight the importance of considering the heterogeneity of disease when designing BCI augmentative and alternative communication devices for clinical applications.
Efficient Use of Distributed Systems for Scientific Applications
NASA Technical Reports Server (NTRS)
Taylor, Valerie; Chen, Jian; Canfield, Thomas; Richard, Jacques
2000-01-01
Distributed computing has been regarded as the future of high performance computing. Nationwide high speed networks such as vBNS are becoming widely available to interconnect high-speed computers, virtual environments, scientific instruments and large data sets. One of the major issues to be addressed with distributed systems is the development of computational tools that facilitate the efficient execution of parallel applications on such systems. These tools must exploit the heterogeneous resources (networks and compute nodes) in distributed systems. This paper presents a tool, called PART, which addresses this issue for mesh partitioning. PART takes advantage of the following heterogeneous system features: (1) processor speed; (2) number of processors; (3) local network performance; and (4) wide area network performance. Further, different finite element applications under consideration may have different computational complexities, different communication patterns, and different element types, which also must be taken into consideration when partitioning. PART uses parallel simulated annealing to partition the domain, taking into consideration network and processor heterogeneity. The results of using PART for an explicit finite element application executing on two IBM SPs (located at Argonne National Laboratory and the San Diego Supercomputer Center) indicate an increase in efficiency by up to 36% as compared to METIS, a widely used mesh partitioning tool. The input to METIS was modified to take into consideration heterogeneous processor performance; METIS does not take into consideration heterogeneous networks. The execution times for these applications were reduced by up to 30% as compared to METIS. These results are given in Figure 1 for four irregular meshes with number of elements ranging from 30,269 elements for the Barth5 mesh to 11,451 elements for the Barth4 mesh. Future work with PART entails using the tool with an integrated application requiring distributed systems. In particular this application, illustrated in the document entails an integration of finite element and fluid dynamic simulations to address the cooling of turbine blades of a gas turbine engine design. It is not uncommon to encounter high-temperature, film-cooled turbine airfoils with 1,000,000s of degrees of freedom. This results because of the complexity of the various components of the airfoils, requiring fine-grain meshing for accuracy. Additional information is contained in the original.
Effective correlator for RadioAstron project
NASA Astrophysics Data System (ADS)
Sergeev, Sergey
This paper presents the implementation of programme FX-correlator for Very Long Baseline Interferometry, adapted for the project "RadioAstron". Software correlator implemented for heterogeneous computing systems using graphics accelerators. It is shown that for the task interferometry implementation of the graphics hardware has a high efficiency. The host processor of heterogeneous computing system, performs the function of forming the data flow for graphics accelerators, the number of which corresponds to the number of frequency channels. So, for the Radioastron project, such channels is seven. Each accelerator is perform correlation matrix for all bases for a single frequency channel. Initial data is converted to the floating-point format, is correction for the corresponding delay function and computes the entire correlation matrix simultaneously. Calculation of the correlation matrix is performed using the sliding Fourier transform. Thus, thanks to the compliance of a solved problem for architecture graphics accelerators, managed to get a performance for one processor platform Kepler, which corresponds to the performance of this task, the computing cluster platforms Intel on four nodes. This task successfully scaled not only on a large number of graphics accelerators, but also on a large number of nodes with multiple accelerators.
Stone, John E; Hallock, Michael J; Phillips, James C; Peterson, Joseph R; Luthey-Schulten, Zaida; Schulten, Klaus
2016-05-01
Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.
Shibuta, Yasushi; Sakane, Shinji; Miyoshi, Eisuke; Okita, Shin; Takaki, Tomohiro; Ohno, Munekazu
2017-04-05
Can completely homogeneous nucleation occur? Large scale molecular dynamics simulations performed on a graphics-processing-unit rich supercomputer can shed light on this long-standing issue. Here, a billion-atom molecular dynamics simulation of homogeneous nucleation from an undercooled iron melt reveals that some satellite-like small grains surrounding previously formed large grains exist in the middle of the nucleation process, which are not distributed uniformly. At the same time, grains with a twin boundary are formed by heterogeneous nucleation from the surface of the previously formed grains. The local heterogeneity in the distribution of grains is caused by the local accumulation of the icosahedral structure in the undercooled melt near the previously formed grains. This insight is mainly attributable to the multi-graphics processing unit parallel computation combined with the rapid progress in high-performance computational environments.Nucleation is a fundamental physical process, however it is a long-standing issue whether completely homogeneous nucleation can occur. Here the authors reveal, via a billion-atom molecular dynamics simulation, that local heterogeneity exists during homogeneous nucleation in an undercooled iron melt.
Dual compile strategy for parallel heterogeneous execution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Tyler Barratt; Perry, James Thomas
2012-06-01
The purpose of the Dual Compile Strategy is to increase our trust in the Compute Engine during its execution of instructions. This is accomplished by introducing a heterogeneous Monitor Engine that checks the execution of the Compute Engine. This leads to the production of a second and custom set of instructions designed for monitoring the execution of the Compute Engine at runtime. This use of multiple engines differs from redundancy in that one engine is working on the application while the other engine is monitoring and checking in parallel instead of both applications (and engines) performing the same work atmore » the same time.« less
ERIC Educational Resources Information Center
da Silveira, Pedro Rodrigo Castro
2014-01-01
This thesis describes the development and deployment of a cyberinfrastructure for distributed high-throughput computations of materials properties at high pressures and/or temperatures--the Virtual Laboratory for Earth and Planetary Materials--VLab. VLab was developed to leverage the aggregated computational power of grid systems to solve…
NASA Astrophysics Data System (ADS)
Niedermeier, Dennis; Ervens, Barbara; Clauss, Tina; Voigtländer, Jens; Wex, Heike; Hartmann, Susan; Stratmann, Frank
2014-01-01
In a recent study, the Soccer ball model (SBM) was introduced for modeling and/or parameterizing heterogeneous ice nucleation processes. The model applies classical nucleation theory. It allows for a consistent description of both apparently singular and stochastic ice nucleation behavior, by distributing contact angles over the nucleation sites of a particle population assuming a Gaussian probability density function. The original SBM utilizes the Monte Carlo technique, which hampers its usage in atmospheric models, as fairly time-consuming calculations must be performed to obtain statistically significant results. Thus, we have developed a simplified and computationally more efficient version of the SBM. We successfully used the new SBM to parameterize experimental nucleation data of, e.g., bacterial ice nucleation. Both SBMs give identical results; however, the new model is computationally less expensive as confirmed by cloud parcel simulations. Therefore, it is a suitable tool for describing heterogeneous ice nucleation processes in atmospheric models.
H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs.
Ye, Weicai; Chen, Ying; Zhang, Yongdong; Xu, Yuesheng
2017-04-15
The sequence alignment is a fundamental problem in bioinformatics. BLAST is a routinely used tool for this purpose with over 118 000 citations in the past two decades. As the size of bio-sequence databases grows exponentially, the computational speed of alignment softwares must be improved. We develop the heterogeneous BLAST (H-BLAST), a fast parallel search tool for a heterogeneous computer that couples CPUs and GPUs, to accelerate BLASTX and BLASTP-basic tools of NCBI-BLAST. H-BLAST employs a locally decoupled seed-extension algorithm for better performance on GPUs, and offers a performance tuning mechanism for better efficiency among various CPUs and GPUs combinations. H-BLAST produces identical alignment results as NCBI-BLAST and its computational speed is much faster than that of NCBI-BLAST. Speedups achieved by H-BLAST over sequential NCBI-BLASTP (resp. NCBI-BLASTX) range mostly from 4 to 10 (resp. 5 to 7.2). With 2 CPU threads and 2 GPUs, H-BLAST can be faster than 16-threaded NCBI-BLASTX. Furthermore, H-BLAST is 1.5-4 times faster than GPU-BLAST. https://github.com/Yeyke/H-BLAST.git. yux06@syr.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Mosby, Matthew; Matouš, Karel
2015-12-01
Three-dimensional simulations capable of resolving the large range of spatial scales, from the failure-zone thickness up to the size of the representative unit cell, in damage mechanics problems of particle reinforced adhesives are presented. We show that resolving this wide range of scales in complex three-dimensional heterogeneous morphologies is essential in order to apprehend fracture characteristics, such as strength, fracture toughness and shape of the softening profile. Moreover, we show that computations that resolve essential physical length scales capture the particle size-effect in fracture toughness, for example. In the vein of image-based computational materials science, we construct statistically optimal unit cells containing hundreds to thousands of particles. We show that these statistically representative unit cells are capable of capturing the first- and second-order probability functions of a given data-source with better accuracy than traditional inclusion packing techniques. In order to accomplish these large computations, we use a parallel multiscale cohesive formulation and extend it to finite strains including damage mechanics. The high-performance parallel computational framework is executed on up to 1024 processing cores. A mesh convergence and a representative unit cell study are performed. Quantifying the complex damage patterns in simulations consisting of tens of millions of computational cells and millions of highly nonlinear equations requires data-mining the parallel simulations, and we propose two damage metrics to quantify the damage patterns. A detailed study of volume fraction and filler size on the macroscopic traction-separation response of heterogeneous adhesives is presented.
Modeling Endovascular Coils as Heterogeneous Porous Media
NASA Astrophysics Data System (ADS)
Yadollahi Farsani, H.; Herrmann, M.; Chong, B.; Frakes, D.
2016-12-01
Minimally invasive surgeries are the stat-of-the-art treatments for many pathologies. Treating brain aneurysms is no exception; invasive neurovascular clipping is no longer the only option and endovascular coiling has introduced itself as the most common treatment. Coiling isolates the aneurysm from blood circulation by promoting thrombosis within the aneurysm. One approach to studying intra-aneurysmal hemodynamics consists of virtually deploying finite element coil models and then performing computational fluid dynamics. However, this approach is often computationally expensive and requires extensive resources to perform. The porous medium approach has been considered as an alternative to the conventional coil modeling approach because it lessens the complexities of computational fluid dynamics simulations by reducing the number of mesh elements needed to discretize the domain. There have been a limited number of attempts at treating the endovascular coils as homogeneous porous media. However, the heterogeneity associated with coil configurations requires a more accurately defined porous medium in which the porosity and permeability change throughout the domain. We implemented this approach by introducing a lattice of sample volumes and utilizing techniques available in the field of interactive computer graphics. We observed that the introduction of the heterogeneity assumption was associated with significant changes in simulated aneurysmal flow velocities as compared to the homogeneous assumption case. Moreover, as the sample volume size was decreased, the flow velocities approached an asymptotical value, showing the importance of the sample volume size selection. These results demonstrate that the homogeneous assumption for porous media that are inherently heterogeneous can lead to considerable errors. Additionally, this modeling approach allowed us to simulate post-treatment flows without considering the explicit geometry of a deployed endovascular coil mass, greatly simplifying computation.
Graph Partitioning for Parallel Applications in Heterogeneous Grid Environments
NASA Technical Reports Server (NTRS)
Bisws, Rupak; Kumar, Shailendra; Das, Sajal K.; Biegel, Bryan (Technical Monitor)
2002-01-01
The problem of partitioning irregular graphs and meshes for parallel computations on homogeneous systems has been extensively studied. However, these partitioning schemes fail when the target system architecture exhibits heterogeneity in resource characteristics. With the emergence of technologies such as the Grid, it is imperative to study the partitioning problem taking into consideration the differing capabilities of such distributed heterogeneous systems. In our model, the heterogeneous system consists of processors with varying processing power and an underlying non-uniform communication network. We present in this paper a novel multilevel partitioning scheme for irregular graphs and meshes, that takes into account issues pertinent to Grid computing environments. Our partitioning algorithm, called MiniMax, generates and maps partitions onto a heterogeneous system with the objective of minimizing the maximum execution time of the parallel distributed application. For experimental performance study, we have considered both a realistic mesh problem from NASA as well as synthetic workloads. Simulation results demonstrate that MiniMax generates high quality partitions for various classes of applications targeted for parallel execution in a distributed heterogeneous environment.
Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop.
Hua, Guan-Jie; Hung, Che-Lun; Tang, Chuan Yi
2018-01-01
In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Stone, John E.; Hallock, Michael J.; Phillips, James C.; Peterson, Joseph R.; Luthey-Schulten, Zaida; Schulten, Klaus
2016-01-01
Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers. PMID:27516922
Thermal Coefficient of Linear Expansion Modified by Dendritic Segregation in Nickel-Iron Alloys
NASA Astrophysics Data System (ADS)
Ogorodnikova, O. M.; Maksimova, E. V.
2018-05-01
The paper presents investigations of thermal properties of Fe-Ni and Fe-Ni-Co casting alloys affected by the heterogeneous distribution of their chemical elements. It is shown that nickel dendritic segregation has a negative effect on properties of studied invars. A mathematical model is proposed to explore the influence of nickel dendritic segregation on the thermal coefficient of linear expansion (TCLE) of the alloy. A computer simulation of TCLE of Fe-Ni-Co superinvars is performed with regard to a heterogeneous distribution of their chemical elements over the whole volume. The ProLigSol computer software application is developed for processing the data array and results of computer simulation.
CoreTSAR: Core Task-Size Adapting Runtime
Scogland, Thomas R. W.; Feng, Wu-chun; Rountree, Barry; ...
2014-10-27
Heterogeneity continues to increase at all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other co-processors into everything from desktops to supercomputers. As a consequence, efficiently managing such disparate resources has become increasingly complex. CoreTSAR seeks to reduce this complexity by adaptively worksharing parallel-loop regions across compute resources without requiring any transformation of the code within the loop. Lastly, our results show performance improvements of up to three-fold over a current state-of-the-art heterogeneous task scheduler as well as linear performance scaling from a single GPU to four GPUs for many codes. In addition, CoreTSAR demonstratesmore » a robust ability to adapt to both a variety of workloads and underlying system configurations.« less
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks
Kim, Deokho; Park, Karam; Ro, Won W.
2011-01-01
While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine. PMID:22164053
Good coupling for the multiscale patch scheme on systems with microscale heterogeneity
NASA Astrophysics Data System (ADS)
Bunder, J. E.; Roberts, A. J.; Kevrekidis, I. G.
2017-05-01
Computational simulation of microscale detailed systems is frequently only feasible over spatial domains much smaller than the macroscale of interest. The 'equation-free' methodology couples many small patches of microscale computations across space to empower efficient computational simulation over macroscale domains of interest. Motivated by molecular or agent simulations, we analyse the performance of various coupling schemes for patches when the microscale is inherently 'rough'. As a canonical problem in this universality class, we systematically analyse the case of heterogeneous diffusion on a lattice. Computer algebra explores how the dynamics of coupled patches predict the large scale emergent macroscale dynamics of the computational scheme. We determine good design for the coupling of patches by comparing the macroscale predictions from patch dynamics with the emergent macroscale on the entire domain, thus minimising the computational error of the multiscale modelling. The minimal error on the macroscale is obtained when the coupling utilises averaging regions which are between a third and a half of the patch. Moreover, when the symmetry of the inter-patch coupling matches that of the underlying microscale structure, patch dynamics predicts the desired macroscale dynamics to any specified order of error. The results confirm that the patch scheme is useful for macroscale computational simulation of a range of systems with microscale heterogeneity.
Constructing Scientific Applications from Heterogeneous Resources
NASA Technical Reports Server (NTRS)
Schichting, Richard D.
1995-01-01
A new model for high-performance scientific applications in which such applications are implemented as heterogeneous distributed programs or, equivalently, meta-computations, is investigated. The specific focus of this grant was a collaborative effort with researchers at NASA and the University of Toledo to test and improve Schooner, a software interconnection system, and to explore the benefits of increased user interaction with existing scientific applications.
Robust mechanobiological behavior emerges in heterogeneous myosin systems.
Egan, Paul F; Moore, Jeffrey R; Ehrlicher, Allen J; Weitz, David A; Schunn, Christian; Cagan, Jonathan; LeDuc, Philip
2017-09-26
Biological complexity presents challenges for understanding natural phenomenon and engineering new technologies, particularly in systems with molecular heterogeneity. Such complexity is present in myosin motor protein systems, and computational modeling is essential for determining how collective myosin interactions produce emergent system behavior. We develop a computational approach for altering myosin isoform parameters and their collective organization, and support predictions with in vitro experiments of motility assays with α-actinins as molecular force sensors. The computational approach models variations in single myosin molecular structure, system organization, and force stimuli to predict system behavior for filament velocity, energy consumption, and robustness. Robustness is the range of forces where a filament is expected to have continuous velocity and depends on used myosin system energy. Myosin systems are shown to have highly nonlinear behavior across force conditions that may be exploited at a systems level by combining slow and fast myosin isoforms heterogeneously. Results suggest some heterogeneous systems have lower energy use near stall conditions and greater energy consumption when unloaded, therefore promoting robustness. These heterogeneous system capabilities are unique in comparison with homogenous systems and potentially advantageous for high performance bionanotechnologies. Findings open doors at the intersections of mechanics and biology, particularly for understanding and treating myosin-related diseases and developing approaches for motor molecule-based technologies.
Robust mechanobiological behavior emerges in heterogeneous myosin systems
NASA Astrophysics Data System (ADS)
Egan, Paul F.; Moore, Jeffrey R.; Ehrlicher, Allen J.; Weitz, David A.; Schunn, Christian; Cagan, Jonathan; LeDuc, Philip
2017-09-01
Biological complexity presents challenges for understanding natural phenomenon and engineering new technologies, particularly in systems with molecular heterogeneity. Such complexity is present in myosin motor protein systems, and computational modeling is essential for determining how collective myosin interactions produce emergent system behavior. We develop a computational approach for altering myosin isoform parameters and their collective organization, and support predictions with in vitro experiments of motility assays with α-actinins as molecular force sensors. The computational approach models variations in single myosin molecular structure, system organization, and force stimuli to predict system behavior for filament velocity, energy consumption, and robustness. Robustness is the range of forces where a filament is expected to have continuous velocity and depends on used myosin system energy. Myosin systems are shown to have highly nonlinear behavior across force conditions that may be exploited at a systems level by combining slow and fast myosin isoforms heterogeneously. Results suggest some heterogeneous systems have lower energy use near stall conditions and greater energy consumption when unloaded, therefore promoting robustness. These heterogeneous system capabilities are unique in comparison with homogenous systems and potentially advantageous for high performance bionanotechnologies. Findings open doors at the intersections of mechanics and biology, particularly for understanding and treating myosin-related diseases and developing approaches for motor molecule-based technologies.
NASA Technical Reports Server (NTRS)
Crawford, D. A.; Barnouin-Jha, O. S.; Cintala, M. J.
2003-01-01
The propagation of shock waves through target materials is strongly influenced by the presence of small-scale structure, fractures, physical and chemical heterogeneities. Pre-existing fractures often create craters that appear square in outline (e.g. Meteor Crater). Reverberations behind the shock from the presence of physical heterogeneity have been proposed as a mechanism for transient weakening of target materials. Pre-existing fractures can also affect melt generation. In this study, we are attempting to bridge the gap in numerical modeling between the micro-scale and the continuum, the so-called meso-scale. To accomplish this, we are developing a methodology to be used in the shock physics hydrocode (CTH) using Monte-Carlo-type methods to investigate the shock properties of heterogeneous materials. By comparing the results of numerical experiments at the micro-scale with experimental results and by using statistical techniques to evaluate the performance of simple constitutive models, we hope to embed the effect of physical heterogeneity into the field variables (pressure, stress, density, velocity) allowing us to directly imprint the effects of micro-scale heterogeneity at the continuum level without incurring high computational cost.
Job Superscheduler Architecture and Performance in Computational Grid Environments
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Oliker, Leonid; Biswas, Rupak
2003-01-01
Computational grids hold great promise in utilizing geographically separated heterogeneous resources to solve large-scale complex scientific problems. However, a number of major technical hurdles, including distributed resource management and effective job scheduling, stand in the way of realizing these gains. In this paper, we propose a novel grid superscheduler architecture and three distributed job migration algorithms. We also model the critical interaction between the superscheduler and autonomous local schedulers. Extensive performance comparisons with ideal, central, and local schemes using real workloads from leading computational centers are conducted in a simulation environment. Additionally, synthetic workloads are used to perform a detailed sensitivity analysis of our superscheduler. Several key metrics demonstrate that substantial performance gains can be achieved via smart superscheduling in distributed computational grids.
Spiking network simulation code for petascale computers.
Kunkel, Susanne; Schmidt, Maximilian; Eppler, Jochen M; Plesser, Hans E; Masumoto, Gen; Igarashi, Jun; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus; Helias, Moritz
2014-01-01
Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. At cellular resolution, the entities of theory are neurons and synapses and over the past decade researchers have learned to manage the heterogeneity of neurons and synapses with efficient data structures. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petaflop computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise for neuronal network simulation software: Each neuron contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all compute nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any compute node. From the viewpoint of an individual compute node, the heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. Here we present a data structure taking advantage of this double collapse using metaprogramming techniques. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers. We show that the novel architecture scales to the largest petascale supercomputers available today.
Spiking network simulation code for petascale computers
Kunkel, Susanne; Schmidt, Maximilian; Eppler, Jochen M.; Plesser, Hans E.; Masumoto, Gen; Igarashi, Jun; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus; Helias, Moritz
2014-01-01
Brain-scale networks exhibit a breathtaking heterogeneity in the dynamical properties and parameters of their constituents. At cellular resolution, the entities of theory are neurons and synapses and over the past decade researchers have learned to manage the heterogeneity of neurons and synapses with efficient data structures. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petaflop computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise for neuronal network simulation software: Each neuron contacts on the order of 10,000 other neurons and thus has targets only on a fraction of all compute nodes; furthermore, for any given source neuron, at most a single synapse is typically created on any compute node. From the viewpoint of an individual compute node, the heterogeneity in the synaptic target lists thus collapses along two dimensions: the dimension of the types of synapses and the dimension of the number of synapses of a given type. Here we present a data structure taking advantage of this double collapse using metaprogramming techniques. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers. We show that the novel architecture scales to the largest petascale supercomputers available today. PMID:25346682
A heterogeneous computing environment for simulating astrophysical fluid flows
NASA Technical Reports Server (NTRS)
Cazes, J.
1994-01-01
In the Concurrent Computing Laboratory in the Department of Physics and Astronomy at Louisiana State University we have constructed a heterogeneous computing environment that permits us to routinely simulate complicated three-dimensional fluid flows and to readily visualize the results of each simulation via three-dimensional animation sequences. An 8192-node MasPar MP-1 computer with 0.5 GBytes of RAM provides 250 MFlops of execution speed for our fluid flow simulations. Utilizing the parallel virtual machine (PVM) language, at periodic intervals data is automatically transferred from the MP-1 to a cluster of workstations where individual three-dimensional images are rendered for inclusion in a single animation sequence. Work is underway to replace executions on the MP-1 with simulations performed on the 512-node CM-5 at NCSA and to simultaneously gain access to more potent volume rendering workstations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Y M; Bush, K; Han, B
Purpose: Accurate and fast dose calculation is a prerequisite of precision radiation therapy in modern photon and particle therapy. While Monte Carlo (MC) dose calculation provides high dosimetric accuracy, the drastically increased computational time hinders its routine use. Deterministic dose calculation methods are fast, but problematic in the presence of tissue density inhomogeneity. We leverage the useful features of deterministic methods and MC to develop a hybrid dose calculation platform with autonomous utilization of MC and deterministic calculation depending on the local geometry, for optimal accuracy and speed. Methods: Our platform utilizes a Geant4 based “localized Monte Carlo” (LMC) methodmore » that isolates MC dose calculations only to volumes that have potential for dosimetric inaccuracy. In our approach, additional structures are created encompassing heterogeneous volumes. Deterministic methods calculate dose and energy fluence up to the volume surfaces, where the energy fluence distribution is sampled into discrete histories and transported using MC. Histories exiting the volume are converted back into energy fluence, and transported deterministically. By matching boundary conditions at both interfaces, deterministic dose calculation account for dose perturbations “downstream” of localized heterogeneities. Hybrid dose calculation was performed for water and anthropomorphic phantoms. Results: We achieved <1% agreement between deterministic and MC calculations in the water benchmark for photon and proton beams, and dose differences of 2%–15% could be observed in heterogeneous phantoms. The saving in computational time (a factor ∼4–7 compared to a full Monte Carlo dose calculation) was found to be approximately proportional to the volume of the heterogeneous region. Conclusion: Our hybrid dose calculation approach takes advantage of the computational efficiency of deterministic method and accuracy of MC, providing a practical tool for high performance dose calculation in modern RT. The approach is generalizable to all modalities where heterogeneities play a large role, notably particle therapy.« less
NASA Astrophysics Data System (ADS)
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
NASA Astrophysics Data System (ADS)
Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos
2011-01-01
General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
Advanced nodal neutron diffusion method with space-dependent cross sections: ILLICO-VX
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajic, H.L.; Ougouag, A.M.
1987-01-01
Advanced transverse integrated nodal methods for neutron diffusion developed since the 1970s require that node- or assembly-homogenized cross sections be known. The underlying structural heterogeneity can be accurately accounted for in homogenization procedures by the use of heterogeneity or discontinuity factors. Other (milder) types of heterogeneity, burnup-induced or due to thermal-hydraulic feedback, can be resolved by explicitly accounting for the spatial variations of material properties. This can be done during the nodal computations via nonlinear iterations. The new method has been implemented in the code ILLICO-VX (ILLICO variable cross-section method). Numerous numerical tests were performed. As expected, the convergence ratemore » of ILLICO-VX is lower than that of ILLICO, requiring approx. 30% more outer iterations per k/sub eff/ computation. The methodology has also been implemented as the NOMAD-VX option of the NOMAD, multicycle, multigroup, two- and three-dimensional nodal diffusion depletion code. The burnup-induced heterogeneities (space dependence of cross sections) are calculated during the burnup steps.« less
Modeling of heterogeneous elastic materials by the multiscale hp-adaptive finite element method
NASA Astrophysics Data System (ADS)
Klimczak, Marek; Cecot, Witold
2018-01-01
We present an enhancement of the multiscale finite element method (MsFEM) by combining it with the hp-adaptive FEM. Such a discretization-based homogenization technique is a versatile tool for modeling heterogeneous materials with fast oscillating elasticity coefficients. No assumption on periodicity of the domain is required. In order to avoid direct, so-called overkill mesh computations, a coarse mesh with effective stiffness matrices is used and special shape functions are constructed to account for the local heterogeneities at the micro resolution. The automatic adaptivity (hp-type at the macro resolution and h-type at the micro resolution) increases efficiency of computation. In this paper details of the modified MsFEM are presented and a numerical test performed on a Fichera corner domain is presented in order to validate the proposed approach.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Jambusaria, Ankit; Klomp, Jeff; Hong, Zhigang; Rafii, Shahin; Dai, Yang; Malik, Asrar B; Rehman, Jalees
2018-06-07
The heterogeneity of cells across tissue types represents a major challenge for studying biological mechanisms as well as for therapeutic targeting of distinct tissues. Computational prediction of tissue-specific gene regulatory networks may provide important insights into the mechanisms underlying the cellular heterogeneity of cells in distinct organs and tissues. Using three pathway analysis techniques, gene set enrichment analysis (GSEA), parametric analysis of gene set enrichment (PGSEA), alongside our novel model (HeteroPath), which assesses heterogeneously upregulated and downregulated genes within the context of pathways, we generated distinct tissue-specific gene regulatory networks. We analyzed gene expression data derived from freshly isolated heart, brain, and lung endothelial cells and populations of neurons in the hippocampus, cingulate cortex, and amygdala. In both datasets, we found that HeteroPath segregated the distinct cellular populations by identifying regulatory pathways that were not identified by GSEA or PGSEA. Using simulated datasets, HeteroPath demonstrated robustness that was comparable to what was seen using existing gene set enrichment methods. Furthermore, we generated tissue-specific gene regulatory networks involved in vascular heterogeneity and neuronal heterogeneity by performing motif enrichment of the heterogeneous genes identified by HeteroPath and linking the enriched motifs to regulatory transcription factors in the ENCODE database. HeteroPath assesses contextual bidirectional gene expression within pathways and thus allows for transcriptomic assessment of cellular heterogeneity. Unraveling tissue-specific heterogeneity of gene expression can lead to a better understanding of the molecular underpinnings of tissue-specific phenotypes.
Faithful qubit transmission in a quantum communication network with heterogeneous channels
NASA Astrophysics Data System (ADS)
Chen, Na; Zhang, Lin Xi; Pei, Chang Xing
2018-04-01
Quantum communication networks enable long-distance qubit transmission and distributed quantum computation. In this paper, a quantum communication network with heterogeneous quantum channels is constructed. A faithful qubit transmission scheme is presented. Detailed calculations and performance analyses show that even in a low-quality quantum channel with serious decoherence, only modest number of locally prepared target qubits are required to achieve near-deterministic qubit transmission.
Travetti, Olga; di Giancamillo, Mauro; Stefanello, Damiano; Ferrari, Roberta; Giudice, Chiara; Grieco, Valeria; Saunders, Jimmy H
2013-06-01
Feline injection-site sarcoma (FISS) may be a consequence of subcutaneous injection. In the present study, the medical records and the computed tomography (CT) features of 22 cats with a FISS, histopathological subtype fibrosarcoma, were used. The majority of the fibrosarcomas (45%) were located in the interscapular region. All fibrosarcomas, except one with mild enhancement, showed strong contrast uptake, characterised as ring (42%), heterogeneous (36%), homogeneous (9%), heterogeneous/ring (6.5%) or mixed heterogeneous/homogeneous enhancement (6.5%). The longest axis of the mass was in a cranio-caudal (68%) or dorso-ventral (32%) direction. The median volume calculated on CT was 7.57 cm(3). Common features were a marked local invasiveness of the musculature and heterogeneity of the tissue in the periphery of the neoplasia. When the fibrosarcoma was interscapular, performing an additional post-contrast scan with the forelimbs positioned caudally along the body, in addition to the standard protocol with the forelimbs extended cranially, allowed better evaluation of the actual relationship between the tumour and the surrounding tissues. The mean number of muscles involved with the tumour was 2.09 with extended and 1.95 with flexed forelimbs. When a lower number of structures was considered infiltrated through the double positioning, a less invasive surgical approach to underlying muscles and scapula was performed.
A Heterogeneous High-Performance System for Computational and Computer Science
2016-11-15
Patents Submitted Patents Awarded Awards Graduate Students Names of Post Doctorates Names of Faculty Supported Names of Under Graduate students supported...team of research faculty from the departments of computer science and natural science at Bowie State University. The supercomputer is not only to...accelerated HPC systems. The supercomputer is also ideal for the research conducted in the Department of Natural Science, as research faculty work on
Approximate Bayesian computation for spatial SEIR(S) epidemic models.
Brown, Grant D; Porter, Aaron T; Oleson, Jacob J; Hinman, Jessica A
2018-02-01
Approximate Bayesia n Computation (ABC) provides an attractive approach to estimation in complex Bayesian inferential problems for which evaluation of the kernel of the posterior distribution is impossible or computationally expensive. These highly parallelizable techniques have been successfully applied to many fields, particularly in cases where more traditional approaches such as Markov chain Monte Carlo (MCMC) are impractical. In this work, we demonstrate the application of approximate Bayesian inference to spatially heterogeneous Susceptible-Exposed-Infectious-Removed (SEIR) stochastic epidemic models. These models have a tractable posterior distribution, however MCMC techniques nevertheless become computationally infeasible for moderately sized problems. We discuss the practical implementation of these techniques via the open source ABSEIR package for R. The performance of ABC relative to traditional MCMC methods in a small problem is explored under simulation, as well as in the spatially heterogeneous context of the 2014 epidemic of Chikungunya in the Americas. Copyright © 2017 Elsevier Ltd. All rights reserved.
Dynamic Transfers Of Tasks Among Computers
NASA Technical Reports Server (NTRS)
Liu, Howard T.; Silvester, John A.
1989-01-01
Allocation scheme gives jobs to idle computers. Ideal resource-sharing algorithm should have following characteristics: Dynamics, decentralized, and heterogeneous. Proposed enhanced receiver-initiated dynamic algorithm (ERIDA) for resource sharing fulfills all above criteria. Provides method balancing workload among hosts, resulting in improvement in response time and throughput performance of total system. Adjusts dynamically to traffic load of each station.
Finite-fault source inversion using adjoint methods in 3D heterogeneous media
NASA Astrophysics Data System (ADS)
Somala, Surendra Nadh; Ampuero, Jean-Paul; Lapusta, Nadia
2018-04-01
Accounting for lateral heterogeneities in the 3D velocity structure of the crust is known to improve earthquake source inversion, compared to results based on 1D velocity models which are routinely assumed to derive finite-fault slip models. The conventional approach to include known 3D heterogeneity in source inversion involves pre-computing 3D Green's functions, which requires a number of 3D wave propagation simulations proportional to the number of stations or to the number of fault cells. The computational cost of such an approach is prohibitive for the dense datasets that could be provided by future earthquake observation systems. Here, we propose an adjoint-based optimization technique to invert for the spatio-temporal evolution of slip velocity. The approach does not require pre-computed Green's functions. The adjoint method provides the gradient of the cost function, which is used to improve the model iteratively employing an iterative gradient-based minimization method. The adjoint approach is shown to be computationally more efficient than the conventional approach based on pre-computed Green's functions in a broad range of situations. We consider data up to 1 Hz from a Haskell source scenario (a steady pulse-like rupture) on a vertical strike-slip fault embedded in an elastic 3D heterogeneous velocity model. The velocity model comprises a uniform background and a 3D stochastic perturbation with the von Karman correlation function. Source inversions based on the 3D velocity model are performed for two different station configurations, a dense and a sparse network with 1 km and 20 km station spacing, respectively. These reference inversions show that our inversion scheme adequately retrieves the rise time when the velocity model is exactly known, and illustrates how dense coverage improves the inference of peak slip velocities. We investigate the effects of uncertainties in the velocity model by performing source inversions based on an incorrect, homogeneous velocity model. We find that, for velocity uncertainties that have standard deviation and correlation length typical of available 3D crustal models, the inverted sources can be severely contaminated by spurious features even if the station density is high. When data from thousand or more receivers is used in source inversions in 3D heterogeneous media, the computational cost of the method proposed in this work is at least two orders of magnitude lower than source inversion based on pre-computed Green's functions.
Finite-fault source inversion using adjoint methods in 3-D heterogeneous media
NASA Astrophysics Data System (ADS)
Somala, Surendra Nadh; Ampuero, Jean-Paul; Lapusta, Nadia
2018-07-01
Accounting for lateral heterogeneities in the 3-D velocity structure of the crust is known to improve earthquake source inversion, compared to results based on 1-D velocity models which are routinely assumed to derive finite-fault slip models. The conventional approach to include known 3-D heterogeneity in source inversion involves pre-computing 3-D Green's functions, which requires a number of 3-D wave propagation simulations proportional to the number of stations or to the number of fault cells. The computational cost of such an approach is prohibitive for the dense data sets that could be provided by future earthquake observation systems. Here, we propose an adjoint-based optimization technique to invert for the spatio-temporal evolution of slip velocity. The approach does not require pre-computed Green's functions. The adjoint method provides the gradient of the cost function, which is used to improve the model iteratively employing an iterative gradient-based minimization method. The adjoint approach is shown to be computationally more efficient than the conventional approach based on pre-computed Green's functions in a broad range of situations. We consider data up to 1 Hz from a Haskell source scenario (a steady pulse-like rupture) on a vertical strike-slip fault embedded in an elastic 3-D heterogeneous velocity model. The velocity model comprises a uniform background and a 3-D stochastic perturbation with the von Karman correlation function. Source inversions based on the 3-D velocity model are performed for two different station configurations, a dense and a sparse network with 1 and 20 km station spacing, respectively. These reference inversions show that our inversion scheme adequately retrieves the rise time when the velocity model is exactly known, and illustrates how dense coverage improves the inference of peak-slip velocities. We investigate the effects of uncertainties in the velocity model by performing source inversions based on an incorrect, homogeneous velocity model. We find that, for velocity uncertainties that have standard deviation and correlation length typical of available 3-D crustal models, the inverted sources can be severely contaminated by spurious features even if the station density is high. When data from thousand or more receivers is used in source inversions in 3-D heterogeneous media, the computational cost of the method proposed in this work is at least two orders of magnitude lower than source inversion based on pre-computed Green's functions.
HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction
Zhang, Xu; You, Zhu-Hong; Huang, Yu-An; Yan, Gui-Ying
2016-01-01
Recently, microRNAs (miRNAs) have drawn more and more attentions because accumulating experimental studies have indicated miRNA could play critical roles in multiple biological processes as well as the development and progression of human complex diseases. Using the huge number of known heterogeneous biological datasets to predict potential associations between miRNAs and diseases is an important topic in the field of biology, medicine, and bioinformatics. In this study, considering the limitations in the previous computational methods, we developed the computational model of Heterogeneous Graph Inference for MiRNA-Disease Association prediction (HGIMDA) to uncover potential miRNA-disease associations by integrating miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and experimentally verified miRNA-disease associations into a heterogeneous graph. HGIMDA obtained AUCs of 0.8781 and 0.8077 based on global and local leave-one-out cross validation, respectively. Furthermore, HGIMDA was applied to three important human cancers for performance evaluation. As a result, 90% (Colon Neoplasms), 88% (Esophageal Neoplasms) and 88% (Kidney Neoplasms) of top 50 predicted miRNAs are confirmed by recent experiment reports. Furthermore, HGIMDA could be effectively applied to new diseases and new miRNAs without any known associations, which overcome the important limitations of many previous computational models. PMID:27533456
HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction.
Chen, Xing; Yan, Chenggang Clarence; Zhang, Xu; You, Zhu-Hong; Huang, Yu-An; Yan, Gui-Ying
2016-10-04
Recently, microRNAs (miRNAs) have drawn more and more attentions because accumulating experimental studies have indicated miRNA could play critical roles in multiple biological processes as well as the development and progression of human complex diseases. Using the huge number of known heterogeneous biological datasets to predict potential associations between miRNAs and diseases is an important topic in the field of biology, medicine, and bioinformatics. In this study, considering the limitations in the previous computational methods, we developed the computational model of Heterogeneous Graph Inference for MiRNA-Disease Association prediction (HGIMDA) to uncover potential miRNA-disease associations by integrating miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and experimentally verified miRNA-disease associations into a heterogeneous graph. HGIMDA obtained AUCs of 0.8781 and 0.8077 based on global and local leave-one-out cross validation, respectively. Furthermore, HGIMDA was applied to three important human cancers for performance evaluation. As a result, 90% (Colon Neoplasms), 88% (Esophageal Neoplasms) and 88% (Kidney Neoplasms) of top 50 predicted miRNAs are confirmed by recent experiment reports. Furthermore, HGIMDA could be effectively applied to new diseases and new miRNAs without any known associations, which overcome the important limitations of many previous computational models.
Comparison of statistical tests for association between rare variants and binary traits.
Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C
2012-01-01
Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.
An incremental database access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, Nicholas; Sellis, Timos
1994-01-01
We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values.
Hussein, Esam M A; Agbogun, H M D; Al, Tom A
2015-03-01
A method is presented for interpreting the values of x-ray attenuation coefficients reconstructed in computed tomography of porous media, while overcoming the ambiguity caused by the multichromatic nature of x-rays, dilution by void, and material heterogeneity. The method enables determination of porosity without relying on calibration or image segmentation or thresholding to discriminate pores from solid material. It distinguishes between solution-accessible and inaccessible pores, and provides the spatial and frequency distributions of solid-matrix material in a heterogeneous medium. This is accomplished by matching an image of a sample saturated with a contrast solution with that saturated with a transparent solution. Voxels occupied with solid-material and inaccessible pores are identified by the fact that they maintain the same location and image attributes in both images, with voxels containing inaccessible pores appearing empty in both images. Fully porous and accessible voxels exhibit the maximum contrast, while the rest are porous voxels containing mixtures of pore solutions and solid. This matching process is performed with an image registration computer code, and image processing software that requires only simple subtraction and multiplication (scaling) processes. The process is demonstrated in dolomite (non-uniform void distribution, homogeneous solid matrix) and sandstone (nearly uniform void distribution, heterogeneous solid matrix) samples, and its overall performance is shown to compare favorably with a method based on calibration and thresholding. Copyright © 2014 Elsevier Ltd. All rights reserved.
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing
Fang, Ye; Ding, Yun; Feinstein, Wei P.; Koppelman, David M.; Moreno, Juana; Jarrell, Mark; Ramanujam, J.; Brylinski, Michal
2016-01-01
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249. PMID:27420300
GeauxDock: Accelerating Structure-Based Virtual Screening with Heterogeneous Computing.
Fang, Ye; Ding, Yun; Feinstein, Wei P; Koppelman, David M; Moreno, Juana; Jarrell, Mark; Ramanujam, J; Brylinski, Michal
2016-01-01
Computational modeling of drug binding to proteins is an integral component of direct drug design. Particularly, structure-based virtual screening is often used to perform large-scale modeling of putative associations between small organic molecules and their pharmacologically relevant protein targets. Because of a large number of drug candidates to be evaluated, an accurate and fast docking engine is a critical element of virtual screening. Consequently, highly optimized docking codes are of paramount importance for the effectiveness of virtual screening methods. In this communication, we describe the implementation, tuning and performance characteristics of GeauxDock, a recently developed molecular docking program. GeauxDock is built upon the Monte Carlo algorithm and features a novel scoring function combining physics-based energy terms with statistical and knowledge-based potentials. Developed specifically for heterogeneous computing platforms, the current version of GeauxDock can be deployed on modern, multi-core Central Processing Units (CPUs) as well as massively parallel accelerators, Intel Xeon Phi and NVIDIA Graphics Processing Unit (GPU). First, we carried out a thorough performance tuning of the high-level framework and the docking kernel to produce a fast serial code, which was then ported to shared-memory multi-core CPUs yielding a near-ideal scaling. Further, using Xeon Phi gives 1.9× performance improvement over a dual 10-core Xeon CPU, whereas the best GPU accelerator, GeForce GTX 980, achieves a speedup as high as 3.5×. On that account, GeauxDock can take advantage of modern heterogeneous architectures to considerably accelerate structure-based virtual screening applications. GeauxDock is open-sourced and publicly available at www.brylinski.org/geauxdock and https://figshare.com/articles/geauxdock_tar_gz/3205249.
Seismic signal processing on heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Fichtner, Andreas
2015-04-01
The processing of seismic signals - including the correlation of massive ambient noise data sets - represents an important part of a wide range of seismological applications. It is characterized by large data volumes as well as high computational input/output intensity. Development of efficient approaches towards seismic signal processing on emerging high performance computing systems is therefore essential. Heterogeneous supercomputing systems introduced in the recent years provide numerous computing nodes interconnected via high throughput networks, every node containing a mix of processing elements of different architectures, like several sequential processor cores and one or a few graphical processing units (GPU) serving as accelerators. A typical representative of such computing systems is "Piz Daint", a supercomputer of the Cray XC 30 family operated by the Swiss National Supercomputing Center (CSCS), which we used in this research. Heterogeneous supercomputers provide an opportunity for manifold application performance increase and are more energy-efficient, however they have much higher hardware complexity and are therefore much more difficult to program. The programming effort may be substantially reduced by the introduction of modular libraries of software components that can be reused for a wide class of seismology applications. The ultimate goal of this research is design of a prototype for such library suitable for implementing various seismic signal processing applications on heterogeneous systems. As a representative use case we have chosen an ambient noise correlation application. Ambient noise interferometry has developed into one of the most powerful tools to image and monitor the Earth's interior. Future applications will require the extraction of increasingly small details from noise recordings. To meet this demand, more advanced correlation techniques combined with very large data volumes are needed. This poses new computational problems that require dedicated HPC solutions. The chosen application is using a wide range of common signal processing methods, which include various IIR filter designs, amplitude and phase correlation, computing the analytic signal, and discrete Fourier transforms. Furthermore, various processing methods specific for seismology, like rotation of seismic traces, are used. Efficient implementation of all these methods on the GPU-accelerated systems represents several challenges. In particular, it requires a careful distribution of work between the sequential processors and accelerators. Furthermore, since the application is designed to process very large volumes of data, special attention had to be paid to the efficient use of the available memory and networking hardware resources in order to reduce intensity of data input and output. In our contribution we will explain the software architecture as well as principal engineering decisions used to address these challenges. We will also describe the programming model based on C++ and CUDA that we used to develop the software. Finally, we will demonstrate performance improvements achieved by using the heterogeneous computing architecture. This work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d26.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swaminarayan, Sriram; Germann, Timothy C; Kadau, Kai
2008-01-01
The authors present timing and performance numbers for a short-range parallel molecular dynamics (MD) code, SPaSM, that has been rewritten for the heterogeneous Roadrunner supercomputer. Each Roadrunner compute node consists of two AMD Opteron dual-core microprocessors and four PowerXCell 8i enhanced Cell microprocessors, so that there are four MPI ranks per node, each with one Opteron and one Cell. The interatomic forces are computed on the Cells (each with one PPU and eight SPU cores), while the Opterons are used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementationmore » of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), corresponding to 124 MFlop/Watt/s at a price of approximately 3.69 MFlops/dollar. They demonstrate an initial target application, the jetting and ejection of material from a shocked surface.« less
CQPSO scheduling algorithm for heterogeneous multi-core DAG task model
NASA Astrophysics Data System (ADS)
Zhai, Wenzheng; Hu, Yue-Li; Ran, Feng
2017-07-01
Efficient task scheduling is critical to achieve high performance in a heterogeneous multi-core computing environment. The paper focuses on the heterogeneous multi-core directed acyclic graph (DAG) task model and proposes a novel task scheduling method based on an improved chaotic quantum-behaved particle swarm optimization (CQPSO) algorithm. A task priority scheduling list was built. A processor with minimum cumulative earliest finish time (EFT) was acted as the object of the first task assignment. The task precedence relationships were satisfied and the total execution time of all tasks was minimized. The experimental results show that the proposed algorithm has the advantage of optimization abilities, simple and feasible, fast convergence, and can be applied to the task scheduling optimization for other heterogeneous and distributed environment.
Heterogeneity in Health Care Computing Environments
Sengupta, Soumitra
1989-01-01
This paper discusses issues of heterogeneity in computer systems, networks, databases, and presentation techniques, and the problems it creates in developing integrated medical information systems. The need for institutional, comprehensive goals are emphasized. Using the Columbia-Presbyterian Medical Center's computing environment as the case study, various steps to solve the heterogeneity problem are presented.
A data colocation grid framework for big data medical image processing: backend design
NASA Astrophysics Data System (ADS)
Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.
2018-03-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop and HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design.
Bao, Shunxing; Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A
2018-03-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework's performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available.
A Data Colocation Grid Framework for Big Data Medical Image Processing: Backend Design
Huo, Yuankai; Parvathaneni, Prasanna; Plassard, Andrew J.; Bermudez, Camilo; Yao, Yuang; Lyu, Ilwoo; Gokhale, Aniruddha; Landman, Bennett A.
2018-01-01
When processing large medical imaging studies, adopting high performance grid computing resources rapidly becomes important. We recently presented a "medical image processing-as-a-service" grid framework that offers promise in utilizing the Apache Hadoop ecosystem and HBase for data colocation by moving computation close to medical image storage. However, the framework has not yet proven to be easy to use in a heterogeneous hardware environment. Furthermore, the system has not yet validated when considering variety of multi-level analysis in medical imaging. Our target design criteria are (1) improving the framework’s performance in a heterogeneous cluster, (2) performing population based summary statistics on large datasets, and (3) introducing a table design scheme for rapid NoSQL query. In this paper, we present a heuristic backend interface application program interface (API) design for Hadoop & HBase for Medical Image Processing (HadoopBase-MIP). The API includes: Upload, Retrieve, Remove, Load balancer (for heterogeneous cluster) and MapReduce templates. A dataset summary statistic model is discussed and implemented by MapReduce paradigm. We introduce a HBase table scheme for fast data query to better utilize the MapReduce model. Briefly, 5153 T1 images were retrieved from a university secure, shared web database and used to empirically access an in-house grid with 224 heterogeneous CPU cores. Three empirical experiments results are presented and discussed: (1) load balancer wall-time improvement of 1.5-fold compared with a framework with built-in data allocation strategy, (2) a summary statistic model is empirically verified on grid framework and is compared with the cluster when deployed with a standard Sun Grid Engine (SGE), which reduces 8-fold of wall clock time and 14-fold of resource time, and (3) the proposed HBase table scheme improves MapReduce computation with 7 fold reduction of wall time compare with a naïve scheme when datasets are relative small. The source code and interfaces have been made publicly available. PMID:29887668
Yim, Wen-Wai; Chien, Shu; Kusumoto, Yasuyuki; Date, Susumu; Haga, Jason
2010-01-01
Large-scale in-silico screening is a necessary part of drug discovery and Grid computing is one answer to this demand. A disadvantage of using Grid computing is the heterogeneous computational environments characteristic of a Grid. In our study, we have found that for the molecular docking simulation program DOCK, different clusters within a Grid organization can yield inconsistent results. Because DOCK in-silico virtual screening (VS) is currently used to help select chemical compounds to test with in-vitro experiments, such differences have little effect on the validity of using virtual screening before subsequent steps in the drug discovery process. However, it is difficult to predict whether the accumulation of these discrepancies over sequentially repeated VS experiments will significantly alter the results if VS is used as the primary means for identifying potential drugs. Moreover, such discrepancies may be unacceptable for other applications requiring more stringent thresholds. This highlights the need for establishing a more complete solution to provide the best scientific accuracy when executing an application across Grids. One possible solution to platform heterogeneity in DOCK performance explored in our study involved the use of virtual machines as a layer of abstraction. This study investigated the feasibility and practicality of using virtual machine and recent cloud computing technologies in a biological research application. We examined the differences and variations of DOCK VS variables, across a Grid environment composed of different clusters, with and without virtualization. The uniform computer environment provided by virtual machines eliminated inconsistent DOCK VS results caused by heterogeneous clusters, however, the execution time for the DOCK VS increased. In our particular experiments, overhead costs were found to be an average of 41% and 2% in execution time for two different clusters, while the actual magnitudes of the execution time costs were minimal. Despite the increase in overhead, virtual clusters are an ideal solution for Grid heterogeneity. With greater development of virtual cluster technology in Grid environments, the problem of platform heterogeneity may be eliminated through virtualization, allowing greater usage of VS, and will benefit all Grid applications in general.
DOE Office of Scientific and Technical Information (OSTI.GOV)
García-Melchor, Max; Vilella, Laia; López, Núria
2016-04-29
An attractive strategy to improve the performance of water oxidation catalysts would be to anchor a homogeneous molecular catalyst on a heterogeneous solid surface to create a hybrid catalyst. The idea of this combined system is to take advantage of the individual properties of each of the two catalyst components. We use Density Functional Theory to determine the stability and activity of a model hybrid water oxidation catalyst consisting of a dimeric Ir complex attached on the IrO 2(110) surface through two oxygen atoms. We find that homogeneous catalysts can be bound to its matrix oxide without losing significant activity.more » Hence, designing hybrid systems that benefit from both the high tunability of activity of homogeneous catalysts and the stability of heterogeneous systems seems feasible.« less
NASA Astrophysics Data System (ADS)
Libera, A.; Henri, C.; de Barros, F.
2017-12-01
Heterogeneities in natural porous formations, mainly manifested through the hydraulic conductivity (K) and, to a lesser degree, the porosity (Φ), largely control subsurface flow and solute transport. The influence of the heterogeneous structure of K on flow and solute transport processes has been widely studied, whereas less attention is dedicated to the joint heterogeneity of conductivity and porosity fields. Our study employs computational tools to investigate the joint effect of the spatial variabilities of K and Φ on the transport behavior of a solute plume. We explore multiple scenarios, characterized by different levels of heterogeneity of the geological system, and compare the computational results from the joint K and Φ heterogeneous system with the results originating from the generally adopted constant porosity case. In our work, we assume that the heterogeneous porosity is positively correlated to hydraulic conductivity. We perform numerical Monte Carlo simulations of conservative and reactive contaminant transport in a 3D aquifer. Contaminant mass and plume arrival times at multiple control planes and/or pumping wells operating under different extraction rates are analyzed. We employ different probabilistic metrics to quantify the risk at the monitoring locations, e.g., increased lifetime cancer risk and exceedance of Maximum Contaminant Levels (MCLs), under multiple transport scenarios (i.e., different levels of heterogeneity, conservative or reactive solutes and different contaminant species). Results show that early and late arrival times of the solute mass at the selected sensitive locations (i.e. control planes/pumping wells) as well as risk metrics are strongly influenced by the spatial variability of the Φ field.
NASA Astrophysics Data System (ADS)
Niño, Alfonso; Muñoz-Caro, Camelia; Reyes, Sebastián
2015-11-01
The last decade witnessed a great development of the structural and dynamic study of complex systems described as a network of elements. Therefore, systems can be described as a set of, possibly, heterogeneous entities or agents (the network nodes) interacting in, possibly, different ways (defining the network edges). In this context, it is of practical interest to model and handle not only static and homogeneous networks but also dynamic, heterogeneous ones. Depending on the size and type of the problem, these networks may require different computational approaches involving sequential, parallel or distributed systems with or without the use of disk-based data structures. In this work, we develop an Application Programming Interface (APINetworks) for the modeling and treatment of general networks in arbitrary computational environments. To minimize dependency between components, we decouple the network structure from its function using different packages for grouping sets of related tasks. The structural package, the one in charge of building and handling the network structure, is the core element of the system. In this work, we focus in this API structural component. We apply an object-oriented approach that makes use of inheritance and polymorphism. In this way, we can model static and dynamic networks with heterogeneous elements in the nodes and heterogeneous interactions in the edges. In addition, this approach permits a unified treatment of different computational environments. Tests performed on a C++11 version of the structural package show that, on current standard computers, the system can handle, in main memory, directed and undirected linear networks formed by tens of millions of nodes and edges. Our results compare favorably to those of existing tools.
Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Abdulhamid, Shafi'i Muhammad; Usman, Mohammed Joda
2017-01-01
Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing.
Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Usman, Mohammed Joda
2017-01-01
Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing. PMID:28467505
Robust analysis of an underwater navigational strategy in electrically heterogeneous corridors.
Dimble, Kedar D; Ranganathan, Badri N; Keshavan, Jishnu; Humbert, J Sean
2016-08-01
Obstacles and other global stimuli provide relevant navigational cues to a weakly electric fish. In this work, robust analysis of a control strategy based on electrolocation for performing obstacle avoidance in electrically heterogeneous corridors is presented and validated. Static output feedback control is shown to achieve the desired goal of reflexive obstacle avoidance in such environments in simulation and experimentation. The proposed approach is computationally inexpensive and readily implementable on a small scale underwater vehicle, making underwater autonomous navigation feasible in real-time.
Extreme-scale Algorithms and Solver Resilience
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, Jack
A widening gap exists between the peak performance of high-performance computers and the performance achieved by complex applications running on these platforms. Over the next decade, extreme-scale systems will present major new challenges to algorithm development that could amplify this mismatch in such a way that it prevents the productive use of future DOE Leadership computers due to the following; Extreme levels of parallelism due to multicore processors; An increase in system fault rates requiring algorithms to be resilient beyond just checkpoint/restart; Complex memory hierarchies and costly data movement in both energy and performance; Heterogeneous system architectures (mixing CPUs, GPUs,more » etc.); and Conflicting goals of performance, resilience, and power requirements.« less
Efficient parallelization of analytic bond-order potentials for large-scale atomistic simulations
NASA Astrophysics Data System (ADS)
Teijeiro, C.; Hammerschmidt, T.; Drautz, R.; Sutmann, G.
2016-07-01
Analytic bond-order potentials (BOPs) provide a way to compute atomistic properties with controllable accuracy. For large-scale computations of heterogeneous compounds at the atomistic level, both the computational efficiency and memory demand of BOP implementations have to be optimized. Since the evaluation of BOPs is a local operation within a finite environment, the parallelization concepts known from short-range interacting particle simulations can be applied to improve the performance of these simulations. In this work, several efficient parallelization methods for BOPs that use three-dimensional domain decomposition schemes are described. The schemes are implemented into the bond-order potential code BOPfox, and their performance is measured in a series of benchmarks. Systems of up to several millions of atoms are simulated on a high performance computing system, and parallel scaling is demonstrated for up to thousands of processors.
Xia, Jun; Huang, Chao; Maslov, Konstantin; Anastasio, Mark A; Wang, Lihong V
2013-08-15
Photoacoustic computed tomography (PACT) is a hybrid technique that combines optical excitation and ultrasonic detection to provide high-resolution images in deep tissues. In the image reconstruction, a constant speed of sound (SOS) is normally assumed. This assumption, however, is often not strictly satisfied in deep tissue imaging, due to acoustic heterogeneities within the object and between the object and the coupling medium. If these heterogeneities are not accounted for, they will cause distortions and artifacts in the reconstructed images. In this Letter, we incorporated ultrasonic computed tomography (USCT), which measures the SOS distribution within the object, into our full-ring array PACT system. Without the need for ultrasonic transmitting electronics, USCT was performed using the same laser beam as for PACT measurement. By scanning the laser beam on the array surface, we can sequentially fire different elements. As a first demonstration of the system, we studied the effect of acoustic heterogeneities on photoacoustic vascular imaging. We verified that constant SOS is a reasonable approximation when the SOS variation is small. When the variation is large, distortion will be observed in the periphery of the object, especially in the tangential direction.
Pollitz, F.F.
2002-01-01
I present a new algorithm for calculating seismic wave propagation through a three-dimensional heterogeneous medium using the framework of mode coupling theory originally developed to perform very low frequency (f < ???0.01-0.05 Hz) seismic wavefield computation. It is a Greens function approach for multiple scattering within a defined volume and employs a truncated traveling wave basis set using the locked mode approximation. Interactions between incident and scattered wavefields are prescribed by mode coupling theory and account for the coupling among surface waves, body waves, and evanescent waves. The described algorithm is, in principle, applicable to global and regional wave propagation problems, but I focus on higher frequency (typically f ??????0.25 Hz) applications at regional and local distances where the locked mode approximation is best utilized and which involve wavefields strongly shaped by propagation through a highly heterogeneous crust. Synthetic examples are shown for P-SV-wave propagation through a semi-ellipsoidal basin and SH-wave propagation through a fault zone.
Optimization and real-time control for laser treatment of heterogeneous soft tissues.
Feng, Yusheng; Fuentes, David; Hawkins, Andrea; Bass, Jon M; Rylander, Marissa Nichole
2009-01-01
Predicting the outcome of thermotherapies in cancer treatment requires an accurate characterization of the bioheat transfer processes in soft tissues. Due to the biological and structural complexity of tumor (soft tissue) composition and vasculature, it is often very difficult to obtain reliable tissue properties that is one of the key factors for the accurate treatment outcome prediction. Efficient algorithms employing in vivo thermal measurements to determine heterogeneous thermal tissues properties in conjunction with a detailed sensitivity analysis can produce essential information for model development and optimal control. The goals of this paper are to present a general formulation of the bioheat transfer equation for heterogeneous soft tissues, review models and algorithms developed for cell damage, heat shock proteins, and soft tissues with nanoparticle inclusion, and demonstrate an overall computational strategy for developing a laser treatment framework with the ability to perform real-time robust calibrations and optimal control. This computational strategy can be applied to other thermotherapies using the heat source such as radio frequency or high intensity focused ultrasound.
Accelerating Subsurface Transport Simulation on Heterogeneous Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Gawande, Nitin A.; Tumeo, Antonino
Reactive transport numerical models simulate chemical and microbiological reactions that occur along a flowpath. These models have to compute reactions for a large number of locations. They solve the set of ordinary differential equations (ODEs) that describes the reaction for each location through the Newton-Raphson technique. This technique involves computing a Jacobian matrix and a residual vector for each set of equation, and then solving iteratively the linearized system by performing Gaussian Elimination and LU decomposition until convergence. STOMP, a well known subsurface flow simulation tool, employs matrices with sizes in the order of 100x100 elements and, for numerical accuracy,more » LU factorization with full pivoting instead of the faster partial pivoting. Modern high performance computing systems are heterogeneous machines whose nodes integrate both CPUs and GPUs, exposing unprecedented amounts of parallelism. To exploit all their computational power, applications must use both the types of processing elements. For the case of subsurface flow simulation, this mainly requires implementing efficient batched LU-based solvers and identifying efficient solutions for enabling load balancing among the different processors of the system. In this paper we discuss two approaches that allows scaling STOMP's performance on heterogeneous clusters. We initially identify the challenges in implementing batched LU-based solvers for small matrices on GPUs, and propose an implementation that fulfills STOMP's requirements. We compare this implementation to other existing solutions. Then, we combine the batched GPU solver with an OpenMP-based CPU solver, and present an adaptive load balancer that dynamically distributes the linear systems to solve between the two components inside a node. We show how these approaches, integrated into the full application, provide speed ups from 6 to 7 times on large problems, executed on up to 16 nodes of a cluster with two AMD Opteron 6272 and a Tesla M2090 per node.« less
Rudroff, Thorsten; Kindred, John H.; Benson, John-Michael; Tracy, Brian L.; Kalliokoski, Kari K.
2014-01-01
We used positron emission tomography/computed tomography (PET/CT) and [18F]-FDG to test the hypothesis that glucose uptake (GU) heterogeneity in skeletal muscles as a measure of heterogeneity in muscle activity is greater in old than young men when they perform isometric contractions. Six young (26 ± 6 years) and six old (77 ± 6 years) men performed two types of submaximal isometric contractions that required either force or position control. [18F]-FDG was injected during the task and PET/CT scans were performed immediately after the task. Within-muscle heterogeneity of knee muscles was determined by calculating the coefficient of variation (CV) of GU in PET image voxels within the muscles of interest. The average GU heterogeneity (mean ± SD) for knee extensors and flexors was greater for the old (35.3 ± 3.3%) than the young (28.6 ± 2.4%) (P = 0.006). Muscle volume of the knee extensors were greater for the young compared to the old men (1016 ± 163 vs. 598 ± 70 cm3, P = 0.004). In a multiple regression model, knee extensor muscle volume was a predictor (partial r = −0.87; P = 0.001) of GU heterogeneity for old men (R2 = 0.78; P < 0.001), and MVC force predicted GU heterogeneity for young men (partial r = −0.95, P < 0.001). The findings demonstrate that GU is more spatially variable for old than young men and especially so for old men who exhibit greater muscle atrophy. PMID:24904432
OCCAM: a flexible, multi-purpose and extendable HPC cluster
NASA Astrophysics Data System (ADS)
Aldinucci, M.; Bagnasco, S.; Lusso, S.; Pasteris, P.; Rabellino, S.; Vallero, S.
2017-10-01
The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multipurpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Sezione di Torino of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible, reconfigurable and extendable infrastructure to cater to a wide range of different scientific computing use cases, including ones from solid-state chemistry, high-energy physics, computer science, big data analytics, computational biology, genomics and many others. Furthermore, it will serve as a platform for R&D activities on computational technologies themselves, with topics ranging from GPU acceleration to Cloud Computing technologies. A heterogeneous and reconfigurable system like this poses a number of challenges related to the frequency at which heterogeneous hardware resources might change their availability and shareability status, which in turn affect methods and means to allocate, manage, optimize, bill, monitor VMs, containers, virtual farms, jobs, interactive bare-metal sessions, etc. This work describes some of the use cases that prompted the design and construction of the HPC cluster, its architecture and resource provisioning model, along with a first characterization of its performance by some synthetic benchmark tools and a few realistic use-case tests.
NASA Astrophysics Data System (ADS)
Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto
2012-11-01
In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
NASA Astrophysics Data System (ADS)
Rudianto, Indra; Sudarmaji
2018-04-01
We present an implementation of the spectral-element method for simulation of two-dimensional elastic wave propagation in fully heterogeneous media. We have incorporated most of realistic geological features in the model, including surface topography, curved layer interfaces, and 2-D wave-speed heterogeneity. To accommodate such complexity, we use an unstructured quadrilateral meshing technique. Simulation was performed on a GPU cluster, which consists of 24 core processors Intel Xeon CPU and 4 NVIDIA Quadro graphics cards using CUDA and MPI implementation. We speed up the computation by a factor of about 5 compared to MPI only, and by a factor of about 40 compared to Serial implementation.
NASA Astrophysics Data System (ADS)
Wang, Neng; Xia, Shuman
2017-01-01
A combined modeling and experimental effort is made in this work to examine the cohesive fracture mechanisms of heterogeneous elastic solids. A two-phase laminated composite, which mimics the key microstructural features of many tough engineering and biological materials, is selected as a model material system. Theoretical and finite element analyses with cohesive zone modeling are performed to study the effective fracture resistance of the heterogeneous material associated with unstable crack propagation and arrest. A crack-tip-position controlled algorithm is implemented in the finite element analysis to overcome the inherent instability issues resulting from crack pinning and depinning at local heterogeneities. Systematic parametric studies are carried out to investigate the effects of various material and geometrical parameters, including the modulus mismatch ratio, phase volume fraction, cohesive zone size, and cohesive law shape. Concurrently, a novel stereolithography-based three-dimensional (3D) printing system is developed and used for fabricating heterogeneous test specimens with well-controlled structural and material properties. Fracture testing of the specimens is performed using the tapered double-cantilever beam (TDCB) test method. With optimal material and geometrical parameters, heterogeneous TDCB specimens are shown to exhibit enhanced effective fracture energy and effective fracture toughness than their homogeneous counterparts, which is in good agreement with the modeling predictions. The integrative computational and experimental study presented here provides a fundamental mechanistic understanding of the fracture mechanisms in brittle heterogeneous materials and sheds light on the rational design of tough materials through patterned heterogeneities.
Efficient Process Migration for Parallel Processing on Non-Dedicated Networks of Workstations
NASA Technical Reports Server (NTRS)
Chanchio, Kasidit; Sun, Xian-He
1996-01-01
This paper presents the design and preliminary implementation of MpPVM, a software system that supports process migration for PVM application programs in a non-dedicated heterogeneous computing environment. New concepts of migration point as well as migration point analysis and necessary data analysis are introduced. In MpPVM, process migrations occur only at previously inserted migration points. Migration point analysis determines appropriate locations to insert migration points; whereas, necessary data analysis provides a minimum set of variables to be transferred at each migration pint. A new methodology to perform reliable point-to-point data communications in a migration environment is also discussed. Finally, a preliminary implementation of MpPVM and its experimental results are presented, showing the correctness and promising performance of our process migration mechanism in a scalable non-dedicated heterogeneous computing environment. While MpPVM is developed on top of PVM, the process migration methodology introduced in this study is general and can be applied to any distributed software environment.
Parallelization of Finite Element Analysis Codes Using Heterogeneous Distributed Computing
NASA Technical Reports Server (NTRS)
Ozguner, Fusun
1996-01-01
Performance gains in computer design are quickly consumed as users seek to analyze larger problems to a higher degree of accuracy. Innovative computational methods, such as parallel and distributed computing, seek to multiply the power of existing hardware technology to satisfy the computational demands of large applications. In the early stages of this project, experiments were performed using two large, coarse-grained applications, CSTEM and METCAN. These applications were parallelized on an Intel iPSC/860 hypercube. It was found that the overall speedup was very low, due to large, inherently sequential code segments present in the applications. The overall execution time T(sub par), of the application is dependent on these sequential segments. If these segments make up a significant fraction of the overall code, the application will have a poor speedup measure.
Multidisciplinary High-Fidelity Analysis and Optimization of Aerospace Vehicles. Part 1; Formulation
NASA Technical Reports Server (NTRS)
Walsh, J. L.; Townsend, J. C.; Salas, A. O.; Samareh, J. A.; Mukhopadhyay, V.; Barthelemy, J.-F.
2000-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity, finite element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a highspeed civil transport configuration. The paper describes the engineering aspects of formulating the optimization by integrating these analysis codes and associated interface codes into the system. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture (CORBA) compliant software product. A companion paper presents currently available results.
NASA Technical Reports Server (NTRS)
Walsh, J. L.; Weston, R. P.; Samareh, J. A.; Mason, B. H.; Green, L. L.; Biedron, R. T.
2000-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity finite-element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a high-speed civil transport configuration. The paper describes both the preliminary results from implementing and validating the multidisciplinary analysis and the results from an aerodynamic optimization. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture compliant software product. A companion paper describes the formulation of the multidisciplinary analysis and optimization system.
A world-wide databridge supported by a commercial cloud provider
NASA Astrophysics Data System (ADS)
Tat Cheung, Kwong; Field, Laurence; Furano, Fabrizio
2017-10-01
Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments. One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources. However, high energy physics applications require more data input and output than the CPU intensive applications that are typically used by other volunteer computing projects. While the so-called databridge has already been successfully proposed as a method to span the untrusted and trusted domains of volunteer computing and Grid computing respective, globally transferring data between potentially poor-performing residential networks and CERN could be unreliable, leading to wasted resources usage. The expectation is that by placing a storage endpoint that is part of a wider, flexible geographical databridge deployment closer to the volunteers, the transfer success rate and the overall performance can be improved. This contribution investigates the provision of a globally distributed databridge implemented upon a commercial cloud provider.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
NASA Astrophysics Data System (ADS)
Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.
1995-03-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, S.; Zacharia, T.; Baltas, N.
1995-04-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Ohue, Masahito; Shimoda, Takehiro; Suzuki, Shuji; Matsuzaki, Yuri; Ishida, Takashi; Akiyama, Yutaka
2014-11-15
The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling. MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
Heterogenous customer satisfaction index for evaluating university food service
NASA Astrophysics Data System (ADS)
Aziz, Nazrina; Zain, Zakiyah; Syarifi, Nadia Asyikin Mohammad; Klivon, Julia; Ap, Nurasiah Che; Zaki, Mahirah
2017-11-01
This paper aims to measure the performance of university food service based on students' perception. Two cafeterias were chosen for comparison: one located at student residential hall (Café 1) and another at the university administration centre (Café 2). By considering the components of importance and satisfaction, the Heterogeneous Customer Satisfaction Index-HCSI was computed to measure the performance of quality items in both cafeterias. Stratified sampling method was used to select 278 students and the DINESERVE instrument was used to assess customer perception on service quality. The findings show that the customer rate these two cafeterias as quite satisfied only, with the HCSI for Café 1 slightly higher than that for Café 2.
CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.
Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan
2017-06-24
The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .
2015-06-01
system accuracy. The AnRAD system was also generalized for the additional application of network intrusion detection . A self-structuring technique...to Host- based Intrusion Detection Systems using Contiguous and Discontiguous System Call Patterns,” IEEE Transactions on Computer, 63(4), pp. 807...square kilometer areas. The anomaly recognition and detection (AnRAD) system was built as a cogent confabulation network . It represented road
Pagan, Marino
2014-01-01
The responses of high-level neurons tend to be mixtures of many different types of signals. While this diversity is thought to allow for flexible neural processing, it presents a challenge for understanding how neural responses relate to task performance and to neural computation. To address these challenges, we have developed a new method to parse the responses of individual neurons into weighted sums of intuitive signal components. Our method computes the weights by projecting a neuron's responses onto a predefined orthonormal basis. Once determined, these weights can be combined into measures of signal modulation; however, in their raw form these signal modulation measures are biased by noise. Here we introduce and evaluate two methods for correcting this bias, and we report that an analytically derived approach produces performance that is robust and superior to a bootstrap procedure. Using neural data recorded from inferotemporal cortex and perirhinal cortex as monkeys performed a delayed-match-to-sample target search task, we demonstrate how the method can be used to quantify the amounts of task-relevant signals in heterogeneous neural populations. We also demonstrate how these intuitive quantifications of signal modulation can be related to single-neuron measures of task performance (d′). PMID:24920017
Semi-supervised Machine Learning for Analysis of Hydrogeochemical Data and Models
NASA Astrophysics Data System (ADS)
Vesselinov, Velimir; O'Malley, Daniel; Alexandrov, Boian; Moore, Bryan
2017-04-01
Data- and model-based analyses such as uncertainty quantification, sensitivity analysis, and decision support using complex physics models with numerous model parameters and typically require a huge number of model evaluations (on order of 10^6). Furthermore, model simulations of complex physics may require substantial computational time. For example, accounting for simultaneously occurring physical processes such as fluid flow and biogeochemical reactions in heterogeneous porous medium may require several hours of wall-clock computational time. To address these issues, we have developed a novel methodology for semi-supervised machine learning based on Non-negative Matrix Factorization (NMF) coupled with customized k-means clustering. The algorithm allows for automated, robust Blind Source Separation (BSS) of groundwater types (contamination sources) based on model-free analyses of observed hydrogeochemical data. We have also developed reduced order modeling tools, which coupling support vector regression (SVR), genetic algorithms (GA) and artificial and convolutional neural network (ANN/CNN). SVR is applied to predict the model behavior within prior uncertainty ranges associated with the model parameters. ANN and CNN procedures are applied to upscale heterogeneity of the porous medium. In the upscaling process, fine-scale high-resolution models of heterogeneity are applied to inform coarse-resolution models which have improved computational efficiency while capturing the impact of fine-scale effects at the course scale of interest. These techniques are tested independently on a series of synthetic problems. We also present a decision analysis related to contaminant remediation where the developed reduced order models are applied to reproduce groundwater flow and contaminant transport in a synthetic heterogeneous aquifer. The tools are coded in Julia and are a part of the MADS high-performance computational framework (https://github.com/madsjulia/Mads.jl).
A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU
NASA Astrophysics Data System (ADS)
Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha
2018-03-01
Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.
MaRIE theory, modeling and computation roadmap executive summary
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lookman, Turab
The confluence of MaRIE (Matter-Radiation Interactions in Extreme) and extreme (exascale) computing timelines offers a unique opportunity in co-designing the elements of materials discovery, with theory and high performance computing, itself co-designed by constrained optimization of hardware and software, and experiments. MaRIE's theory, modeling, and computation (TMC) roadmap efforts have paralleled 'MaRIE First Experiments' science activities in the areas of materials dynamics, irradiated materials and complex functional materials in extreme conditions. The documents that follow this executive summary describe in detail for each of these areas the current state of the art, the gaps that exist and the road mapmore » to MaRIE and beyond. Here we integrate the various elements to articulate an overarching theme related to the role and consequences of heterogeneities which manifest as competing states in a complex energy landscape. MaRIE experiments will locate, measure and follow the dynamical evolution of these heterogeneities. Our TMC vision spans the various pillar science and highlights the key theoretical and experimental challenges. We also present a theory, modeling and computation roadmap of the path to and beyond MaRIE in each of the science areas.« less
Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI
Donato, David I.
2017-01-01
In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; McDonald, Elizabeth S; Rosen, Mark; Mies, Carolyn; Feldman, Michael; Kontos, Despina
2015-06-01
Heterogeneity in cancer can affect response to therapy and patient prognosis. Histologic measures have classically been used to measure heterogeneity, although a reliable noninvasive measurement is needed both to establish baseline risk of recurrence and monitor response to treatment. Here, we propose using spatiotemporal wavelet kinetic features from dynamic contrast-enhanced magnetic resonance imaging to quantify intratumor heterogeneity in breast cancer. Tumor pixels are first partitioned into homogeneous subregions using pharmacokinetic measures. Heterogeneity wavelet kinetic (HetWave) features are then extracted from these partitions to obtain spatiotemporal patterns of the wavelet coefficients and the contrast agent uptake. The HetWave features are evaluated in terms of their prognostic value using a logistic regression classifier with genetic algorithm wrapper-based feature selection to classify breast cancer recurrence risk as determined by a validated gene expression assay. Receiver operating characteristic analysis and area under the curve (AUC) are computed to assess classifier performance using leave-one-out cross validation. The HetWave features outperform other commonly used features (AUC = 0.88 HetWave versus 0.70 standard features). The combination of HetWave and standard features further increases classifier performance (AUCs 0.94). The rate of the spatial frequency pattern over the pharmacokinetic partitions can provide valuable prognostic information. HetWave could be a powerful feature extraction approach for characterizing tumor heterogeneity, providing valuable prognostic information.
2013-01-01
M. Ahmadi, and M. Shridhar, “ Handwritten Numeral Recognition with Multiple Features and Multistage Classifiers,” Proc. IEEE Int’l Symp. Circuits...ARTICLE (Post Print) 3. DATES COVERED (From - To) SEP 2011 – SEP 2013 4. TITLE AND SUBTITLE A PARALLEL NEUROMORPHIC TEXT RECOGNITION SYSTEM AND ITS...research in computational intelligence has entered a new era. In this paper, we present an HPC-based context-aware intelligent text recognition
Tools and Techniques for Measuring and Improving Grid Performance
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Frumkin, M.; Smith, W.; VanderWijngaart, R.; Wong, P.; Biegel, Bryan (Technical Monitor)
2001-01-01
This viewgraph presentation provides information on NASA's geographically dispersed computing resources, and the various methods by which the disparate technologies are integrated within a nationwide computational grid. Many large-scale science and engineering projects are accomplished through the interaction of people, heterogeneous computing resources, information systems and instruments at different locations. The overall goal is to facilitate the routine interactions of these resources to reduce the time spent in design cycles, particularly for NASA's mission critical projects. The IPG (Information Power Grid) seeks to implement NASA's diverse computing resources in a fashion similar to the way in which electric power is made available.
A review of predictive nonlinear theories for multiscale modeling of heterogeneous materials
NASA Astrophysics Data System (ADS)
Matouš, Karel; Geers, Marc G. D.; Kouznetsova, Varvara G.; Gillman, Andrew
2017-02-01
Since the beginning of the industrial age, material performance and design have been in the midst of innovation of many disruptive technologies. Today's electronics, space, medical, transportation, and other industries are enriched by development, design and deployment of composite, heterogeneous and multifunctional materials. As a result, materials innovation is now considerably outpaced by other aspects from component design to product cycle. In this article, we review predictive nonlinear theories for multiscale modeling of heterogeneous materials. Deeper attention is given to multiscale modeling in space and to computational homogenization in addressing challenging materials science questions. Moreover, we discuss a state-of-the-art platform in predictive image-based, multiscale modeling with co-designed simulations and experiments that executes on the world's largest supercomputers. Such a modeling framework consists of experimental tools, computational methods, and digital data strategies. Once fully completed, this collaborative and interdisciplinary framework can be the basis of Virtual Materials Testing standards and aids in the development of new material formulations. Moreover, it will decrease the time to market of innovative products.
A review of predictive nonlinear theories for multiscale modeling of heterogeneous materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matouš, Karel, E-mail: kmatous@nd.edu; Geers, Marc G.D.; Kouznetsova, Varvara G.
2017-02-01
Since the beginning of the industrial age, material performance and design have been in the midst of innovation of many disruptive technologies. Today's electronics, space, medical, transportation, and other industries are enriched by development, design and deployment of composite, heterogeneous and multifunctional materials. As a result, materials innovation is now considerably outpaced by other aspects from component design to product cycle. In this article, we review predictive nonlinear theories for multiscale modeling of heterogeneous materials. Deeper attention is given to multiscale modeling in space and to computational homogenization in addressing challenging materials science questions. Moreover, we discuss a state-of-the-art platformmore » in predictive image-based, multiscale modeling with co-designed simulations and experiments that executes on the world's largest supercomputers. Such a modeling framework consists of experimental tools, computational methods, and digital data strategies. Once fully completed, this collaborative and interdisciplinary framework can be the basis of Virtual Materials Testing standards and aids in the development of new material formulations. Moreover, it will decrease the time to market of innovative products.« less
Utility functions and resource management in an oversubscribed heterogeneous computing environment
Khemka, Bhavesh; Friese, Ryan; Briceno, Luis Diego; ...
2014-09-26
We model an oversubscribed heterogeneous computing system where tasks arrive dynamically and a scheduler maps the tasks to machines for execution. The environment and workloads are based on those being investigated by the Extreme Scale Systems Center at Oak Ridge National Laboratory. Utility functions that are designed based on specifications from the system owner and users are used to create a metric for the performance of resource allocation heuristics. Each task has a time-varying utility (importance) that the enterprise will earn based on when the task successfully completes execution. We design multiple heuristics, which include a technique to drop lowmore » utility-earning tasks, to maximize the total utility that can be earned by completing tasks. The heuristics are evaluated using simulation experiments with two levels of oversubscription. The results show the benefit of having fast heuristics that account for the importance of a task and the heterogeneity of the environment when making allocation decisions in an oversubscribed environment. Furthermore, the ability to drop low utility-earning tasks allow the heuristics to tolerate the high oversubscription as well as earn significant utility.« less
NASA Astrophysics Data System (ADS)
Kuo, Y.-H.; Leung, J. M. Y.; Graham, C. A.
2015-05-01
In this paper, we present a case study of modelling and analyzing the patient flow of a hospital emergency department in Hong Kong. The emergency department is facing the challenge of overcrowding and the patients there usually experience a long waiting time. Our project team was requested by a senior consultant of the emergency department to analyze the patient flow and provide a decision support tool to help improve their operations. We adopt a simulation approach to mimic their daily operations. With the simulation model, we conduct a computational study to examine the effect of physician heterogeneity on the emergency department performance. We found that physician heterogeneity has a great impact on the operational efficiency and thus should be considered when developing simulation models. Our computational results show that, with the same average of service rates among the physicians, variation in the rates can improve overcrowding situation. This suggests that emergency departments may consider having some efficient physicians to speed up the overall service rate in return for more time for patients who need extra medical care.
Performance of a Heterogeneous Grid Partitioner for N-body Applications
NASA Technical Reports Server (NTRS)
Harvey, Daniel J.; Das, Sajal K.; Biswas, Rupak
2003-01-01
An important characteristic of distributed grids is that they allow geographically separated multicomputers to be tied together in a transparent virtual environment to solve large-scale computational problems. However, many of these applications require effective runtime load balancing for the resulting solutions to be viable. Recently, we developed a latency tolerant partitioner, called MinEX, specifically for use in distributed grid environments. This paper compares the performance of MinEX to that of METIS, a popular multilevel family of partitioners, using simulated heterogeneous grid configurations. A solver for the classical N-body problem is implemented to provide a framework for the comparisons. Experimental results show that MinEX provides superior quality partitions while being competitive to METIS in speed of execution.
NASA Astrophysics Data System (ADS)
Monteiller, Vadim; Beller, Stephen; Operto, Stephane; Virieux, Jean
2015-04-01
The current development of dense seismic arrays and high performance computing make feasible today application of full-waveform inversion (FWI) on teleseismic data for high-resolution lithospheric imaging. In teleseismic configuration, the source is often considered to first order as a planar wave that impinges the base of the lithospheric target located below the receiver array. Recently, injection methods coupling global propagation in 1D or axisymmetric earth model with regional 3D methods (Discontinuous Galerkin finite element methods, Spectral elements methods or finite differences) allow us to consider more realistic teleseismic phases. Those teleseismic phases can be propagated inside 3D regional model in order to exploit not only the forward-scattered waves propagating up to the receiver but also second-order arrivals that are back-scattered from the free-surface and the reflectors before their recordings on the surface. However, those computation are performed assuming simple global model. In this presentation, we review some key specifications that might be considered for mitigating the effect on FWI of heterogeneities situated outside the regional domain. We consider synthetic models and data computed using our recently developed hybrid method AxiSEM/SEM. The global simulation is done by AxiSEM code which allows us to consider axisymmetric anomalies. The 3D regional computation is performed by Spectral Element Method. We investigate the effect of external anomalies on the regional model obtained by FWI when one neglects them by considering only 1D global propagation. We also investigate the effect of the source time function and the focal mechanism on results of the FWI approach.
NASA Astrophysics Data System (ADS)
Negrut, Dan; Lamb, David; Gorsich, David
2011-06-01
This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up of a combination of CPUs and Graphics Processing Units (GPUs). The computational dynamics applications targeted to execute on such a hardware topology include many-body dynamics, smoothed-particle hydrodynamics (SPH) fluid simulation, and fluid-solid interaction analysis. The underlying theme of the solution approach embraced by HCT is that of partitioning the domain of interest into a number of subdomains that are each managed by a separate core/accelerator (CPU/GPU) pair. Five components at the core of HCT enable the envisioned distributed computing approach to large-scale dynamical system simulation: (a) the ability to partition the problem according to the one-to-one mapping; i.e., spatial subdivision, discussed above (pre-processing); (b) a protocol for passing data between any two co-processors; (c) algorithms for element proximity computation; and (d) the ability to carry out post-processing in a distributed fashion. In this contribution the components (a) and (b) of the HCT are demonstrated via the example of the Discrete Element Method (DEM) for rigid body dynamics with friction and contact. The collision detection task required in frictional-contact dynamics (task (c) above), is shown to benefit on the GPU of a two order of magnitude gain in efficiency when compared to traditional sequential implementations. Note: Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not imply its endorsement, recommendation, or favoring by the United States Army. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Army, and shall not be used for advertising or product endorsement purposes.
Pandey, Vaibhav; Saini, Poonam
2018-06-01
MapReduce (MR) computing paradigm and its open source implementation Hadoop have become a de facto standard to process big data in a distributed environment. Initially, the Hadoop system was homogeneous in three significant aspects, namely, user, workload, and cluster (hardware). However, with growing variety of MR jobs and inclusion of different configurations of nodes in the existing cluster, heterogeneity has become an essential part of Hadoop systems. The heterogeneity factors adversely affect the performance of a Hadoop scheduler and limit the overall throughput of the system. To overcome this problem, various heterogeneous Hadoop schedulers have been proposed in the literature. Existing survey works in this area mostly cover homogeneous schedulers and classify them on the basis of quality of service parameters they optimize. Hence, there is a need to study the heterogeneous Hadoop schedulers on the basis of various heterogeneity factors considered by them. In this survey article, we first discuss different heterogeneity factors that typically exist in a Hadoop system and then explore various challenges that arise while designing the schedulers in the presence of such heterogeneity. Afterward, we present the comparative study of heterogeneous scheduling algorithms available in the literature and classify them by the previously said heterogeneity factors. Lastly, we investigate different methods and environment used for evaluation of discussed Hadoop schedulers.
NASA Astrophysics Data System (ADS)
Khebbab, Mohamed; Feliachi, Mouloud; El Hadi Latreche, Mohamed
2018-03-01
In this present paper, a simulation of eddy current non-destructive testing (EC NDT) on unidirectional carbon fiber reinforced polymer is performed; for this magneto-dynamic formulation in term of magnetic vector potential is solved using finite element heterogeneous multi-scale method (FE HMM). FE HMM has as goal to compute the homogenized solution without calculating the homogenized tensor explicitly, the solution is based only on the physical characteristic known in micro domain. This feature is well adapted to EC NDT to evaluate defect in carbon composite material in microscopic scale, where the defect detection is performed by coil impedance measurement; the measurement value is intimately linked to material characteristic in microscopic level. Based on this, our model can handle different defects such as: cracks, inclusion, internal electrical conductivity changes, heterogeneities, etc. The simulation results were compared with the solution obtained with homogenized material using mixture law, a good agreement was found.
Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi
2017-01-01
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. PMID:28786986
Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong
2017-01-01
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
USDA-ARS?s Scientific Manuscript database
Comparing performance of a large number of accessions simultaneously is not always possible. Typically, only subsets of all accessions are tested in separate trials with only some (or none) of the accessions overlapping between subsets. Using standard statistical approaches to combine data from such...
OpenCL-based vicinity computation for 3D multiresolution mesh compression
NASA Astrophysics Data System (ADS)
Hachicha, Soumaya; Elkefi, Akram; Ben Amar, Chokri
2017-03-01
3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.
Whelan, Nathan V; Halanych, Kenneth M
2017-03-01
As phylogenetic datasets have increased in size, site-heterogeneous substitution models such as CAT-F81 and CAT-GTR have been advocated in favor of other models because they purportedly suppress long-branch attraction (LBA). These models are two of the most commonly used models in phylogenomics, and they have been applied to a variety of taxa, ranging from Drosophila to land plants. However, many arguments in favor of CAT models have been based on tenuous assumptions about the true phylogeny, rather than rigorous testing with known trees via simulation. Moreover, CAT models have not been compared to other approaches for handling substitutional heterogeneity such as data partitioning with site-homogeneous substitution models. We simulated amino acid sequence datasets with substitutional heterogeneity on a variety of tree shapes including those susceptible to LBA. Data were analyzed with both CAT models and partitioning to explore model performance; in total over 670,000 CPU hours were used, of which over 97% was spent running analyses with CAT models. In many cases, all models recovered branching patterns that were identical to the known tree. However, CAT-F81 consistently performed worse than other models in inferring the correct branching patterns, and both CAT models often overestimated substitutional heterogeneity. Additionally, reanalysis of two empirical metazoan datasets supports the notion that CAT-F81 tends to recover less accurate trees than data partitioning and CAT-GTR. Given these results, we conclude that partitioning and CAT-GTR perform similarly in recovering accurate branching patterns. However, computation time can be orders of magnitude less for data partitioning, with commonly used implementations of CAT-GTR often failing to reach completion in a reasonable time frame (i.e., for Bayesian analyses to converge). Practices such as removing constant sites and parsimony uninformative characters, or using CAT-F81 when CAT-GTR is deemed too computationally expensive, cannot be logically justified. Given clear problems with CAT-F81, phylogenies previously inferred with this model should be reassessed. [Data partitioning; phylogenomics, simulation, site-heterogeneity, substitution models.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
SCEAPI: A unified Restful Web API for High-Performance Computing
NASA Astrophysics Data System (ADS)
Rongqiang, Cao; Haili, Xiao; Shasha, Lu; Yining, Zhao; Xiaoning, Wang; Xuebin, Chi
2017-10-01
The development of scientific computing is increasingly moving to collaborative web and mobile applications. All these applications need high-quality programming interface for accessing heterogeneous computing resources consisting of clusters, grid computing or cloud computing. In this paper, we introduce our high-performance computing environment that integrates computing resources from 16 HPC centers across China. Then we present a bundle of web services called SCEAPI and describe how it can be used to access HPC resources with HTTP or HTTPs protocols. We discuss SCEAPI from several aspects including architecture, implementation and security, and address specific challenges in designing compatible interfaces and protecting sensitive data. We describe the functions of SCEAPI including authentication, file transfer and job management for creating, submitting and monitoring, and how to use SCEAPI in an easy-to-use way. Finally, we discuss how to exploit more HPC resources quickly for the ATLAS experiment by implementing the custom ARC compute element based on SCEAPI, and our work shows that SCEAPI is an easy-to-use and effective solution to extend opportunistic HPC resources.
NASA Astrophysics Data System (ADS)
Sayre, George Anthony
The purpose of this dissertation was to develop the C ++ program Emergency Dose to calculate transport of radionuclides through indoor spaces using intermediate fidelity physics that provides improved spatial heterogeneity over well-mixed models such as MELCORRTM and much lower computation times than CFD codes such as FLUENTRTM . Modified potential flow theory, which is an original formulation of potential flow theory with additions of turbulent jet and natural convection approximations, calculates spatially heterogeneous velocity fields that well-mixed models cannot predict. Other original contributions of MPFT are: (1) generation of high fidelity boundary conditions relative to well-mixed-CFD coupling methods (conflation), (2) broadening of potential flow applications to arbitrary indoor spaces previously restricted to specific applications such as exhaust hood studies, and (3) great reduction of computation time relative to CFD codes without total loss of heterogeneity. Additionally, the Lagrangian transport module, which is discussed in Sections 1.3 and 2.4, showcases an ensemble-based formulation thought to be original to interior studies. Velocity and concentration transport benchmarks against analogous formulations in COMSOLRTM produced favorable results with discrepancies resulting from the tetrahedral meshing used in COMSOLRTM outperforming the Cartesian method used by Emergency Dose. A performance comparison of the concentration transport modules against MELCORRTM showed that Emergency Dose held advantages over the well-mixed model especially in scenarios with many interior partitions and varied source positions. A performance comparison of velocity module against FLUENTRTM showed that viscous drag provided the largest error between Emergency Dose and CFD velocity calculations, but that Emergency Dose's turbulent jets well approximated the corresponding CFD jets. Overall, Emergency Dose was found to provide a viable intermediate solution method for concentration transport with relatively low computation times.
Improving Design Efficiency for Large-Scale Heterogeneous Circuits
NASA Astrophysics Data System (ADS)
Gregerson, Anthony
Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
NASA Astrophysics Data System (ADS)
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
NASA Astrophysics Data System (ADS)
Victor, Rodolfo A.; Prodanović, Maša.; Torres-Verdín, Carlos
2017-12-01
We develop a new Monte Carlo-based inversion method for estimating electron density and effective atomic number from 3-D dual-energy computed tomography (CT) core scans. The method accounts for uncertainties in X-ray attenuation coefficients resulting from the polychromatic nature of X-ray beam sources of medical and industrial scanners, in addition to delivering uncertainty estimates of inversion products. Estimation of electron density and effective atomic number from CT core scans enables direct deterministic or statistical correlations with salient rock properties for improved petrophysical evaluation; this condition is specifically important in media such as vuggy carbonates where CT resolution better captures core heterogeneity that dominates fluid flow properties. Verification tests of the inversion method performed on a set of highly heterogeneous carbonate cores yield very good agreement with in situ borehole measurements of density and photoelectric factor.
Lima, Jakelyne; Cerdeira, Louise Teixeira; Bol, Erick; Schneider, Maria Paula Cruz; Silva, Artur; Azevedo, Vasco; Abelém, Antônio Jorge Gomes
2012-01-01
Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality. PMID:22461785
Benkner, Siegfried; Arbona, Antonio; Berti, Guntram; Chiarini, Alessandro; Dunlop, Robert; Engelbrecht, Gerhard; Frangi, Alejandro F; Friedrich, Christoph M; Hanser, Susanne; Hasselmeyer, Peer; Hose, Rod D; Iavindrasana, Jimison; Köhler, Martin; Iacono, Luigi Lo; Lonsdale, Guy; Meyer, Rodolphe; Moore, Bob; Rajasekaran, Hariharan; Summers, Paul E; Wöhrer, Alexander; Wood, Steven
2010-11-01
The increasing volume of data describing human disease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the @neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system's architecture is generic enough that it could be adapted to the treatment of other diseases. Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers clinicians the tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medical researchers gain access to a critical mass of aneurysm related data due to the system's ability to federate distributed information sources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access and work on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand for performing computationally intensive simulations for treatment planning and research.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sunderam, Vaidy S.
2007-01-09
The Harness project has developed novel software frameworks for the execution of high-end simulations in a fault-tolerant manner on distributed resources. The H2O subsystem comprises the kernel of the Harness framework, and controls the key functions of resource management across multiple administrative domains, especially issues of access and allocation. It is based on a “pluggable” architecture that enables the aggregated use of distributed heterogeneous resources for high performance computing. The major contributions of the Harness II project result in significantly enhancing the overall computational productivity of high-end scientific applications by enabling robust, failure-resilient computations on cooperatively pooled resource collections.
NASA Astrophysics Data System (ADS)
Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em
2017-09-01
Exascale-level simulations require fault-resilient algorithms that are robust against repeated and expected software and/or hardware failures during computations, which may render the simulation results unsatisfactory. If each processor can share some global information about the simulation from a coarse, limited accuracy but relatively costless auxiliary simulator we can effectively fill-in the missing spatial data at the required times by a statistical learning technique - multi-level Gaussian process regression, on the fly; this has been demonstrated in previous work [1]. Based on the previous work, we also employ another (nonlinear) statistical learning technique, Diffusion Maps, that detects computational redundancy in time and hence accelerate the simulation by projective time integration, giving the overall computation a "patch dynamics" flavor. Furthermore, we are now able to perform information fusion with multi-fidelity and heterogeneous data (including stochastic data). Finally, we set the foundations of a new framework in CFD, called patch simulation, that combines information fusion techniques from, in principle, multiple fidelity and resolution simulations (and even experiments) with a new adaptive timestep refinement technique. We present two benchmark problems (the heat equation and the Navier-Stokes equations) to demonstrate the new capability that statistical learning tools can bring to traditional scientific computing algorithms. For each problem, we rely on heterogeneous and multi-fidelity data, either from a coarse simulation of the same equation or from a stochastic, particle-based, more "microscopic" simulation. We consider, as such "auxiliary" models, a Monte Carlo random walk for the heat equation and a dissipative particle dynamics (DPD) model for the Navier-Stokes equations. More broadly, in this paper we demonstrate the symbiotic and synergistic combination of statistical learning, domain decomposition, and scientific computing in exascale simulations.
Coarse-Grained Clustering Dynamics of Heterogeneously Coupled Neurons.
Moon, Sung Joon; Cook, Katherine A; Rajendran, Karthikeyan; Kevrekidis, Ioannis G; Cisternas, Jaime; Laing, Carlo R
2015-12-01
The formation of oscillating phase clusters in a network of identical Hodgkin-Huxley neurons is studied, along with their dynamic behavior. The neurons are synaptically coupled in an all-to-all manner, yet the synaptic coupling characteristic time is heterogeneous across the connections. In a network of N neurons where this heterogeneity is characterized by a prescribed random variable, the oscillatory single-cluster state can transition-through [Formula: see text] (possibly perturbed) period-doubling and subsequent bifurcations-to a variety of multiple-cluster states. The clustering dynamic behavior is computationally studied both at the detailed and the coarse-grained levels, and a numerical approach that can enable studying the coarse-grained dynamics in a network of arbitrarily large size is suggested. Among a number of cluster states formed, double clusters, composed of nearly equal sub-network sizes are seen to be stable; interestingly, the heterogeneity parameter in each of the double-cluster components tends to be consistent with the random variable over the entire network: Given a double-cluster state, permuting the dynamical variables of the neurons can lead to a combinatorially large number of different, yet similar "fine" states that appear practically identical at the coarse-grained level. For weak heterogeneity we find that correlations rapidly develop, within each cluster, between the neuron's "identity" (its own value of the heterogeneity parameter) and its dynamical state. For single- and double-cluster states we demonstrate an effective coarse-graining approach that uses the Polynomial Chaos expansion to succinctly describe the dynamics by these quickly established "identity-state" correlations. This coarse-graining approach is utilized, within the equation-free framework, to perform efficient computations of the neuron ensemble dynamics.
A heterogeneous and parallel computing framework for high-resolution hydrodynamic simulations
NASA Astrophysics Data System (ADS)
Smith, Luke; Liang, Qiuhua
2015-04-01
Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of parallel processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic simulations across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. Simulations for the three-day event could be performed on a 2m grid within a few hours. In the context of a rapid pluvial flood event in Newcastle upon Tyne during 2012, the technique allows simulation of inundation for a 31km2 of the city centre in less than an hour on a 2m grid; however, further grid refinement is required to fully capture important smaller flow pathways. Good agreement between the model and observed inundation is achieved for a variety of dam failure, slow fluvial inundation, rapid pluvial inundation, and defence breach scenarios in the UK.
BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations
NASA Astrophysics Data System (ADS)
Smaragdos, Georgios; Chatzikonstantis, Georgios; Kukreja, Rahul; Sidiropoulos, Harry; Rodopoulos, Dimitrios; Sourdis, Ioannis; Al-Ars, Zaid; Kachris, Christoforos; Soudris, Dimitrios; De Zeeuw, Chris I.; Strydis, Christos
2017-12-01
Objective. The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. Approach. In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU, a NVidia GP-GPU and a Maxeler Dataflow Engine. The PyNN software framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different experiment instances of a state-of-the-art neuron model, representing the inferior-olivary nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity densities, which can drastically affect the workload’s performance characteristics. Main results. The combined use of different HPC technologies demonstrates that BrainFrame is better able to cope with the modeling diversity encountered in realistic experiments while at the same time running on significantly lower energy budgets. Our performance analysis clearly shows that the model directly affects performance and all three technologies are required to cope with all the model use cases. Significance. The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast number of models already available. Additionally, it gives a clear roadmap for extending the platform support beyond the proof of concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
BrainFrame: a node-level heterogeneous accelerator platform for neuron simulations.
Smaragdos, Georgios; Chatzikonstantis, Georgios; Kukreja, Rahul; Sidiropoulos, Harry; Rodopoulos, Dimitrios; Sourdis, Ioannis; Al-Ars, Zaid; Kachris, Christoforos; Soudris, Dimitrios; De Zeeuw, Chris I; Strydis, Christos
2017-12-01
The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU, a NVidia GP-GPU and a Maxeler Dataflow Engine. The PyNN software framework is also integrated into the platform. As a challenging proof of concept, we analyze the performance of BrainFrame on different experiment instances of a state-of-the-art neuron model, representing the inferior-olivary nucleus using a biophysically-meaningful, extended Hodgkin-Huxley representation. The model instances take into account not only the neuronal-network dimensions but also different network-connectivity densities, which can drastically affect the workload's performance characteristics. The combined use of different HPC technologies demonstrates that BrainFrame is better able to cope with the modeling diversity encountered in realistic experiments while at the same time running on significantly lower energy budgets. Our performance analysis clearly shows that the model directly affects performance and all three technologies are required to cope with all the model use cases. The BrainFrame framework is designed to transparently configure and select the appropriate back-end accelerator technology for use per simulation run. The PyNN integration provides a familiar bridge to the vast number of models already available. Additionally, it gives a clear roadmap for extending the platform support beyond the proof of concept, with improved usability and directly useful features to the computational-neuroscience community, paving the way for wider adoption.
Characterizing the heterogeneity of tumor tissues from spatially resolved molecular measures
Zavodszky, Maria I.
2017-01-01
Background Tumor heterogeneity can manifest itself by sub-populations of cells having distinct phenotypic profiles expressed as diverse molecular, morphological and spatial distributions. This inherent heterogeneity poses challenges in terms of diagnosis, prognosis and efficient treatment. Consequently, tools and techniques are being developed to properly characterize and quantify tumor heterogeneity. Multiplexed immunofluorescence (MxIF) is one such technology that offers molecular insight into both inter-individual and intratumor heterogeneity. It enables the quantification of both the concentration and spatial distribution of 60+ proteins across a tissue section. Upon bioimage processing, protein expression data can be generated for each cell from a tissue field of view. Results The Multi-Omics Heterogeneity Analysis (MOHA) tool was developed to compute tissue heterogeneity metrics from MxIF spatially resolved tissue imaging data. This technique computes the molecular state of each cell in a sample based on a pathway or gene set. Spatial states are then computed based on the spatial arrangements of the cells as distinguished by their respective molecular states. MOHA computes tissue heterogeneity metrics from the distributions of these molecular and spatially defined states. A colorectal cancer cohort of approximately 700 subjects with MxIF data is presented to demonstrate the MOHA methodology. Within this dataset, statistically significant correlations were found between the intratumor AKT pathway state diversity and cancer stage and histological tumor grade. Furthermore, intratumor spatial diversity metrics were found to correlate with cancer recurrence. Conclusions MOHA provides a simple and robust approach to characterize molecular and spatial heterogeneity of tissues. Research projects that generate spatially resolved tissue imaging data can take full advantage of this useful technique. The MOHA algorithm is implemented as a freely available R script (see supplementary information). PMID:29190747
NASA Astrophysics Data System (ADS)
Maloney, C.; Toon, B.; Bardeen, C.
2017-12-01
Recent studies indicate that heterogeneous nucleation may play a large role in cirrus cloud formation in the UT/LS, a region previously thought to be primarily dominated by homogeneous nucleation. As a result, it is beneficial to ensure that general circulation models properly represent heterogeneous nucleation in ice cloud simulations. Our work strives towards addressing this issue in the NSF/DOE Community Earth System Model's atmospheric model, CAM. More specifically we are addressing the role of heterogeneous nucleation in the coupled sectional microphysics cloud model, CARMA. Currently, our CAM/CARMA cirrus model only performs homogenous ice nucleation while ignoring heterogeneous nucleation. In our work, we couple the CAM/CARMA cirrus model with the Modal Aerosol Model (MAM). By combining the aerosol model with CAM/CARMA we can both account for heterogeneous nucleation, as well as directly link the sulfates used for homogeneous nucleation to computed fields instead of the current static field being utilized. Here we present our initial results and compare our findings to observations from the long running CALIPSO and MODIS satellite missions.
Effect of homogenous-heterogeneous reactions on MHD Prandtl fluid flow over a stretching sheet
NASA Astrophysics Data System (ADS)
Khan, Imad; Malik, M. Y.; Hussain, Arif; Salahuddin, T.
An analysis is performed to explore the effects of homogenous-heterogeneous reactions on two-dimensional flow of Prandtl fluid over a stretching sheet. In present analysis, we used the developed model of homogeneous-heterogeneous reactions in boundary layer flow. The mathematical configuration of presented flow phenomenon yields the nonlinear partial differential equations. Using scaling transformations, the governing partial differential equations (momentum equation and homogenous-heterogeneous reactions equations) are transformed into non-linear ordinary differential equations (ODE's). Then, resulting non-linear ODE's are solved by computational scheme known as shooting method. The quantitative and qualitative manners of concerned physical quantities (velocity, concentration and drag force coefficient) are examined under prescribed physical constrained through figures and tables. It is observed that velocity profile enhances verses fluid parameters α and β while Hartmann number reduced it. The homogeneous and heterogeneous reactions parameters have reverse effects on concentration profile. Concentration profile shows retarding behavior for large values of Schmidt number. Skin fraction coefficient enhances with increment in Hartmann number H and fluid parameter α .
Heterogeneous Compression of Large Collections of Evolutionary Trees.
Matthews, Suzanne J
2015-01-01
Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoon, Hongkyu
The purpose of the project was to perform multiscale characterization of low permeability rocks to determine the effect of physical and chemical heterogeneity on the poromechanical and flow responses of shales and carbonate rocks with a broad range of physical and chemical heterogeneity . An integrated multiscale imaging of shale and carbonate rocks from nanometer to centimeter scales include s dual focused ion beam - scanning electron microscopy (FIB - SEM) , micro computed tomography (micro - CT) , optical and confocal microscopy, and 2D and 3D energy dispersive spectroscopy (EDS). In addition, mineralogical mapping and backscattered imaging with nanoindentationmore » testing advanced the quantitative evaluat ion of the relationship between material heterogeneity and mechanical behavior. T he spatial distribution of compositional heterogeneity, anisotropic bedding patterns, and mechanical anisotropy were employed as inputs for brittle fracture simulations using a phase field model . Comparison of experimental and numerical simulations reveal ed that proper incorporation of additional material information, such as bedding layer thickness and other geometrical attributes of the microstructures, can yield improvements on the numerical prediction of the mesoscale fracture patterns and hence the macroscopic effective toughness. Overall, a comprehensive framework to evaluate the relationship between mechanical response and micro-lithofacial features can allow us to make more accurate prediction of reservoir performance by developing a multi - scale understanding of poromechanical response to coupled chemical and mechanical interactions for subsurface energy related activities.« less
Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python
Laura, Jason R.; Rey, Sergio J.
2017-01-01
Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience
Stockton, David B.; Santamaria, Fidel
2015-01-01
We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.
Stockton, David B; Santamaria, Fidel
2015-01-01
We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.
Costa - Introduction to 2015 Annual Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Costa, James E.
In parallel with Sandia National Laboratories having two major locations (NM and CA), along with a number of smaller facilities across the nation, so too is the distribution of scientific, engineering and computing resources. As a part of Sandia’s Institutional Computing Program, CA site-based Sandia computer scientists and engineers have been providing mission and research staff with local CA resident expertise on computing options while also focusing on two growing high performance computing research problems. The first is how to increase system resilience to failure, as machines grow larger, more complex and heterogeneous. The second is how to ensure thatmore » computer hardware and configurations are optimized for specialized data analytical mission needs within the overall Sandia computing environment, including the HPC subenvironment. All of these activities support the larger Sandia effort in accelerating development and integration of high performance computing into national security missions. Sandia continues to both promote national R&D objectives, including the recent Presidential Executive Order establishing the National Strategic Computing Initiative and work to ensure that the full range of computing services and capabilities are available for all mission responsibilities, from national security to energy to homeland defense.« less
An Architecture for Cross-Cloud System Management
NASA Astrophysics Data System (ADS)
Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad
The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
NASA Astrophysics Data System (ADS)
O'Malley, D.; Le, E. B.; Vesselinov, V. V.
2015-12-01
We present a fast, scalable, and highly-implementable stochastic inverse method for characterization of aquifer heterogeneity. The method utilizes recent advances in randomized matrix algebra and exploits the structure of the Quasi-Linear Geostatistical Approach (QLGA), without requiring a structured grid like Fast-Fourier Transform (FFT) methods. The QLGA framework is a more stable version of Gauss-Newton iterates for a large number of unknown model parameters, but provides unbiased estimates. The methods are matrix-free and do not require derivatives or adjoints, and are thus ideal for complex models and black-box implementation. We also incorporate randomized least-square solvers and data-reduction methods, which speed up computation and simulate missing data points. The new inverse methodology is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. Inversion results based on series of synthetic problems with steady-state and transient calibration data are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fang, Y; Huang, H; Su, T
Purpose: Texture-based quantification of image heterogeneity has been a popular topic for imaging studies in recent years. As previous studies mainly focus on oncological applications, we report our recent efforts of applying such techniques on cardiac perfusion imaging. A fully automated procedure has been developed to perform texture analysis for measuring the image heterogeneity. Clinical data were used to evaluate the preliminary performance of such methods. Methods: Myocardial perfusion images of Thallium-201 scans were collected from 293 patients with suspected coronary artery disease. Each subject underwent a Tl-201 scan and a percutaneous coronary intervention (PCI) within three months. The PCImore » Result was used as the gold standard of coronary ischemia of more than 70% stenosis. Each Tl-201 scan was spatially normalized to an image template for fully automatic segmentation of the LV. The segmented voxel intensities were then carried into the texture analysis with our open-source software Chang Gung Image Texture Analysis toolbox (CGITA). To evaluate the clinical performance of the image heterogeneity for detecting the coronary stenosis, receiver operating characteristic (ROC) analysis was used to compute the overall accuracy, sensitivity and specificity as well as the area under curve (AUC). Those indices were compared to those obtained from the commercially available semi-automatic software QPS. Results: With the fully automatic procedure to quantify heterogeneity from Tl-201 scans, we were able to achieve a good discrimination with good accuracy (74%), sensitivity (73%), specificity (77%) and AUC of 0.82. Such performance is similar to those obtained from the semi-automatic QPS software that gives a sensitivity of 71% and specificity of 77%. Conclusion: Based on fully automatic procedures of data processing, our preliminary data indicate that the image heterogeneity of myocardial perfusion imaging can provide useful information for automatic determination of the myocardial ischemia.« less
NASA Astrophysics Data System (ADS)
Poat, M. D.; Lauret, J.; Betts, W.
2015-12-01
The STAR online computing environment is an intensive ever-growing system used for real-time data collection and analysis. Composed of heterogeneous and sometimes groups of custom-tuned machines, the computing infrastructure was previously managed by manual configurations and inconsistently monitored by a combination of tools. This situation led to configuration inconsistency and an overload of repetitive tasks along with lackluster communication between personnel and machines. Globally securing this heterogeneous cyberinfrastructure was tedious at best and an agile, policy-driven system ensuring consistency, was pursued. Three configuration management tools, Chef, Puppet, and CFEngine have been compared in reliability, versatility and performance along with a comparison of infrastructure monitoring tools Nagios and Icinga. STAR has selected the CFEngine configuration management tool and the Icinga infrastructure monitoring system leading to a versatile and sustainable solution. By leveraging these two tools STAR can now swiftly upgrade and modify the environment to its needs with ease as well as promptly react to cyber-security requests. By creating a sustainable long term monitoring solution, the detection of failures was reduced from days to minutes, allowing rapid actions before the issues become dire problems, potentially causing loss of precious experimental data or uptime.
HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
Dongarra, Jack; Gates, Mark; Haidar, Azzam; ...
2015-01-01
This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA.more » High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.« less
Magnetoelastic shear wave propagation in pre-stressed anisotropic media under gravity
NASA Astrophysics Data System (ADS)
Kumari, Nirmala; Chattopadhyay, Amares; Singh, Abhishek K.; Sahu, Sanjeev A.
2017-03-01
The present study investigates the propagation of shear wave (horizontally polarized) in two initially stressed heterogeneous anisotropic (magnetoelastic transversely isotropic) layers in the crust overlying a transversely isotropic gravitating semi-infinite medium. Heterogeneities in both the anisotropic layers are caused due to exponential variation (case-I) and linear variation (case-II) in the elastic constants with respect to the space variable pointing positively downwards. The dispersion relations have been established in closed form using Whittaker's asymptotic expansion and were found to be in the well-agreement to the classical Love wave equations. The substantial effects of magnetoelastic coupling parameters, heterogeneity parameters, horizontal compressive initial stresses, Biot's gravity parameter, and wave number on the phase velocity of shear waves have been computed and depicted by means of a graph. As a special case, dispersion equations have been deduced when the two layers and half-space are isotropic and homogeneous. The comparative study for both cases of heterogeneity of the layers has been performed and also depicted by means of graphical illustrations.
In vivo quantitative bioluminescence tomography using heterogeneous and homogeneous mouse models.
Liu, Junting; Wang, Yabin; Qu, Xiaochao; Li, Xiangsi; Ma, Xiaopeng; Han, Runqiang; Hu, Zhenhua; Chen, Xueli; Sun, Dongdong; Zhang, Rongqing; Chen, Duofang; Chen, Dan; Chen, Xiaoyuan; Liang, Jimin; Cao, Feng; Tian, Jie
2010-06-07
Bioluminescence tomography (BLT) is a new optical molecular imaging modality, which can monitor both physiological and pathological processes by using bioluminescent light-emitting probes in small living animal. Especially, this technology possesses great potential in drug development, early detection, and therapy monitoring in preclinical settings. In the present study, we developed a dual modality BLT prototype system with Micro-computed tomography (MicroCT) registration approach, and improved the quantitative reconstruction algorithm based on adaptive hp finite element method (hp-FEM). Detailed comparisons of source reconstruction between the heterogeneous and homogeneous mouse models were performed. The models include mice with implanted luminescence source and tumor-bearing mice with firefly luciferase report gene. Our data suggest that the reconstruction based on heterogeneous mouse model is more accurate in localization and quantification than the homogeneous mouse model with appropriate optical parameters and that BLT allows super-early tumor detection in vivo based on tomographic reconstruction of heterogeneous mouse model signal.
NASA Astrophysics Data System (ADS)
Johnson, Ryan Federick; Chelliah, Harsha Kumar
2017-01-01
For a range of flow and chemical timescales, numerical simulations of two-dimensional laminar flow over a reacting carbon surface were performed to understand further the complex coupling between heterogeneous and homogeneous reactions. An open-source computational package (OpenFOAM®) was used with previously developed lumped heterogeneous reaction models for carbon surfaces and a detailed homogeneous reaction model for CO oxidation. The influence of finite-rate chemical kinetics was explored by varying the surface temperatures from 1800 to 2600 K, while flow residence time effects were explored by varying the free-stream velocity up to 50 m/s. The reacting boundary layer structure dependence on the residence time was analysed by extracting the ratio of chemical source and species diffusion terms. The important contributions of radical species reactions on overall carbon removal rate, which is often neglected in multi-dimensional simulations, are highlighted. The results provide a framework for future development and validation of lumped heterogeneous reaction models based on multi-dimensional reacting flow configurations.
NASA Astrophysics Data System (ADS)
Leskiw, Donald M.; Zhau, Junmei
2000-06-01
This paper reports on results from an ongoing project to develop methodologies for representing and managing multiple, concurrent levels of detail and enabling high performance computing using parallel arrays within distributed object-based simulation frameworks. At this time we present the methodology for representing and managing multiple, concurrent levels of detail and modeling accuracy by using a representation based on the Kalman approach for estimation. The Kalman System Model equations are used to represent model accuracy, Kalman Measurement Model equations provide transformations between heterogeneous levels of detail, and interoperability among disparate abstractions is provided using a form of the Kalman Update equations.
Thin client performance for remote 3-D image display.
Lai, Albert; Nieh, Jason; Laine, Andrew; Starren, Justin
2003-01-01
Several trends in biomedical computing are converging in a way that will require new approaches to telehealth image display. Image viewing is becoming an "anytime, anywhere" activity. In addition, organizations are beginning to recognize that healthcare providers are highly mobile and optimal care requires providing information wherever the provider and patient are. Thin-client computing is one way to support image viewing this complex environment. However little is known about the behavior of thin client systems in supporting image transfer in modern heterogeneous networks. Our results show that using thin-clients can deliver acceptable performance over conditions commonly seen in wireless networks if newer protocols optimized for these conditions are used.
A Comparative Study of High and Low Fidelity Fan Models for Turbofan Engine System Simulation
NASA Technical Reports Server (NTRS)
Reed, John A.; Afjeh, Abdollah A.
1991-01-01
In this paper, a heterogeneous propulsion system simulation method is presented. The method is based on the formulation of a cycle model of a gas turbine engine. The model includes the nonlinear characteristics of the engine components via use of empirical data. The potential to simulate the entire engine operation on a computer without the aid of data is demonstrated by numerically generating "performance maps" for a fan component using two flow models of varying fidelity. The suitability of the fan models were evaluated by comparing the computed performance with experimental data. A discussion of the potential benefits and/or difficulties in connecting simulations solutions of differing fidelity is given.
Exascale computing and what it means for shock physics
NASA Astrophysics Data System (ADS)
Germann, Timothy
2015-06-01
The U.S. Department of Energy is preparing to launch an Exascale Computing Initiative, to address the myriad challenges required to deploy and effectively utilize an exascale-class supercomputer (i.e., one capable of performing 1018 operations per second) in the 2023 timeframe. Since physical (power dissipation) requirements limit clock rates to at most a few GHz, this will necessitate the coordination of on the order of a billion concurrent operations, requiring sophisticated system and application software, and underlying mathematical algorithms, that may differ radically from traditional approaches. Even at the smaller workstation or cluster level of computation, the massive concurrency and heterogeneity within each processor will impact computational scientists. Through the multi-institutional, multi-disciplinary Exascale Co-design Center for Materials in Extreme Environments (ExMatEx), we have initiated an early and deep collaboration between domain (computational materials) scientists, applied mathematicians, computer scientists, and hardware architects, in order to establish the relationships between algorithms, software stacks, and architectures needed to enable exascale-ready materials science application codes within the next decade. In my talk, I will discuss these challenges, and what it will mean for exascale-era electronic structure, molecular dynamics, and engineering-scale simulations of shock-compressed condensed matter. In particular, we anticipate that the emerging hierarchical, heterogeneous architectures can be exploited to achieve higher physical fidelity simulations using adaptive physics refinement. This work is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research.
NASA Technical Reports Server (NTRS)
Stroupe, Ashley W.; Okon, Avi; Robinson, Matthew; Huntsberger, Terry; Aghazarian, Hrand; Baumgartner, Eric
2004-01-01
Robotic Construction Crew (RCC) is a heterogeneous multi-robot system for autonomous acquisition, transport, and precision mating of components in construction tasks. RCC minimizes resources constrained in a space environment such as computation, power, communication and, sensing. A behavior-based architecture provides adaptability and robustness despite low computational requirements. RCC successfully performs several construction related tasks in an emulated outdoor environment despite high levels of uncertainty in motions and sensing. Quantitative results are provided for formation keeping in component transport, precision instrument placement, and construction tasks.
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Hardware accelerated high performance neutron transport computation based on AGENT methodology
NASA Astrophysics Data System (ADS)
Xiao, Shanjie
The spatial heterogeneity of the next generation Gen-IV nuclear reactor core designs brings challenges to the neutron transport analysis. The Arbitrary Geometry Neutron Transport (AGENT) AGENT code is a three-dimensional neutron transport analysis code being developed at the Laboratory for Neutronics and Geometry Computation (NEGE) at Purdue University. It can accurately describe the spatial heterogeneity in a hierarchical structure through the R-function solid modeler. The previous version of AGENT coupled the 2D transport MOC solver and the 1D diffusion NEM solver to solve the three dimensional Boltzmann transport equation. In this research, the 2D/1D coupling methodology was expanded to couple two transport solvers, the radial 2D MOC solver and the axial 1D MOC solver, for better accuracy. The expansion was benchmarked with the widely applied C5G7 benchmark models and two fast breeder reactor models, and showed good agreement with the reference Monte Carlo results. In practice, the accurate neutron transport analysis for a full reactor core is still time-consuming and thus limits its application. Therefore, another content of my research is focused on designing a specific hardware based on the reconfigurable computing technique in order to accelerate AGENT computations. It is the first time that the application of this type is used to the reactor physics and neutron transport for reactor design. The most time consuming part of the AGENT algorithm was identified. Moreover, the architecture of the AGENT acceleration system was designed based on the analysis. Through the parallel computation on the specially designed, highly efficient architecture, the acceleration design on FPGA acquires high performance at the much lower working frequency than CPUs. The whole design simulations show that the acceleration design would be able to speedup large scale AGENT computations about 20 times. The high performance AGENT acceleration system will drastically shortening the computation time for 3D full-core neutron transport analysis, making the AGENT methodology unique and advantageous, and thus supplies the possibility to extend the application range of neutron transport analysis in either industry engineering or academic research.
Zhang, Xiaotian; Yin, Jian; Zhang, Xu
2018-03-02
Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.
A high-speed DAQ framework for future high-level trigger and event building clusters
NASA Astrophysics Data System (ADS)
Caselle, M.; Ardila Perez, L. E.; Balzer, M.; Dritschler, T.; Kopmann, A.; Mohr, H.; Rota, L.; Vogelgesang, M.; Weber, M.
2017-03-01
Modern data acquisition and trigger systems require a throughput of several GB/s and latencies of the order of microseconds. To satisfy such requirements, a heterogeneous readout system based on FPGA readout cards and GPU-based computing nodes coupled by InfiniBand has been developed. The incoming data from the back-end electronics is delivered directly into the internal memory of GPUs through a dedicated peer-to-peer PCIe communication. High performance DMA engines have been developed for direct communication between FPGAs and GPUs using "DirectGMA (AMD)" and "GPUDirect (NVIDIA)" technologies. The proposed infrastructure is a candidate for future generations of event building clusters, high-level trigger filter farms and low-level trigger system. In this paper the heterogeneous FPGA-GPU architecture will be presented and its performance be discussed.
Arcade: A Web-Java Based Framework for Distributed Computing
NASA Technical Reports Server (NTRS)
Chen, Zhikai; Maly, Kurt; Mehrotra, Piyush; Zubair, Mohammad; Bushnell, Dennis M. (Technical Monitor)
2000-01-01
Distributed heterogeneous environments are being increasingly used to execute a variety of large size simulations and computational problems. We are developing Arcade, a web-based environment to design, execute, monitor, and control distributed applications. These targeted applications consist of independent heterogeneous modules which can be executed on a distributed heterogeneous environment. In this paper we describe the overall design of the system and discuss the prototype implementation of the core functionalities required to support such a framework.
Optimization and large scale computation of an entropy-based moment closure
NASA Astrophysics Data System (ADS)
Kristopher Garrett, C.; Hauck, Cory; Hill, Judith
2015-12-01
We present computational advances and results in the implementation of an entropy-based moment closure, MN, in the context of linear kinetic equations, with an emphasis on heterogeneous and large-scale computing platforms. Entropy-based closures are known in several cases to yield more accurate results than closures based on standard spectral approximations, such as PN, but the computational cost is generally much higher and often prohibitive. Several optimizations are introduced to improve the performance of entropy-based algorithms over previous implementations. These optimizations include the use of GPU acceleration and the exploitation of the mathematical properties of spherical harmonics, which are used as test functions in the moment formulation. To test the emerging high-performance computing paradigm of communication bound simulations, we present timing results at the largest computational scales currently available. These results show, in particular, load balancing issues in scaling the MN algorithm that do not appear for the PN algorithm. We also observe that in weak scaling tests, the ratio in time to solution of MN to PN decreases.
Optimization and large scale computation of an entropy-based moment closure
Hauck, Cory D.; Hill, Judith C.; Garrett, C. Kristopher
2015-09-10
We present computational advances and results in the implementation of an entropy-based moment closure, M N, in the context of linear kinetic equations, with an emphasis on heterogeneous and large-scale computing platforms. Entropy-based closures are known in several cases to yield more accurate results than closures based on standard spectral approximations, such as P N, but the computational cost is generally much higher and often prohibitive. Several optimizations are introduced to improve the performance of entropy-based algorithms over previous implementations. These optimizations include the use of GPU acceleration and the exploitation of the mathematical properties of spherical harmonics, which aremore » used as test functions in the moment formulation. To test the emerging high-performance computing paradigm of communication bound simulations, we present timing results at the largest computational scales currently available. Lastly, these results show, in particular, load balancing issues in scaling the M N algorithm that do not appear for the P N algorithm. We also observe that in weak scaling tests, the ratio in time to solution of M N to P N decreases.« less
Development of a Cloud Resolving Model for Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Sreepathi, S.; Norman, M. R.; Pal, A.; Hannah, W.; Ponder, C.
2017-12-01
A cloud resolving climate model is needed to reduce major systematic errors in climate simulations due to structural uncertainty in numerical treatments of convection - such as convective storm systems. This research describes the porting effort to enable SAM (System for Atmosphere Modeling) cloud resolving model on heterogeneous supercomputers using GPUs (Graphical Processing Units). We have isolated a standalone configuration of SAM that is targeted to be integrated into the DOE ACME (Accelerated Climate Modeling for Energy) Earth System model. We have identified key computational kernels from the model and offloaded them to a GPU using the OpenACC programming model. Furthermore, we are investigating various optimization strategies intended to enhance GPU utilization including loop fusion/fission, coalesced data access and loop refactoring to a higher abstraction level. We will present early performance results, lessons learned as well as optimization strategies. The computational platform used in this study is the Summitdev system, an early testbed that is one generation removed from Summit, the next leadership class supercomputer at Oak Ridge National Laboratory. The system contains 54 nodes wherein each node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. This work is part of a larger project, ACME-MMF component of the U.S. Department of Energy(DOE) Exascale Computing Project. The ACME-MMF approach addresses structural uncertainty in cloud processes by replacing traditional parameterizations with cloud resolving "superparameterization" within each grid cell of global climate model. Super-parameterization dramatically increases arithmetic intensity, making the MMF approach an ideal strategy to achieve good performance on emerging exascale computing architectures. The goal of the project is to integrate superparameterization into ACME, and explore its full potential to scientifically and computationally advance climate simulation and prediction.
Computational biomedicine: a challenge for the twenty-first century.
Coveney, Peter V; Shublaq, Nour W
2012-01-01
With the relentless increase of computer power and the widespread availability of digital patient-specific medical data, we are now entering an era when it is becoming possible to develop predictive models of human disease and pathology, which can be used to support and enhance clinical decision-making. The approach amounts to a grand challenge to computational science insofar as we need to be able to provide seamless yet secure access to large scale heterogeneous personal healthcare data in a facile way, typically integrated into complex workflows-some parts of which may need to be run on high performance computers-in a facile way that is integrated into clinical decision support software. In this paper, we review the state of the art in terms of case studies drawn from neurovascular pathologies and HIV/AIDS. These studies are representative of a large number of projects currently being performed within the Virtual Physiological Human initiative. They make demands of information technology at many scales, from the desktop to national and international infrastructures for data storage and processing, linked by high performance networks.
ADAPTIVE-GRID SIMULATION OF GROUNDWATER FLOW IN HETEROGENEOUS AQUIFERS. (R825689C068)
The prediction of contaminant transport in porous media requires the computation of the flow velocity. This work presents a methodology for high-accuracy computation of flow in a heterogeneous isotropic formation, employing a dual-flow formulation and adaptive...
An approach for heterogeneous and loosely coupled geospatial data distributed computing
NASA Astrophysics Data System (ADS)
Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui
2010-07-01
Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.
Finite Dimensional Approximations for Continuum Multiscale Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berlyand, Leonid
2017-01-24
The completed research project concerns the development of novel computational techniques for modeling nonlinear multiscale physical and biological phenomena. Specifically, it addresses the theoretical development and applications of the homogenization theory (coarse graining) approach to calculation of the effective properties of highly heterogenous biological and bio-inspired materials with many spatial scales and nonlinear behavior. This theory studies properties of strongly heterogeneous media in problems arising in materials science, geoscience, biology, etc. Modeling of such media raises fundamental mathematical questions, primarily in partial differential equations (PDEs) and calculus of variations, the subject of the PI’s research. The focus of completed researchmore » was on mathematical models of biological and bio-inspired materials with the common theme of multiscale analysis and coarse grain computational techniques. Biological and bio-inspired materials offer the unique ability to create environmentally clean functional materials used for energy conversion and storage. These materials are intrinsically complex, with hierarchical organization occurring on many nested length and time scales. The potential to rationally design and tailor the properties of these materials for broad energy applications has been hampered by the lack of computational techniques, which are able to bridge from the molecular to the macroscopic scale. The project addressed the challenge of computational treatments of such complex materials by the development of a synergistic approach that combines innovative multiscale modeling/analysis techniques with high performance computing.« less
Maintenance of ventricular fibrillation in heterogeneous ventricle.
Arevalo, Hamenegild J; Trayanova, Natalia A
2006-01-01
Although ventricular fibrillation (VF) is the prevalent cause of sudden cardiac death, the mechanisms that underlie VF remain elusive. One possible explanation is that VF is driven by a single robust rotor that is the source of wavefronts that break-up due to functional heterogeneities. Previous 2D computer simulations have proposed that a heterogeneity in background potassium current (IK1) can serve as the substrate for the formation of mother rotor activity. This study incorporates IK1 heterogeneity between the left and right ventricle in a realistic 3D rabbit ventricle model to examine its effects on the organization of VF. Computer simulations show that the IK1 heterogeneity contributes to the initiation and maintenance of VF by providing regions of different refractoriness which serves as sites of wave break and rotor formation. A single rotor that drives the fibrillatory activity in the ventricle is not found in this study. Instead, multiple sites of reentry are recorded throughout the ventricle. Calculation of dominant frequencies for each myocardial node yields no significant difference between the dominant frequency of the LV and the RV. The 3D computer simulations suggest that IK1 spatial heterogeneity alone can not lead to the formation of a stable rotor.
Theoretical foundation for measuring the groundwater age distribution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, William Payton; Arnold, Bill Walter
2014-01-01
In this study, we use PFLOTRAN, a highly scalable, parallel, flow and reactive transport code to simulate the concentrations of 3H, 3He, CFC-11, CFC-12, CFC-113, SF6, 39Ar, 81Kr, 4He and themean groundwater age in heterogeneous fields on grids with an excess of 10 million nodes. We utilize this computational platform to simulate the concentration of multiple tracers in high-resolution, heterogeneous 2-D and 3-D domains, and calculate tracer-derived ages. Tracer-derived ages show systematic biases toward younger ages when the groundwater age distribution contains water older than the maximum tracer age. The deviation of the tracer-derived age distribution from the true groundwatermore » age distribution increases with increasing heterogeneity of the system. However, the effect of heterogeneity is diminished as the mean travel time gets closer the tracer age limit. Age distributions in 3-D domains differ significantly from 2-D domains. 3D simulations show decreased mean age, and less variance in age distribution for identical heterogeneity statistics. High-performance computing allows for investigation of tracer and groundwater age systematics in high-resolution domains, providing a platform for understanding and utilizing environmental tracer and groundwater age information in heterogeneous 3-D systems. Groundwater environmental tracers can provide important constraints for the calibration of groundwater flow models. Direct simulation of environmental tracer concentrations in models has the additional advantage of avoiding assumptions associated with using calculated groundwater age values. This study quantifies model uncertainty reduction resulting from the addition of environmental tracer concentration data. The analysis uses a synthetic heterogeneous aquifer and the calibration of a flow and transport model using the pilot point method. Results indicate a significant reduction in the uncertainty in permeability with the addition of environmental tracer data, relative to the use of hydraulic measurements alone. Anthropogenic tracers and their decay products, such as CFC11, 3H, and 3He, provide significant constraint oninput permeability values in the model. Tracer data for 39Ar provide even more complete information on the heterogeneity of permeability and variability in the flow system than the anthropogenic tracers, leading to greater parameter uncertainty reduction.« less
Ogawa, S.; Komini Babu, S.; Chung, H. T.; ...
2016-08-22
The nano/micro-scale geometry of polymer electrolyte fuel cell (PEFC) catalyst layers critically affects cell performance. The small length scales and complex structure of these composite layers make it challenging to analyze cell performance and physics at the particle scale by experiment. We present a computational method to simulate transport and chemical reaction phenomena at the pore/particle-scale and apply it to a PEFC cathode with platinum group metal free (PGM-free) catalyst. Here, we numerically solve the governing equations for the physics with heterogeneous oxygen diffusion coefficient and proton conductivity evaluated using the actual electrode structure and ionomer distribution obtained using nano-scalemore » resolution X-ray computed tomography (nano-CT). Using this approach, the oxygen concentration and electrolyte potential distributions imposed by the oxygen reduction reaction are solved and the impact of the catalyst layer structure on performance is evaluated.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ogawa, S.; Komini Babu, S.; Chung, H. T.
The nano/micro-scale geometry of polymer electrolyte fuel cell (PEFC) catalyst layers critically affects cell performance. The small length scales and complex structure of these composite layers make it challenging to analyze cell performance and physics at the particle scale by experiment. We present a computational method to simulate transport and chemical reaction phenomena at the pore/particle-scale and apply it to a PEFC cathode with platinum group metal free (PGM-free) catalyst. Here, we numerically solve the governing equations for the physics with heterogeneous oxygen diffusion coefficient and proton conductivity evaluated using the actual electrode structure and ionomer distribution obtained using nano-scalemore » resolution X-ray computed tomography (nano-CT). Using this approach, the oxygen concentration and electrolyte potential distributions imposed by the oxygen reduction reaction are solved and the impact of the catalyst layer structure on performance is evaluated.« less
Rinkevicius, Zilvinas; Li, Xin; Sandberg, Jaime A R; Mikkelsen, Kurt V; Ågren, Hans
2014-03-11
We introduce a density functional theory/molecular mechanical approach for computation of linear response properties of molecules in heterogeneous environments, such as metal surfaces or nanoparticles embedded in solvents. The heterogeneous embedding environment, consisting from metallic and nonmetallic parts, is described by combined force fields, where conventional force fields are used for the nonmetallic part and capacitance-polarization-based force fields are used for the metallic part. The presented approach enables studies of properties and spectra of systems embedded in or placed at arbitrary shaped metallic surfaces, clusters, or nanoparticles. The capability and performance of the proposed approach is illustrated by sample calculations of optical absorption spectra of thymidine absorbed on gold surfaces in an aqueous environment, where we study how different organizations of the gold surface and how the combined, nonadditive effect of the two environments is reflected in the optical absorption spectrum.
Zhong, Qing; Rüschoff, Jan H.; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J.; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J.; Rupp, Niels J.; Fankhauser, Christian; Buhmann, Joachim M.; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A.; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C.; Jochum, Wolfram; Wild, Peter J.
2016-01-01
Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility. PMID:27052161
Zhong, Qing; Rüschoff, Jan H; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J; Rupp, Niels J; Fankhauser, Christian; Buhmann, Joachim M; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C; Jochum, Wolfram; Wild, Peter J
2016-04-07
Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerhard Strydom; Su-Jong Yoon
2014-04-01
Computational Fluid Dynamics (CFD) evaluation of homogeneous and heterogeneous fuel models was performed as part of the Phase I calculations of the International Atomic Energy Agency (IAEA) Coordinate Research Program (CRP) on High Temperature Reactor (HTR) Uncertainties in Modeling (UAM). This study was focused on the nominal localized stand-alone fuel thermal response, as defined in Ex. I-3 and I-4 of the HTR UAM. The aim of the stand-alone thermal unit-cell simulation is to isolate the effect of material and boundary input uncertainties on a very simplified problem, before propagation of these uncertainties are performed in subsequent coupled neutronics/thermal fluids phasesmore » on the benchmark. In many of the previous studies for high temperature gas cooled reactors, the volume-averaged homogeneous mixture model of a single fuel compact has been applied. In the homogeneous model, the Tristructural Isotropic (TRISO) fuel particles in the fuel compact were not modeled directly and an effective thermal conductivity was employed for the thermo-physical properties of the fuel compact. On the contrary, in the heterogeneous model, the uranium carbide (UCO), inner and outer pyrolytic carbon (IPyC/OPyC) and silicon carbide (SiC) layers of the TRISO fuel particles are explicitly modeled. The fuel compact is modeled as a heterogeneous mixture of TRISO fuel kernels embedded in H-451 matrix graphite. In this study, a steady-state and transient CFD simulations were performed with both homogeneous and heterogeneous models to compare the thermal characteristics. The nominal values of the input parameters are used for this CFD analysis. In a future study, the effects of input uncertainties in the material properties and boundary parameters will be investigated and reported.« less
Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations
NASA Astrophysics Data System (ADS)
Detrixhe, Miles; Gibou, Frédéric
2016-10-01
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
Rate decline curves analysis of multiple-fractured horizontal wells in heterogeneous reservoirs
NASA Astrophysics Data System (ADS)
Wang, Jiahang; Wang, Xiaodong; Dong, Wenxiu
2017-10-01
In heterogeneous reservoir with multiple-fractured horizontal wells (MFHWs), due to the high density network of artificial hydraulic fractures, the fluid flow around fracture tips behaves like non-linear flow. Moreover, the production behaviors of different artificial hydraulic fractures are also different. A rigorous semi-analytical model for MFHWs in heterogeneous reservoirs is presented by combining source function with boundary element method. The model are first validated by both analytical model and simulation model. Then new Blasingame type curves are established. Finally, the effects of critical parameters on the rate decline characteristics of MFHWs are discussed. The results show that heterogeneity has significant influence on the rate decline characteristics of MFHWs; the parameters related to the MFHWs, such as fracture conductivity and length also can affect the rate characteristics of MFHWs. One novelty of this model is to consider the elliptical flow around artificial hydraulic fracture tips. Therefore, our model can be used to predict rate performance more accurately for MFHWs in heterogeneous reservoir. The other novelty is the ability to model the different production behavior at different fracture stages. Compared to numerical and analytic methods, this model can not only reduce extensive computing processing but also show high accuracy.
Anchang, Benedict; Davis, Kara L.; Fienberg, Harris G.; Bendall, Sean C.; Karacosta, Loukia G.; Tibshirani, Robert; Nolan, Garry P.; Plevritis, Sylvia K.
2018-01-01
An individual malignant tumor is composed of a heterogeneous collection of single cells with distinct molecular and phenotypic features, a phenomenon termed intratumoral heterogeneity. Intratumoral heterogeneity poses challenges for cancer treatment, motivating the need for combination therapies. Single-cell technologies are now available to guide effective drug combinations by accounting for intratumoral heterogeneity through the analysis of the signaling perturbations of an individual tumor sample screened by a drug panel. In particular, Mass Cytometry Time-of-Flight (CyTOF) is a high-throughput single-cell technology that enables the simultaneous measurements of multiple (>40) intracellular and surface markers at the level of single cells for hundreds of thousands of cells in a sample. We developed a computational framework, entitled Drug Nested Effects Models (DRUG-NEM), to analyze CyTOF single-drug perturbation data for the purpose of individualizing drug combinations. DRUG-NEM optimizes drug combinations by choosing the minimum number of drugs that produce the maximal desired intracellular effects based on nested effects modeling. We demonstrate the performance of DRUG-NEM using single-cell drug perturbation data from tumor cell lines and primary leukemia samples. PMID:29654148
NASA Technical Reports Server (NTRS)
DeBaca, Richard C.; Sarkissian, Edwin; Madatyan, Mariyetta; Shepard, Douglas; Gluck, Scott; Apolinski, Mark; McDuffie, James; Tremblay, Dennis
2006-01-01
TES L1B Subsystem is a computer program that performs several functions for the Tropospheric Emission Spectrometer (TES). The term "L1B" (an abbreviation of "level 1B"), refers to data, specific to the TES, on radiometric calibrated spectral radiances and their corresponding noise equivalent spectral radiances (NESRs), plus ancillary geolocation, quality, and engineering data. The functions performed by TES L1B Subsystem include shear analysis, monitoring of signal levels, detection of ice build-up, and phase correction and radiometric and spectral calibration of TES target data. Also, the program computes NESRs for target spectra, writes scientific TES level-1B data to hierarchical- data-format (HDF) files for public distribution, computes brightness temperatures, and quantifies interpixel signal variability for the purpose of first-order cloud and heterogeneous land screening by the level-2 software summarized in the immediately following article. This program uses an in-house-developed algorithm, called "NUSRT," to correct instrument line-shape factors.
A computer-assisted study of pulse dynamics in anisotropic media
NASA Astrophysics Data System (ADS)
Krishnan, J.; Engelborghs, K.; Bär, M.; Lust, K.; Roose, D.; Kevrekidis, I. G.
2001-06-01
This study focuses on the computer-assisted stability analysis of travelling pulse-like structures in spatially periodic heterogeneous reaction-diffusion media. The physical motivation comes from pulse propagation in thin annular domains on a diffusionally anisotropic catalytic surface. The study was performed by computing the travelling pulse-like structures as limit cycles of the spatially discretized PDE, which in turn is performed in two ways: a Newton method based on a pseudospectral discretization of the PDE, and a Newton-Picard method based on a finite difference discretization. Details about the spectra of these modulated pulse-like structures are discussed, including how they may be compared with the spectra of pulses in homogeneous media. The effects of anisotropy on the dynamics of pulses and pulse pairs are studied. Beyond shifting the location of bifurcations present in homogeneous media, anisotropy can also introduce certain new instabilities.
2017-01-01
We report a computational fluid dynamics–discrete element method (CFD-DEM) simulation study on the interplay between mass transfer and a heterogeneous catalyzed chemical reaction in cocurrent gas-particle flows as encountered in risers. Slip velocity, axial gas dispersion, gas bypassing, and particle mixing phenomena have been evaluated under riser flow conditions to study the complex system behavior in detail. The most important factors are found to be directly related to particle cluster formation. Low air-to-solids flux ratios lead to more heterogeneous systems, where the cluster formation is more pronounced and mass transfer more influenced. Falling clusters can be partially circumvented by the gas phase, which therefore does not fully interact with the cluster particles, leading to poor gas–solid contact efficiencies. Cluster gas–solid contact efficiencies are quantified at several gas superficial velocities, reaction rates, and dilution factors in order to gain more insight regarding the influence of clustering phenomena on the performance of riser reactors. PMID:28553011
Carlos Varas, Álvaro E; Peters, E A J F; Kuipers, J A M
2017-05-17
We report a computational fluid dynamics-discrete element method (CFD-DEM) simulation study on the interplay between mass transfer and a heterogeneous catalyzed chemical reaction in cocurrent gas-particle flows as encountered in risers. Slip velocity, axial gas dispersion, gas bypassing, and particle mixing phenomena have been evaluated under riser flow conditions to study the complex system behavior in detail. The most important factors are found to be directly related to particle cluster formation. Low air-to-solids flux ratios lead to more heterogeneous systems, where the cluster formation is more pronounced and mass transfer more influenced. Falling clusters can be partially circumvented by the gas phase, which therefore does not fully interact with the cluster particles, leading to poor gas-solid contact efficiencies. Cluster gas-solid contact efficiencies are quantified at several gas superficial velocities, reaction rates, and dilution factors in order to gain more insight regarding the influence of clustering phenomena on the performance of riser reactors.
High performance computation of radiative transfer equation using the finite element method
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Favennec, Y.
2018-05-01
This article deals with an efficient strategy for numerically simulating radiative transfer phenomena using distributed computing. The finite element method alongside the discrete ordinate method is used for spatio-angular discretization of the monochromatic steady-state radiative transfer equation in an anisotropically scattering media. Two very different methods of parallelization, angular and spatial decomposition methods, are presented. To do so, the finite element method is used in a vectorial way. A detailed comparison of scalability, performance, and efficiency on thousands of processors is established for two- and three-dimensional heterogeneous test cases. Timings show that both algorithms scale well when using proper preconditioners. It is also observed that our angular decomposition scheme outperforms our domain decomposition method. Overall, we perform numerical simulations at scales that were previously unattainable by standard radiative transfer equation solvers.
CROPPER: a metagene creator resource for cross-platform and cross-species compendium studies.
Paananen, Jussi; Storvik, Markus; Wong, Garry
2006-09-22
Current genomic research methods provide researchers with enormous amounts of data. Combining data from different high-throughput research technologies commonly available in biological databases can lead to novel findings and increase research efficiency. However, combining data from different heterogeneous sources is often a very arduous task. These sources can be different microarray technology platforms, genomic databases, or experiments performed on various species. Our aim was to develop a software program that could facilitate the combining of data from heterogeneous sources, and thus allow researchers to perform genomic cross-platform/cross-species studies and to use existing experimental data for compendium studies. We have developed a web-based software resource, called CROPPER that uses the latest genomic information concerning different data identifiers and orthologous genes from the Ensembl database. CROPPER can be used to combine genomic data from different heterogeneous sources, allowing researchers to perform cross-platform/cross-species compendium studies without the need for complex computational tools or the requirement of setting up one's own in-house database. We also present an example of a simple cross-platform/cross-species compendium study based on publicly available Parkinson's disease data derived from different sources. CROPPER is a user-friendly and freely available web-based software resource that can be successfully used for cross-species/cross-platform compendium studies.
A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.
Wei, Wei; Gao, Bin; Liu, Tie-Yan; Wang, Taifeng; Li, Guohui; Li, Hang
2016-04-01
Graph-based ranking has been extensively studied and frequently applied in many applications, such as webpage ranking. It aims at mining potentially valuable information from the raw graph-structured data. Recently, with the proliferation of rich heterogeneous information (e.g., node/edge features and prior knowledge) available in many real-world graphs, how to effectively and efficiently leverage all information to improve the ranking performance becomes a new challenging problem. Previous methods only utilize part of such information and attempt to rank graph nodes according to link-based methods, of which the ranking performances are severely affected by several well-known issues, e.g., over-fitting or high computational complexity, especially when the scale of graph is very large. In this paper, we address the large-scale graph-based ranking problem and focus on how to effectively exploit rich heterogeneous information of the graph to improve the ranking performance. Specifically, we propose an innovative and effective semi-supervised PageRank (SSP) approach to parameterize the derived information within a unified semi-supervised learning framework (SSLF-GR), then simultaneously optimize the parameters and the ranking scores of graph nodes. Experiments on the real-world large-scale graphs demonstrate that our method significantly outperforms the algorithms that consider such graph information only partially.
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
NASA Astrophysics Data System (ADS)
Li, Hechao
An accurate knowledge of the complex microstructure of a heterogeneous material is crucial for quantitative structure-property relations establishment and its performance prediction and optimization. X-ray tomography has provided a non-destructive means for microstructure characterization in both 3D and 4D (i.e., structural evolution over time). Traditional reconstruction algorithms like filtered-back-projection (FBP) method or algebraic reconstruction techniques (ART) require huge number of tomographic projections and segmentation process before conducting microstructural quantification. This can be quite time consuming and computationally intensive. In this thesis, a novel procedure is first presented that allows one to directly extract key structural information in forms of spatial correlation functions from limited x-ray tomography data. The key component of the procedure is the computation of a "probability map", which provides the probability of an arbitrary point in the material system belonging to specific phase. The correlation functions of interest are then readily computed from the probability map. Using effective medium theory, accurate predictions of physical properties (e.g., elastic moduli) can be obtained. Secondly, a stochastic optimization procedure that enables one to accurately reconstruct material microstructure from a small number of x-ray tomographic projections (e.g., 20 - 40) is presented. Moreover, a stochastic procedure for multi-modal data fusion is proposed, where both X-ray projections and correlation functions computed from limited 2D optical images are fused to accurately reconstruct complex heterogeneous materials in 3D. This multi-modal reconstruction algorithm is proved to be able to integrate the complementary data to perform an excellent optimization procedure, which indicates its high efficiency in using limited structural information. Finally, the accuracy of the stochastic reconstruction procedure using limited X-ray projection data is ascertained by analyzing the microstructural degeneracy and the roughness of energy landscape associated with different number of projections. Ground-state degeneracy of a microstructure is found to decrease with increasing number of projections, which indicates a higher probability that the reconstructed configurations match the actual microstructure. The roughness of energy landscape can also provide information about the complexity and convergence behavior of the reconstruction for given microstructures and projection number.
Using Theory and Simulation to Design Self-Healing Surfaces
2007-11-16
blends, microcapsules Anna C. Balazs University of Pittsburgh Office of Sponsored Programs 3700 O’Hara St Pittsburgh, PA 15260 - REPORT DOCUMENTATION PAGE...novel computational approach (P5) to simulate the rolling motion of fluid-driven, particle-filled microcapsules along heterogeneous, adhesive substrates...established guidelines for designing particle-filled microcapsules that perform a “repair and go” function and could ultimately be used to restore
An adaptive Gaussian process-based iterative ensemble smoother for data assimilation
NASA Astrophysics Data System (ADS)
Ju, Lei; Zhang, Jiangjiang; Meng, Long; Wu, Laosheng; Zeng, Lingzao
2018-05-01
Accurate characterization of subsurface hydraulic conductivity is vital for modeling of subsurface flow and transport. The iterative ensemble smoother (IES) has been proposed to estimate the heterogeneous parameter field. As a Monte Carlo-based method, IES requires a relatively large ensemble size to guarantee its performance. To improve the computational efficiency, we propose an adaptive Gaussian process (GP)-based iterative ensemble smoother (GPIES) in this study. At each iteration, the GP surrogate is adaptively refined by adding a few new base points chosen from the updated parameter realizations. Then the sensitivity information between model parameters and measurements is calculated from a large number of realizations generated by the GP surrogate with virtually no computational cost. Since the original model evaluations are only required for base points, whose number is much smaller than the ensemble size, the computational cost is significantly reduced. The applicability of GPIES in estimating heterogeneous conductivity is evaluated by the saturated and unsaturated flow problems, respectively. Without sacrificing estimation accuracy, GPIES achieves about an order of magnitude of speed-up compared with the standard IES. Although subsurface flow problems are considered in this study, the proposed method can be equally applied to other hydrological models.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas
2017-04-01
Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dreher, M.; Peterka, T.
Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less
Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-01-01
This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, David H; Dubois, Andrew J; Boorman, Thomas M
2009-03-10
This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.
A new ChainMail approach for real-time soft tissue simulation.
Zhang, Jinao; Zhong, Yongmin; Smith, Julian; Gu, Chengfan
2016-07-03
This paper presents a new ChainMail method for real-time soft tissue simulation. This method enables the use of different material properties for chain elements to accommodate various materials. Based on the ChainMail bounding region, a new time-saving scheme is developed to improve computational efficiency for isotropic materials. The proposed method also conserves volume and strain energy. Experimental results demonstrate that the proposed ChainMail method can not only accommodate isotropic, anisotropic and heterogeneous materials but also model incompressibility and relaxation behaviors of soft tissues. Further, the proposed method can achieve real-time computational performance.
Influencing Trust for Human-Automation Collaborative Scheduling of Multiple Unmanned Vehicles.
Clare, Andrew S; Cummings, Mary L; Repenning, Nelson P
2015-11-01
We examined the impact of priming on operator trust and system performance when supervising a decentralized network of heterogeneous unmanned vehicles (UVs). Advances in autonomy have enabled a future vision of single-operator control of multiple heterogeneous UVs. Real-time scheduling for multiple UVs in uncertain environments requires the computational ability of optimization algorithms combined with the judgment and adaptability of human supervisors. Because of system and environmental uncertainty, appropriate operator trust will be instrumental to maintain high system performance and prevent cognitive overload. Three groups of operators experienced different levels of trust priming prior to conducting simulated missions in an existing, multiple-UV simulation environment. Participants who play computer and video games frequently were found to have a higher propensity to overtrust automation. By priming gamers to lower their initial trust to a more appropriate level, system performance was improved by 10% as compared to gamers who were primed to have higher trust in the automation. Priming was successful at adjusting the operator's initial and dynamic trust in the automated scheduling algorithm, which had a substantial impact on system performance. These results have important implications for personnel selection and training for futuristic multi-UV systems under human supervision. Although gamers may bring valuable skills, they may also be potentially prone to automation bias. Priming during training and regular priming throughout missions may be one potential method for overcoming this propensity to overtrust automation. © 2015, Human Factors and Ergonomics Society.
Buscail, L; Escourrou, J; Moreau, J; Delvaux, M; Louvel, D; Lapeyre, F; Tregant, P; Frexinos, J
1995-04-01
The usefulness and accuracy rate of endoscopic ultrasonography (EUS) in the diagnosis of chronic pancreatitis (CP) were prospectively evaluated in 81 patients with suspected pancreatic disease. All underwent EUS, abdominal ultrasonography (AUS), and computed tomography (CT), and endoscopic retrograde cholangiopancreatography (ERCP) was performed in 55 of the cases. The diagnosis of CP was established in 44 patients (CP group) including 24 with a calcified form. No pancreatic disease was observed in 18 patients (control group), and 19 patients had a pancreatic tumor. In the CP group AUS was less accurate than EUS in visualizing the pancreas, performances of CT scan being identical to EUS in this respect. A good correlation was observed between EUS and ERCP for visualization and measurement of the Wirsung duct. The most significant changes observed by EUS in the CP group were dilatation of the main pancreatic duct, heterogeneous echogenicity of the pancreatic parenchyma, and cysts < 20 mm in size even in noncalcified CP or with normal pancreatograms. Sensitivity of EUS for diagnosis of CP was 88% (AUS, 58%; ERCP, 74%; CT scan, 75%), the specificity being 100% for ERCP and EUS, 95% for CT scan, and 75% for AUS. The good performances of EUS allow early diagnosis of CP in symptomatic patients since heterogeneous echogenicity of the pancreatic parenchyma seems to be almost specifically associated with the disease.
Scaling of flow and transport behavior in heterogeneous groundwater systems
NASA Astrophysics Data System (ADS)
Scheibe, Timothy; Yabusaki, Steven
1998-11-01
Three-dimensional numerical simulations using a detailed synthetic hydraulic conductivity field developed from geological considerations provide insight into the scaling of subsurface flow and transport processes. Flow and advective transport in the highly resolved heterogeneous field were modeled using massively parallel computers, providing a realistic baseline for evaluation of the impacts of parameter scaling. Upscaling of hydraulic conductivity was performed at a variety of scales using a flexible power law averaging technique. A series of tests were performed to determine the effects of varying the scaling exponent on a number of metrics of flow and transport behavior. Flow and transport simulation on high-performance computers and three-dimensional scientific visualization combine to form a powerful tool for gaining insight into the behavior of complex heterogeneous systems. Many quantitative groundwater models utilize upscaled hydraulic conductivity parameters, either implicitly or explicitly. These parameters are designed to reproduce the bulk flow characteristics at the grid or field scale while not requiring detailed quantification of local-scale conductivity variations. An example from applied groundwater modeling is the common practice of calibrating grid-scale model hydraulic conductivity or transmissivity parameters so as to approximate observed hydraulic head and boundary flux values. Such parameterizations, perhaps with a bulk dispersivity imposed, are then sometimes used to predict transport of reactive or non-reactive solutes. However, this work demonstrates that those parameters that lead to the best upscaling for hydraulic conductivity and head do not necessarily correspond to the best upscaling for prediction of a variety of transport behaviors. This result reflects the fact that transport is strongly impacted by the existence and connectedness of extreme-valued hydraulic conductivities, in contrast to bulk flow which depends more strongly on mean values. It provides motivation for continued research into upscaling methods for transport that directly address advection in heterogeneous porous media. An electronic version of this article is available online at the journal's homepage at http://www.elsevier.nl/locate/advwatres or http://www.elsevier.com/locate/advwatres (see "Special section on vizualization". The online version contains additional supporting information, graphics, and a 3D animation of simulated particle movement. Limited. All rights reserved
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-07
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
Two-stage atlas subset selection in multi-atlas based image segmentation.
Zhao, Tingting; Ruan, Dan
2015-06-01
Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stage atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.
Cheng, Tao; Zhang, Guoyou; Zhang, Xianlong
2011-12-01
The aim of computer-assisted surgery is to improve accuracy and limit the range of surgical variability. However, a worldwide debate exists regarding the importance and usefulness of computer-assisted navigation for total knee arthroplasty (TKA). The main purpose of this study is to summarize and compare the radiographic outcomes of TKA performed using imageless computer-assisted navigation compared with conventional techniques. An electronic search of PubMed, EMBASE, Web of Science, and Cochrane library databases was made, in addition to manual search of major orthopedic journals. A meta-analysis of 29 quasi-randomized/randomized controlled trials (quasi-RCTs/RCTs) and 11 prospective comparative studies was conducted through a random effects model. Additional a priori sources of clinical heterogeneity were evaluated by subgroup analysis with regard to radiographic methods. When the outlier cut-off value of lower limb axis was defined as ±2° or ±3° from the neutral, the postoperative full-length radiographs demonstrated that the risk ratio was 0.54 or 0.39, respectively, which were in favor of the navigated group. When the cut-off value used for the alignment in the coronal and sagittal plane was 2° or 3°, imageless navigation significantly reduced the outlier rate of the femoral and tibial components compared with the conventional group. Notably, computed tomography scans demonstrated no statistically significant differences between the two groups regarding the outliers in the rotational alignment of the femoral and tibial components; however, there was strong statistical heterogeneity. Our results indicated that imageless computer-assisted navigation systems improve lower limb axis and component orientation in the coronal and sagittal planes, but not the rotational alignment in TKA. Further multiple-center clinical trials with long-term follow-up are needed to determine differences in the clinical and functional outcomes of knee arthroplasties performed using computer-assisted techniques. Copyright © 2011 Elsevier Inc. All rights reserved.
MultiPhyl: a high-throughput phylogenomics webserver using distributed computing
Keane, Thomas M.; Naughton, Thomas J.; McInerney, James O.
2007-01-01
With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php. PMID:17553837
Can we model solute transfer in heterogeneous soils with MIM model?
NASA Astrophysics Data System (ADS)
Ben Slimene, Erij; Lassabatere, Laurent; Winiarski, Thierry; Gourdon, Remy
2017-04-01
The fate of pollutants in the vadose zone must be understood, in particular, underneath infiltration basins for an optimum management of these plants. Stormwaters carry pollutants (heavy metals, organics, emerging pollutant like nanoparticles, etc.) and thus constitute a risk for groundwater and soil quality. Most infiltration basins are settled over highly permeable soils that exhibit a strong lithological heterogeneity. The impact of such lithological heterogeneity on flow and solute transfer has already been questioned. Previous studies have already proved that lithological heterogeneity was prone to the establishment of preferential flows. In more details, the concomitance of several materials with contrasting hydraulic properties induces funneled flow at the interfaces between less permeable and more permeable lithofacies. Solutes are then carried by water fluxes quickly along preferential flow pathways and have restricted access to zones far from these pathways. It can clearly be imagined that such pattern could be modeled by a MIM model postulating water fraction into two fractions, one mobile and the other immobile, with solute transport by convection and dispersion in mobile water fraction and solute diffusion at the interface between mobile and immobile water fractions. The application of MIM approach to the case of solute transport in strongly heterogeneous soils may be quite advantageous: simplification of the problem, fewer parameters, ease of modeling, numerical computation, gain in computation time, etc. However, such consistency has never been investigated in details. In this paper, we focus on the possibility to model solute transport in a strongly heterogeneous deposit using MIM model. The deposit has been the subject of intensive campaigns of characterization of its lithology and the hydraulic and hydrodispersive properties of its lithofacies. Numerical computations were performed for a section of deposit 13.5 m wide and 2.5 m deep. Numerical results clearly showed the establishment of preferential flows with funneling mostly under unsaturated conditions. Solute elution at 2.5 m depth was characterized and discussed as a function of solute reactivity. Solutes breakthrough curves show clear evidence of MIM like pattern. In this paper, we clearly demonstrate that MIM model accurately reproduces solute elution at 2.5m depths but also at different depths. MIM approach accuracy is ensured provided that related parameters are optimized as a function of depth, hydric and hydraulic conditions and the contrast in hydraulic parameters of the lithofacies that constitute the deposit.
Real-time video streaming in mobile cloud over heterogeneous wireless networks
NASA Astrophysics Data System (ADS)
Abdallah-Saleh, Saleh; Wang, Qi; Grecos, Christos
2012-06-01
Recently, the concept of Mobile Cloud Computing (MCC) has been proposed to offload the resource requirements in computational capabilities, storage and security from mobile devices into the cloud. Internet video applications such as real-time streaming are expected to be ubiquitously deployed and supported over the cloud for mobile users, who typically encounter a range of wireless networks of diverse radio access technologies during their roaming. However, real-time video streaming for mobile cloud users across heterogeneous wireless networks presents multiple challenges. The network-layer quality of service (QoS) provision to support high-quality mobile video delivery in this demanding scenario remains an open research question, and this in turn affects the application-level visual quality and impedes mobile users' perceived quality of experience (QoE). In this paper, we devise a framework to support real-time video streaming in this new mobile video networking paradigm and evaluate the performance of the proposed framework empirically through a lab-based yet realistic testing platform. One particular issue we focus on is the effect of users' mobility on the QoS of video streaming over the cloud. We design and implement a hybrid platform comprising of a test-bed and an emulator, on which our concept of mobile cloud computing, video streaming and heterogeneous wireless networks are implemented and integrated to allow the testing of our framework. As representative heterogeneous wireless networks, the popular WLAN (Wi-Fi) and MAN (WiMAX) networks are incorporated in order to evaluate effects of handovers between these different radio access technologies. The H.264/AVC (Advanced Video Coding) standard is employed for real-time video streaming from a server to mobile users (client nodes) in the networks. Mobility support is introduced to enable continuous streaming experience for a mobile user across the heterogeneous wireless network. Real-time video stream packets are captured for analytical purposes on the mobile user node. Experimental results are obtained and analysed. Future work is identified towards further improvement of the current design and implementation. With this new mobile video networking concept and paradigm implemented and evaluated, results and observations obtained from this study would form the basis of a more in-depth, comprehensive understanding of various challenges and opportunities in supporting high-quality real-time video streaming in mobile cloud over heterogeneous wireless networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, C.; Yu, G.; Wang, K.
The physical designs of the new concept reactors which have complex structure, various materials and neutronic energy spectrum, have greatly improved the requirements to the calculation methods and the corresponding computing hardware. Along with the widely used parallel algorithm, heterogeneous platforms architecture has been introduced into numerical computations in reactor physics. Because of the natural parallel characteristics, the CPU-FPGA architecture is often used to accelerate numerical computation. This paper studies the application and features of this kind of heterogeneous platforms used in numerical calculation of reactor physics through practical examples. After the designed neutron diffusion module based on CPU-FPGA architecturemore » achieves a 11.2 speed up factor, it is proved to be feasible to apply this kind of heterogeneous platform into reactor physics. (authors)« less
SU-E-J-128: Two-Stage Atlas Selection in Multi-Atlas-Based Image Segmentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, T; Ruan, D
2015-06-15
Purpose: In the new era of big data, multi-atlas-based image segmentation is challenged by heterogeneous atlas quality and high computation burden from extensive atlas collection, demanding efficient identification of the most relevant atlases. This study aims to develop a two-stage atlas selection scheme to achieve computational economy with performance guarantee. Methods: We develop a low-cost fusion set selection scheme by introducing a preliminary selection to trim full atlas collection into an augmented subset, alleviating the need for extensive full-fledged registrations. More specifically, fusion set selection is performed in two successive steps: preliminary selection and refinement. An augmented subset is firstmore » roughly selected from the whole atlas collection with a simple registration scheme and the corresponding preliminary relevance metric; the augmented subset is further refined into the desired fusion set size, using full-fledged registration and the associated relevance metric. The main novelty of this work is the introduction of an inference model to relate the preliminary and refined relevance metrics, based on which the augmented subset size is rigorously derived to ensure the desired atlases survive the preliminary selection with high probability. Results: The performance and complexity of the proposed two-stage atlas selection method were assessed using a collection of 30 prostate MR images. It achieved comparable segmentation accuracy as the conventional one-stage method with full-fledged registration, but significantly reduced computation time to 1/3 (from 30.82 to 11.04 min per segmentation). Compared with alternative one-stage cost-saving approach, the proposed scheme yielded superior performance with mean and medium DSC of (0.83, 0.85) compared to (0.74, 0.78). Conclusion: This work has developed a model-guided two-stage atlas selection scheme to achieve significant cost reduction while guaranteeing high segmentation accuracy. The benefit in both complexity and performance is expected to be most pronounced with large-scale heterogeneous data.« less
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing.
Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng
2014-10-01
Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream . Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boris, J.P.; Picone, J.M.; Lambrakos, S.G.
The Surveillance, Correlation, and Tracking (SCAT) problem is the computation-limited kernel of future battle-management systems currently being developed, for example, under the Strategic Defense Initiative (SDI). This report shows how high-performance SCAT can be performed in this decade. Estimates suggest that an increase by a factor of at least one thousand in computational capacity will be necessary to track 10/sup 5/ SDI objects in real time. This large improvement is needed because standard algorithms for data organization in important segments of the SCAT problem scale as N/sup 2/ and N/sup 3/, where N is the number of perceived objects. Itmore » is shown that the required speed-up factor can now be achieved because of two new developments: 1) a heterogeneous element supercomputer system based on available parallel-processing technology can account for over one order of magnitude performance improvement today over existing supercomputers; and 2) algorithmic innovations development recently by the NRL Laboratory for Computational Physics will account for another two orders of magnitude improvement. Based on these advances, a comprehensive, high-performance kernel for a simulator/system to perform the SCAT portion of SDI battle management is described.« less
NASA Technical Reports Server (NTRS)
Fauchez, Thomas; Platnick, Steven; Meyer, Kerry; Cornet, Celine; Szczap, Frederic; Varnai, Tamas
2017-01-01
This paper presents a study on the impact of cirrus cloud heterogeneities on MODIS simulated thermal infrared (TIR) brightness temperatures (BTs) at the top of the atmosphere (TOA) as a function of spatial resolution from 50 meters to 10 kilometers. A realistic 3-D (three-dimensional) cirrus field is generated by the 3DCLOUD model (average optical thickness of 1.4, cloudtop and base altitudes at 10 and 12 kilometers, respectively, consisting of aggregate column crystals of D (sub eff) equals 20 microns), and 3-D thermal infrared radiative transfer (RT) is simulated with the 3DMCPOL (3-D Monte Carlo Polarized) code. According to previous studies, differences between 3-D BT computed from a heterogenous pixel and 1-D (one-dimensional) RT computed from a homogeneous pixel are considered dependent at nadir on two effects: (i) the optical thickness horizontal heterogeneity leading to the plane-parallel homogeneous bias (PPHB); and the (ii) horizontal radiative transport (HRT) leading to the independent pixel approximation error (IPAE). A single but realistic cirrus case is simulated and, as expected, the PPHB mainly impacts the low-spatial resolution results (above approximately 250 meters), with averaged values of up to 5-7 K (thousand), while the IPAE mainly impacts the high-spatial resolution results (below approximately 250 meters) with average values of up to 1-2 K (thousand). A sensitivity study has been performed in order to extend these results to various cirrus optical thicknesses and heterogeneities by sampling the cirrus in several ranges of parameters. For four optical thickness classes and four optical heterogeneity classes, we have found that, for nadir observations, the spatial resolution at which the combination of PPHB and HRT effects is the smallest, falls between 100 and 250 meters. These spatial resolutions thus appear to be the best choice to retrieve cirrus optical properties with the smallest cloud heterogeneity-related total bias in the thermal infrared. For off-nadir observations, the average total effect is increased and the minimum is shifted to coarser spatial resolutions.
Processing of the WLCG monitoring data using NoSQL
NASA Astrophysics Data System (ADS)
Andreeva, J.; Beche, A.; Belov, S.; Dzhunov, I.; Kadochnikov, I.; Karavakis, E.; Saiz, P.; Schovancova, J.; Tuckett, D.
2014-06-01
The Worldwide LHC Computing Grid (WLCG) today includes more than 150 computing centres where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. Monitoring the computing activities of the LHC experiments, over such a huge heterogeneous infrastructure, is extremely demanding in terms of computation, performance and reliability. Furthermore, the generated monitoring flow is constantly increasing, which represents another challenge for the monitoring systems. While existing solutions are traditionally based on Oracle for data storage and processing, recent developments evaluate NoSQL for processing large-scale monitoring datasets. NoSQL databases are getting increasingly popular for processing datasets at the terabyte and petabyte scale using commodity hardware. In this contribution, the integration of NoSQL data processing in the Experiment Dashboard framework is described along with first experiences of using this technology for monitoring the LHC computing activities.
Mobility in hospital work: towards a pervasive computing hospital environment.
Morán, Elisa B; Tentori, Monica; González, Víctor M; Favela, Jesus; Martínez-Garcia, Ana I
2007-01-01
Handheld computers are increasingly being used by hospital workers. With the integration of wireless networks into hospital information systems, handheld computers can provide the basis for a pervasive computing hospital environment; to develop this designers need empirical information to understand how hospital workers interact with information while moving around. To characterise the medical phenomena we report the results of a workplace study conducted in a hospital. We found that individuals spend about half of their time at their base location, where most of their interactions occur. On average, our informants spent 23% of their time performing information management tasks, followed by coordination (17.08%), clinical case assessment (15.35%) and direct patient care (12.6%). We discuss how our results offer insights for the design of pervasive computing technology, and directions for further research and development in this field such as transferring information between heterogeneous devices and integration of the physical and digital domains.
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
Intratumor heterogeneity of DCE-MRI reveals Ki-67 proliferation status in breast cancer
NASA Astrophysics Data System (ADS)
Cheng, Hu; Fan, Ming; Zhang, Peng; Liu, Bin; Shao, Guoliang; Li, Lihua
2018-03-01
Breast cancer is a highly heterogeneous disease both biologically and clinically, and certain pathologic parameters, i.e., Ki67 expression, are useful in predicting the prognosis of patients. The aim of the study is to identify intratumor heterogeneity of breast cancer for predicting Ki-67 proliferation status in estrogen receptor (ER)-positive breast cancer patients. A dataset of 77 patients was collected who underwent dynamic contrast enhancement magnetic resonance imaging (DCE-MRI) examination. Of these patients, 51 were high-Ki-67 expression and 26 were low-Ki-67 expression. We partitioned the breast tumor into subregions using two methods based on the values of time to peak (TTP) and peak enhancement rate (PER). Within each tumor subregion, image features were extracted including statistical and morphological features from DCE-MRI. The classification models were applied on each region separately to assess whether the classifiers based on features extracted from various subregions features could have different performance for prediction. An area under a receiver operating characteristic curve (AUC) was computed using leave-one-out cross-validation (LOOCV) method. The classifier using features related with moderate time to peak achieved best performance with AUC of 0.826 than that based on the other regions. While using multi-classifier fusion method, the AUC value was significantly (P=0.03) increased to 0.858+/-0.032 compare to classifier with AUC of 0.778 using features from the entire tumor. The results demonstrated that features reflect heterogeneity in intratumoral subregions can improve the classifier performance to predict the Ki-67 proliferation status than the classifier using features from entire tumor alone.
Liu, Yang; Chiaromonte, Francesca; Li, Bing
2017-06-01
In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach. © 2016, The International Biometric Society.
Opportunities and choice in a new vector era
NASA Astrophysics Data System (ADS)
Nowak, A.
2014-06-01
This work discusses the significant changes in computing landscape related to the progression of Moore's Law, and the implications on scientific computing. Particular attention is devoted to the High Energy Physics domain (HEP), which has always made good use of threading, but levels of parallelism closer to the hardware were often left underutilized. Findings of the CERN openlab Platform Competence Center are reported in the context of expanding "performance dimensions", and especially the resurgence of vectors. These suggest that data oriented designs are feasible in HEP and have considerable potential for performance improvements on multiple levels, but will rarely trump algorithmic enhancements. Finally, an analysis of upcoming hardware and software technologies identifies heterogeneity as a major challenge for software, which will require more emphasis on scalable, efficient design.
Analysis and Modeling of Realistic Compound Channels in Transparent Relay Transmissions
Kanjirathumkal, Cibile K.; Mohammed, Sameer S.
2014-01-01
Analytical approaches for the characterisation of the compound channels in transparent multihop relay transmissions over independent fading channels are considered in this paper. Compound channels with homogeneous links are considered first. Using Mellin transform technique, exact expressions are derived for the moments of cascaded Weibull distributions. Subsequently, two performance metrics, namely, coefficient of variation and amount of fade, are derived using the computed moments. These metrics quantify the possible variations in the channel gain and signal to noise ratio from their respective average values and can be used to characterise the achievable receiver performance. This approach is suitable for analysing more realistic compound channel models for scattering density variations of the environment, experienced in multihop relay transmissions. The performance metrics for such heterogeneous compound channels having distinct distribution in each hop are computed and compared with those having identical constituent component distributions. The moments and the coefficient of variation computed are then used to develop computationally efficient estimators for the distribution parameters and the optimal hop count. The metrics and estimators proposed are complemented with numerical and simulation results to demonstrate the impact of the accuracy of the approaches. PMID:24701175
Performance characteristics of a nanoscale double-gate reconfigurable array
NASA Astrophysics Data System (ADS)
Beckett, Paul
2008-12-01
The double gate transistor is a promising device applicable to deep sub-micron design due to its inherent resistance to short-channel effects and superior subthreshold performance. Using both TCAD and SPICE circuit simulation, it is shown that the characteristics of fully depleted dual-gate thin-body Schottky barrier silicon transistors will not only uncouple the conflicting requirements of high performance and low standby power in digital logic, but will also allow the development of a locally-connected reconfigurable computing mesh. The magnitude of the threshold shift effect will scale with device dimensions and will remain compatible with oxide reliability constraints. A field-programmable architecture based on the double gate transistor is described in which the operating point of the circuit is biased via one gate while the other gate is used to form the logic array, such that complex heterogeneous computing functions may be developed from this homogeneous, mesh-connected organization.
NASA Technical Reports Server (NTRS)
Moorhead, Robert J., II; Smith, Wayne
1992-01-01
This report is the mid-year report intended for the design concepts for the communication network for the Advanced Solid Rocket Motor (ASRM) facility being built at Yellow Creek near Iuka, MS. The overall network is to include heterogeneous computers, to use various protocols, and to have different bandwidths. Performance consideration must be given to the potential network applications in the network environment. The performance evaluation of X window applications was given the major emphasis in this report. A simulation study using Bones will be included later. This mid-year report has three parts: Part 1 is an investigation of X window traffic using TCP/IP over Ethernet networks; part 2 is a survey study of performance concepts of X window applications with Macintosh computers; and the last part is a tutorial on DECnet protocols. The results of this report should be useful in the design and operation of the ASRM communication network.
Autonomic Management of Application Workflows on Hybrid Computing Infrastructure
Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; ...
2011-01-01
In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less
NASA Technical Reports Server (NTRS)
Vidal Melo, M. F.; Loeppky, J. A.; Caprihan, A.; Luft, U. C.
1993-01-01
This study describes a two-compartment model of pulmonary gas exchange in which alveolar ventilation to perfusion (VA/Q) heterogeneity and impairment of pulmonary diffusing capacity (D) are simultaneously taken into account. The mathematical model uses as input data measurements usually obtained in the lung function laboratory. It consists of two compartments and an anatomical shunt. Each compartment receives fractions of alveolar ventilation and blood flow. Mass balance equations and integration of Fick's law of diffusion are used to compute alveolar and blood O2 and CO2 values compatible with input O2 uptake and CO2 elimination. Two applications are presented. The first is a method to partition O2 and CO2 alveolar-arterial gradients into VA/Q and D components. The technique is evaluated in data of patients with chronic obstructive pulmonary disease (COPD). The second is a theoretical analysis of the effects of blood flow variation in alveolar and blood O2 partial pressures. The results show the importance of simultaneous consideration of D to estimate VA/Q heterogeneity in patients with diffusion impairment. This factor plays an increasing role in gas alveolar-arterial gradients as severity of COPD increases. Association of VA/Q heterogeneity and D may produce an increase of O2 arterial pressure with decreasing QT which would not be observed if only D were considered. We conclude that the presented computer model is a useful tool for description and interpretation of data from COPD patients and for performing theoretical analysis of variables involved in the gas exchange process.
Aumiller, William M; Davis, Bradley W; Hashemian, Negar; Maranas, Costas; Armaou, Antonios; Keating, Christine D
2014-03-06
The intracellular environment in which biological reactions occur is crowded with macromolecules and subdivided into microenvironments that differ in both physical properties and chemical composition. The work described here combines experimental and computational model systems to help understand the consequences of this heterogeneous reaction media on the outcome of coupled enzyme reactions. Our experimental model system for solution heterogeneity is a biphasic polyethylene glycol (PEG)/sodium citrate aqueous mixture that provides coexisting PEG-rich and citrate-rich phases. Reaction kinetics for the coupled enzyme reaction between glucose oxidase (GOX) and horseradish peroxidase (HRP) were measured in the PEG/citrate aqueous two-phase system (ATPS). Enzyme kinetics differed between the two phases, particularly for the HRP. Both enzymes, as well as the substrates glucose and H2O2, partitioned to the citrate-rich phase; however, the Amplex Red substrate necessary to complete the sequential reaction partitioned strongly to the PEG-rich phase. Reactions in ATPS were quantitatively described by a mathematical model that incorporated measured partitioning and kinetic parameters. The model was then extended to new reaction conditions, i.e., higher enzyme concentration. Both experimental and computational results suggest mass transfer across the interface is vital to maintain the observed rate of product formation, which may be a means of metabolic regulation in vivo. Although outcomes for a specific system will depend on the particulars of the enzyme reactions and the microenvironments, this work demonstrates how coupled enzymatic reactions in complex, heterogeneous media can be understood in terms of a mathematical model.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
PREPARING FOR EXASCALE: ORNL Leadership Computing Application Requirements and Strategy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joubert, Wayne; Kothe, Douglas B; Nam, Hai Ah
2009-12-01
In 2009 the Oak Ridge Leadership Computing Facility (OLCF), a U.S. Department of Energy (DOE) facility at the Oak Ridge National Laboratory (ORNL) National Center for Computational Sciences (NCCS), elicited petascale computational science requirements from leading computational scientists in the international science community. This effort targeted science teams whose projects received large computer allocation awards on OLCF systems. A clear finding of this process was that in order to reach their science goals over the next several years, multiple projects will require computational resources in excess of an order of magnitude more powerful than those currently available. Additionally, for themore » longer term, next-generation science will require computing platforms of exascale capability in order to reach DOE science objectives over the next decade. It is generally recognized that achieving exascale in the proposed time frame will require disruptive changes in computer hardware and software. Processor hardware will become necessarily heterogeneous and will include accelerator technologies. Software must undergo the concomitant changes needed to extract the available performance from this heterogeneous hardware. This disruption portends to be substantial, not unlike the change to the message passing paradigm in the computational science community over 20 years ago. Since technological disruptions take time to assimilate, we must aggressively embark on this course of change now, to insure that science applications and their underlying programming models are mature and ready when exascale computing arrives. This includes initiation of application readiness efforts to adapt existing codes to heterogeneous architectures, support of relevant software tools, and procurement of next-generation hardware testbeds for porting and testing codes. The 2009 OLCF requirements process identified numerous actions necessary to meet this challenge: (1) Hardware capabilities must be advanced on multiple fronts, including peak flops, node memory capacity, interconnect latency, interconnect bandwidth, and memory bandwidth. (2) Effective parallel programming interfaces must be developed to exploit the power of emerging hardware. (3) Science application teams must now begin to adapt and reformulate application codes to the new hardware and software, typified by hierarchical and disparate layers of compute, memory and concurrency. (4) Algorithm research must be realigned to exploit this hierarchy. (5) When possible, mathematical libraries must be used to encapsulate the required operations in an efficient and useful way. (6) Software tools must be developed to make the new hardware more usable. (7) Science application software must be improved to cope with the increasing complexity of computing systems. (8) Data management efforts must be readied for the larger quantities of data generated by larger, more accurate science models. Requirements elicitation, analysis, validation, and management comprise a difficult and inexact process, particularly in periods of technological change. Nonetheless, the OLCF requirements modeling process is becoming increasingly quantitative and actionable, as the process becomes more developed and mature, and the process this year has identified clear and concrete steps to be taken. This report discloses (1) the fundamental science case driving the need for the next generation of computer hardware, (2) application usage trends that illustrate the science need, (3) application performance characteristics that drive the need for increased hardware capabilities, (4) resource and process requirements that make the development and deployment of science applications on next-generation hardware successful, and (5) summary recommendations for the required next steps within the computer and computational science communities.« less
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density
NASA Astrophysics Data System (ADS)
Hohl, A.; Delmelle, E. M.; Tang, W.
2015-07-01
Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
High performance computing and communications: Advancing the frontiers of information technology
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1997-12-31
This report, which supplements the President`s Fiscal Year 1997 Budget, describes the interagency High Performance Computing and Communications (HPCC) Program. The HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of accomplishments to its credit. Over its five-year history, the HPCC Program has focused on developing high performance computing and communications technologies that can be applied to computation-intensive applications. Major highlights for FY 1996: (1) High performance computing systems enable practical solutions to complex problems with accuracies not possible five years ago; (2) HPCC-funded research in very large scale networking techniques has been instrumental inmore » the evolution of the Internet, which continues exponential growth in size, speed, and availability of information; (3) The combination of hardware capability measured in gigaflop/s, networking technology measured in gigabit/s, and new computational science techniques for modeling phenomena has demonstrated that very large scale accurate scientific calculations can be executed across heterogeneous parallel processing systems located thousands of miles apart; (4) Federal investments in HPCC software R and D support researchers who pioneered the development of parallel languages and compilers, high performance mathematical, engineering, and scientific libraries, and software tools--technologies that allow scientists to use powerful parallel systems to focus on Federal agency mission applications; and (5) HPCC support for virtual environments has enabled the development of immersive technologies, where researchers can explore and manipulate multi-dimensional scientific and engineering problems. Educational programs fostered by the HPCC Program have brought into classrooms new science and engineering curricula designed to teach computational science. This document contains a small sample of the significant HPCC Program accomplishments in FY 1996.« less
The Research and Implementation of MUSER CLEAN Algorithm Based on OpenCL
NASA Astrophysics Data System (ADS)
Feng, Y.; Chen, K.; Deng, H.; Wang, F.; Mei, Y.; Wei, S. L.; Dai, W.; Yang, Q. P.; Liu, Y. B.; Wu, J. P.
2017-03-01
It's urgent to carry out high-performance data processing with a single machine in the development of astronomical software. However, due to the different configuration of the machine, traditional programming techniques such as multi-threading, and CUDA (Compute Unified Device Architecture)+GPU (Graphic Processing Unit) have obvious limitations in portability and seamlessness between different operation systems. The OpenCL (Open Computing Language) used in the development of MUSER (MingantU SpEctral Radioheliograph) data processing system is introduced. And the Högbom CLEAN algorithm is re-implemented into parallel CLEAN algorithm by the Python language and PyOpenCL extended package. The experimental results show that the CLEAN algorithm based on OpenCL has approximately equally operating efficiency compared with the former CLEAN algorithm based on CUDA. More important, the data processing in merely CPU (Central Processing Unit) environment of this system can also achieve high performance, which has solved the problem of environmental dependence of CUDA+GPU. Overall, the research improves the adaptability of the system with emphasis on performance of MUSER image clean computing. In the meanwhile, the realization of OpenCL in MUSER proves its availability in scientific data processing. In view of the high-performance computing features of OpenCL in heterogeneous environment, it will probably become the preferred technology in the future high-performance astronomical software development.
Beating the tyranny of scale with a private cloud configured for Big Data
NASA Astrophysics Data System (ADS)
Lawrence, Bryan; Bennett, Victoria; Churchill, Jonathan; Juckes, Martin; Kershaw, Philip; Pepler, Sam; Pritchard, Matt; Stephens, Ag
2015-04-01
The Joint Analysis System, JASMIN, consists of a five significant hardware components: a batch computing cluster, a hypervisor cluster, bulk disk storage, high performance disk storage, and access to a tape robot. Each of the computing clusters consists of a heterogeneous set of servers, supporting a range of possible data analysis tasks - and a unique network environment makes it relatively trivial to migrate servers between the two clusters. The high performance disk storage will include the world's largest (publicly visible) deployment of the Panasas parallel disk system. Initially deployed in April 2012, JASMIN has already undergone two major upgrades, culminating in a system which by April 2015, will have in excess of 16 PB of disk and 4000 cores. Layered on the basic hardware are a range of services, ranging from managed services, such as the curated archives of the Centre for Environmental Data Archival or the data analysis environment for the National Centres for Atmospheric Science and Earth Observation, to a generic Infrastructure as a Service (IaaS) offering for the UK environmental science community. Here we present examples of some of the big data workloads being supported in this environment - ranging from data management tasks, such as checksumming 3 PB of data held in over one hundred million files, to science tasks, such as re-processing satellite observations with new algorithms, or calculating new diagnostics on petascale climate simulation outputs. We will demonstrate how the provision of a cloud environment closely coupled to a batch computing environment, all sharing the same high performance disk system allows massively parallel processing without the necessity to shuffle data excessively - even as it supports many different virtual communities, each with guaranteed performance. We will discuss the advantages of having a heterogeneous range of servers with available memory from tens of GB at the low end to (currently) two TB at the high end. There are some limitations of the JASMIN environment, the high performance disk environment is not fully available in the IaaS environment, and a planned ability to burst compute heavy jobs into the public cloud is not yet fully available. There are load balancing and performance issues that need to be understood. We will conclude with projections for future usage, and our plans to meet those requirements.
NASA Astrophysics Data System (ADS)
Fang, Y.; Hou, J.; Engel, D.; Lin, G.; Yin, J.; Han, B.; Fang, Z.; Fountoulakis, V.
2011-12-01
In this study, we introduce an uncertainty quantification (UQ) software framework for carbon sequestration, with the focus of studying being the effect of spatial heterogeneity of reservoir properties on CO2 migration. We use a sequential Gaussian method (SGSIM) to generate realizations of permeability fields with various spatial statistical attributes. To deal with the computational difficulties, we integrate the following ideas/approaches: 1) firstly, we use three different sampling approaches (probabilistic collocation, quasi-Monte Carlo, and adaptive sampling approaches) to reduce the required forward calculations while trying to explore the parameter space and quantify the input uncertainty; 2) secondly, we use eSTOMP as the forward modeling simulator. eSTOMP is implemented using the Global Arrays toolkit (GA) that is based on one-sided inter-processor communication and supports a shared memory programming style on distributed memory platforms. It provides highly-scalable performance. It uses a data model to partition most of the large scale data structures into a relatively small number of distinct classes. The lower level simulator infrastructure (e.g. meshing support, associated data structures, and data mapping to processors) is separated from the higher level physics and chemistry algorithmic routines using a grid component interface; and 3) besides the faster model and more efficient algorithms to speed up the forward calculation, we built an adaptive system infrastructure to select the best possible data transfer mechanisms, to optimally allocate system resources to improve performance, and to integrate software packages and data for composing carbon sequestration simulation, computation, analysis, estimation and visualization. We will demonstrate the framework with a given CO2 injection scenario in a heterogeneous sandstone reservoir.
Quasi-heterogeneous efficient 3-D discrete ordinates CANDU calculations using Attila
DOE Office of Scientific and Technical Information (OSTI.GOV)
Preeti, T.; Rulko, R.
2012-07-01
In this paper, 3-D quasi-heterogeneous large scale parallel Attila calculations of a generic CANDU test problem consisting of 42 complete fuel channels and a perpendicular to fuel reactivity device are presented. The solution method is that of discrete ordinates SN and the computational model is quasi-heterogeneous, i.e. fuel bundle is partially homogenized into five homogeneous rings consistently with the DRAGON code model used by the industry for the incremental cross-section generation. In calculations, the HELIOS-generated 45 macroscopic cross-sections library was used. This approach to CANDU calculations has the following advantages: 1) it allows detailed bundle (and eventually channel) power calculationsmore » for each fuel ring in a bundle, 2) it allows the exact reactivity device representation for its precise reactivity worth calculation, and 3) it eliminates the need for incremental cross-sections. Our results are compared to the reference Monte Carlo MCNP solution. In addition, the Attila SN method performance in CANDU calculations characterized by significant up scattering is discussed. (authors)« less
Analytical studies on the instabilities of heterogeneous intelligent traffic flow
NASA Astrophysics Data System (ADS)
Ngoduy, D.
2013-10-01
It has been widely reported in literature that a small perturbation in traffic flow such as a sudden deceleration of a vehicle could lead to the formation of traffic jams without a clear bottleneck. These traffic jams are usually related to instabilities in traffic flow. The applications of intelligent traffic systems are a potential solution to reduce the amplitude or to eliminate the formation of such traffic instabilities. A lot of research has been conducted to theoretically study the effect of intelligent vehicles, for example adaptive cruise control vehicles, using either computer simulation or analytical method. However, most current analytical research has only applied to single class traffic flow. To this end, the main topic of this paper is to perform a linear stability analysis to find the stability threshold of heterogeneous traffic flow using microscopic models, particularly the effect of intelligent vehicles on heterogeneous (or multi-class) traffic flow instabilities. The analytical results will show how intelligent vehicle percentages affect the stability of multi-class traffic flow.
Deformation of Surface Nanobubbles Induced by Substrate Hydrophobicity.
Wei, Jiachen; Zhang, Xianren; Song, Fan
2016-12-13
Recent experimental measurements have shown that there exists a population of nanobubbles with different curvature radii, whereas both computer simulations and theoretical analysis indicated that the curvature radii of different nanobubbles should be the same at a given supersaturation. To resolve such inconsistency, we perform molecular dynamics simulations on surface nanobubbles that are stabilized by heterogeneous substrates either in the geometrical heterogeneity model (GHM) or in the chemical heterogeneity model (CHM) and propose that the inconsistency could be ascribed to the substrate-induced nanobubble deformation. We find that, as expected from theory and computer simulation, for either the GHM or the CHM, there exists a universal upper limit of contact angle for the nanobubbles, which is determined by the degree of supersaturation alone. By analyzing the evolution of the shape of nanobubbles as a function of substrate hydrophobicity that is controlled here by the liquid-solid interaction, two different origins of nanobubble deformation are identified. For substrates in the GHM, where the contact line is pinned by surface roughness, variation in the liquid-solid interaction changes only the location of the contact line and the measured contact angle, without causing a change in the nanobubble curvature. For substrates in the CHM, however, the liquid-solid interaction exerted by the bottom substrate can deform the vapor-liquid interface, resulting in variations in both the curvature of the vapor-liquid interface and the contact angle.
NASA Astrophysics Data System (ADS)
Tian, Liang; Wilkinson, Richard; Yang, Zhibing; Power, Henry; Fagerlund, Fritjof; Niemi, Auli
2017-08-01
We explore the use of Gaussian process emulators (GPE) in the numerical simulation of CO2 injection into a deep heterogeneous aquifer. The model domain is a two-dimensional, log-normally distributed stochastic permeability field. We first estimate the cumulative distribution functions (CDFs) of the CO2 breakthrough time and the total CO2 mass using a computationally expensive Monte Carlo (MC) simulation. We then show that we can accurately reproduce these CDF estimates with a GPE, using only a small fraction of the computational cost required by traditional MC simulation. In order to build a GPE that can predict the simulator output from a permeability field consisting of 1000s of values, we use a truncated Karhunen-Loève (K-L) expansion of the permeability field, which enables the application of the Bayesian functional regression approach. We perform a cross-validation exercise to give an insight of the optimization of the experiment design for selected scenarios: we find that it is sufficient to use 100s values for the size of training set and that it is adequate to use as few as 15 K-L components. Our work demonstrates that GPE with truncated K-L expansion can be effectively applied to uncertainty analysis associated with modelling of multiphase flow and transport processes in heterogeneous media.
Estimating crustal heterogeneity from double-difference tomography
Got, J.-L.; Monteiller, V.; Virieux, J.; Okubo, P.
2006-01-01
Seismic velocity parameters in limited, but heterogeneous volumes can be inferred using a double-difference tomographic algorithm, but to obtain meaningful results accuracy must be maintained at every step of the computation. MONTEILLER et al. (2005) have devised a double-difference tomographic algorithm that takes full advantage of the accuracy of cross-spectral time-delays of large correlated event sets. This algorithm performs an accurate computation of theoretical travel-time delays in heterogeneous media and applies a suitable inversion scheme based on optimization theory. When applied to Kilauea Volcano, in Hawaii, the double-difference tomography approach shows significant and coherent changes to the velocity model in the well-resolved volumes beneath the Kilauea caldera and the upper east rift. In this paper, we first compare the results obtained using MONTEILLER et al.'s algorithm with those obtained using the classic travel-time tomographic approach. Then, we evaluated the effect of using data series of different accuracies, such as handpicked arrival-time differences ("picking differences"), on the results produced by double-difference tomographic algorithms. We show that picking differences have a non-Gaussian probability density function (pdf). Using a hyperbolic secant pdf instead of a Gaussian pdf allows improvement of the double-difference tomographic result when using picking difference data. We completed our study by investigating the use of spatially discontinuous time-delay data. ?? Birkha??user Verlag, Basel, 2006.
WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers
NASA Astrophysics Data System (ADS)
Andreeva, J.; Beche, A.; Belov, S.; Kadochnikov, I.; Saiz, P.; Tuckett, D.
2014-06-01
The Worldwide LHC Computing Grid provides resources for the four main virtual organizations. Along with data processing, data distribution is the key computing activity on the WLCG infrastructure. The scale of this activity is very large, the ATLAS virtual organization (VO) alone generates and distributes more than 40 PB of data in 100 million files per year. Another challenge is the heterogeneity of data transfer technologies. Currently there are two main alternatives for data transfers on the WLCG: File Transfer Service and XRootD protocol. Each LHC VO has its own monitoring system which is limited to the scope of that particular VO. There is a need for a global system which would provide a complete cross-VO and cross-technology picture of all WLCG data transfers. We present a unified monitoring tool - WLCG Transfers Dashboard - where all the VOs and technologies coexist and are monitored together. The scale of the activity and the heterogeneity of the system raise a number of technical challenges. Each technology comes with its own monitoring specificities and some of the VOs use several of these technologies. This paper describes the implementation of the system with particular focus on the design principles applied to ensure the necessary scalability and performance, and to easily integrate any new technology providing additional functionality which might be specific to that technology.
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sulakhe, D.; Rodriguez, A.; Wilde, M.
2008-03-01
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
High Performance Radiation Transport Simulations on TITAN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Christopher G; Davidson, Gregory G; Evans, Thomas M
2012-01-01
In this paper we describe the Denovo code system. Denovo solves the six-dimensional, steady-state, linear Boltzmann transport equation, of central importance to nuclear technology applications such as reactor core analysis (neutronics), radiation shielding, nuclear forensics and radiation detection. The code features multiple spatial differencing schemes, state-of-the-art linear solvers, the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm for inverting the transport operator, a new multilevel energy decomposition method scaling to hundreds of thousands of processing cores, and a modern, novel code architecture that supports straightforward integration of new features. In this paper we discuss the performance of Denovo on the 10--20 petaflop ORNLmore » GPU-based system, Titan. We describe algorithms and techniques used to exploit the capabilities of Titan's heterogeneous compute node architecture and the challenges of obtaining good parallel performance for this sparse hyperbolic PDE solver containing inherently sequential computations. Numerical results demonstrating Denovo performance on early Titan hardware are presented.« less
Particle simulation on heterogeneous distributed supercomputers
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.; Dagum, Leonardo
1993-01-01
We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks: a high-performance parallel interface (HIPPI) and an optical network (UltraNet). This is the first application to use this configuration at NASA Ames Research Center. We describe our experience implementing and using the application and report the results of several timing measurements. We show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.
AWE-WQ: fast-forwarding molecular dynamics using the accelerated weighted ensemble.
Abdul-Wahid, Badi'; Feng, Haoyun; Rajan, Dinesh; Costaouec, Ronan; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2014-10-27
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy.
Heterogeneous scalable framework for multiphase flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morris, Karla Vanessa
2013-09-01
Two categories of challenges confront the developer of computational spray models: those related to the computation and those related to the physics. Regarding the computation, the trend towards heterogeneous, multi- and many-core platforms will require considerable re-engineering of codes written for the current supercomputing platforms. Regarding the physics, accurate methods for transferring mass, momentum and energy from the dispersed phase onto the carrier fluid grid have so far eluded modelers. Significant challenges also lie at the intersection between these two categories. To be competitive, any physics model must be expressible in a parallel algorithm that performs well on evolving computermore » platforms. This work created an application based on a software architecture where the physics and software concerns are separated in a way that adds flexibility to both. The develop spray-tracking package includes an application programming interface (API) that abstracts away the platform-dependent parallelization concerns, enabling the scientific programmer to write serial code that the API resolves into parallel processes and threads of execution. The project also developed the infrastructure required to provide similar APIs to other application. The API allow object-oriented Fortran applications direct interaction with Trilinos to support memory management of distributed objects in central processing units (CPU) and graphic processing units (GPU) nodes for applications using C++.« less
AWE-WQ: Fast-Forwarding Molecular Dynamics Using the Accelerated Weighted Ensemble
2015-01-01
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy. PMID:25207854
Ship Detection from Ocean SAR Image Based on Local Contrast Variance Weighted Information Entropy
Huang, Yulin; Pei, Jifang; Zhang, Qian; Gu, Qin; Yang, Jianyu
2018-01-01
Ship detection from synthetic aperture radar (SAR) images is one of the crucial issues in maritime surveillance. However, due to the varying ocean waves and the strong echo of the sea surface, it is very difficult to detect ships from heterogeneous and strong clutter backgrounds. In this paper, an innovative ship detection method is proposed to effectively distinguish the vessels from complex backgrounds from a SAR image. First, the input SAR image is pre-screened by the maximally-stable extremal region (MSER) method, which can obtain the ship candidate regions with low computational complexity. Then, the proposed local contrast variance weighted information entropy (LCVWIE) is adopted to evaluate the complexity of those candidate regions and the dissimilarity between the candidate regions with their neighborhoods. Finally, the LCVWIE values of the candidate regions are compared with an adaptive threshold to obtain the final detection result. Experimental results based on measured ocean SAR images have shown that the proposed method can obtain stable detection performance both in strong clutter and heterogeneous backgrounds. Meanwhile, it has a low computational complexity compared with some existing detection methods. PMID:29652863
NASA Astrophysics Data System (ADS)
Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik
2016-09-01
We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
Gálvez, Sergio; Ferusic, Adis; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan A; Dorado, Gabriel
2016-10-01
The Smith-Waterman algorithm has a great sensitivity when used for biological sequence-database searches, but at the expense of high computing-power requirements. To overcome this problem, there are implementations in literature that exploit the different hardware-architectures available in a standard PC, such as GPU, CPU, and coprocessors. We introduce an application that splits the original database-search problem into smaller parts, resolves each of them by executing the most efficient implementations of the Smith-Waterman algorithms in different hardware architectures, and finally unifies the generated results. Using non-overlapping hardware allows simultaneous execution, and up to 2.58-fold performance gain, when compared with any other algorithm to search sequence databases. Even the performance of the popular BLAST heuristic is exceeded in 78% of the tests. The application has been tested with standard hardware: Intel i7-4820K CPU, Intel Xeon Phi 31S1P coprocessors, and nVidia GeForce GTX 960 graphics cards. An important increase in performance has been obtained in a wide range of situations, effectively exploiting the available hardware.
Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments
Kadima, Hubert; Granado, Bertrand
2013-01-01
We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361
Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.
Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand
2013-01-01
We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.
Yip, Connie; Tacelli, Nunzia; Remy-Jardin, Martine; Scherpereel, Arnaud; Cortot, Alexis; Lafitte, Jean-Jacques; Wallyn, Frederic; Remy, Jacques; Bassett, Paul; Siddique, Musib; Cook, Gary J R; Landau, David B; Goh, Vicky
2015-09-01
We aimed to assess computed tomography (CT) intratumoral heterogeneity changes, and compared the prognostic ability of the Response Evaluation Criteria in Solid Tumors (RECIST) 1.1, an alternate response method (Crabb), and CT heterogeneity in non-small cell lung cancer treated with chemotherapy with and without bevacizumab. Forty patients treated with chemotherapy (group C) or chemotherapy and bevacizumab (group BC) underwent contrast-enhanced CT at baseline and after 1, 3, and 6 cycles of chemotherapy. Radiologic response was assessed using RECIST 1.1 and an alternate method. CT heterogeneity analysis generating global and locoregional parameters depicting tumor image spatial intensity characteristics was performed. Heterogeneity parameters between the 2 groups were compared using the Mann-Whitney U test. Associations between heterogeneity parameters and radiologic response with overall survival were assessed using Cox regression. Global and locoregional heterogeneity parameters changed with treatment, with increased tumor heterogeneity in group BC. Entropy [group C: median -0.2% (interquartile range -2.2, 1.7) vs. group BC: 0.7% (-0.7, 3.5), P=0.10] and busyness [-27.7% (-62.2, -5.0) vs. -11.5% (-29.1, 92.4), P=0.10] showed a greater reduction in group C, whereas uniformity [1.9% (-8.0, 9.8) vs. -5.0% (-13.9, 5.6), P=0.10] showed a relative increase after 1 cycle but did not reach statistical significance. Two (9%) and 1 (6%) additional responders were identified using the alternate method compared with RECIST in group C and group BC, respectively. Heterogeneity parameters were not significant prognostic factors. The alternate response method described by Crabb identified more responders compared with RECIST. However, both criteria and baseline imaging heterogeneity parameters were not prognostic of survival.
Heterogeneous Systems for Information-Variable Environments (HIVE)
2017-05-01
ARL-TR-8027 ● May 2017 US Army Research Laboratory Heterogeneous Systems for Information - Variable Environments (HIVE) by Amar...not return it to the originator. ARL-TR-8027 ● May 2017 US Army Research Laboratory Heterogeneous Systems for Information ...Computational and Information Sciences Directorate, ARL Approved for public release; distribution is unlimited. ii REPORT
Pérez-Beteta, Julián; Luque, Belén; Arregui, Elena; Calvo, Manuel; Borrás, José M; López, Carlos; Martino, Juan; Velasquez, Carlos; Asenjo, Beatriz; Benavides, Manuel; Herruzo, Ismael; Martínez-González, Alicia; Pérez-Romasanta, Luis; Arana, Estanislao; Pérez-García, Víctor M
2016-01-01
Objective: The main objective of this retrospective work was the study of three-dimensional (3D) heterogeneity measures of post-contrast pre-operative MR images acquired with T1 weighted sequences of patients with glioblastoma (GBM) as predictors of clinical outcome. Methods: 79 patients from 3 hospitals were included in the study. 16 3D textural heterogeneity measures were computed including run-length matrix (RLM) features (regional heterogeneity) and co-occurrence matrix (CM) features (local heterogeneity). The significance of the results was studied using Kaplan–Meier curves and Cox proportional hazards analysis. Correlation between the variables of the study was assessed using the Spearman's correlation coefficient. Results: Kaplan–Meyer survival analysis showed that 4 of the 11 RLM features and 4 of the 5 CM features considered were robust predictors of survival. The median survival differences in the most significant cases were of over 6 months. Conclusion: Heterogeneity measures computed on the post-contrast pre-operative T1 weighted MR images of patients with GBM are predictors of survival. Advances in knowledge: Texture analysis to assess tumour heterogeneity has been widely studied. However, most works develop a two-dimensional analysis, focusing only on one MRI slice to state tumour heterogeneity. The study of fully 3D heterogeneity textural features as predictors of clinical outcome is more robust and is not dependent on the selected slice of the tumour. PMID:27319577
Molina, David; Pérez-Beteta, Julián; Luque, Belén; Arregui, Elena; Calvo, Manuel; Borrás, José M; López, Carlos; Martino, Juan; Velasquez, Carlos; Asenjo, Beatriz; Benavides, Manuel; Herruzo, Ismael; Martínez-González, Alicia; Pérez-Romasanta, Luis; Arana, Estanislao; Pérez-García, Víctor M
2016-07-04
The main objective of this retrospective work was the study of three-dimensional (3D) heterogeneity measures of post-contrast pre-operative MR images acquired with T 1 weighted sequences of patients with glioblastoma (GBM) as predictors of clinical outcome. 79 patients from 3 hospitals were included in the study. 16 3D textural heterogeneity measures were computed including run-length matrix (RLM) features (regional heterogeneity) and co-occurrence matrix (CM) features (local heterogeneity). The significance of the results was studied using Kaplan-Meier curves and Cox proportional hazards analysis. Correlation between the variables of the study was assessed using the Spearman's correlation coefficient. Kaplan-Meyer survival analysis showed that 4 of the 11 RLM features and 4 of the 5 CM features considered were robust predictors of survival. The median survival differences in the most significant cases were of over 6 months. Heterogeneity measures computed on the post-contrast pre-operative T 1 weighted MR images of patients with GBM are predictors of survival. Texture analysis to assess tumour heterogeneity has been widely studied. However, most works develop a two-dimensional analysis, focusing only on one MRI slice to state tumour heterogeneity. The study of fully 3D heterogeneity textural features as predictors of clinical outcome is more robust and is not dependent on the selected slice of the tumour.
Towards ubiquitous access of computer-assisted surgery systems.
Liu, Hui; Lufei, Hanping; Shi, Weishong; Chaudhary, Vipin
2006-01-01
Traditional stand-alone computer-assisted surgery (CAS) systems impede the ubiquitous and simultaneous access by multiple users. With advances in computing and networking technologies, ubiquitous access to CAS systems becomes possible and promising. Based on our preliminary work, CASMIL, a stand-alone CAS server developed at Wayne State University, we propose a novel mobile CAS system, UbiCAS, which allows surgeons to retrieve, review and interpret multimodal medical images, and to perform some critical neurosurgical procedures on heterogeneous devices from anywhere at anytime. Furthermore, various optimization techniques, including caching, prefetching, pseudo-streaming-model, and compression, are used to guarantee the QoS of the UbiCAS system. UbiCAS enables doctors at remote locations to actively participate remote surgeries, share patient information in real time before, during, and after the surgery.
Uchida, Yuichiro; Masui, Toshihiko; Sato, Asahi; Nagai, Kazuyuki; Anazawa, Takayuki; Takaori, Kyoichi; Uemoto, Shinji
2018-03-27
Peripancreatic collections occur frequently after distal pancreatectomy. However, the sequelae of peripancreatic collections vary from case to case, and their clinical impact is uncertain. In this study, the correlations between CT findings of peripancreatic collections and complications after distal pancreatectomy were investigated. Ninety-six consecutive patients who had undergone distal pancreatectomy between 2010 and 2015 were retrospectively investigated. The extent and heterogeneity of peripancreatic collections and background clinicopathological characteristics were analyzed. The extent of peripancreatic collections was calculated based on three-dimensional computed tomography images, and the degree of heterogeneity of peripancreatic collections was assessed based on the standard deviation of their density on computed tomography. Of 85 patients who underwent postoperative computed tomography imaging, a peripancreatic collection was detected in 77 (91%). Patients with either a large extent or a high degree of heterogeneity of peripancreatic collection had a significantly higher rate of clinically relevant pancreatic fistula than those without (odds ratio 5.95, 95% confidence interval 2.12-19.72, p = 0.001; odds ratio 8.0, 95% confidence interval 2.87-24.19, p = 0.0001, respectively). A large and heterogeneous peripancreatic collection was significantly associated with postoperative complications, especially clinically relevant postoperative pancreatic fistula. A small and homogenous peripancreatic collection could be safely observed.
NASA Astrophysics Data System (ADS)
Yue, S. S.; Wen, Y. N.; Lv, G. N.; Hu, D.
2013-10-01
In recent years, the increasing development of cloud computing technologies laid critical foundation for efficiently solving complicated geographic issues. However, it is still difficult to realize the cooperative operation of massive heterogeneous geographical models. Traditional cloud architecture is apt to provide centralized solution to end users, while all the required resources are often offered by large enterprises or special agencies. Thus, it's a closed framework from the perspective of resource utilization. Solving comprehensive geographic issues requires integrating multifarious heterogeneous geographical models and data. In this case, an open computing platform is in need, with which the model owners can package and deploy their models into cloud conveniently, while model users can search, access and utilize those models with cloud facility. Based on this concept, the open cloud service strategies for the sharing of heterogeneous geographic analysis models is studied in this article. The key technology: unified cloud interface strategy, sharing platform based on cloud service, and computing platform based on cloud service are discussed in detail, and related experiments are conducted for further verification.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
NASA Astrophysics Data System (ADS)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten
2017-11-01
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
SU-E-T-422: Fast Analytical Beamlet Optimization for Volumetric Intensity-Modulated Arc Therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Kenny S K; Lee, Louis K Y; Xing, L
2015-06-15
Purpose: To implement a fast optimization algorithm on CPU/GPU heterogeneous computing platform and to obtain an optimal fluence for a given target dose distribution from the pre-calculated beamlets in an analytical approach. Methods: The 2D target dose distribution was modeled as an n-dimensional vector and estimated by a linear combination of independent basis vectors. The basis set was composed of the pre-calculated beamlet dose distributions at every 6 degrees of gantry angle and the cost function was set as the magnitude square of the vector difference between the target and the estimated dose distribution. The optimal weighting of the basis,more » which corresponds to the optimal fluence, was obtained analytically by the least square method. Those basis vectors with a positive weighting were selected for entering into the next level of optimization. Totally, 7 levels of optimization were implemented in the study.Ten head-and-neck and ten prostate carcinoma cases were selected for the study and mapped to a round water phantom with a diameter of 20cm. The Matlab computation was performed in a heterogeneous programming environment with Intel i7 CPU and NVIDIA Geforce 840M GPU. Results: In all selected cases, the estimated dose distribution was in a good agreement with the given target dose distribution and their correlation coefficients were found to be in the range of 0.9992 to 0.9997. Their root-mean-square error was monotonically decreasing and converging after 7 cycles of optimization. The computation took only about 10 seconds and the optimal fluence maps at each gantry angle throughout an arc were quickly obtained. Conclusion: An analytical approach is derived for finding the optimal fluence for a given target dose distribution and a fast optimization algorithm implemented on the CPU/GPU heterogeneous computing environment greatly reduces the optimization time.« less
Probability-based collaborative filtering model for predicting gene-disease associations.
Zeng, Xiangxiang; Ding, Ningxiang; Rodríguez-Patón, Alfonso; Zou, Quan
2017-12-28
Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene-disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used. We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches. PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.
NASA Technical Reports Server (NTRS)
Tessler, Alexander; DiSciuva, Marco; Gherlone, marco
2010-01-01
The Refined Zigzag Theory (RZT) for homogeneous, laminated composite, and sandwich plates is presented from a multi-scale formalism starting with the inplane displacement field expressed as a superposition of coarse and fine contributions. The coarse kinematic field is that of first-order shear-deformation theory, whereas the fine kinematic field has a piecewise-linear zigzag distribution through the thickness. The condition of limiting homogeneity of transverse-shear properties is proposed and yields four distinct sets of zigzag functions. By examining elastostatic solutions for highly heterogeneous sandwich plates, the best-performing zigzag functions are identified. The RZT predictive capabilities to model homogeneous and highly heterogeneous sandwich plates are critically assessed, demonstrating its superior efficiency, accuracy ; and a wide range of applicability. The present theory, which is derived from the virtual work principle, is well-suited for developing computationally efficient CO-continuous finite elements, and is thus appropriate for the analysis and design of high-performance load-bearing aerospace structures.
Charge heterogeneity of surfaces: mapping and effects on surface forces.
Drelich, Jaroslaw; Wang, Yu U
2011-07-11
The DLVO theory treats the total interaction force between two surfaces in a liquid medium as an arithmetic sum of two components: Lifshitz-van der Waals and electric double layer forces. Despite the success of the DLVO model developed for homogeneous surfaces, a vast majority of surfaces of particles and materials in technological systems are of a heterogeneous nature with a mosaic structure composed of microscopic and sub-microscopic domains of different surface characteristics. In such systems, the heterogeneity of the surface can be more important than the average surface character. Attractions can be stronger, by orders of magnitude, than would be expected from the classical mean-field DLVO model when area-averaged surface charge or potential is employed. Heterogeneity also introduces anisotropy of interactions into colloidal systems, vastly ignored in the past. To detect surface heterogeneities, analytical tools which provide accurate and spatially resolved information about material surface chemistry and potential - particularly at microscopic and sub-microscopic resolutions - are needed. Atomic force microscopy (AFM) offers the opportunity to locally probe not only changes in material surface characteristic but also charges of heterogeneous surfaces through measurements of force-distance curves in electrolyte solutions. Both diffuse-layer charge densities and potentials can be calculated by fitting the experimental data with a DLVO theoretical model. The surface charge characteristics of the heterogeneous substrate as recorded by AFM allow the charge variation to be mapped. Based on the obtained information, computer modeling and simulation can be performed to study the interactions among an ensemble of heterogeneous particles and their collective motions. In this paper, the diffuse-layer charge mapping by the AFM technique is briefly reviewed, and a new Diffuse Interface Field Approach to colloid modeling and simulation is briefly discussed. Copyright © 2011 Elsevier B.V. All rights reserved.
Application-oriented integrated control center (AICC) for heterogeneous optical networks
NASA Astrophysics Data System (ADS)
Zhao, Yongli; Zhang, Jie; Cao, Xuping; Wang, Dajiang; Wu, Koubo; Cai, Yinxiang; Gu, Wanyi
2011-12-01
Various broad bandwidth services have being swallowing the bandwidth resource of optical networks, such as the data center application and cloud computation. There are still some challenges for future optical networks although the available bandwidth is increasing with the development of transmission technologies. The relationship between upper application layer and lower network resource layer is necessary to be researched further. In order to improve the efficiency of network resources and capability of service provisioning, heterogeneous optical networks resource can be abstracted as unified Application Programming Interfaces (APIs) which can be open to various upper applications through Application-oriented Integrated Control Center (AICC) proposed in the paper. A novel Openflow-based unified control architecture is proposed for the optimization of cross layer resources. Numeric results show good performance of AICC through simulation experiments.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.
Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros
2014-06-25
The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions
2014-01-01
Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing
Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng
2015-01-01
Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA’s CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream. Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels. PMID:26566545
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure
NASA Astrophysics Data System (ADS)
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-01
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
NASA Astrophysics Data System (ADS)
de Barros, Felipe P. J.
2018-07-01
Quantifying the uncertainty in solute mass discharge at an environmentally sensitive location is key to assess the risks due to groundwater contamination. Solute mass fluxes are strongly affected by the spatial variability of hydrogeological properties as well as release conditions at the source zone. This paper provides a methodological framework to investigate the interaction between the ubiquitous heterogeneity of the hydraulic conductivity and the mass release rate at the source zone on the uncertainty of mass discharge. Through the use of perturbation theory, we derive analytical and semi-analytical expressions for the statistics of the solute mass discharge at a control plane in a three-dimensional aquifer while accounting for the solute mass release rates at the source. The derived solutions are limited to aquifers displaying low-to-mild heterogeneity. Results illustrate the significance of the source zone mass release rate in controlling the mass discharge uncertainty. The relative importance of the mass release rate on the mean solute discharge depends on the distance between the source and the control plane. On the other hand, we find that the solute release rate at the source zone has a strong impact on the variance of the mass discharge. Within a risk context, we also compute the peak mean discharge as a function of the parameters governing the spatial heterogeneity of the hydraulic conductivity field and mass release rates at the source zone. The proposed physically-based framework is application-oriented, computationally efficient and capable of propagating uncertainty from different parameters onto risk metrics. Furthermore, it can be used for preliminary screening purposes to guide site managers to perform system-level sensitivity analysis and better allocate resources.
Jackson, Dan; Bowden, Jack
2016-09-07
Confidence intervals for the between study variance are useful in random-effects meta-analyses because they quantify the uncertainty in the corresponding point estimates. Methods for calculating these confidence intervals have been developed that are based on inverting hypothesis tests using generalised heterogeneity statistics. Whilst, under the random effects model, these new methods furnish confidence intervals with the correct coverage, the resulting intervals are usually very wide, making them uninformative. We discuss a simple strategy for obtaining 95 % confidence intervals for the between-study variance with a markedly reduced width, whilst retaining the nominal coverage probability. Specifically, we consider the possibility of using methods based on generalised heterogeneity statistics with unequal tail probabilities, where the tail probability used to compute the upper bound is greater than 2.5 %. This idea is assessed using four real examples and a variety of simulation studies. Supporting analytical results are also obtained. Our results provide evidence that using unequal tail probabilities can result in shorter 95 % confidence intervals for the between-study variance. We also show some further results for a real example that illustrates how shorter confidence intervals for the between-study variance can be useful when performing sensitivity analyses for the average effect, which is usually the parameter of primary interest. We conclude that using unequal tail probabilities when computing 95 % confidence intervals for the between-study variance, when using methods based on generalised heterogeneity statistics, can result in shorter confidence intervals. We suggest that those who find the case for using unequal tail probabilities convincing should use the '1-4 % split', where greater tail probability is allocated to the upper confidence bound. The 'width-optimal' interval that we present deserves further investigation.
Mobile high-performance computing (HPC) for synthetic aperture radar signal processing
NASA Astrophysics Data System (ADS)
Misko, Joshua; Kim, Youngsoo; Qi, Chenchen; Sirkeci, Birsen
2018-04-01
The importance of mobile high-performance computing has emerged in numerous battlespace applications at the tactical edge in hostile environments. Energy efficient computing power is a key enabler for diverse areas ranging from real-time big data analytics and atmospheric science to network science. However, the design of tactical mobile data centers is dominated by power, thermal, and physical constraints. Presently, it is very unlikely to achieve required computing processing power by aggregating emerging heterogeneous many-core processing platforms consisting of CPU, Field Programmable Gate Arrays and Graphic Processor cores constrained by power and performance. To address these challenges, we performed a Synthetic Aperture Radar case study for Automatic Target Recognition (ATR) using Deep Neural Networks (DNNs). However, these DNN models are typically trained using GPUs with gigabytes of external memories and massively used 32-bit floating point operations. As a result, DNNs do not run efficiently on hardware appropriate for low power or mobile applications. To address this limitation, we proposed for compressing DNN models for ATR suited to deployment on resource constrained hardware. This proposed compression framework utilizes promising DNN compression techniques including pruning and weight quantization while also focusing on processor features common to modern low-power devices. Following this methodology as a guideline produced a DNN for ATR tuned to maximize classification throughput, minimize power consumption, and minimize memory footprint on a low-power device.
Dunlop, R; Arbona, A; Rajasekaran, H; Lo Iacono, L; Fingberg, J; Summers, P; Benkner, S; Engelbrecht, G; Chiarini, A; Friedrich, C M; Moore, B; Bijlenga, P; Iavindrasana, J; Hose, R D; Frangi, A F
2008-01-01
This paper presents an overview of computerised decision support for clinical practice. The concept of computer-interpretable guidelines is introduced in the context of the @neurIST project, which aims at supporting the research and treatment of asymptomatic unruptured cerebral aneurysms by bringing together heterogeneous data, computing and complex processing services. The architecture is generic enough to adapt it to the treatment of other diseases beyond cerebral aneurysms. The paper reviews the generic requirements of the @neurIST system and presents the innovative work in distributing executable clinical guidelines.
NASA Astrophysics Data System (ADS)
Kort-Kamp, W. J. M.; Cordes, N. L.; Ionita, A.; Glover, B. B.; Duque, A. L. Higginbotham; Perry, W. L.; Patterson, B. M.; Dalvit, D. A. R.; Moore, D. S.
2016-04-01
Electromagnetic stimulation of energetic materials provides a noninvasive and nondestructive tool for detecting and identifying explosives. We combine structural information based on x-ray computed tomography, experimental dielectric data, and electromagnetic full-wave simulations to study microscale electromagnetic heating of realistic three-dimensional heterogeneous explosives. We analyze the formation of electromagnetic hot spots and thermal gradients in the explosive-binder mesostructures and compare the heating rate for various binder systems.
Two-stage atlas subset selection in multi-atlas based image segmentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhao, Tingting, E-mail: tingtingzhao@mednet.ucla.edu; Ruan, Dan, E-mail: druan@mednet.ucla.edu
2015-06-15
Purpose: Fast growing access to large databases and cloud stored data presents a unique opportunity for multi-atlas based image segmentation and also presents challenges in heterogeneous atlas quality and computation burden. This work aims to develop a novel two-stage method tailored to the special needs in the face of large atlas collection with varied quality, so that high-accuracy segmentation can be achieved with low computational cost. Methods: An atlas subset selection scheme is proposed to substitute a significant portion of the computationally expensive full-fledged registration in the conventional scheme with a low-cost alternative. More specifically, the authors introduce a two-stagemore » atlas subset selection method. In the first stage, an augmented subset is obtained based on a low-cost registration configuration and a preliminary relevance metric; in the second stage, the subset is further narrowed down to a fusion set of desired size, based on full-fledged registration and a refined relevance metric. An inference model is developed to characterize the relationship between the preliminary and refined relevance metrics, and a proper augmented subset size is derived to ensure that the desired atlases survive the preliminary selection with high probability. Results: The performance of the proposed scheme has been assessed with cross validation based on two clinical datasets consisting of manually segmented prostate and brain magnetic resonance images, respectively. The proposed scheme demonstrates comparable end-to-end segmentation performance as the conventional single-stage selection method, but with significant computation reduction. Compared with the alternative computation reduction method, their scheme improves the mean and medium Dice similarity coefficient value from (0.74, 0.78) to (0.83, 0.85) and from (0.82, 0.84) to (0.95, 0.95) for prostate and corpus callosum segmentation, respectively, with statistical significance. Conclusions: The authors have developed a novel two-stage atlas subset selection scheme for multi-atlas based segmentation. It achieves good segmentation accuracy with significantly reduced computation cost, making it a suitable configuration in the presence of extensive heterogeneous atlases.« less
Multi-scale image segmentation and numerical modeling in carbonate rocks
NASA Astrophysics Data System (ADS)
Alves, G. C.; Vanorio, T.
2016-12-01
Numerical methods based on computational simulations can be an important tool in estimating physical properties of rocks. These can complement experimental results, especially when time constraints and sample availability are a problem. However, computational models created at different scales can yield conflicting results with respect to the physical laboratory. This problem is exacerbated in carbonate rocks due to their heterogeneity at all scales. We developed a multi-scale approach performing segmentation of the rock images and numerical modeling across several scales, accounting for those heterogeneities. As a first step, we measured the porosity and the elastic properties of a group of carbonate samples with varying micrite content. Then, samples were imaged by Scanning Electron Microscope (SEM) as well as optical microscope at different magnifications. We applied three different image segmentation techniques to create numerical models from the SEM images and performed numerical simulations of the elastic wave-equation. Our results show that a multi-scale approach can efficiently account for micro-porosities in tight micrite-supported samples, yielding acoustic velocities comparable to those obtained experimentally. Nevertheless, in high-porosity samples characterized by larger grain/micrite ratio, results show that SEM scale images tend to overestimate velocities, mostly due to their inability to capture macro- and/or intragranular- porosity. This suggests that, for high-porosity carbonate samples, optical microscope images would be more suited for numerical simulations.
Thermal Performance Analysis of a Geologic Borehole Repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reagin, Lauren
2016-08-16
The Brazilian Nuclear Research Institute (IPEN) proposed a design for the disposal of Disused Sealed Radioactive Sources (DSRS) based on the IAEA Borehole Disposal of Sealed Radioactive Sources (BOSS) design that would allow the entirety of Brazil’s inventory of DSRS to be disposed in a single borehole. The proposed IPEN design allows for 170 waste packages (WPs) containing DSRS (such as Co-60 and Cs-137) to be stacked on top of each other inside the borehole. The primary objective of this work was to evaluate the thermal performance of a conservative approach to the IPEN proposal with the equivalent of twomore » WPs and two different inside configurations using Co-60 as the radioactive heat source. The current WP configuration (heterogeneous) for the IPEN proposal has 60% of the WP volume being occupied by a nuclear radioactive heat source and the remaining 40% as vacant space. The second configuration (homogeneous) considered for this project was a homogeneous case where 100% of the WP volume was occupied by a nuclear radioactive heat source. The computational models for the thermal analyses of the WP configurations with the Co-60 heat source considered three different cooling mechanisms (conduction, radiation, and convection) and the effect of mesh size on the results from the thermal analysis. The results of the analyses yielded maximum temperatures inside the WPs for both of the WP configurations and various mesh sizes. The heterogeneous WP considered the cooling mechanisms of conduction, convection, and radiation. The temperature results from the heterogeneous WP analysis suggest that the model is cooled predominantly by conduction with effect of radiation and natural convection on cooling being negligible. From the thermal analysis comparing the two WP configurations, the results suggest that either WP configuration could be used for the design. The mesh sensitivity results verify the meshes used, and results obtained from the thermal analyses were close to being independent of mesh size. The results from the computational case and analytically-calculated case for the homogeneous WP in benchmarking were almost identical, which indicates that the computational approach used here was successfully verified by the analytical solution.« less
NASA Astrophysics Data System (ADS)
Dagrau, Franck; Coulouvrat, François; Marchiano, Régis; Héron, Nicolas
2008-06-01
Dassault Aviation as a civil aircraft manufacturer is studying the feasibility of a supersonic business jet with the target of an "acceptable" sonic boom at the ground level, and in particular in case of focusing. A sonic boom computational process has been performed, that takes into account meteorological effects and aircraft manoeuvres. Turn manoeuvres and aircraft acceleration create zones of convergence of rays (caustics) which are the place of sound amplification. Therefore two elements have to be evaluated: firstly the geometrical position of the caustics, and secondly the noise level in the neighbourhood of the caustics. The modelling of the sonic boom propagation is based essentially on the assumptions of geometrical acoustics. Ray tracing is obtained according to Fermat's principle as paths that minimise the propagation time between the source (the aircraft) and the receiver. Wave amplitude and time waveform result from the solution of the inviscid Burgers' equation written along each individual ray. The "age variable" measuring the cumulative nonlinear effects is linked to the ray tube area. Caustics are located as the place where the ray tube area vanishes. Since geometrical acoustics does not take into account diffraction effects, it breaks down in the neighbourhood of caustics where it would predict unphysical infinite pressure amplitude. The aim of this study is to describe an original method for computing the focused noise level. The approach involves three main steps that can be summarised as follows. The propagation equation is solved by a forward marching procedure split into three successive steps: linear propagation in a homogeneous medium, linear perturbation due to the weak heterogeneity of the medium, and non-linear effects. The first step is solved using an "exact" angular spectrum algorithm. Parabolic approximation is applied only for the weak perturbation due to the heterogeneities. Finally, non linear effects are performed by solving the in-viscid Burgers' equation. As this one is valid for a plane wave, the direction of this last one is not prescribed a priori, but is computed in a self-adaptative way using an efficient numerical solver of the non-linear eikonal equation (Fast Marching Method).
Perpetuation of torsade de pointes in heterogeneous hearts: competing foci or re-entry?
Vandersickel, Nele; de Boer, Teun P; Vos, Marc A; Panfilov, Alexander V
2016-12-01
The underlying mechanism of torsade de pointes (TdP) remains of debate: perpetuation may be due to (1) focal activity or (2) re-entrant activity. The onset of TdP correlates with action potential heterogeneities in different regions of the heart. We studied the mechanism of perpetuation of TdP in silico using a 2D model of human cardiac tissue and an anatomically accurate model of the ventricles of the human heart. We found that the mechanism of perpetuation TdP depends on the degree of heterogeneity. If the degree of heterogeneity is large, focal activity alone can sustain a TdP, otherwise re-entrant activity emerges. This result can help to understand the relationship between the mechanisms of TdP and tissue properties and may help in developing new drugs against it. Torsade de pointes (TdP) can be the consequence of cardiac remodelling, drug effects or a combination of both. The mechanism underlying TdP is unclear, and may involve triggered focal activity or re-entry. Recent work by our group has indicated that both cases may exist, i.e. TdPs induced in the chronic atrioventricular block (CAVB) dog model may have a focal origin or are due to re-entry. Also it was found that heterogeneities might play an important role. In the current study we have used computational modelling to further investigate the mechanisms involved in TdP initiation and perpetuation, especially in the CAVB dog model, by the addition of heterogeneities with reduced repolarization reserve in comparison with the surrounding tissue. For this, the TNNP computer model was used for computations. We demonstrated in 2D and 3D simulations that ECGs with the typical TdP morphology can be caused by both multiple competing foci and re-entry circuits as a result of introduction of heterogeneities, depending on whether the heterogeneities have a large or a smaller reduced repolarization reserve in comparison with the surrounding tissue. Large heterogeneities can produce ectopic TdP, while smaller heterogeneities will produce re-entry-type TdP. © 2016 The Authors. The Journal of Physiology © 2016 The Physiological Society.
Compute as Fast as the Engineers Can Think! ULTRAFAST COMPUTING TEAM FINAL REPORT
NASA Technical Reports Server (NTRS)
Biedron, R. T.; Mehrotra, P.; Nelson, M. L.; Preston, M. L.; Rehder, J. J.; Rogersm J. L.; Rudy, D. H.; Sobieski, J.; Storaasli, O. O.
1999-01-01
This report documents findings and recommendations by the Ultrafast Computing Team (UCT). In the period 10-12/98, UCT reviewed design case scenarios for a supersonic transport and a reusable launch vehicle to derive computing requirements necessary for support of a design process with efficiency so radically improved that human thought rather than the computer paces the process. Assessment of the present computing capability against the above requirements indicated a need for further improvement in computing speed by several orders of magnitude to reduce time to solution from tens of hours to seconds in major applications. Evaluation of the trends in computer technology revealed a potential to attain the postulated improvement by further increases of single processor performance combined with massively parallel processing in a heterogeneous environment. However, utilization of massively parallel processing to its full capability will require redevelopment of the engineering analysis and optimization methods, including invention of new paradigms. To that end UCT recommends initiation of a new activity at LaRC called Computational Engineering for development of new methods and tools geared to the new computer architectures in disciplines, their coordination, and validation and benefit demonstration through applications.
Parallel computing in experimental mechanics and optical measurement: A review (II)
NASA Astrophysics Data System (ADS)
Wang, Tianyi; Kemao, Qian
2018-05-01
With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
NASA Astrophysics Data System (ADS)
Manfredi, Sabato
2016-06-01
Large-scale dynamic systems are becoming highly pervasive in their occurrence with applications ranging from system biology, environment monitoring, sensor networks, and power systems. They are characterised by high dimensionality, complexity, and uncertainty in the node dynamic/interactions that require more and more computational demanding methods for their analysis and control design, as well as the network size and node system/interaction complexity increase. Therefore, it is a challenging problem to find scalable computational method for distributed control design of large-scale networks. In this paper, we investigate the robust distributed stabilisation problem of large-scale nonlinear multi-agent systems (briefly MASs) composed of non-identical (heterogeneous) linear dynamical systems coupled by uncertain nonlinear time-varying interconnections. By employing Lyapunov stability theory and linear matrix inequality (LMI) technique, new conditions are given for the distributed control design of large-scale MASs that can be easily solved by the toolbox of MATLAB. The stabilisability of each node dynamic is a sufficient assumption to design a global stabilising distributed control. The proposed approach improves some of the existing LMI-based results on MAS by both overcoming their computational limits and extending the applicative scenario to large-scale nonlinear heterogeneous MASs. Additionally, the proposed LMI conditions are further reduced in terms of computational requirement in the case of weakly heterogeneous MASs, which is a common scenario in real application where the network nodes and links are affected by parameter uncertainties. One of the main advantages of the proposed approach is to allow to move from a centralised towards a distributed computing architecture so that the expensive computation workload spent to solve LMIs may be shared among processors located at the networked nodes, thus increasing the scalability of the approach than the network size. Finally, a numerical example shows the applicability of the proposed method and its advantage in terms of computational complexity when compared with the existing approaches.
de Vargas Roditi, Laura; Claassen, Manfred
2015-08-01
Novel technological developments enable single cell population profiling with respect to their spatial and molecular setup. These include single cell sequencing, flow cytometry and multiparametric imaging approaches and open unprecedented possibilities to learn about the heterogeneity, dynamics and interplay of the different cell types which constitute tissues and multicellular organisms. Statistical and dynamic systems theory approaches have been applied to quantitatively describe a variety of cellular processes, such as transcription and cell signaling. Machine learning approaches have been developed to define cell types, their mutual relationships, and differentiation hierarchies shaping heterogeneous cell populations, yielding insights into topics such as, for example, immune cell differentiation and tumor cell type composition. This combination of experimental and computational advances has opened perspectives towards learning predictive multi-scale models of heterogeneous cell populations. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Voecks, G. E.
1983-01-01
Insufficient theoretical definition of heterogeneous catalysts is the major difficulty confronting industrial suppliers who seek catalyst systems which are more active, selective, and stable than those currently available. In contrast, progress was made in tailoring homogeneous catalysts to specific reactions because more is known about the reaction intermediates promoted and/or stabilized by these catalysts during the course of reaction. However, modeling heterogeneous catalysts on a microscopic scale requires compiling and verifying complex information on reaction intermediates and pathways. This can be achieved by adapting homogeneous catalyzed reaction intermediate species, applying theoretical quantum chemistry and computer technology, and developing a better understanding of heterogeneous catalyst system environments. Research in microscopic reaction modeling is now at a stage where computer modeling, supported by physical experimental verification, could provide information about the dynamics of the reactions that will lead to designing supported catalysts with improved selectivity and stability.
Quantum Heterogeneous Computing for Satellite Positioning Optimization
NASA Astrophysics Data System (ADS)
Bass, G.; Kumar, V.; Dulny, J., III
2016-12-01
Hard optimization problems occur in many fields of academic study and practical situations. We present results in which quantum heterogeneous computing is used to solve a real-world optimization problem: satellite positioning. Optimization problems like this can scale very rapidly with problem size, and become unsolvable with traditional brute-force methods. Typically, such problems have been approximately solved with heuristic approaches; however, these methods can take a long time to calculate and are not guaranteed to find optimal solutions. Quantum computing offers the possibility of producing significant speed-up and improved solution quality. There are now commercially available quantum annealing (QA) devices that are designed to solve difficult optimization problems. These devices have 1000+ quantum bits, but they have significant hardware size and connectivity limitations. We present a novel heterogeneous computing stack that combines QA and classical machine learning and allows the use of QA on problems larger than the quantum hardware could solve in isolation. We begin by analyzing the satellite positioning problem with a heuristic solver, the genetic algorithm. The classical computer's comparatively large available memory can explore the full problem space and converge to a solution relatively close to the true optimum. The QA device can then evolve directly to the optimal solution within this more limited space. Preliminary experiments, using the Quantum Monte Carlo (QMC) algorithm to simulate QA hardware, have produced promising results. Working with problem instances with known global minima, we find a solution within 8% in a matter of seconds, and within 5% in a few minutes. Future studies include replacing QMC with commercially available quantum hardware and exploring more problem sets and model parameters. Our results have important implications for how heterogeneous quantum computing can be used to solve difficult optimization problems in any field.
Luo, Jiawei; Xiao, Qiu
2017-02-01
MicroRNAs (miRNAs) play a critical role by regulating their targets in post-transcriptional level. Identification of potential miRNA-disease associations will aid in deciphering the pathogenesis of human polygenic diseases. Several computational models have been developed to uncover novel miRNA-disease associations based on the predicted target genes. However, due to the insufficient number of experimentally validated miRNA-target interactions as well as the relatively high false-positive and false-negative rates of predicted target genes, it is still challenging for these prediction models to obtain remarkable performances. The purpose of this study is to prioritize miRNA candidates for diseases. We first construct a heterogeneous network, which consists of a disease similarity network, a miRNA functional similarity network and a known miRNA-disease association network. Then, an unbalanced bi-random walk-based algorithm on the heterogeneous network (BRWH) is adopted to discover potential associations by exploiting bipartite subgraphs. Based on 5-fold cross validation, the proposed network-based method achieves AUC values ranging from 0.782 to 0.907 for the 22 human diseases and an average AUC of almost 0.846. The experiments indicated that BRWH can achieve better performances compared with several popular methods. In addition, case studies of some common diseases further demonstrated the superior performance of our proposed method on prioritizing disease-related miRNA candidates. Copyright © 2017 Elsevier Inc. All rights reserved.
Kharche, Sanjay R.; So, Aaron; Salerno, Fabio; Lee, Ting-Yim; Ellis, Chris; Goldman, Daniel; McIntyre, Christopher W.
2018-01-01
Dialysis prolongs life but augments cardiovascular mortality. Imaging data suggests that dialysis increases myocardial blood flow (BF) heterogeneity, but its causes remain poorly understood. A biophysical model of human coronary vasculature was used to explain the imaging observations, and highlight causes of coronary BF heterogeneity. Post-dialysis CT images from patients under control, pharmacological stress (adenosine), therapy (cooled dialysate), and adenosine and cooled dialysate conditions were obtained. The data presented disparate phenotypes. To dissect vascular mechanisms, a 3D human vasculature model based on known experimental coronary morphometry and a space filling algorithm was implemented. Steady state simulations were performed to investigate the effects of altered aortic pressure and blood vessel diameters on myocardial BF heterogeneity. Imaging showed that stress and therapy potentially increased mean and total BF, while reducing heterogeneity. BF histograms of one patient showed multi-modality. Using the model, it was found that total coronary BF increased as coronary perfusion pressure was increased. BF heterogeneity was differentially affected by large or small vessel blocking. BF heterogeneity was found to be inversely related to small blood vessel diameters. Simulation of large artery stenosis indicates that BF became heterogeneous (increase relative dispersion) and gave multi-modal histograms. The total transmural BF as well as transmural BF heterogeneity reduced due to large artery stenosis, generating large patches of very low BF regions downstream. Blocking of arteries at various orders showed that blocking larger arteries results in multi-modal BF histograms and large patches of low BF, whereas smaller artery blocking results in augmented relative dispersion and fractal dimension. Transmural heterogeneity was also affected. Finally, the effects of augmented aortic pressure in the presence of blood vessel blocking shows differential effects on BF heterogeneity as well as transmural BF. Improved aortic blood pressure may improve total BF. Stress and therapy may be effective if they dilate small vessels. A potential cause for the observed complex BF distributions (multi-modal BF histograms) may indicate existing large vessel stenosis. The intuitive BF heterogeneity methods used can be readily used in clinical studies. Further development of the model and methods will permit personalized assessment of patient BF status. PMID:29867555
Strategies for Large Scale Implementation of a Multiscale, Multiprocess Integrated Hydrologic Model
NASA Astrophysics Data System (ADS)
Kumar, M.; Duffy, C.
2006-05-01
Distributed models simulate hydrologic state variables in space and time while taking into account the heterogeneities in terrain, surface, subsurface properties and meteorological forcings. Computational cost and complexity associated with these model increases with its tendency to accurately simulate the large number of interacting physical processes at fine spatio-temporal resolution in a large basin. A hydrologic model run on a coarse spatial discretization of the watershed with limited number of physical processes needs lesser computational load. But this negatively affects the accuracy of model results and restricts physical realization of the problem. So it is imperative to have an integrated modeling strategy (a) which can be universally applied at various scales in order to study the tradeoffs between computational complexity (determined by spatio- temporal resolution), accuracy and predictive uncertainty in relation to various approximations of physical processes (b) which can be applied at adaptively different spatial scales in the same domain by taking into account the local heterogeneity of topography and hydrogeologic variables c) which is flexible enough to incorporate different number and approximation of process equations depending on model purpose and computational constraint. An efficient implementation of this strategy becomes all the more important for Great Salt Lake river basin which is relatively large (~89000 sq. km) and complex in terms of hydrologic and geomorphic conditions. Also the types and the time scales of hydrologic processes which are dominant in different parts of basin are different. Part of snow melt runoff generated in the Uinta Mountains infiltrates and contributes as base flow to the Great Salt Lake over a time scale of decades to centuries. The adaptive strategy helps capture the steep topographic and climatic gradient along the Wasatch front. Here we present the aforesaid modeling strategy along with an associated hydrologic modeling framework which facilitates a seamless, computationally efficient and accurate integration of the process model with the data model. The flexibility of this framework leads to implementation of multiscale, multiresolution, adaptive refinement/de-refinement and nested modeling simulations with least computational burden. However, performing these simulations and related calibration of these models over a large basin at higher spatio- temporal resolutions is computationally intensive and requires use of increasing computing power. With the advent of parallel processing architectures, high computing performance can be achieved by parallelization of existing serial integrated-hydrologic-model code. This translates to running the same model simulation on a network of large number of processors thereby reducing the time needed to obtain solution. The paper also discusses the implementation of the integrated model on parallel processors. Also will be discussed the mapping of the problem on multi-processor environment, method to incorporate coupling between hydrologic processes using interprocessor communication models, model data structure and parallel numerical algorithms to obtain high performance.
Kort-Kamp, W. J. M.; Cordes, N. L.; Ionita, A.; ...
2016-04-01
Electromagnetic stimulation of energetic materials provides a noninvasive and nondestructive tool for detecting and identifying explosives. We combine structural information based on x-ray computed tomography, experimental dielectric data, and electromagnetic full-wave simulations to study microscale electromagnetic heating of realistic three-dimensional heterogeneous explosives. In conclusion, we analyze the formation of electromagnetic hot spots and thermal gradients in the explosive-binder mesostructures and compare the heating rate for various binder systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kort-Kamp, W. J. M.; Cordes, N. L.; Ionita, A.
Electromagnetic stimulation of energetic materials provides a noninvasive and nondestructive tool for detecting and identifying explosives. We combine structural information based on x-ray computed tomography, experimental dielectric data, and electromagnetic full-wave simulations to study microscale electromagnetic heating of realistic three-dimensional heterogeneous explosives. In conclusion, we analyze the formation of electromagnetic hot spots and thermal gradients in the explosive-binder mesostructures and compare the heating rate for various binder systems.
A Component-based Programming Model for Composite, Distributed Applications
NASA Technical Reports Server (NTRS)
Eidson, Thomas M.; Bushnell, Dennis M. (Technical Monitor)
2001-01-01
The nature of scientific programming is evolving to larger, composite applications that are composed of smaller element applications. These composite applications are more frequently being targeted for distributed, heterogeneous networks of computers. They are most likely programmed by a group of developers. Software component technology and computational frameworks are being proposed and developed to meet the programming requirements of these new applications. Historically, programming systems have had a hard time being accepted by the scientific programming community. In this paper, a programming model is outlined that attempts to organize the software component concepts and fundamental programming entities into programming abstractions that will be better understood by the application developers. The programming model is designed to support computational frameworks that manage many of the tedious programming details, but also that allow sufficient programmer control to design an accurate, high-performance application.
Computer Simulations of Intrinsically Disordered Proteins
NASA Astrophysics Data System (ADS)
Chong, Song-Ho; Chatterjee, Prathit; Ham, Sihyun
2017-05-01
The investigation of intrinsically disordered proteins (IDPs) is a new frontier in structural and molecular biology that requires a new paradigm to connect structural disorder to function. Molecular dynamics simulations and statistical thermodynamics potentially offer ideal tools for atomic-level characterizations and thermodynamic descriptions of this fascinating class of proteins that will complement experimental studies. However, IDPs display sensitivity to inaccuracies in the underlying molecular mechanics force fields. Thus, achieving an accurate structural characterization of IDPs via simulations is a challenge. It is also daunting to perform a configuration-space integration over heterogeneous structural ensembles sampled by IDPs to extract, in particular, protein configurational entropy. In this review, we summarize recent efforts devoted to the development of force fields and the critical evaluations of their performance when applied to IDPs. We also survey recent advances in computational methods for protein configurational entropy that aim to provide a thermodynamic link between structural disorder and protein activity.
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...
2017-11-27
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
NASA Astrophysics Data System (ADS)
DeBeer, Chris M.; Pomeroy, John W.
2017-10-01
The spatial heterogeneity of mountain snow cover and ablation is important in controlling patterns of snow cover depletion (SCD), meltwater production, and runoff, yet is not well-represented in most large-scale hydrological models and land surface schemes. Analyses were conducted in this study to examine the influence of various representations of snow cover and melt energy heterogeneity on both simulated SCD and stream discharge from a small alpine basin in the Canadian Rocky Mountains. Simulations were performed using the Cold Regions Hydrological Model (CRHM), where point-scale snowmelt computations were made using a snowpack energy balance formulation and applied to spatial frequency distributions of snow water equivalent (SWE) on individual slope-, aspect-, and landcover-based hydrological response units (HRUs) in the basin. Hydrological routines were added to represent the vertical and lateral transfers of water through the basin and channel system. From previous studies it is understood that the heterogeneity of late winter SWE is a primary control on patterns of SCD. The analyses here showed that spatial variation in applied melt energy, mainly due to differences in net radiation, has an important influence on SCD at multiple scales and basin discharge, and cannot be neglected without serious error in the prediction of these variables. A single basin SWE distribution using the basin-wide mean SWE (SWE ‾) and coefficient of variation (CV; standard deviation/mean) was found to represent the fine-scale spatial heterogeneity of SWE sufficiently well. Simulations that accounted for differences in (SWE ‾) among HRUs but neglected the sub-HRU heterogeneity of SWE were found to yield similar discharge results as simulations that included this heterogeneity, while SCD was poorly represented, even at the basin level. Finally, applying point-scale snowmelt computations based on a single SWE depth for each HRU (thereby neglecting spatial differences in internal snowpack energetics over the distributions) was found to yield similar SCD and discharge results as simulations that resolved internal energy differences. Spatial/internal snowpack melt energy effects are more pronounced at times earlier in spring before the main period of snowmelt and SCD, as shown in previously published work. The paper discusses the importance of these findings as they apply to the warranted complexity of snowmelt process simulation in cold mountain environments, and shows how the end-of-winter SWE distribution represents an effective means of resolving snow cover heterogeneity at multiple scales for modelling, even in steep and complex terrain.
NASA Astrophysics Data System (ADS)
Zhu, J.; Winter, C. L.; Wang, Z.
2015-08-01
Computational experiments are performed to evaluate the effects of locally heterogeneous conductivity fields on regional exchanges of water between stream and aquifer systems in the Middle Heihe River Basin (MHRB) of northwestern China. The effects are found to be nonlinear in the sense that simulated discharges from aquifers to streams are systematically lower than discharges produced by a base model parameterized with relatively coarse effective conductivity. A similar, but weaker, effect is observed for stream leakage. The study is organized around three hypotheses: (H1) small-scale spatial variations of conductivity significantly affect regional exchanges of water between streams and aquifers in river basins, (H2) aggregating small-scale heterogeneities into regional effective parameters systematically biases estimates of stream-aquifer exchanges, and (H3) the biases result from slow-paths in groundwater flow that emerge due to small-scale heterogeneities. The hypotheses are evaluated by comparing stream-aquifer fluxes produced by the base model to fluxes simulated using realizations of the MHRB characterized by local (grid-scale) heterogeneity. Levels of local heterogeneity are manipulated as control variables by adjusting coefficients of variation. All models are implemented using the MODFLOW simulation environment, and the PEST tool is used to calibrate effective conductivities defined over 16 zones within the MHRB. The effective parameters are also used as expected values to develop log-normally distributed conductivity (K) fields on local grid scales. Stream-aquifer exchanges are simulated with K fields at both scales and then compared. Results show that the effects of small-scale heterogeneities significantly influence exchanges with simulations based on local-scale heterogeneities always producing discharges that are less than those produced by the base model. Although aquifer heterogeneities are uncorrelated at local scales, they appear to induce coherent slow-paths in groundwater fluxes that in turn reduce aquifer-stream exchanges. Since surface water-groundwater exchanges are critical hydrologic processes in basin-scale water budgets, these results also have implications for water resources management.
NASA Astrophysics Data System (ADS)
Yamada, Takashi
2017-12-01
This study computationally examines (1) how the behaviors of subjects are represented, (2) whether the classification of subjects is related to the scale of the game, and (3) what kind of behavioral models are successful in small-sized lowest unique integer games (LUIGs). In a LUIG, N (>= 3) players submit a positive integer up to M(> 1) and the player choosing the smallest number not chosen by anyone else wins. For this purpose, the author considers four LUIGs with N = {3, 4} and M = {3, 4} and uses the behavioral data obtained in the laboratory experiment by Yamada and Hanaki (Physica A 463, pp. 88–102, 2016). For computational experiments, the author calibrates the parameters of typical learning models for each subject and then pursues round robin competitions. The main findings are in the following: First, the subjects who played not differently from the mixed-strategy Nash equilibrium (MSE) prediction tended to made use of not only their choices but also the game outcomes. Meanwhile those who deviated from the MSE prediction took care of only their choices as the complexity of the game increased. Second, the heterogeneity of player strategies depends on both the number of players (N) and the upper limit (M). Third, when groups consist of different agents like in the earlier laboratory experiment, sticking behavior is quite effective to win.
NASA Astrophysics Data System (ADS)
Jain, Madhu; Meena, Rakesh Kumar
2018-03-01
Markov model of multi-component machining system comprising two unreliable heterogeneous servers and mixed type of standby support has been studied. The repair job of broken down machines is done on the basis of bi-level threshold policy for the activation of the servers. The server returns back to render repair job when the pre-specified workload of failed machines is build up. The first (second) repairman turns on only when the work load of N1 (N2) failed machines is accumulated in the system. The both servers may go for vacation in case when all the machines are in good condition and there are no pending repair jobs for the repairmen. Runge-Kutta method is implemented to solve the set of governing equations used to formulate the Markov model. Various system metrics including the mean queue length, machine availability, throughput, etc., are derived to determine the performance of the machining system. To provide the computational tractability of the present investigation, a numerical illustration is provided. A cost function is also constructed to determine the optimal repair rate of the server by minimizing the expected cost incurred on the system. The hybrid soft computing method is considered to develop the adaptive neuro-fuzzy inference system (ANFIS). The validation of the numerical results obtained by Runge-Kutta approach is also facilitated by computational results generated by ANFIS.
Federated data storage and management infrastructure
NASA Astrophysics Data System (ADS)
Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.
2016-10-01
The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.
Mapping the Conjugate Gradient Algorithm onto High Performance Heterogeneous Computers
2014-05-01
Matrix Storage Formats According to J . Dongarra (Dongerra 2000), the efficiency of most iterative methods, such as CG, can be attributed to the...valh = aij) ⇒ (colh = j ). The ptr integer vector is of length n + 1 and contains the index in val where each matrix row starts. For example, the...first nonzero element of matrix rowm is found at index ptrm of val. By convention, ptrn+1 ≡ nz + 1. Notice that (aij) ⇒ (ptri ≤ j < ptri+1) for all i. An
Simulation of Etching in Chlorine Discharges Using an Integrated Feature Evolution-Plasma Model
NASA Technical Reports Server (NTRS)
Hwang, Helen H.; Bose, Deepak; Govindan, T. R.; Meyyappan, M.; Biegel, Bryan (Technical Monitor)
2002-01-01
To better utilize its vast collection of heterogeneous resources that are geographically distributed across the United States, NASA is constructing a computational grid called the Information Power Grid (IPG). This paper describes various tools and techniques that we are developing to measure and improve the performance of a broad class of NASA applications when run on the IPG. In particular, we are investigating the areas of grid benchmarking, grid monitoring, user-level application scheduling, and decentralized system-level scheduling.
The relationship between species detection probability and local extinction probability
Alpizar-Jara, R.; Nichols, J.D.; Hines, J.E.; Sauer, J.R.; Pollock, K.H.; Rosenberry, C.S.
2004-01-01
In community-level ecological studies, generally not all species present in sampled areas are detected. Many authors have proposed the use of estimation methods that allow detection probabilities that are < 1 and that are heterogeneous among species. These methods can also be used to estimate community-dynamic parameters such as species local extinction probability and turnover rates (Nichols et al. Ecol Appl 8:1213-1225; Conserv Biol 12:1390-1398). Here, we present an ad hoc approach to estimating community-level vital rates in the presence of joint heterogeneity of detection probabilities and vital rates. The method consists of partitioning the number of species into two groups using the detection frequencies and then estimating vital rates (e.g., local extinction probabilities) for each group. Estimators from each group are combined in a weighted estimator of vital rates that accounts for the effect of heterogeneity. Using data from the North American Breeding Bird Survey, we computed such estimates and tested the hypothesis that detection probabilities and local extinction probabilities were negatively related. Our analyses support the hypothesis that species detection probability covaries negatively with local probability of extinction and turnover rates. A simulation study was conducted to assess the performance of vital parameter estimators as well as other estimators relevant to questions about heterogeneity, such as coefficient of variation of detection probabilities and proportion of species in each group. Both the weighted estimator suggested in this paper and the original unweighted estimator for local extinction probability performed fairly well and provided no basis for preferring one to the other.
NASA Astrophysics Data System (ADS)
Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.
2015-12-01
Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.
2017-12-01
The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Automated inverse computer modeling of borehole flow data in heterogeneous aquifers
NASA Astrophysics Data System (ADS)
Sawdey, J. R.; Reeve, A. S.
2012-09-01
A computer model has been developed to simulate borehole flow in heterogeneous aquifers where the vertical distribution of permeability may vary significantly. In crystalline fractured aquifers, flow into or out of a borehole occurs at discrete locations of fracture intersection. Under these circumstances, flow simulations are defined by independent variables of transmissivity and far-field heads for each flow contributing fracture intersecting the borehole. The computer program, ADUCK (A Downhole Underwater Computational Kit), was developed to automatically calibrate model simulations to collected flowmeter data providing an inverse solution to fracture transmissivity and far-field head. ADUCK has been tested in variable borehole flow scenarios, and converges to reasonable solutions in each scenario. The computer program has been created using open-source software to make the ADUCK model widely available to anyone who could benefit from its utility.
Improving the Aircraft Design Process Using Web-Based Modeling and Simulation
NASA Technical Reports Server (NTRS)
Reed, John A.; Follen, Gregory J.; Afjeh, Abdollah A.; Follen, Gregory J. (Technical Monitor)
2000-01-01
Designing and developing new aircraft systems is time-consuming and expensive. Computational simulation is a promising means for reducing design cycle times, but requires a flexible software environment capable of integrating advanced multidisciplinary and multifidelity analysis methods, dynamically managing data across heterogeneous computing platforms, and distributing computationally complex tasks. Web-based simulation, with its emphasis on collaborative composition of simulation models, distributed heterogeneous execution, and dynamic multimedia documentation, has the potential to meet these requirements. This paper outlines the current aircraft design process, highlighting its problems and complexities, and presents our vision of an aircraft design process using Web-based modeling and simulation.
Improving the Aircraft Design Process Using Web-based Modeling and Simulation
NASA Technical Reports Server (NTRS)
Reed, John A.; Follen, Gregory J.; Afjeh, Abdollah A.
2003-01-01
Designing and developing new aircraft systems is time-consuming and expensive. Computational simulation is a promising means for reducing design cycle times, but requires a flexible software environment capable of integrating advanced multidisciplinary and muitifidelity analysis methods, dynamically managing data across heterogeneous computing platforms, and distributing computationally complex tasks. Web-based simulation, with its emphasis on collaborative composition of simulation models, distributed heterogeneous execution, and dynamic multimedia documentation, has the potential to meet these requirements. This paper outlines the current aircraft design process, highlighting its problems and complexities, and presents our vision of an aircraft design process using Web-based modeling and simulation.
Analytical effective tensor for flow-through composites
Sviercoski, Rosangela De Fatima [Los Alamos, NM
2012-06-19
A machine, method and computer-usable medium for modeling an average flow of a substance through a composite material. Such a modeling includes an analytical calculation of an effective tensor K.sup.a suitable for use with a variety of media. The analytical calculation corresponds to an approximation to the tensor K, and follows by first computing the diagonal values, and then identifying symmetries of the heterogeneity distribution. Additional calculations include determining the center of mass of the heterogeneous cell and its angle according to a defined Cartesian system, and utilizing this angle into a rotation formula to compute the off-diagonal values and determining its sign.
Guo, Xuesong; Zhou, Xin; Chen, Qiuwen; Liu, Junxin
2013-04-01
In the Orbal oxidation ditch, denitrification is primarily accomplished in the outer channel. However, the detailed characteristics of the flow field and dissolved oxygen (DO) distribution in the outer channel are not well understood. Therefore, in this study, the flow velocity and DO concentration in the outer channel of an Orbal oxidation ditch system in a wastewater treatment plant in Beijing (China) were monitored under actual operation conditions. The flow field and DO concentration distributions were analyzed by computed fluid dynamic modeling. In situ monitoring and modeling both showed that the flow velocity was heterogeneous in the outer channel. As a result, the DO was also heterogeneously distributed in the outer channel, with concentration gradients occurring along the flow direction as well as in the cross-section. This heterogeneous DO distribution created many anoxic and aerobic zones, which may have facilitated simultaneous nitrification-denitrification in the channel. These findings may provide supporting information for rational optimization of the performance of the Orbal oxidation ditch.
Roy, Rajarshi; Desai, Jaydev P.
2016-01-01
This paper outlines a comprehensive parametric approach for quantifying mechanical properties of spatially heterogeneous thin biological specimens such as human breast tissue using contact-mode Atomic Force Microscopy. Using inverse finite element (FE) analysis of spherical nanoindentation, the force response from hyperelastic material models is compared with the predicted force response from existing analytical contact models, and a sensitivity study is carried out to assess uniqueness of the inverse FE solution. Furthermore, an automation strategy is proposed to analyze AFM force curves with varying levels of material nonlinearity with minimal user intervention. Implementation of our approach on an elastic map acquired from raster AFM indentation of breast tissue specimens indicates that a judicious combination of analytical and numerical techniques allow more accurate interpretation of AFM indentation data compared to relying on purely analytical contact models, while keeping the computational cost associated an inverse FE solution with reasonable limits. The results reported in this study have several implications in performing unsupervised data analysis on AFM indentation measurements on a wide variety of heterogeneous biomaterials. PMID:25015130
NASA Astrophysics Data System (ADS)
Wetterling, F.; Liehr, M.; Schimpf, P.; Liu, H.; Haueisen, J.
2009-09-01
The non-invasive localization of focal heart activity via body surface potential measurements (BSPM) could greatly benefit the understanding and treatment of arrhythmic heart diseases. However, the in vivo validation of source localization algorithms is rather difficult with currently available measurement techniques. In this study, we used a physical torso phantom composed of different conductive compartments and seven dipoles, which were placed in the anatomical position of the human heart in order to assess the performance of the Recursively Applied and Projected Multiple Signal Classification (RAP-MUSIC) algorithm. Electric potentials were measured on the torso surface for single dipoles with and without further uncorrelated or correlated dipole activity. The localization error averaged 11 ± 5 mm over 22 dipoles, which shows the ability of RAP-MUSIC to distinguish an uncorrelated dipole from surrounding sources activity. For the first time, real computational modelling errors could be included within the validation procedure due to the physically modelled heterogeneities. In conclusion, the introduced heterogeneous torso phantom can be used to validate state-of-the-art algorithms under nearly realistic measurement conditions.
Message Efficient Checkpointing and Rollback Recovery in Heterogeneous Mobile Networks
NASA Astrophysics Data System (ADS)
Jaggi, Parmeet Kaur; Singh, Awadhesh Kumar
2016-06-01
Heterogeneous networks provide an appealing way of expanding the computing capability of mobile networks by combining infrastructure-less mobile ad-hoc networks with the infrastructure-based cellular mobile networks. The nodes in such a network range from low-power nodes to macro base stations and thus, vary greatly in their capabilities such as computation power and battery power. The nodes are susceptible to different types of transient and permanent failures and therefore, the algorithms designed for such networks need to be fault-tolerant. The article presents a checkpointing algorithm for the rollback recovery of mobile hosts in a heterogeneous mobile network. Checkpointing is a well established approach to provide fault tolerance in static and cellular mobile distributed systems. However, the use of checkpointing for fault tolerance in a heterogeneous environment remains to be explored. The proposed protocol is based on the results of zigzag paths and zigzag cycles by Netzer-Xu. Considering the heterogeneity prevalent in the network, an uncoordinated checkpointing technique is employed. Yet, useless checkpoints are avoided without causing a high message overhead.
Consolidating WLCG topology and configuration in the Computing Resource Information Catalogue
Alandes, Maria; Andreeva, Julia; Anisenkov, Alexey; ...
2017-10-01
Here, the Worldwide LHC Computing Grid infrastructure links about 200 participating computing centres affiliated with several partner projects. It is built by integrating heterogeneous computer and storage resources in diverse data centres all over the world and provides CPU and storage capacity to the LHC experiments to perform data processing and physics analysis. In order to be used by the experiments, these distributed resources should be well described, which implies easy service discovery and detailed description of service configuration. Currently this information is scattered over multiple generic information sources like GOCDB, OIM, BDII and experiment-specific information systems. Such a modelmore » does not allow to validate topology and configuration information easily. Moreover, information in various sources is not always consistent. Finally, the evolution of computing technologies introduces new challenges. Experiments are more and more relying on opportunistic resources, which by their nature are more dynamic and should also be well described in the WLCG information system. This contribution describes the new WLCG configuration service CRIC (Computing Resource Information Catalogue) which collects information from various information providers, performs validation and provides a consistent set of UIs and APIs to the LHC VOs for service discovery and usage configuration. The main requirements for CRIC are simplicity, agility and robustness. CRIC should be able to be quickly adapted to new types of computing resources, new information sources, and allow for new data structures to be implemented easily following the evolution of the computing models and operations of the experiments.« less
Consolidating WLCG topology and configuration in the Computing Resource Information Catalogue
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alandes, Maria; Andreeva, Julia; Anisenkov, Alexey
Here, the Worldwide LHC Computing Grid infrastructure links about 200 participating computing centres affiliated with several partner projects. It is built by integrating heterogeneous computer and storage resources in diverse data centres all over the world and provides CPU and storage capacity to the LHC experiments to perform data processing and physics analysis. In order to be used by the experiments, these distributed resources should be well described, which implies easy service discovery and detailed description of service configuration. Currently this information is scattered over multiple generic information sources like GOCDB, OIM, BDII and experiment-specific information systems. Such a modelmore » does not allow to validate topology and configuration information easily. Moreover, information in various sources is not always consistent. Finally, the evolution of computing technologies introduces new challenges. Experiments are more and more relying on opportunistic resources, which by their nature are more dynamic and should also be well described in the WLCG information system. This contribution describes the new WLCG configuration service CRIC (Computing Resource Information Catalogue) which collects information from various information providers, performs validation and provides a consistent set of UIs and APIs to the LHC VOs for service discovery and usage configuration. The main requirements for CRIC are simplicity, agility and robustness. CRIC should be able to be quickly adapted to new types of computing resources, new information sources, and allow for new data structures to be implemented easily following the evolution of the computing models and operations of the experiments.« less
CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research
Sherif, Tarek; Rioux, Pierre; Rousseau, Marc-Etienne; Kassis, Nicolas; Beck, Natacha; Adalat, Reza; Das, Samir; Glatard, Tristan; Evans, Alan C.
2014-01-01
The Canadian Brain Imaging Research Platform (CBRAIN) is a web-based collaborative research platform developed in response to the challenges raised by data-heavy, compute-intensive neuroimaging research. CBRAIN offers transparent access to remote data sources, distributed computing sites, and an array of processing and visualization tools within a controlled, secure environment. Its web interface is accessible through any modern browser and uses graphical interface idioms to reduce the technical expertise required to perform large-scale computational analyses. CBRAIN's flexible meta-scheduling has allowed the incorporation of a wide range of heterogeneous computing sites, currently including nine national research High Performance Computing (HPC) centers in Canada, one in Korea, one in Germany, and several local research servers. CBRAIN leverages remote computing cycles and facilitates resource-interoperability in a transparent manner for the end-user. Compared with typical grid solutions available, our architecture was designed to be easily extendable and deployed on existing remote computing sites with no tool modification, administrative intervention, or special software/hardware configuration. As October 2013, CBRAIN serves over 200 users spread across 53 cities in 17 countries. The platform is built as a generic framework that can accept data and analysis tools from any discipline. However, its current focus is primarily on neuroimaging research and studies of neurological diseases such as Autism, Parkinson's and Alzheimer's diseases, Multiple Sclerosis as well as on normal brain structure and development. This technical report presents the CBRAIN Platform, its current deployment and usage and future direction. PMID:24904400
CBRAIN: a web-based, distributed computing platform for collaborative neuroimaging research.
Sherif, Tarek; Rioux, Pierre; Rousseau, Marc-Etienne; Kassis, Nicolas; Beck, Natacha; Adalat, Reza; Das, Samir; Glatard, Tristan; Evans, Alan C
2014-01-01
The Canadian Brain Imaging Research Platform (CBRAIN) is a web-based collaborative research platform developed in response to the challenges raised by data-heavy, compute-intensive neuroimaging research. CBRAIN offers transparent access to remote data sources, distributed computing sites, and an array of processing and visualization tools within a controlled, secure environment. Its web interface is accessible through any modern browser and uses graphical interface idioms to reduce the technical expertise required to perform large-scale computational analyses. CBRAIN's flexible meta-scheduling has allowed the incorporation of a wide range of heterogeneous computing sites, currently including nine national research High Performance Computing (HPC) centers in Canada, one in Korea, one in Germany, and several local research servers. CBRAIN leverages remote computing cycles and facilitates resource-interoperability in a transparent manner for the end-user. Compared with typical grid solutions available, our architecture was designed to be easily extendable and deployed on existing remote computing sites with no tool modification, administrative intervention, or special software/hardware configuration. As October 2013, CBRAIN serves over 200 users spread across 53 cities in 17 countries. The platform is built as a generic framework that can accept data and analysis tools from any discipline. However, its current focus is primarily on neuroimaging research and studies of neurological diseases such as Autism, Parkinson's and Alzheimer's diseases, Multiple Sclerosis as well as on normal brain structure and development. This technical report presents the CBRAIN Platform, its current deployment and usage and future direction.
Consolidating WLCG topology and configuration in the Computing Resource Information Catalogue
NASA Astrophysics Data System (ADS)
Alandes, Maria; Andreeva, Julia; Anisenkov, Alexey; Bagliesi, Giuseppe; Belforte, Stephano; Campana, Simone; Dimou, Maria; Flix, Jose; Forti, Alessandra; di Girolamo, A.; Karavakis, Edward; Lammel, Stephan; Litmaath, Maarten; Sciaba, Andrea; Valassi, Andrea
2017-10-01
The Worldwide LHC Computing Grid infrastructure links about 200 participating computing centres affiliated with several partner projects. It is built by integrating heterogeneous computer and storage resources in diverse data centres all over the world and provides CPU and storage capacity to the LHC experiments to perform data processing and physics analysis. In order to be used by the experiments, these distributed resources should be well described, which implies easy service discovery and detailed description of service configuration. Currently this information is scattered over multiple generic information sources like GOCDB, OIM, BDII and experiment-specific information systems. Such a model does not allow to validate topology and configuration information easily. Moreover, information in various sources is not always consistent. Finally, the evolution of computing technologies introduces new challenges. Experiments are more and more relying on opportunistic resources, which by their nature are more dynamic and should also be well described in the WLCG information system. This contribution describes the new WLCG configuration service CRIC (Computing Resource Information Catalogue) which collects information from various information providers, performs validation and provides a consistent set of UIs and APIs to the LHC VOs for service discovery and usage configuration. The main requirements for CRIC are simplicity, agility and robustness. CRIC should be able to be quickly adapted to new types of computing resources, new information sources, and allow for new data structures to be implemented easily following the evolution of the computing models and operations of the experiments.
NASA Technical Reports Server (NTRS)
Lawrence, Charles; Putt, Charles W.
1997-01-01
The Visual Computing Environment (VCE) is a NASA Lewis Research Center project to develop a framework for intercomponent and multidisciplinary computational simulations. Many current engineering analysis codes simulate various aspects of aircraft engine operation. For example, existing computational fluid dynamics (CFD) codes can model the airflow through individual engine components such as the inlet, compressor, combustor, turbine, or nozzle. Currently, these codes are run in isolation, making intercomponent and complete system simulations very difficult to perform. In addition, management and utilization of these engineering codes for coupled component simulations is a complex, laborious task, requiring substantial experience and effort. To facilitate multicomponent aircraft engine analysis, the CFD Research Corporation (CFDRC) is developing the VCE system. This system, which is part of NASA's Numerical Propulsion Simulation System (NPSS) program, can couple various engineering disciplines, such as CFD, structural analysis, and thermal analysis. The objectives of VCE are to (1) develop a visual computing environment for controlling the execution of individual simulation codes that are running in parallel and are distributed on heterogeneous host machines in a networked environment, (2) develop numerical coupling algorithms for interchanging boundary conditions between codes with arbitrary grid matching and different levels of dimensionality, (3) provide a graphical interface for simulation setup and control, and (4) provide tools for online visualization and plotting. VCE was designed to provide a distributed, object-oriented environment. Mechanisms are provided for creating and manipulating objects, such as grids, boundary conditions, and solution data. This environment includes parallel virtual machine (PVM) for distributed processing. Users can interactively select and couple any set of codes that have been modified to run in a parallel distributed fashion on a cluster of heterogeneous workstations. A scripting facility allows users to dictate the sequence of events that make up the particular simulation.
Overview of the LINCS architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fletcher, J.G.; Watson, R.W.
1982-01-13
Computing at the Lawrence Livermore National Laboratory (LLNL) has evolved over the past 15 years with a computer network based resource sharing environment. The increasing use of low cost and high performance micro, mini and midi computers and commercially available local networking systems will accelerate this trend. Further, even the large scale computer systems, on which much of the LLNL scientific computing depends, are evolving into multiprocessor systems. It is our belief that the most cost effective use of this environment will depend on the development of application systems structured into cooperating concurrent program modules (processes) distributed appropriately over differentmore » nodes of the environment. A node is defined as one or more processors with a local (shared) high speed memory. Given the latter view, the environment can be characterized as consisting of: multiple nodes communicating over noisy channels with arbitrary delays and throughput, heterogenous base resources and information encodings, no single administration controlling all resources, distributed system state, and no uniform time base. The system design problem is - how to turn the heterogeneous base hardware/firmware/software resources of this environment into a coherent set of resources that facilitate development of cost effective, reliable, and human engineered applications. We believe the answer lies in developing a layered, communication oriented distributed system architecture; layered and modular to support ease of understanding, reconfiguration, extensibility, and hiding of implementation or nonessential local details; communication oriented because that is a central feature of the environment. The Livermore Interactive Network Communication System (LINCS) is a hierarchical architecture designed to meet the above needs. While having characteristics in common with other architectures, it differs in several respects.« less
The effect of material heterogeneities in long term multiscale seismic cycle simulations
NASA Astrophysics Data System (ADS)
Kyriakopoulos, C.; Richards-Dinger, K. B.; Dieterich, J. H.
2016-12-01
A fundamental part of the simulation of the earthquake cycles in large-scale multicycle earthquake simulators is the pre-computation of elastostatic Greens functions collected into the stiffness matrix (K). The stiffness matrices are typically based on the elastostatic solutions of Okada (1992), Gimbutas et al. (2012), or similar. While these analytic solutions are computationally very fast, they are limited to modeling a homogeneous isotropic half-space. It is thus unknown how such simulations may be affected by material heterogeneity characterizing the earth medium. We are currently working on the estimation of the effects of heterogeneous material properties in the earthquake simulator RSQSim (Richards-Dinger and Dieterich, 2012). In order to do that we are calculating elastostatic solutions in a heterogeneous medium using the Finite Element (FE) method instead of any of the analytical solutions. The investigated region is a 400 x 400 km area centered on the Anza zone in southern California. The fault system geometry is based on that of the UCERF3 deformation models in the area of interest, which we then implement in a finite element mesh using Trelis 15. The heterogeneous elastic structure is based on available tomographic data (seismic wavespeeds and density) for the region (SCEC CVM and Allam et al., 2014). For computation of the Greens functions we are using the open source FE code Defmod (https://bitbucket.org/stali/defmod/wiki/Home) to calculate the elastostatic solutions due to unit slip on each patch. Earthquake slip on the fault plane is implemented through linear constraint equations (Ali et al., 2014, Kyriakopoulos et al., 2013, Aagard et al, 2015) and more specifically with the use of Lagrange multipliers adjunction. The elementary responses are collected into the "heterogeneous" stiffness matrix Khet and used in RSQSim instead of the ones generated with Okada. Finally, we compare the RSQSim results based on the "heterogeneous" Khet with results from Khom (stiffness matrix generated from the same mesh as Khet but using homogeneous material properties). The estimation of the effect of heterogeneous material properties in the seismic cycles simulated by RSQSim is a needed experiment that will allow us to evaluate the impact of heterogeneities in earthquake simulators.
On salesmen and tourists: Two-step optimization in deterministic foragers
NASA Astrophysics Data System (ADS)
Maya, Miguel; Miramontes, Octavio; Boyer, Denis
2017-02-01
We explore a two-step optimization problem in random environments, the so-called restaurant-coffee shop problem, where a walker aims at visiting the nearest and better restaurant in an area and then move to the nearest and better coffee-shop. This is an extension of the Tourist Problem, a one-step optimization dynamics that can be viewed as a deterministic walk in a random medium. A certain amount of heterogeneity in the values of the resources to be visited causes the emergence of power-laws distributions for the steps performed by the walker, similarly to a Lévy flight. The fluctuations of the step lengths tend to decrease as a consequence of multiple-step planning, thus reducing the foraging uncertainty. We find that the first and second steps of each planned movement play very different roles in heterogeneous environments. The two-step process improves only slightly the foraging efficiency compared to the one-step optimization, at a much higher computational cost. We discuss the implications of these findings for animal and human mobility, in particular in relation to the computational effort that informed agents should deploy to solve search problems.
A characterization of workflow management systems for extreme-scale applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...
2017-02-16
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.
Scharfe, Michael; Pielot, Rainer; Schreiber, Falk
2010-01-11
Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Toward performance portability of the Albany finite element analysis code using the Kokkos library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Toward performance portability of the Albany finite element analysis code using the Kokkos library
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...
2018-02-05
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Heterogeneous variances in multi-environment yield trials for corn hybrids
USDA-ARS?s Scientific Manuscript database
Recent developments in statistics and computing have enabled much greater levels of complexity in statistical models of multi-environment yield trial data. One particular feature of interest to breeders is simultaneously modeling heterogeneity of variances among environments and cultivars. Our obj...
Hielscher, Andreas H; Bartel, Sebastian
2004-02-01
Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.
An Overview of MSHN: The Management System for Heterogeneous Networks
1999-04-01
An Overview of MSHN: The Management System for Heterogeneous Networks Debra A. Hensgen†, Taylor Kidd†, David St. John§, Matthew C . Schnaidt†, Howard...ABSTRACT UU 18. NUMBER OF PAGES 15 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b. ABSTRACT unclassified c . THIS PAGE...Alhusaini, V. K. Prasanna, and C . S. Raghavendra, “A unified resource scheduling framework for heterogeneous computing environments,” Proc. 8th IEEE
Research on detecting heterogeneous fibre from cotton based on linear CCD camera
NASA Astrophysics Data System (ADS)
Zhang, Xian-bin; Cao, Bing; Zhang, Xin-peng; Shi, Wei
2009-07-01
The heterogeneous fibre in cotton make a great impact on production of cotton textile, it will have a bad effect on the quality of product, thereby affect economic benefits and market competitive ability of corporation. So the detecting and eliminating of heterogeneous fibre is particular important to improve machining technics of cotton, advance the quality of cotton textile and reduce production cost. There are favorable market value and future development for this technology. An optical detecting system obtains the widespread application. In this system, we use a linear CCD camera to scan the running cotton, then the video signals are put into computer and processed according to the difference of grayscale, if there is heterogeneous fibre in cotton, the computer will send an order to drive the gas nozzle to eliminate the heterogeneous fibre. In the paper, we adopt monochrome LED array as the new detecting light source, it's lamp flicker, stability of luminous intensity, lumens depreciation and useful life are all superior to fluorescence light. We analyse the reflection spectrum of cotton and various heterogeneous fibre first, then select appropriate frequency of the light source, we finally adopt violet LED array as the new detecting light source. The whole hardware structure and software design are introduced in this paper.
NASA Astrophysics Data System (ADS)
Jung, Jin Woo; Lee, Jung-Seob; Cho, Dong-Woo
2016-02-01
Recently, much attention has focused on replacement or/and enhancement of biological tissues via the use of cell-laden hydrogel scaffolds with an architecture that mimics the tissue matrix, and with the desired three-dimensional (3D) external geometry. However, mimicking the heterogeneous tissues that most organs and tissues are formed of is challenging. Although multiple-head 3D printing systems have been proposed for fabricating heterogeneous cell-laden hydrogel scaffolds, to date only the simple exterior form has been realized. Here we describe a computer-aided design and manufacturing (CAD/CAM) system for this application. We aim to develop an algorithm to enable easy, intuitive design and fabrication of a heterogeneous cell-laden hydrogel scaffolds with a free-form 3D geometry. The printing paths of the scaffold are automatically generated from the 3D CAD model, and the scaffold is then printed by dispensing four materials; i.e., a frame, two kinds of cell-laden hydrogel and a support. We demonstrated printing of heterogeneous tissue models formed of hydrogel scaffolds using this approach, including the outer ear, kidney and tooth tissue. These results indicate that this approach is particularly promising for tissue engineering and 3D printing applications to regenerate heterogeneous organs and tissues with tailored geometries to treat specific defects or injuries.
Jung, Jin Woo; Lee, Jung-Seob; Cho, Dong-Woo
2016-02-22
Recently, much attention has focused on replacement or/and enhancement of biological tissues via the use of cell-laden hydrogel scaffolds with an architecture that mimics the tissue matrix, and with the desired three-dimensional (3D) external geometry. However, mimicking the heterogeneous tissues that most organs and tissues are formed of is challenging. Although multiple-head 3D printing systems have been proposed for fabricating heterogeneous cell-laden hydrogel scaffolds, to date only the simple exterior form has been realized. Here we describe a computer-aided design and manufacturing (CAD/CAM) system for this application. We aim to develop an algorithm to enable easy, intuitive design and fabrication of a heterogeneous cell-laden hydrogel scaffolds with a free-form 3D geometry. The printing paths of the scaffold are automatically generated from the 3D CAD model, and the scaffold is then printed by dispensing four materials; i.e., a frame, two kinds of cell-laden hydrogel and a support. We demonstrated printing of heterogeneous tissue models formed of hydrogel scaffolds using this approach, including the outer ear, kidney and tooth tissue. These results indicate that this approach is particularly promising for tissue engineering and 3D printing applications to regenerate heterogeneous organs and tissues with tailored geometries to treat specific defects or injuries.
The Transition to a Many-core World
NASA Astrophysics Data System (ADS)
Mattson, T. G.
2012-12-01
The need to increase performance within a fixed energy budget has pushed the computer industry to many core processors. This is grounded in the physics of computing and is not a trend that will just go away. It is hard to overestimate the profound impact of many-core processors on software developers. Virtually every facet of the software development process will need to change to adapt to these new processors. In this talk, we will look at many-core hardware and consider its evolution from a perspective grounded in the CPU. We will show that the number of cores will inevitably increase, but in addition, a quest to maximize performance per watt will push these cores to be heterogeneous. We will show that the inevitable result of these changes is a computing landscape where the distinction between the CPU and the GPU is blurred. We will then consider the much more pressing problem of software in a many core world. Writing software for heterogeneous many core processors is well beyond the ability of current programmers. One solution is to support a software development process where programmer teams are split into two distinct groups: a large group of domain-expert productivity programmers and much smaller team of computer-scientist efficiency programmers. The productivity programmers work in terms of high level frameworks to express the concurrency in their problems while avoiding any details for how that concurrency is exploited. The second group, the efficiency programmers, map applications expressed in terms of these frameworks onto the target many-core system. In other words, we can solve the many-core software problem by creating a software infrastructure that only requires a small subset of programmers to become master parallel programmers. This is different from the discredited dream of automatic parallelism. Note that productivity programmers still need to define the architecture of their software in a way that exposes the concurrency inherent in their problem. We submit that domain-expert programmers understand "what is concurrent". The parallel programming problem emerges from the complexity of "how that concurrency is utilized" on real hardware. The research described in this talk was carried out in collaboration with the ParLab at UC Berkeley. We use a design pattern language to define the high level frameworks exposed to domain-expert, productivity programmers. We then use tools from the SEJITS project (Selective embedded Just In time Specializers) to build the software transformation tool chains thst turn these framework-oriented designs into highly efficient code. The final ingredient is a software platform to serve as a target for these tools. One such platform is the OpenCL industry standard for programming heterogeneous systems. We will briefly describe OpenCL and show how it provides a vendor-neutral software target for current and future many core systems; both CPU-based, GPU-based, and heterogeneous combinations of the two.
Shamwell, E Jared; Nothwang, William D; Perlis, Donald
2018-05-04
Aimed at improving size, weight, and power (SWaP)-constrained robotic vision-aided state estimation, we describe our unsupervised, deep convolutional-deconvolutional sensor fusion network, Multi-Hypothesis DeepEfference (MHDE). MHDE learns to intelligently combine noisy heterogeneous sensor data to predict several probable hypotheses for the dense, pixel-level correspondence between a source image and an unseen target image. We show how our multi-hypothesis formulation provides increased robustness against dynamic, heteroscedastic sensor and motion noise by computing hypothesis image mappings and predictions at 76⁻357 Hz depending on the number of hypotheses being generated. MHDE fuses noisy, heterogeneous sensory inputs using two parallel, inter-connected architectural pathways and n (1⁻20 in this work) multi-hypothesis generating sub-pathways to produce n global correspondence estimates between a source and a target image. We evaluated MHDE on the KITTI Odometry dataset and benchmarked it against the vision-only DeepMatching and Deformable Spatial Pyramids algorithms and were able to demonstrate a significant runtime decrease and a performance increase compared to the next-best performing method.
NASA Astrophysics Data System (ADS)
Zimoń, Małgorzata; Sawko, Robert; Emerson, David; Thompson, Christopher
2017-11-01
Uncertainty quantification (UQ) is increasingly becoming an indispensable tool for assessing the reliability of computational modelling. Efficient handling of stochastic inputs, such as boundary conditions, physical properties or geometry, increases the utility of model results significantly. We discuss the application of non-intrusive generalised polynomial chaos techniques in the context of fluid engineering simulations. Deterministic and Monte Carlo integration rules are applied to a set of problems, including ordinary differential equations and the computation of aerodynamic parameters subject to random perturbations. In particular, we analyse acoustic wave propagation in a heterogeneous medium to study the effects of mesh resolution, transients, number and variability of stochastic inputs. We consider variants of multi-level Monte Carlo and perform a novel comparison of the methods with respect to numerical and parametric errors, as well as computational cost. The results provide a comprehensive view of the necessary steps in UQ analysis and demonstrate some key features of stochastic fluid flow systems.
Accelerating the discovery of space-time patterns of infectious diseases using parallel computing.
Hohl, Alexander; Delmelle, Eric; Tang, Wenwu; Casas, Irene
2016-11-01
Infectious diseases have complex transmission cycles, and effective public health responses require the ability to monitor outbreaks in a timely manner. Space-time statistics facilitate the discovery of disease dynamics including rate of spread and seasonal cyclic patterns, but are computationally demanding, especially for datasets of increasing size, diversity and availability. High-performance computing reduces the effort required to identify these patterns, however heterogeneity in the data must be accounted for. We develop an adaptive space-time domain decomposition approach for parallel computation of the space-time kernel density. We apply our methodology to individual reported dengue cases from 2010 to 2011 in the city of Cali, Colombia. The parallel implementation reaches significant speedup compared to sequential counterparts. Density values are visualized in an interactive 3D environment, which facilitates the identification and communication of uneven space-time distribution of disease events. Our framework has the potential to enhance the timely monitoring of infectious diseases. Copyright © 2016 Elsevier Ltd. All rights reserved.
Exploring Asynchronous Many-Task Runtime Systems toward Extreme Scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Knight, Samuel; Baker, Gavin Matthew; Gamell, Marc
2015-10-01
Major exascale computing reports indicate a number of software challenges to meet the dramatic change of system architectures in near future. While several-orders-of-magnitude increase in parallelism is the most commonly cited of those, hurdles also include performance heterogeneity of compute nodes across the system, increased imbalance between computational capacity and I/O capabilities, frequent system interrupts, and complex hardware architectures. Asynchronous task-parallel programming models show a great promise in addressing these issues, but are not yet fully understood nor developed su ciently for computational science and engineering application codes. We address these knowledge gaps through quantitative and qualitative exploration of leadingmore » candidate solutions in the context of engineering applications at Sandia. In this poster, we evaluate MiniAero code ported to three leading candidate programming models (Charm++, Legion and UINTAH) to examine the feasibility of these models that permits insertion of new programming model elements into an existing code base.« less
Martínez, Carlos Alberto; Khare, Kshitij; Banerjee, Arunava; Elzo, Mauricio A
2017-03-21
It is important to consider heterogeneity of marker effects and allelic frequencies in across population genome-wide prediction studies. Moreover, all regression models used in genome-wide prediction overlook randomness of genotypes. In this study, a family of hierarchical Bayesian models to perform across population genome-wide prediction modeling genotypes as random variables and allowing population-specific effects for each marker was developed. Models shared a common structure and differed in the priors used and the assumption about residual variances (homogeneous or heterogeneous). Randomness of genotypes was accounted for by deriving the joint probability mass function of marker genotypes conditional on allelic frequencies and pedigree information. As a consequence, these models incorporated kinship and genotypic information that not only permitted to account for heterogeneity of allelic frequencies, but also to include individuals with missing genotypes at some or all loci without the need for previous imputation. This was possible because the non-observed fraction of the design matrix was treated as an unknown model parameter. For each model, a simpler version ignoring population structure, but still accounting for randomness of genotypes was proposed. Implementation of these models and computation of some criteria for model comparison were illustrated using two simulated datasets. Theoretical and computational issues along with possible applications, extensions and refinements were discussed. Some features of the models developed in this study make them promising for genome-wide prediction, the use of information contained in the probability distribution of genotypes is perhaps the most appealing. Further studies to assess the performance of the models proposed here and also to compare them with conventional models used in genome-wide prediction are needed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Gariani, Joanna; Westerland, Olwen; Natas, Sarah; Verma, Hema; Cook, Gary; Goh, Vicky
2018-04-01
To undertake a systematic review to determine the diagnostic performance of whole body MRI (WBMRI) including diffusion weighted sequences (DWI) compared to whole body computed tomography (WBCT) or 18 F-fluorodeoxyglucose positron emission tomography/CT ( 18 F-FDG PET/CT) in patients with myeloma. Two researchers searched the primary literature independently for WBMRI studies of myeloma. Data were extracted focusing on the diagnostic ability of WBMRI versus WBCT and 18 F-FDG PET/CT. Meta-analysis was intended. 6 of 2857 articles were eligible that included 147 patients, published from 2008 to 2016. Studies were heterogeneous including both newly diagnosed & relapsed patients. All were single centre studies. Four of the six studies (66.7%) accrued prospectively and 5/6 (83.3%, 3 prospective) included WBMRI and 18 F-FDG PET/CT. Three of seven (42.9%) included DWI. The lack of an independent reference standard for individual lesions was noted in 5/6 (83.3%) studies. Studies reported that WBMRI detected more lesions than 18 F-FDG PET/CT (sensitivity 68-100% versus 47-100%) but was less specific (specificity 37-83% versus 62-85.7%). No paper assessed impact on management. Studies were heterogeneous, the majority lacking an independent reference standard. Future prospective trials should address these limitations and assess the impact of WBMRI on management. Copyright © 2018. Published by Elsevier B.V.
Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A
2010-01-01
Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
Ubiquitous Green Computing Techniques for High Demand Applications in Smart Environments
Zapater, Marina; Sanchez, Cesar; Ayala, Jose L.; Moya, Jose M.; Risco-Martín, José L.
2012-01-01
Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time. PMID:23112621
Distributed computing for membrane-based modeling of action potential propagation.
Porras, D; Rogers, J M; Smith, W M; Pollard, A E
2000-08-01
Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
An interactive parallel programming environment applied in atmospheric science
NASA Technical Reports Server (NTRS)
vonLaszewski, G.
1996-01-01
This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening.
Karthikeyan, Muthukumarasamy; Pandit, Deepak; Bhavasar, Arvind; Vyas, Renu
2015-01-01
The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platorm ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.
Ubiquitous green computing techniques for high demand applications in Smart environments.
Zapater, Marina; Sanchez, Cesar; Ayala, Jose L; Moya, Jose M; Risco-Martín, José L
2012-01-01
Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.
DeCourcy, Kelly; Hostnik, Eric T; Lorbach, Josh; Knoblaugh, Sue
2016-12-01
An adult leopard gecko ( Eublepharis macularius ) presented for lethargy, hyporexia, weight loss, decreased passage of waste, and a palpable caudal coelomic mass. Computed tomography showed a heterogeneous hyperattenuating (∼143 Hounsfield units) structure within the right caudal coelom. The distal colon-coprodeum lumen or urinary bladder was hypothesized as the most likely location for the heterogeneous structure. Medical support consisted of warm water and lubricant enema, as well as a heated environment. Medical intervention aided the passage of a plug comprised centrally of cholesterol and urates with peripheral stratified layers of fibrin, macrophages, heterophils, and bacteria. Within 24 hr, a follow-up computed tomography scan showed resolution of the pelvic canal plug.
Performance related issues in distributed database systems
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
The key elements of research performed during the year long effort of this project are: Investigate the effects of heterogeneity in distributed real time systems; Study the requirements to TRAC towards building a heterogeneous database system; Study the effects of performance modeling on distributed database performance; and Experiment with an ORACLE based heterogeneous system.
A computational model was developed to simulate aquifer remediation by pump and treat for a confined, perfectly stratified aquifer. plit-operator finite element numerical technique was utilized to incorporate flow field heterogeneity and nonequilibrium sorption into a two-dimensi...
An approach for drag correction based on the local heterogeneity for gas-solid flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Tingwen; Wang, Limin; Rogers, William
2016-09-22
The drag models typically used for gas-solids interaction are mainly developed based on homogeneous systems of flow passing fixed particle assembly. It has been shown that the heterogeneous structures, i.e., clusters and bubbles in fluidized beds, need to be resolved to account for their effect in the numerical simulations. Since the heterogeneity is essentially captured through the local concentration gradient in the computational cells, this study proposes a simple approach to account for the non-uniformity of solids spatial distribution inside a computational cell and its effect on the interaction between gas and solid phases. Finally, to validate this approach, themore » predicted drag coefficient has been compared to the results from direct numerical simulations. In addition, the need to account for this type of heterogeneity is discussed for a periodic riser flow simulation with highly resolved numerical grids and the impact of the proposed correction for drag is demonstrated.« less
Distributed MRI reconstruction using Gadgetron-based cloud computing.
Xue, Hui; Inati, Souheil; Sørensen, Thomas Sangild; Kellman, Peter; Hansen, Michael S
2015-03-01
To expand the open source Gadgetron reconstruction framework to support distributed computing and to demonstrate that a multinode version of the Gadgetron can be used to provide nonlinear reconstruction with clinically acceptable latency. The Gadgetron framework was extended with new software components that enable an arbitrary number of Gadgetron instances to collaborate on a reconstruction task. This cloud-enabled version of the Gadgetron was deployed on three different distributed computing platforms ranging from a heterogeneous collection of commodity computers to the commercial Amazon Elastic Compute Cloud. The Gadgetron cloud was used to provide nonlinear, compressed sensing reconstruction on a clinical scanner with low reconstruction latency (eg, cardiac and neuroimaging applications). The proposed setup was able to handle acquisition and 11 -SPIRiT reconstruction of nine high temporal resolution real-time, cardiac short axis cine acquisitions, covering the ventricles for functional evaluation, in under 1 min. A three-dimensional high-resolution brain acquisition with 1 mm(3) isotropic pixel size was acquired and reconstructed with nonlinear reconstruction in less than 5 min. A distributed computing enabled Gadgetron provides a scalable way to improve reconstruction performance using commodity cluster computing. Nonlinear, compressed sensing reconstruction can be deployed clinically with low image reconstruction latency. © 2014 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2013-05-23
Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting simulation program, has been used to conduct daylighting simulations for complex fenestration systems. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of new approach wasmore » evaluated using various daylighting simulation cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.« less
Performance of the engineering analysis and data system 2 common file system
NASA Technical Reports Server (NTRS)
Debrunner, Linda S.
1993-01-01
The Engineering Analysis and Data System (EADS) was used from April 1986 to July 1993 to support large scale scientific and engineering computation (e.g. computational fluid dynamics) at Marshall Space Flight Center. The need for an updated system resulted in a RFP in June 1991, after which a contract was awarded to Cray Grumman. EADS II was installed in February 1993, and by July 1993 most users were migrated. EADS II is a network of heterogeneous computer systems supporting scientific and engineering applications. The Common File System (CFS) is a key component of this system. The CFS provides a seamless, integrated environment to the users of EADS II including both disk and tape storage. UniTree software is used to implement this hierarchical storage management system. The performance of the CFS suffered during the early months of the production system. Several of the performance problems were traced to software bugs which have been corrected. Other problems were associated with hardware. However, the use of NFS in UniTree UCFM software limits the performance of the system. The performance issues related to the CFS have led to a need to develop a greater understanding of the CFS organization. This paper will first describe the EADS II with emphasis on the CFS. Then, a discussion of mass storage systems will be presented, and methods of measuring the performance of the Common File System will be outlined. Finally, areas for further study will be identified and conclusions will be drawn.
Shi, Longxiang; Li, Shijian; Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin
2017-01-01
With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective.
Yang, Xiaoran; Qi, Jiaheng; Pan, Gang; Zhou, Binbin
2017-01-01
With the explosion of healthcare information, there has been a tremendous amount of heterogeneous textual medical knowledge (TMK), which plays an essential role in healthcare information systems. Existing works for integrating and utilizing the TMK mainly focus on straightforward connections establishment and pay less attention to make computers interpret and retrieve knowledge correctly and quickly. In this paper, we explore a novel model to organize and integrate the TMK into conceptual graphs. We then employ a framework to automatically retrieve knowledge in knowledge graphs with a high precision. In order to perform reasonable inference on knowledge graphs, we propose a contextual inference pruning algorithm to achieve efficient chain inference. Our algorithm achieves a better inference result with precision and recall of 92% and 96%, respectively, which can avoid most of the meaningless inferences. In addition, we implement two prototypes and provide services, and the results show our approach is practical and effective. PMID:28299322
Is a matrix exponential specification suitable for the modeling of spatial correlation structures?
Strauß, Magdalena E.; Mezzetti, Maura; Leorato, Samantha
2018-01-01
This paper investigates the adequacy of the matrix exponential spatial specifications (MESS) as an alternative to the widely used spatial autoregressive models (SAR). To provide as complete a picture as possible, we extend the analysis to all the main spatial models governed by matrix exponentials comparing them with their spatial autoregressive counterparts. We propose a new implementation of Bayesian parameter estimation for the MESS model with vague prior distributions, which is shown to be precise and computationally efficient. Our implementations also account for spatially lagged regressors. We further allow for location-specific heterogeneity, which we model by including spatial splines. We conclude by comparing the performances of the different model specifications in applications to a real data set and by running simulations. Both the applications and the simulations suggest that the spatial splines are a flexible and efficient way to account for spatial heterogeneities governed by unknown mechanisms. PMID:29492375
A Method for the Alignment of Heterogeneous Macromolecules from Electron Microscopy
Shatsky, Maxim; Hall, Richard J.; Brenner, Steven E.; Glaeser, Robert M.
2009-01-01
We propose a feature-based image alignment method for single-particle electron microscopy that is able to accommodate various similarity scoring functions while efficiently sampling the two-dimensional transformational space. We use this image alignment method to evaluate the performance of a scoring function that is based on the Mutual Information (MI) of two images rather than one that is based on the cross-correlation function. We show that alignment using MI for the scoring function has far less model-dependent bias than is found with cross-correlation based alignment. We also demonstrate that MI improves the alignment of some types of heterogeneous data, provided that the signal to noise ratio is relatively high. These results indicate, therefore, that use of MI as the scoring function is well suited for the alignment of class-averages computed from single particle images. Our method is tested on data from three model structures and one real dataset. PMID:19166941
Computational homogenisation for thermoviscoplasticity: application to thermally sprayed coatings
NASA Astrophysics Data System (ADS)
Berthelsen, Rolf; Denzer, Ralf; Oppermann, Philip; Menzel, Andreas
2017-11-01
Metal forming processes require wear-resistant tool surfaces in order to ensure a long life cycle of the expensive tools together with a constant high quality of the produced components. Thermal spraying is a relatively widely applied coating technique for the deposit of wear protection coatings. During these coating processes, heterogeneous coatings are deployed at high temperatures followed by quenching where residual stresses occur which strongly influence the performance of the coated tools. The objective of this article is to discuss and apply a thermo-mechanically coupled simulation framework which captures the heterogeneity of the deposited coating material. Therefore, a two-scale finite element framework for the solution of nonlinear thermo-mechanically coupled problems is elaborated and applied to the simulation of thermoviscoplastic material behaviour including nonlinear thermal softening in a geometrically linearised setting. The finite element framework and material model is demonstrated by means of numerical examples.
Dong, Xinzhe; Wu, Peipei; Sun, Xiaorong; Li, Wenwu; Wan, Honglin; Yu, Jinming; Xing, Ligang
2015-06-01
This study aims to explore whether the intra-tumour (18) F-fluorodeoxyglucose (FDG) uptake heterogeneity affects the reliability of target volume definition with FDG positron emission tomography/computed tomography (PET/CT) imaging for nonsmall cell lung cancer (NSCLC) and squamous cell oesophageal cancer (SCEC). Patients with NSCLC (n = 50) or SCEC (n = 50) who received (18)F-FDG PET/CT scanning before treatments were included in this retrospective study. Intra-tumour FDG uptake heterogeneity was assessed by visual scoring, the coefficient of variation (COV) of the standardised uptake value (SUV) and the image texture feature (entropy). Tumour volumes (gross tumour volume (GTV)) were delineated on the CT images (GTV(CT)), the fused PET/CT images (GTV(PET-CT)) and the PET images, using a threshold at 40% SUV(max) (GTV(PET40%)) or the SUV cut-off value of 2.5 (GTV(PET2.5)). The correlation between the FDG uptake heterogeneity parameters and the differences in tumour volumes among GTV(CT), GTV(PET-CT), GTV(PET40%) and GTV(PET2.5) was analysed. For both NSCLC and SCEC, obvious correlations were found between uptake heterogeneity, SUV or tumour volumes. Three types of heterogeneity parameters were consistent and closely related to each other. Substantial differences between the four methods of GTV definition were found. The differences between the GTV correlated significantly with PET heterogeneity defined with the visual score, the COV or the textural feature-entropy for NSCLC and SCEC. In tumours with a high FDG uptake heterogeneity, a larger GTV delineation difference was found. Advance image segmentation algorithms dealing with tracer uptake heterogeneity should be incorporated into the treatment planning system. © 2015 The Royal Australian and New Zealand College of Radiologists.
A heterogeneous hierarchical architecture for real-time computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skroch, D.A.; Fornaro, R.J.
The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
Case retrieval in medical databases by fusing heterogeneous information.
Quellec, Gwénolé; Lamard, Mathieu; Cazuguel, Guy; Roux, Christian; Cochener, Béatrice
2011-01-01
A novel content-based heterogeneous information retrieval framework, particularly well suited to browse medical databases and support new generation computer aided diagnosis (CADx) systems, is presented in this paper. It was designed to retrieve possibly incomplete documents, consisting of several images and semantic information, from a database; more complex data types such as videos can also be included in the framework. The proposed retrieval method relies on image processing, in order to characterize each individual image in a document by their digital content, and information fusion. Once the available images in a query document are characterized, a degree of match, between the query document and each reference document stored in the database, is defined for each attribute (an image feature or a metadata). A Bayesian network is used to recover missing information if need be. Finally, two novel information fusion methods are proposed to combine these degrees of match, in order to rank the reference documents by decreasing relevance for the query. In the first method, the degrees of match are fused by the Bayesian network itself. In the second method, they are fused by the Dezert-Smarandache theory: the second approach lets us model our confidence in each source of information (i.e., each attribute) and take it into account in the fusion process for a better retrieval performance. The proposed methods were applied to two heterogeneous medical databases, a diabetic retinopathy database and a mammography screening database, for computer aided diagnosis. Precisions at five of 0.809 ± 0.158 and 0.821 ± 0.177, respectively, were obtained for these two databases, which is very promising.
NASA Astrophysics Data System (ADS)
Hao, Y.; Smith, M. M.; Mason, H. E.; Carroll, S.
2015-12-01
It has long been appreciated that chemical interactions have a major effect on rock porosity and permeability evolution and may alter the behavior or performance of both natural and engineered reservoir systems. Such reaction-induced permeability evolution is of particular importance for geological CO2 sequestration and storage associated with enhanced oil recovery. In this study we used a three-dimensional Darcy scale reactive transport model to simulate CO2 core flood experiments in which the CO2-equilibrated brine was injected into dolostone cores collected from the Arbuckle carbonate reservoir, Wellington, Kansas. Heterogeneous distributions of macro pores, fractures, and mineral phases inside the cores were obtained from X-ray computed microtomography (XCMT) characterization data, and then used to construct initial model macroscopic properties including porosity, permeability, and mineral compositions. The reactive transport simulations were performed by using the Nonisothermal Unsaturated Flow and Transport (NUFT) code, and their results were compared with experimental data. It was observed both experimentally and numerically that the dissolution fronts became unstable in highly heterogeneous and less permeable formations, leading to the development of highly porous flow paths or wormholes. Our model results indicate that the continuum-scale reactive transport models are able to adequately capture the evolution of distinct dissolution fronts as observed in carbonate rocks at a core scale. The impacts of rock heterogeneity, chemical kinetics and porosity-permeability relationships were also examined in this study. The numerical model developed in this study will not only help improve understanding of coupled physical and chemical processes controlling carbonate dissolution, but also provide a useful basis for upscaling transport and reaction properties from core scale to field scale. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Real-time simulation of contact and cutting of heterogeneous soft-tissues.
Courtecuisse, Hadrien; Allard, Jérémie; Kerfriden, Pierre; Bordas, Stéphane P A; Cotin, Stéphane; Duriez, Christian
2014-02-01
This paper presents a numerical method for interactive (real-time) simulations, which considerably improves the accuracy of the response of heterogeneous soft-tissue models undergoing contact, cutting and other topological changes. We provide an integrated methodology able to deal both with the ill-conditioning issues associated with material heterogeneities, contact boundary conditions which are one of the main sources of inaccuracies, and cutting which is one of the most challenging issues in interactive simulations. Our approach is based on an implicit time integration of a non-linear finite element model. To enable real-time computations, we propose a new preconditioning technique, based on an asynchronous update at low frequency. The preconditioner is not only used to improve the computation of the deformation of the tissues, but also to simulate the contact response of homogeneous and heterogeneous bodies with the same accuracy. We also address the problem of cutting the heterogeneous structures and propose a method to update the preconditioner according to the topological modifications. Finally, we apply our approach to three challenging demonstrators: (i) a simulation of cataract surgery (ii) a simulation of laparoscopic hepatectomy (iii) a brain tumor surgery. Copyright © 2013 Elsevier B.V. All rights reserved.
Assessing the Performance of a Computer-Based Policy Model of HIV and AIDS
Rydzak, Chara E.; Cotich, Kara L.; Sax, Paul E.; Hsu, Heather E.; Wang, Bingxia; Losina, Elena; Freedberg, Kenneth A.; Weinstein, Milton C.; Goldie, Sue J.
2010-01-01
Background Model-based analyses, conducted within a decision analytic framework, provide a systematic way to combine information about the natural history of disease and effectiveness of clinical management strategies with demographic and epidemiological characteristics of the population. Among the challenges with disease-specific modeling include the need to identify influential assumptions and to assess the face validity and internal consistency of the model. Methods and Findings We describe a series of exercises involved in adapting a computer-based simulation model of HIV disease to the Women's Interagency HIV Study (WIHS) cohort and assess model performance as we re-parameterized the model to address policy questions in the U.S. relevant to HIV-infected women using data from the WIHS. Empiric calibration targets included 24-month survival curves stratified by treatment status and CD4 cell count. The most influential assumptions in untreated women included chronic HIV-associated mortality following an opportunistic infection, and in treated women, the ‘clinical effectiveness’ of HAART and the ability of HAART to prevent HIV complications independent of virologic suppression. Good-fitting parameter sets required reductions in the clinical effectiveness of 1st and 2nd line HAART and improvements in 3rd and 4th line regimens. Projected rates of treatment regimen switching using the calibrated cohort-specific model closely approximated independent analyses published using data from the WIHS. Conclusions The model demonstrated good internal consistency and face validity, and supported cohort heterogeneities that have been reported in the literature. Iterative assessment of model performance can provide information about the relative influence of uncertain assumptions and provide insight into heterogeneities within and between cohorts. Description of calibration exercises can enhance the transparency of disease-specific models. PMID:20844741
Assessing the performance of a computer-based policy model of HIV and AIDS.
Rydzak, Chara E; Cotich, Kara L; Sax, Paul E; Hsu, Heather E; Wang, Bingxia; Losina, Elena; Freedberg, Kenneth A; Weinstein, Milton C; Goldie, Sue J
2010-09-09
Model-based analyses, conducted within a decision analytic framework, provide a systematic way to combine information about the natural history of disease and effectiveness of clinical management strategies with demographic and epidemiological characteristics of the population. Among the challenges with disease-specific modeling include the need to identify influential assumptions and to assess the face validity and internal consistency of the model. We describe a series of exercises involved in adapting a computer-based simulation model of HIV disease to the Women's Interagency HIV Study (WIHS) cohort and assess model performance as we re-parameterized the model to address policy questions in the U.S. relevant to HIV-infected women using data from the WIHS. Empiric calibration targets included 24-month survival curves stratified by treatment status and CD4 cell count. The most influential assumptions in untreated women included chronic HIV-associated mortality following an opportunistic infection, and in treated women, the 'clinical effectiveness' of HAART and the ability of HAART to prevent HIV complications independent of virologic suppression. Good-fitting parameter sets required reductions in the clinical effectiveness of 1st and 2nd line HAART and improvements in 3rd and 4th line regimens. Projected rates of treatment regimen switching using the calibrated cohort-specific model closely approximated independent analyses published using data from the WIHS. The model demonstrated good internal consistency and face validity, and supported cohort heterogeneities that have been reported in the literature. Iterative assessment of model performance can provide information about the relative influence of uncertain assumptions and provide insight into heterogeneities within and between cohorts. Description of calibration exercises can enhance the transparency of disease-specific models.
A computational model was developed to simulate aquifer remediation by pump and treat for a confined, perfectly stratified aquifer. A split-operator finite element numerical technique was utilized to incorporate flow field heterogeneity and nonequilibrium sorption into a two-dime...
Seismic waves in heterogeneous material: subcell resolution of the discontinuous Galerkin method
NASA Astrophysics Data System (ADS)
Castro, Cristóbal E.; Käser, Martin; Brietzke, Gilbert B.
2010-07-01
We present an important extension of the arbitrary high-order discontinuous Galerkin (DG) finite-element method to model 2-D elastic wave propagation in highly heterogeneous material. In this new approach we include space-variable coefficients to describe smooth or discontinuous material variations inside each element using the same numerical approximation strategy as for the velocity-stress variables in the formulation of the elastic wave equation. The combination of the DG method with a time integration scheme based on the solution of arbitrary accuracy derivatives Riemann problems still provides an explicit, one-step scheme which achieves arbitrary high-order accuracy in space and time. Compared to previous formulations the new scheme contains two additional terms in the form of volume integrals. We show that the increasing computational cost per element can be overcompensated due to the improved material representation inside each element as coarser meshes can be used which reduces the total number of elements and therefore computational time to reach a desired error level. We confirm the accuracy of the proposed scheme performing convergence tests and several numerical experiments considering smooth and highly heterogeneous material. As the approximation of the velocity and stress variables in the wave equation and of the material properties in the model can be chosen independently, we investigate the influence of the polynomial material representation on the accuracy of the synthetic seismograms with respect to computational cost. Moreover, we study the behaviour of the new method on strong material discontinuities, in the case where the mesh is not aligned with such a material interface. In this case second-order linear material approximation seems to be the best choice, with higher-order intra-cell approximation leading to potential instable behaviour. For all test cases we validate our solution against the well-established standard fourth-order finite difference and spectral element method.
Schneider, Frank; Bludau, Frederic; Clausen, Sven; Fleckenstein, Jens; Obertacke, Udo; Wenz, Frederik
2017-05-01
To the present date, IORT has been eye and hand guided without treatment planning and tissue heterogeneity correction. This limits the precision of the application and the precise documentation of the location and the deposited dose in the tissue. Here we present a set-up where we use image guidance by intraoperative cone beam computed tomography (CBCT) for precise online Monte Carlo treatment planning including tissue heterogeneity correction. An IORT was performed during balloon kyphoplasty using a dedicated Needle Applicator. An intraoperative CBCT was registered with a pre-op CT. Treatment planning was performed in Radiance using a hybrid Monte Carlo algorithm simulating dose in homogeneous (MCwater) and heterogeneous medium (MChet). Dose distributions on CBCT and pre-op CT were compared with each other. Spinal cord and the metastasis doses were evaluated. The MCwater calculations showed a spherical dose distribution as expected. The minimum target dose for the MChet simulations on pre-op CT was increased by 40% while the maximum spinal cord dose was decreased by 35%. Due to the artefacts on the CBCT the comparison between MChet simulations on CBCT and pre-op CT showed differences up to 50% in dose. igIORT and online treatment planning improves the accuracy of IORT. However, the current set-up is limited by CT artefacts. Fusing an intraoperative CBCT with a pre-op CT allows the combination of an accurate dose calculation with the knowledge of the correct source/applicator position. This method can be also used for pre-operative treatment planning followed by image guided surgery. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Puckett, E. G.; Turcotte, D. L.; He, Y.; Lokavarapu, H. V.; Robey, J.; Kellogg, L. H.
2017-12-01
Geochemical observations of mantle-derived rocks favor a nearly homogeneous upper mantle, the source of mid-ocean ridge basalts (MORB), and heterogeneous lower mantle regions.Plumes that generate ocean island basalts are thought to sample the lower mantle regions and exhibit more heterogeneity than MORB.These regions have been associated with lower mantle structures known as large low shear velocity provinces below Africa and the South Pacific.The isolation of these regions is attributed to compositional differences and density stratification that, consequently, have been the subject of computational and laboratory modeling designed to determine the parameter regime in which layering is stable and understanding how layering evolves.Mathematical models of persistent compositional interfaces in the Earth's mantle may be inherently unstable, at least in some regions of the parameter space relevant to the mantle.Computing approximations to solutions of such problems presents severe challenges, even to state-of-the-art numerical methods.Some numerical algorithms for modeling the interface between distinct compositions smear the interface at the boundary between compositions, such as methods that add numerical diffusion or `artificial viscosity' in order to stabilize the algorithm. We present two new algorithms for maintaining high-resolution and sharp computational boundaries in computations of these types of problems: a discontinuous Galerkin method with a bound preserving limiter and a Volume-of-Fluid interface tracking algorithm.We compare these new methods with two approaches widely used for modeling the advection of two distinct thermally driven compositional fields in mantle convection computations: a high-order accurate finite element advection algorithm with entropy viscosity and a particle method.We compare the performance of these four algorithms on three problems, including computing an approximation to the solution of an initially compositionally stratified fluid at Ra = 105 with buoyancy numbers {B} that vary from no stratification at B = 0 to stratified flow at large B.
Monte Carlo-based treatment planning system calculation engine for microbeam radiation therapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martinez-Rovira, I.; Sempau, J.; Prezado, Y.
Purpose: Microbeam radiation therapy (MRT) is a synchrotron radiotherapy technique that explores the limits of the dose-volume effect. Preclinical studies have shown that MRT irradiations (arrays of 25-75-{mu}m-wide microbeams spaced by 200-400 {mu}m) are able to eradicate highly aggressive animal tumor models while healthy tissue is preserved. These promising results have provided the basis for the forthcoming clinical trials at the ID17 Biomedical Beamline of the European Synchrotron Radiation Facility (ESRF). The first step includes irradiation of pets (cats and dogs) as a milestone before treatment of human patients. Within this context, accurate dose calculations are required. The distinct featuresmore » of both beam generation and irradiation geometry in MRT with respect to conventional techniques require the development of a specific MRT treatment planning system (TPS). In particular, a Monte Carlo (MC)-based calculation engine for the MRT TPS has been developed in this work. Experimental verification in heterogeneous phantoms and optimization of the computation time have also been performed. Methods: The penelope/penEasy MC code was used to compute dose distributions from a realistic beam source model. Experimental verification was carried out by means of radiochromic films placed within heterogeneous slab-phantoms. Once validation was completed, dose computations in a virtual model of a patient, reconstructed from computed tomography (CT) images, were performed. To this end, decoupling of the CT image voxel grid (a few cubic millimeter volume) to the dose bin grid, which has micrometer dimensions in the transversal direction of the microbeams, was performed. Optimization of the simulation parameters, the use of variance-reduction (VR) techniques, and other methods, such as the parallelization of the simulations, were applied in order to speed up the dose computation. Results: Good agreement between MC simulations and experimental results was achieved, even at the interfaces between two different media. Optimization of the simulation parameters and the use of VR techniques saved a significant amount of computation time. Finally, parallelization of the simulations improved even further the calculation time, which reached 1 day for a typical irradiation case envisaged in the forthcoming clinical trials in MRT. An example of MRT treatment in a dog's head is presented, showing the performance of the calculation engine. Conclusions: The development of the first MC-based calculation engine for the future TPS devoted to MRT has been accomplished. This will constitute an essential tool for the future clinical trials on pets at the ESRF. The MC engine is able to calculate dose distributions in micrometer-sized bins in complex voxelized CT structures in a reasonable amount of time. Minimization of the computation time by using several approaches has led to timings that are adequate for pet radiotherapy at synchrotron facilities. The next step will consist in its integration into a user-friendly graphical front-end.« less
Monte Carlo-based treatment planning system calculation engine for microbeam radiation therapy.
Martinez-Rovira, I; Sempau, J; Prezado, Y
2012-05-01
Microbeam radiation therapy (MRT) is a synchrotron radiotherapy technique that explores the limits of the dose-volume effect. Preclinical studies have shown that MRT irradiations (arrays of 25-75-μm-wide microbeams spaced by 200-400 μm) are able to eradicate highly aggressive animal tumor models while healthy tissue is preserved. These promising results have provided the basis for the forthcoming clinical trials at the ID17 Biomedical Beamline of the European Synchrotron Radiation Facility (ESRF). The first step includes irradiation of pets (cats and dogs) as a milestone before treatment of human patients. Within this context, accurate dose calculations are required. The distinct features of both beam generation and irradiation geometry in MRT with respect to conventional techniques require the development of a specific MRT treatment planning system (TPS). In particular, a Monte Carlo (MC)-based calculation engine for the MRT TPS has been developed in this work. Experimental verification in heterogeneous phantoms and optimization of the computation time have also been performed. The penelope/penEasy MC code was used to compute dose distributions from a realistic beam source model. Experimental verification was carried out by means of radiochromic films placed within heterogeneous slab-phantoms. Once validation was completed, dose computations in a virtual model of a patient, reconstructed from computed tomography (CT) images, were performed. To this end, decoupling of the CT image voxel grid (a few cubic millimeter volume) to the dose bin grid, which has micrometer dimensions in the transversal direction of the microbeams, was performed. Optimization of the simulation parameters, the use of variance-reduction (VR) techniques, and other methods, such as the parallelization of the simulations, were applied in order to speed up the dose computation. Good agreement between MC simulations and experimental results was achieved, even at the interfaces between two different media. Optimization of the simulation parameters and the use of VR techniques saved a significant amount of computation time. Finally, parallelization of the simulations improved even further the calculation time, which reached 1 day for a typical irradiation case envisaged in the forthcoming clinical trials in MRT. An example of MRT treatment in a dog's head is presented, showing the performance of the calculation engine. The development of the first MC-based calculation engine for the future TPS devoted to MRT has been accomplished. This will constitute an essential tool for the future clinical trials on pets at the ESRF. The MC engine is able to calculate dose distributions in micrometer-sized bins in complex voxelized CT structures in a reasonable amount of time. Minimization of the computation time by using several approaches has led to timings that are adequate for pet radiotherapy at synchrotron facilities. The next step will consist in its integration into a user-friendly graphical front-end.
Computers and Cooperative Learning. Tech Use Guide: Using Computer Technology.
ERIC Educational Resources Information Center
Council for Exceptional Children, Reston, VA. Center for Special Education Technology.
This guide focuses on the use of computers and cooperative learning techniques in classrooms that include students with disabilities. The guide outlines the characteristics of cooperative learning such as goal interdependence, individual accountability, and heterogeneous groups, emphasizing the value of each group member. Several cooperative…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baumann, K; Weber, U; Simeonov, Y
2015-06-15
Purpose: Aim of this study was to analyze the modulating, broadening effect on the Bragg Peak due to heterogeneous geometries like multi-wire chambers in the beam path of a particle therapy beam line. The effect was described by a mathematical model which was implemented in the Monte-Carlo code FLUKA via user-routines, in order to reduce the computation time for the simulations. Methods: The depth dose curve of 80 MeV/u C12-ions in a water phantom was calculated using the Monte-Carlo code FLUKA (reference curve). The modulating effect on this dose distribution behind eleven mesh-like foils (periodicity ∼80 microns) occurring in amore » typical set of multi-wire and dose chambers was mathematically described by optimizing a normal distribution so that the reverence curve convoluted with this distribution equals the modulated dose curve. This distribution describes a displacement in water and was transferred in a probability distribution of the thickness of the eleven foils using the water equivalent thickness of the foil’s material. From this distribution the distribution of the thickness of one foil was determined inversely. In FLUKA the heterogeneous foils were replaced by homogeneous foils and a user-routine was programmed that varies the thickness of the homogeneous foils for each simulated particle using this distribution. Results: Using the mathematical model and user-routine in FLUKA the broadening effect could be reproduced exactly when replacing the heterogeneous foils by homogeneous ones. The computation time was reduced by 90 percent. Conclusion: In this study the broadening effect on the Bragg Peak due to heterogeneous structures was analyzed, described by a mathematical model and implemented in FLUKA via user-routines. Applying these routines the computing time was reduced by 90 percent. The developed tool can be used for any heterogeneous structure in the dimensions of microns to millimeters, in principle even for organic materials like lung tissue.« less
Congestion game scheduling for virtual drug screening optimization
NASA Astrophysics Data System (ADS)
Nikitina, Natalia; Ivashko, Evgeny; Tchernykh, Andrei
2018-02-01
In virtual drug screening, the chemical diversity of hits is an important factor, along with their predicted activity. Moreover, interim results are of interest for directing the further research, and their diversity is also desirable. In this paper, we consider a problem of obtaining a diverse set of virtual screening hits in a short time. To this end, we propose a mathematical model of task scheduling for virtual drug screening in high-performance computational systems as a congestion game between computational nodes to find the equilibrium solutions for best balancing the number of interim hits with their chemical diversity. The model considers the heterogeneous environment with workload uncertainty, processing time uncertainty, and limited knowledge about the input dataset structure. We perform computational experiments and evaluate the performance of the developed approach considering organic molecules database GDB-9. The used set of molecules is rich enough to demonstrate the feasibility and practicability of proposed solutions. We compare the algorithm with two known heuristics used in practice and observe that game-based scheduling outperforms them by the hit discovery rate and chemical diversity at earlier steps. Based on these results, we use a social utility metric for assessing the efficiency of our equilibrium solutions and show that they reach greatest values.
NASA Astrophysics Data System (ADS)
Indra, Sandipa; Guchhait, Biswajit; Biswas, Ranjit
2016-03-01
We have performed steady state UV-visible absorption and time-resolved fluorescence measurements and computer simulations to explore the cosolvent mole fraction induced changes in structural and dynamical properties of water/dioxane (Diox) and water/tetrahydrofuran (THF) binary mixtures. Diox is a quadrupolar solvent whereas THF is a dipolar one although both are cyclic molecules and represent cycloethers. The focus here is on whether these cycloethers can induce stiffening and transition of water H-bond network structure and, if they do, whether such structural modification differentiates the chemical nature (dipolar or quadrupolar) of the cosolvent molecules. Composition dependent measured fluorescence lifetimes and rotation times of a dissolved dipolar solute (Coumarin 153, C153) suggest cycloether mole-fraction (XTHF/Diox) induced structural transition for both of these aqueous binary mixtures in the 0.1 ≤ XTHF/Diox ≤ 0.2 regime with no specific dependence on the chemical nature. Interestingly, absorption measurements reveal stiffening of water H-bond structure in the presence of both the cycloethers at a nearly equal mole-fraction, XTHF/Diox ˜ 0.05. Measurements near the critical solution temperature or concentration indicate no role for the solution criticality on the anomalous structural changes. Evidences for cycloether aggregation at very dilute concentrations have been found. Simulated radial distribution functions reflect abrupt changes in respective peak heights at those mixture compositions around which fluorescence measurements revealed structural transition. Simulated water coordination numbers (for a dissolved C153) and number of H-bonds also exhibit minima around these cosolvent concentrations. In addition, several dynamic heterogeneity parameters have been simulated for both the mixtures to explore the effects of structural transition and chemical nature of cosolvent on heterogeneous dynamics of these systems. Simulated four-point dynamic susceptibility suggests formation of clusters inducing local heterogeneity in the solution structure.
Speckle contrast diffuse correlation tomography of complex turbid medium flow
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Chong; Irwin, Daniel; Lin, Yu
2015-07-15
Purpose: Developed herein is a three-dimensional (3D) flow contrast imaging system leveraging advancements in the extension of laser speckle contrast imaging theories to deep tissues along with our recently developed finite-element diffuse correlation tomography (DCT) reconstruction scheme. This technique, termed speckle contrast diffuse correlation tomography (scDCT), enables incorporation of complex optical property heterogeneities and sample boundaries. When combined with a reflectance-based design, this system facilitates a rapid segue into flow contrast imaging of larger, in vivo applications such as humans. Methods: A highly sensitive CCD camera was integrated into a reflectance-based optical system. Four long-coherence laser source positions were coupledmore » to an optical switch for sequencing of tomographic data acquisition providing multiple projections through the sample. This system was investigated through incorporation of liquid and solid tissue-like phantoms exhibiting optical properties and flow characteristics typical of human tissues. Computer simulations were also performed for comparisons. A uniquely encountered smear correction algorithm was employed to correct point-source illumination contributions during image capture with the frame-transfer CCD and reflectance setup. Results: Measurements with scDCT on a homogeneous liquid phantom showed that speckle contrast-based deep flow indices were within 12% of those from standard DCT. Inclusion of a solid phantom submerged below the liquid phantom surface allowed for heterogeneity detection and validation. The heterogeneity was identified successfully by reconstructed 3D flow contrast tomography with scDCT. The heterogeneity center and dimensions and averaged relative flow (within 3%) and localization were in agreement with actuality and computer simulations, respectively. Conclusions: A custom cost-effective CCD-based reflectance 3D flow imaging system demonstrated rapid acquisition of dense boundary data and, with further studies, a high potential for translatability to real tissues with arbitrary boundaries. A requisite correction was also found for measurements in the fashion of scDCT to recover accurate speckle contrast of deep tissues.« less
Verde, Franco; Hruban, Ralph H; Fishman, Elliot K
Small bowel gastrointestinal stromal tumors (SB-GISTs) are rare lesions with a variable appearance on computed tomography (CT). This case series analyzes the CT enhancement pattern with the histologic risk assessment of tumor progression. Local institutional pathology database was searched for SB-GISTs from 2000 to 2015. Pathology reports and clinical notes were reviewed. Imaging was qualitatively reviewed for pattern of enhancement categorized into homogeneous or heterogeneous groups. Nonparametric statistical analysis was performed comparing enhancement to segment of bowel involved, presence of necrosis, tumor size, histologic grade (ie, G1 or G2), and histologic risk of progression (ie low, moderate, high). For simplicity, risk of progression was binned into low-risk or non-low-risk groups. Twenty-six pathology-proven, first presentation, nonmetastatic SB-GISTs were included into study. Seventeen were located in duodenum, 7 in jejunum, and 2 within the ileum. Dual phase (arterial and venous) CT imaging was available for 22 cases. Four cases did not have dual phase (three venous phase and one arterial phase only). Seventeen cases demonstrated heterogeneous enhancement and 9 cases homogeneous enhancement. Statistically significant difference was found between size versus enhancement groups (3.1 cm for homogeneous versus 6.8 cm for heterogeneous) (Mann-Whitney U test, n = 26, P = 0.002). Presence of necrosis versus enhancement group was statistically significant (Pearson χ, P = 0.001). Low-risk and non-low-risk groups versus enhancement groups was very significant (P = 0.001). Bowel segment involvement and histologic grading versus enhancement group did not reach statistical significance (P = 0.174 and P = 0.07, respectively). This case series reveals an important significant association between heterogeneous enhancement and non-low risk (ie, moderate/high) SB-GISTs. Beyond just describing the tumor, using enhancing pattern, the interpreting radiologist can preoperatively suggest additional prognostic information, potentially helpful for surgical planning.
Heterogeneous compute in computer vision: OpenCL in OpenCV
NASA Astrophysics Data System (ADS)
Gasparakis, Harris
2014-02-01
We explore the relevance of Heterogeneous System Architecture (HSA) in Computer Vision, both as a long term vision, and as a near term emerging reality via the recently ratified OpenCL 2.0 Khronos standard. After a brief review of OpenCL 1.2 and 2.0, including HSA features such as Shared Virtual Memory (SVM) and platform atomics, we identify what genres of Computer Vision workloads stand to benefit by leveraging those features, and we suggest a new mental framework that replaces GPU compute with hybrid HSA APU compute. As a case in point, we discuss, in some detail, popular object recognition algorithms (part-based models), emphasizing the interplay and concurrent collaboration between the GPU and CPU. We conclude by describing how OpenCL has been incorporated in OpenCV, a popular open source computer vision library, emphasizing recent work on the Transparent API, to appear in OpenCV 3.0, which unifies the native CPU and OpenCL execution paths under a single API, allowing the same code to execute either on CPU or on a OpenCL enabled device, without even recompiling.
Execution of parallel algorithms on a heterogeneous multicomputer
NASA Astrophysics Data System (ADS)
Isenstein, Barry S.; Greene, Jonathon
1995-04-01
Many aerospace/defense sensing and dual-use applications require high-performance computing, extensive high-bandwidth interconnect and realtime deterministic operation. This paper will describe the architecture of a scalable multicomputer that includes DSP and RISC processors. A single chassis implementation is capable of delivering in excess of 10 GFLOPS of DSP processing power with 2 Gbytes/s of realtime sensor I/O. A software approach to implementing parallel algorithms called the Parallel Application System (PAS) is also presented. An example of applying PAS to a DSP application is shown.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friese, Ryan; Khemka, Bhavesh; Maciejewski, Anthony A
Rising costs of energy consumption and an ongoing effort for increases in computing performance are leading to a significant need for energy-efficient computing. Before systems such as supercomputers, servers, and datacenters can begin operating in an energy-efficient manner, the energy consumption and performance characteristics of the system must be analyzed. In this paper, we provide an analysis framework that will allow a system administrator to investigate the tradeoffs between system energy consumption and utility earned by a system (as a measure of system performance). We model these trade-offs as a bi-objective resource allocation problem. We use a popular multi-objective geneticmore » algorithm to construct Pareto fronts to illustrate how different resource allocations can cause a system to consume significantly different amounts of energy and earn different amounts of utility. We demonstrate our analysis framework using real data collected from online benchmarks, and further provide a method to create larger data sets that exhibit similar heterogeneity characteristics to real data sets. This analysis framework can provide system administrators with insight to make intelligent scheduling decisions based on the energy and utility needs of their systems.« less
Artificial Intelligence (AI) Based Tactical Guidance for Fighter Aircraft
NASA Technical Reports Server (NTRS)
McManus, John W.; Goodrich, Kenneth H.
1990-01-01
A research program investigating the use of Artificial Intelligence (AI) techniques to aid in the development of a Tactical Decision Generator (TDG) for Within Visual Range (WVR) air combat engagements is discussed. The application of AI programming and problem solving methods in the development and implementation of the Computerized Logic For Air-to-Air Warfare Simulations (CLAWS), a second generation TDG, is presented. The Knowledge-Based Systems used by CLAWS to aid in the tactical decision-making process are outlined in detail, and the results of tests to evaluate the performance of CLAWS versus a baseline TDG developed in FORTRAN to run in real-time in the Langley Differential Maneuvering Simulator (DMS), are presented. To date, these test results have shown significant performance gains with respect to the TDG baseline in one-versus-one air combat engagements, and the AI-based TDG software has proven to be much easier to modify and maintain than the baseline FORTRAN TDG programs. Alternate computing environments and programming approaches, including the use of parallel algorithms and heterogeneous computer networks are discussed, and the design and performance of a prototype concurrent TDG system are presented.
Moutsatsos, Ioannis K; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J; Jenkins, Jeremy L; Holway, Nicholas; Tallarico, John; Parker, Christian N
2017-03-01
High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community.
Moutsatsos, Ioannis K.; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J.; Jenkins, Jeremy L.; Holway, Nicholas; Tallarico, John; Parker, Christian N.
2016-01-01
High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an “off-the-shelf,” open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community. PMID:27899692
NASA Astrophysics Data System (ADS)
Hyde, B. C.; Tait, K. T.; Nicklin, I.; Day, J. M. D.; Ash, R. D.; Moser, D. E.
2013-09-01
Sectioning of meteorites is usually done in an arbitrary manner. We used micro-computed tomography to view the interior of brachinite NWA 4872. A cut was then made through an area of interest. Heterogeneity and modal abundance are discussed.
Jung, Jin Woo; Lee, Jung-Seob; Cho, Dong-Woo
2016-01-01
Recently, much attention has focused on replacement or/and enhancement of biological tissues via the use of cell-laden hydrogel scaffolds with an architecture that mimics the tissue matrix, and with the desired three-dimensional (3D) external geometry. However, mimicking the heterogeneous tissues that most organs and tissues are formed of is challenging. Although multiple-head 3D printing systems have been proposed for fabricating heterogeneous cell-laden hydrogel scaffolds, to date only the simple exterior form has been realized. Here we describe a computer-aided design and manufacturing (CAD/CAM) system for this application. We aim to develop an algorithm to enable easy, intuitive design and fabrication of a heterogeneous cell-laden hydrogel scaffolds with a free-form 3D geometry. The printing paths of the scaffold are automatically generated from the 3D CAD model, and the scaffold is then printed by dispensing four materials; i.e., a frame, two kinds of cell-laden hydrogel and a support. We demonstrated printing of heterogeneous tissue models formed of hydrogel scaffolds using this approach, including the outer ear, kidney and tooth tissue. These results indicate that this approach is particularly promising for tissue engineering and 3D printing applications to regenerate heterogeneous organs and tissues with tailored geometries to treat specific defects or injuries. PMID:26899876
Intrinsic islet heterogeneity and gap junction coupling determine spatiotemporal Ca²⁺ wave dynamics.
Benninger, Richard K P; Hutchens, Troy; Head, W Steven; McCaughey, Michael J; Zhang, Min; Le Marchand, Sylvain J; Satin, Leslie S; Piston, David W
2014-12-02
Insulin is released from the islets of Langerhans in discrete pulses that are linked to synchronized oscillations of intracellular free calcium ([Ca(2+)]i). Associated with each synchronized oscillation is a propagating calcium wave mediated by Connexin36 (Cx36) gap junctions. A computational islet model predicted that waves emerge due to heterogeneity in β-cell function throughout the islet. To test this, we applied defined patterns of glucose stimulation across the islet using a microfluidic device and measured how these perturbations affect calcium wave propagation. We further investigated how gap junction coupling regulates spatiotemporal [Ca(2+)]i dynamics in the face of heterogeneous glucose stimulation. Calcium waves were found to originate in regions of the islet having elevated excitability, and this heterogeneity is an intrinsic property of islet β-cells. The extent of [Ca(2+)]i elevation across the islet in the presence of heterogeneity is gap-junction dependent, which reveals a glucose dependence of gap junction coupling. To better describe these observations, we had to modify the computational islet model to consider the electrochemical gradient between neighboring β-cells. These results reveal how the spatiotemporal [Ca(2+)]i dynamics of the islet depend on β-cell heterogeneity and cell-cell coupling, and are important for understanding the regulation of coordinated insulin release across the islet. Copyright © 2014 Biophysical Society. Published by Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Johnston, William E.; Gannon, Dennis; Nitzberg, Bill; Feiereisen, William (Technical Monitor)
2000-01-01
The term "Grid" refers to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. The vision for NASN's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks that will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: The scientist / design engineer whose primary interest is problem solving (e.g., determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user if the tool designer: The computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. This paper describes the current state of IPG (the operational testbed), the set of capabilities being put into place for the operational prototype IPG, as well as some of the longer term R&D tasks.
Cooperation and heterogeneity of the autistic mind.
Yoshida, Wako; Dziobek, Isabel; Kliemann, Dorit; Heekeren, Hauke R; Friston, Karl J; Dolan, Ray J
2010-06-30
Individuals with autism spectrum conditions (ASCs) have a core difficulty in recursively inferring the intentions of others. The precise cognitive dysfunctions that determine the heterogeneity at the heart of this spectrum, however, remains unclear. Furthermore, it remains possible that impairment in social interaction is not a fundamental deficit but a reflection of deficits in distinct cognitive processes. To better understand heterogeneity within ASCs, we employed a game-theoretic approach to characterize unobservable computational processes implicit in social interactions. Using a social hunting game with autistic adults, we found that a selective difficulty representing the level of strategic sophistication of others, namely inferring others' mindreading strategy, specifically predicts symptom severity. In contrast, a reduced ability in iterative planning was predicted by overall intellectual level. Our findings provide the first quantitative approach that can reveal the underlying computational dysfunctions that generate the autistic "spectrum."
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets
2010-01-01
Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
A framework supporting the development of a Grid portal for analysis based on ROI.
Ichikawa, K; Date, S; Kaishima, T; Shimojo, S
2005-01-01
In our research on brain function analysis, users require two different simultaneous types of processing: interactive processing to a specific part of data and high-performance batch processing to an entire dataset. The difference between these two types of processing is in whether or not the analysis is for data in the region of interest (ROI). In this study, we propose a Grid portal that has a mechanism to freely assign computing resources to the users on a Grid environment according to the users' two different types of processing requirements. We constructed a Grid portal which integrates interactive processing and batch processing by the following two mechanisms. First, a job steering mechanism controls job execution based on user-tagged priority among organizations with heterogeneous computing resources. Interactive jobs are processed in preference to batch jobs by this mechanism. Second, a priority-based result delivery mechanism that administrates a rank of data significance. The portal ensures a turn-around time of interactive processing by the priority-based job controlling mechanism, and provides the users with quality of services (QoS) for interactive processing. The users can access the analysis results of interactive jobs in preference to the analysis results of batch jobs. The Grid portal has also achieved high-performance computation of MEG analysis with batch processing on the Grid environment. The priority-based job controlling mechanism has been realized to freely assign computing resources to the users' requirements. Furthermore the achievement of high-performance computation contributes greatly to the overall progress of brain science. The portal has thus made it possible for the users to flexibly include the large computational power in what they want to analyze.
A proto-Data Processing Center for LISA
NASA Astrophysics Data System (ADS)
Cavet, Cécile; Petiteau, Antoine; Le Jeune, Maude; Plagnol, Eric; Marin-Martholaz, Etienne; Bayle, Jean-Baptiste
2017-05-01
The LISA project preparation requires to study and define a new data analysis framework, capable of dealing with highly heterogeneous CPU needs and of exploiting the emergent information technologies. In this context, a prototype of the mission’s Data Processing Center (DPC) has been initiated. The DPC is designed to efficiently manage computing constraints and to offer a common infrastructure where the whole collaboration can contribute to development work. Several tools such as continuous integration (CI) have already been delivered to the collaboration and are presently used for simulations and performance studies. This article presents the progress made regarding this collaborative environment and discusses also the possible next steps towards an on-demand computing infrastructure. This activity is supported by CNES as part of the French contribution to LISA.
Computer-Assisted Diagnosis of the Sleep Apnea-Hypopnea Syndrome: A Review
Alvarez-Estevez, Diego; Moret-Bonillo, Vicente
2015-01-01
Automatic diagnosis of the Sleep Apnea-Hypopnea Syndrome (SAHS) has become an important area of research due to the growing interest in the field of sleep medicine and the costs associated with its manual diagnosis. The increment and heterogeneity of the different techniques, however, make it somewhat difficult to adequately follow the recent developments. A literature review within the area of computer-assisted diagnosis of SAHS has been performed comprising the last 15 years of research in the field. Screening approaches, methods for the detection and classification of respiratory events, comprehensive diagnostic systems, and an outline of current commercial approaches are reviewed. An overview of the different methods is presented together with validation analysis and critical discussion of the current state of the art. PMID:26266052
Losh, Molly; Gordon, Peter C.
2014-01-01
Autism Spectrum Disorder (ASD) is characterized by difficulties with social communication and functioning, and ritualistic/repetitive behaviors (American Psychiatric Association, 2013). While substantial heterogeneity exists in symptom expression, impairments in language discourse skills, including narrative, are universally observed (Tager-Flusberg, Paul, & Lord, 2005). This study applied a computational linguistic tool, Latent Semantic Analysis (LSA), to objectively characterize narrative performance in ASD across two narrative contexts differing in interpersonal and cognitive demands. Results indicated that individuals with ASD produced narratives comparable in semantic content to those from controls when narrating from a picture book, but produced narratives diminished in semantic quality in a more demanding narrative recall task. Results are discussed in terms of the utility of LSA as a quantitative, objective, and efficient measure of narrative ability. PMID:24915929
NASA Astrophysics Data System (ADS)
Zhu, J.; Winter, C. L.; Wang, Z.
2015-11-01
Computational experiments are performed to evaluate the effects of locally heterogeneous conductivity fields on regional exchanges of water between stream and aquifer systems in the Middle Heihe River basin (MHRB) of northwestern China. The effects are found to be nonlinear in the sense that simulated discharges from aquifers to streams are systematically lower than discharges produced by a base model parameterized with relatively coarse effective conductivity. A similar, but weaker, effect is observed for stream leakage. The study is organized around three hypotheses: (H1) small-scale spatial variations of conductivity significantly affect regional exchanges of water between streams and aquifers in river basins, (H2) aggregating small-scale heterogeneities into regional effective parameters systematically biases estimates of stream-aquifer exchanges, and (H3) the biases result from slow paths in groundwater flow that emerge due to small-scale heterogeneities. The hypotheses are evaluated by comparing stream-aquifer fluxes produced by the base model to fluxes simulated using realizations of the MHRB characterized by local (grid-scale) heterogeneity. Levels of local heterogeneity are manipulated as control variables by adjusting coefficients of variation. All models are implemented using the MODFLOW (Modular Three-dimensional Finite-difference Groundwater Flow Model) simulation environment, and the PEST (parameter estimation) tool is used to calibrate effective conductivities defined over 16 zones within the MHRB. The effective parameters are also used as expected values to develop lognormally distributed conductivity (K) fields on local grid scales. Stream-aquifer exchanges are simulated with K fields at both scales and then compared. Results show that the effects of small-scale heterogeneities significantly influence exchanges with simulations based on local-scale heterogeneities always producing discharges that are less than those produced by the base model. Although aquifer heterogeneities are uncorrelated at local scales, they appear to induce coherent slow paths in groundwater fluxes that in turn reduce aquifer-stream exchanges. Since surface water-groundwater exchanges are critical hydrologic processes in basin-scale water budgets, these results also have implications for water resources management.
Accelerating a three-dimensional eco-hydrological cellular automaton on GPGPU with OpenCL
NASA Astrophysics Data System (ADS)
Senatore, Alfonso; D'Ambrosio, Donato; De Rango, Alessio; Rongo, Rocco; Spataro, William; Straface, Salvatore; Mendicino, Giuseppe
2016-10-01
This work presents an effective implementation of a numerical model for complete eco-hydrological Cellular Automata modeling on Graphical Processing Units (GPU) with OpenCL (Open Computing Language) for heterogeneous computation (i.e., on CPUs and/or GPUs). Different types of parallel implementations were carried out (e.g., use of fast local memory, loop unrolling, etc), showing increasing performance improvements in terms of speedup, adopting also some original optimizations strategies. Moreover, numerical analysis of results (i.e., comparison of CPU and GPU outcomes in terms of rounding errors) have proven to be satisfactory. Experiments were carried out on a workstation with two CPUs (Intel Xeon E5440 at 2.83GHz), one GPU AMD R9 280X and one GPU nVIDIA Tesla K20c. Results have been extremely positive, but further testing should be performed to assess the functionality of the adopted strategies on other complete models and their ability to fruitfully exploit parallel systems resources.
Modelling brain emergent behaviours through coevolution of neural agents.
Maniadakis, Michail; Trahanias, Panos
2006-06-01
Recently, many research efforts focus on modelling partial brain areas with the long-term goal to support cognitive abilities of artificial organisms. Existing models usually suffer from heterogeneity, which constitutes their integration very difficult. The present work introduces a computational framework to address brain modelling tasks, emphasizing on the integrative performance of substructures. Moreover, implemented models are embedded in a robotic platform to support its behavioural capabilities. We follow an agent-based approach in the design of substructures to support the autonomy of partial brain structures. Agents are formulated to allow the emergence of a desired behaviour after a certain amount of interaction with the environment. An appropriate collaborative coevolutionary algorithm, able to emphasize both the speciality of brain areas and their cooperative performance, is employed to support design specification of agent structures. The effectiveness of the proposed approach is illustrated through the implementation of computational models for motor cortex and hippocampus, which are successfully tested on a simulated mobile robot.
Chaisangmongkon, Warasinee; Swaminathan, Sruthi K.; Freedman, David J.; Wang, Xiao-Jing
2017-01-01
Summary Decision making involves dynamic interplay between internal judgements and external perception, which has been investigated in delayed match-to-category (DMC) experiments. Our analysis of neural recordings shows that, during DMC tasks, LIP and PFC neurons demonstrate mixed, time-varying, and heterogeneous selectivity, but previous theoretical work has not established the link between these neural characteristics and population-level computations. We trained a recurrent network model to perform DMC tasks and found that the model can remarkably reproduce key features of neuronal selectivity at the single-neuron and population levels. Analysis of the trained networks elucidates that robust transient trajectories of the neural population are the key driver of sequential categorical decisions. The directions of trajectories are governed by network self-organized connectivity, defining a ‘neural landscape’, consisting of a task-tailored arrangement of slow states and dynamical tunnels. With this model, we can identify functionally-relevant circuit motifs and generalize the framework to solve other categorization tasks. PMID:28334612
Sihong Chen; Jing Qin; Xing Ji; Baiying Lei; Tianfu Wang; Dong Ni; Jie-Zhi Cheng
2017-03-01
The gap between the computational and semantic features is the one of major factors that bottlenecks the computer-aided diagnosis (CAD) performance from clinical usage. To bridge this gap, we exploit three multi-task learning (MTL) schemes to leverage heterogeneous computational features derived from deep learning models of stacked denoising autoencoder (SDAE) and convolutional neural network (CNN), as well as hand-crafted Haar-like and HoG features, for the description of 9 semantic features for lung nodules in CT images. We regard that there may exist relations among the semantic features of "spiculation", "texture", "margin", etc., that can be explored with the MTL. The Lung Image Database Consortium (LIDC) data is adopted in this study for the rich annotation resources. The LIDC nodules were quantitatively scored w.r.t. 9 semantic features from 12 radiologists of several institutes in U.S.A. By treating each semantic feature as an individual task, the MTL schemes select and map the heterogeneous computational features toward the radiologists' ratings with cross validation evaluation schemes on the randomly selected 2400 nodules from the LIDC dataset. The experimental results suggest that the predicted semantic scores from the three MTL schemes are closer to the radiologists' ratings than the scores from single-task LASSO and elastic net regression methods. The proposed semantic attribute scoring scheme may provide richer quantitative assessments of nodules for better support of diagnostic decision and management. Meanwhile, the capability of the automatic association of medical image contents with the clinical semantic terms by our method may also assist the development of medical search engine.
NASA Astrophysics Data System (ADS)
Barreiro, F. H.; Borodin, M.; De, K.; Golubkov, D.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Padolski, S.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The second generation of the ATLAS Production System called ProdSys2 is a distributed workload manager that runs daily hundreds of thousands of jobs, from dozens of different ATLAS specific workflows, across more than hundred heterogeneous sites. It achieves high utilization by combining dynamic job definition based on many criteria, such as input and output size, memory requirements and CPU consumption, with manageable scheduling policies and by supporting different kind of computational resources, such as GRID, clouds, supercomputers and volunteer-computers. The system dynamically assigns a group of jobs (task) to a group of geographically distributed computing resources. Dynamic assignment and resources utilization is one of the major features of the system, it didn’t exist in the earliest versions of the production system where Grid resources topology was predefined using national or/and geographical pattern. Production System has a sophisticated job fault-recovery mechanism, which efficiently allows to run multi-Terabyte tasks without human intervention. We have implemented “train” model and open-ended production which allow to submit tasks automatically as soon as new set of data is available and to chain physics groups data processing and analysis with central production by the experiment. We present an overview of the ATLAS Production System and its major components features and architecture: task definition, web user interface and monitoring. We describe the important design decisions and lessons learned from an operational experience during the first year of LHC Run2. We also report the performance of the designed system and how various workflows, such as data (re)processing, Monte-Carlo and physics group production, users analysis, are scheduled and executed within one production system on heterogeneous computing resources.
Wen, Yanhua; Wei, Yanjun; Zhang, Shumei; Li, Song; Liu, Hongbo; Wang, Fang; Zhao, Yue; Zhang, Dongwei; Zhang, Yan
2017-05-01
Tumour heterogeneity describes the coexistence of divergent tumour cell clones within tumours, which is often caused by underlying epigenetic changes. DNA methylation is commonly regarded as a significant regulator that differs across cells and tissues. In this study, we comprehensively reviewed research progress on estimating of tumour heterogeneity. Bioinformatics-based analysis of DNA methylation has revealed the evolutionary relationships between breast cancer cell lines and tissues. Further analysis of the DNA methylation profiles in 33 breast cancer-related cell lines identified cell line-specific methylation patterns. Next, we reviewed the computational methods in inferring clonal evolution of tumours from different perspectives and then proposed a deconvolution strategy for modelling cell subclonal populations dynamics in breast cancer tissues based on DNA methylation. Further analysis of simulated cancer tissues and real cell lines revealed that this approach exhibits satisfactory performance and relative stability in estimating the composition and proportions of cellular subpopulations. The application of this strategy to breast cancer individuals of the Cancer Genome Atlas's identified different cellular subpopulations with distinct molecular phenotypes. Moreover, the current and potential future applications of this deconvolution strategy to clinical breast cancer research are discussed, and emphasis was placed on the DNA methylation-based recognition of intra-tumour heterogeneity. The wide use of these methods for estimating heterogeneity to further clinical cohorts will improve our understanding of neoplastic progression and the design of therapeutic interventions for treating breast cancer and other malignancies. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Treeby, Bradley E; Jaros, Jiri; Rendell, Alistair P; Cox, B T
2012-06-01
The simulation of nonlinear ultrasound propagation through tissue realistic media has a wide range of practical applications. However, this is a computationally difficult problem due to the large size of the computational domain compared to the acoustic wavelength. Here, the k-space pseudospectral method is used to reduce the number of grid points required per wavelength for accurate simulations. The model is based on coupled first-order acoustic equations valid for nonlinear wave propagation in heterogeneous media with power law absorption. These are derived from the equations of fluid mechanics and include a pressure-density relation that incorporates the effects of nonlinearity, power law absorption, and medium heterogeneities. The additional terms accounting for convective nonlinearity and power law absorption are expressed as spatial gradients making them efficient to numerically encode. The governing equations are then discretized using a k-space pseudospectral technique in which the spatial gradients are computed using the Fourier-collocation method. This increases the accuracy of the gradient calculation and thus relaxes the requirement for dense computational grids compared to conventional finite difference methods. The accuracy and utility of the developed model is demonstrated via several numerical experiments, including the 3D simulation of the beam pattern from a clinical ultrasound probe.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Gavin Matthew; Bettencourt, Matthew Tyler; Bova, Steven W.
2015-09-01
This report provides in-depth information and analysis to help create a technical road map for developing next- generation Orogramming mocleN and runtime systemsl that support Advanced Simulation and Computing (ASC) work- load requirements. The focus herein is on 4synchronous many-task (AMT) model and runtime systems, which are of great interest in the context of "Oriascale7 computing, as they hold the promise to address key issues associated with future extreme-scale computer architectures. This report includes a thorough qualitative and quantitative examination of three best-of-class AIM] runtime systemsHCharm-HE, Legion, and Uintah, all of which are in use as part of the Centers.more » The studies focus on each of the runtimes' programmability, performance, and mutability. Through the experiments and analysis presented, several overarching Predictive Science Academic Alliance Program II (PSAAP-II) Ascl findings emerge. From a performance perspective, AIVT11runtimes show tremendous potential for addressing extreme- scale challenges. Empirical studies show an AM11 runtime can mitigate performance heterogeneity inherent to the machine itself and that Message Passing Interface (MP1) and AM11runtimes perform comparably under balanced con- ditions. From a programmability and mutability perspective however, none of the runtimes in this study are currently ready for use in developing production-ready Sandia ASCIapplications. The report concludes by recommending a co- design path forward, wherein application, programming model, and runtime system developers work together to define requirements and solutions. Such a requirements-driven co-design approach benefits the community as a whole, with widespread community engagement mitigating risk for both application developers developers. and high-performance computing inntime systein« less
Use of Graph Database for the Integration of Heterogeneous Biological Data.
Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young
2017-03-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data
Yoon, Byoung-Ha; Kim, Seon-Kyu
2017-01-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
Scuba: scalable kernel-based gene prioritization.
Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio
2018-01-25
The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .
NASA Astrophysics Data System (ADS)
Pfeil, Thomas; Jordan, Jakob; Tetzlaff, Tom; Grübl, Andreas; Schemmel, Johannes; Diesmann, Markus; Meier, Karlheinz
2016-04-01
High-level brain function, such as memory, classification, or reasoning, can be realized by means of recurrent networks of simplified model neurons. Analog neuromorphic hardware constitutes a fast and energy-efficient substrate for the implementation of such neural computing architectures in technical applications and neuroscientific research. The functional performance of neural networks is often critically dependent on the level of correlations in the neural activity. In finite networks, correlations are typically inevitable due to shared presynaptic input. Recent theoretical studies have shown that inhibitory feedback, abundant in biological neural networks, can actively suppress these shared-input correlations and thereby enable neurons to fire nearly independently. For networks of spiking neurons, the decorrelating effect of inhibitory feedback has so far been explicitly demonstrated only for homogeneous networks of neurons with linear subthreshold dynamics. Theory, however, suggests that the effect is a general phenomenon, present in any system with sufficient inhibitory feedback, irrespective of the details of the network structure or the neuronal and synaptic properties. Here, we investigate the effect of network heterogeneity on correlations in sparse, random networks of inhibitory neurons with nonlinear, conductance-based synapses. Emulations of these networks on the analog neuromorphic-hardware system Spikey allow us to test the efficiency of decorrelation by inhibitory feedback in the presence of hardware-specific heterogeneities. The configurability of the hardware substrate enables us to modulate the extent of heterogeneity in a systematic manner. We selectively study the effects of shared input and recurrent connections on correlations in membrane potentials and spike trains. Our results confirm that shared-input correlations are actively suppressed by inhibitory feedback also in highly heterogeneous networks exhibiting broad, heavy-tailed firing-rate distributions. In line with former studies, cell heterogeneities reduce shared-input correlations. Overall, however, correlations in the recurrent system can increase with the level of heterogeneity as a consequence of diminished effective negative feedback.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhong, Lirong; Oostrom, Martinus; Wietsma, Thomas W.
2008-07-29
Abstract Heterogeneity is often encountered in subsurface contamination characterization and remediation. Low-permeability zones are typically bypassed when remedial fluids are injected into subsurface heterogeneous aquifer systems. Therefore, contaminants in the bypassed areas may not be contacted by the amendments in the remedial fluid, which may significantly prolong the remediation operations. Laboratory experiments and numerical studies have been conducted to develop the Mobility-Controlled Flood (MCF) technology for subsurface remediation and to demonstrate the capability of this technology in enhancing the remedial amendments delivery to the lower permeability zones in heterogeneous systems. Xanthan gum, a bio-polymer, was used to modify the viscositymore » of the amendment-containing remedial solutions. Sodium mono-phosphate and surfactant were the remedial amendment used in this work. The enhanced delivery of the amendments was demonstrated in two-dimensional (2-D) flow cell experiments, packed with heterogeneous systems. The impact of polymer concentration, fluid injection rate, and permeability contract in the heterogeneous systems has been studied. The Subsurface Transport over Multiple Phases (STOMP) simulator was modified to include polymer-induced shear thinning effects. Shear rates of polymer solutions were computed from pore-water velocities using a relationship proposed in the literature. Viscosity data were subsequently obtained from empirical viscosity-shear rate relationships derived from laboratory data. The experimental and simulation results clearly show that the MCF technology is capable of enhancing the delivery of remedial amendments to subsurface lower permeability zones. The enhanced delivery significantly improved the NAPL removal from these zones and the sweeping efficiency on a heterogeneous system was remarkably increased when a polymer fluid was applied. MCF technology is also able to stabilize the fluid displacing front when there is a density difference between the fluids. The modified STOMP simulator was able to predict the experimental observed fluid displacing behavior. The simulator may be used to predict the subsurface remediation performance when a shear thinning fluid is used to remediate a heterogeneous system.« less
Tixier, Florent; Hatt, Mathieu; Le Rest, Catherine Cheze; Le Pogam, Adrien; Corcos, Laurent; Visvikis, Dimitris
2012-05-01
(18)F-FDG PET measurement of standardized uptake value (SUV) is increasingly used for monitoring therapy response and predicting outcome. Alternative parameters computed through textural analysis were recently proposed to quantify the heterogeneity of tracer uptake by tumors as a significant predictor of response. The primary objective of this study was to evaluate the reproducibility of these heterogeneity measurements. Double baseline (18)F-FDG PET scans were acquired within 4 d of each other for 16 patients before any treatment was considered. A Bland-Altman analysis was performed on 8 parameters based on histogram measurements and 17 parameters based on textural heterogeneity features after discretization with values between 8 and 128. The reproducibility of maximum and mean SUV was similar to that in previously reported studies, with a mean percentage difference of 4.7% ± 19.5% and 5.5% ± 21.2%, respectively. By comparison, better reproducibility was measured for some textural features describing local heterogeneity of tracer uptake, such as entropy and homogeneity, with a mean percentage difference of -2% ± 5.4% and 1.8% ± 11.5%, respectively. Several regional heterogeneity parameters such as variability in the intensity and size of regions of homogeneous activity distribution had reproducibility similar to that of SUV measurements, with 95% confidence intervals of -22.5% to 3.1% and -1.1% to 23.5%, respectively. These parameters were largely insensitive to the discretization range. Several parameters derived from textural analysis describing heterogeneity of tracer uptake by tumors on local and regional scales had reproducibility similar to or better than that of simple SUV measurements. These reproducibility results suggest that these (18)F-FDG PET-derived parameters, which have already been shown to have predictive and prognostic value in certain cancer models, may be used to monitor therapy response and predict patient outcome.
NASA Technical Reports Server (NTRS)
Townsend, James C.; Weston, Robert P.; Eidson, Thomas M.
1993-01-01
The Framework for Interdisciplinary Design Optimization (FIDO) is a general programming environment for automating the distribution of complex computing tasks over a networked system of heterogeneous computers. For example, instead of manually passing a complex design problem between its diverse specialty disciplines, the FIDO system provides for automatic interactions between the discipline tasks and facilitates their communications. The FIDO system networks all the computers involved into a distributed heterogeneous computing system, so they have access to centralized data and can work on their parts of the total computation simultaneously in parallel whenever possible. Thus, each computational task can be done by the most appropriate computer. Results can be viewed as they are produced and variables changed manually for steering the process. The software is modular in order to ease migration to new problems: different codes can be substituted for each of the current code modules with little or no effect on the others. The potential for commercial use of FIDO rests in the capability it provides for automatically coordinating diverse computations on a networked system of workstations and computers. For example, FIDO could provide the coordination required for the design of vehicles or electronics or for modeling complex systems.
Abreu, Rui Mv; Froufe, Hugo Jc; Queiroz, Maria João Rp; Ferreira, Isabel Cfr
2010-10-28
Virtual screening of small molecules using molecular docking has become an important tool in drug discovery. However, large scale virtual screening is time demanding and usually requires dedicated computer clusters. There are a number of software tools that perform virtual screening using AutoDock4 but they require access to dedicated Linux computer clusters. Also no software is available for performing virtual screening with Vina using computer clusters. In this paper we present MOLA, an easy-to-use graphical user interface tool that automates parallel virtual screening using AutoDock4 and/or Vina in bootable non-dedicated computer clusters. MOLA automates several tasks including: ligand preparation, parallel AutoDock4/Vina jobs distribution and result analysis. When the virtual screening project finishes, an open-office spreadsheet file opens with the ligands ranked by binding energy and distance to the active site. All results files can automatically be recorded on an USB-flash drive or on the hard-disk drive using VirtualBox. MOLA works inside a customized Live CD GNU/Linux operating system, developed by us, that bypass the original operating system installed on the computers used in the cluster. This operating system boots from a CD on the master node and then clusters other computers as slave nodes via ethernet connections. MOLA is an ideal virtual screening tool for non-experienced users, with a limited number of multi-platform heterogeneous computers available and no access to dedicated Linux computer clusters. When a virtual screening project finishes, the computers can just be restarted to their original operating system. The originality of MOLA lies on the fact that, any platform-independent computer available can he added to the cluster, without ever using the computer hard-disk drive and without interfering with the installed operating system. With a cluster of 10 processors, and a potential maximum speed-up of 10x, the parallel algorithm of MOLA performed with a speed-up of 8,64× using AutoDock4 and 8,60× using Vina.
Quantum transport modelling of silicon nanobeams using heterogeneous computing scheme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harb, M., E-mail: harbm@physics.mcgill.ca; Michaud-Rioux, V., E-mail: vincentm@physics.mcgill.ca; Guo, H., E-mail: guo@physics.mcgill.ca
We report the development of a powerful method for quantum transport calculations of nanowire/nanobeam structures with large cross sectional area. Our approach to quantum transport is based on Green's functions and tight-binding potentials. A linear algebraic formulation allows us to harness the massively parallel nature of Graphics Processing Units (GPUs) and our implementation is based on a heterogeneous parallel computing scheme with traditional processors and GPUs working together. Using our software tool, the electronic and quantum transport properties of silicon nanobeams with a realistic cross sectional area of ∼22.7 nm{sup 2} and a length of ∼81.5 nm—comprising 105 000 Si atoms and 24 000more » passivating H atoms in the scattering region—are investigated. The method also allows us to perform significant averaging over impurity configurations—all possible configurations were considered in the case of single impurities. Finally, the effect of the position and number of vacancy defects on the transport properties was considered. It is found that the configurations with the vacancies lying closer to the local density of states (LDOS) maxima have lower transmission functions than the configurations with the vacancies located at LDOS minima or far away from LDOS maxima, suggesting both a qualitative method to tune or estimate optimal impurity configurations as well as a physical picture that accounts for device variability. Finally, we provide performance benchmarks for structures as large as ∼42.5 nm{sup 2} cross section and ∼81.5 nm length.« less
ERIC Educational Resources Information Center
DeVillar, Robert A.; Faltis, Christian J.
This book offers an alternative conceptual framework for effectively incorporating computer use within the heterogeneous classroom. The framework integrates Vygotskian social-learning theory with Allport's contact theory and the principles of cooperative learning. In Part 1 an essential element is identified for each of these areas. These are, in…
Best, Katharine; Oakes, Theres; Heather, James M.; Shawe-Taylor, John; Chain, Benny
2015-01-01
The polymerase chain reaction (PCR) is one of the most widely used techniques in molecular biology. In combination with High Throughput Sequencing (HTS), PCR is widely used to quantify transcript abundance for RNA-seq, and in the context of analysis of T and B cell receptor repertoires. In this study, we combine DNA barcoding with HTS to quantify PCR output from individual target molecules. We develop computational tools that simulate both the PCR branching process itself, and the subsequent subsampling which typically occurs during HTS sequencing. We explore the influence of different types of heterogeneity on sequencing output, and compare them to experimental results where the efficiency of amplification is measured by barcodes uniquely identifying each molecule of starting template. Our results demonstrate that the PCR process introduces substantial amplification heterogeneity, independent of primer sequence and bulk experimental conditions. This heterogeneity can be attributed both to inherited differences between different template DNA molecules, and the inherent stochasticity of the PCR process. The results demonstrate that PCR heterogeneity arises even when reaction and substrate conditions are kept as constant as possible, and therefore single molecule barcoding is essential in order to derive reproducible quantitative results from any protocol combining PCR with HTS. PMID:26459131
Axelrod, David E; Vedula, Sudeepti; Obaniyi, James
2017-05-01
The effectiveness of cancer chemotherapy is limited by intra-tumor heterogeneity, the emergence of spontaneous and induced drug-resistant mutant subclones, and the maximum dose to which normal tissues can be exposed without adverse side effects. The goal of this project was to determine if intermittent schedules of the maximum dose that allows colon crypt maintenance could overcome these limitations, specifically by eliminating mixtures of drug-resistant mutants from heterogeneous early colon adenomas while maintaining colon crypt function. A computer model of cell dynamics in human colon crypts was calibrated with measurements of human biopsy specimens. The model allowed simulation of continuous and intermittent dose schedules of a cytotoxic chemotherapeutic drug, as well as the drug's effect on the elimination of mutant cells and the maintenance of crypt function. Colon crypts can tolerate a tenfold greater intermittent dose than constant dose. This allows elimination of a mixture of relatively drug-sensitive and drug-resistant mutant subclones from heterogeneous colon crypts. Mutants can be eliminated whether they arise spontaneously or are induced by the cytotoxic drug. An intermittent dose, at the maximum that allows colon crypt maintenance, can be effective in eliminating a heterogeneous mixture of mutant subclones before they fill the crypt and form an adenoma.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haack, Jeffrey; Shohet, Gil
2016-12-02
The software implements a heterogeneous multiscale method (HMM), which involves solving a classical molecular dynamics (MD) problem and then computes the entropy production in order to compute the relaxation times towards equilibrium for use in a Bhatnagar-Gross-Krook (BGK) solver.
Measuring the effects of heterogeneity on distributed systems
NASA Technical Reports Server (NTRS)
El-Toweissy, Mohamed; Zeineldine, Osman; Mukkamala, Ravi
1991-01-01
Distributed computer systems in daily use are becoming more and more heterogeneous. Currently, much of the design and analysis studies of such systems assume homogeneity. This assumption of homogeneity has been mainly driven by the resulting simplicity in modeling and analysis. A simulation study is presented which investigated the effects of heterogeneity on scheduling algorithms for hard real time distributed systems. In contrast to previous results which indicate that random scheduling may be as good as a more complex scheduler, this algorithm is shown to be consistently better than a random scheduler. This conclusion is more prevalent at high workloads as well as at high levels of heterogeneity.
A Joint Method of Envelope Inversion Combined with Hybrid-domain Full Waveform Inversion
NASA Astrophysics Data System (ADS)
CUI, C.; Hou, W.
2017-12-01
Full waveform inversion (FWI) aims to construct high-precision subsurface models by fully using the information in seismic records, including amplitude, travel time, phase and so on. However, high non-linearity and the absence of low frequency information in seismic data lead to the well-known cycle skipping problem and make inversion easily fall into local minima. In addition, those 3D inversion methods that are based on acoustic approximation ignore the elastic effects in real seismic field, and make inversion harder. As a result, the accuracy of final inversion results highly relies on the quality of initial model. In order to improve stability and quality of inversion results, multi-scale inversion that reconstructs subsurface model from low to high frequency are applied. But, the absence of very low frequencies (< 3Hz) in field data is still bottleneck in the FWI. By extracting ultra low-frequency data from field data, envelope inversion is able to recover low wavenumber model with a demodulation operator (envelope operator), though the low frequency data does not really exist in field data. To improve the efficiency and viability of the inversion, in this study, we proposed a joint method of envelope inversion combined with hybrid-domain FWI. First, we developed 3D elastic envelope inversion, and the misfit function and the corresponding gradient operator were derived. Then we performed hybrid-domain FWI with envelope inversion result as initial model which provides low wavenumber component of model. Here, forward modeling is implemented in the time domain and inversion in the frequency domain. To accelerate the inversion, we adopt CPU/GPU heterogeneous computing techniques. There were two levels of parallelism. In the first level, the inversion tasks are decomposed and assigned to each computation node by shot number. In the second level, GPU multithreaded programming is used for the computation tasks in each node, including forward modeling, envelope extraction, DFT (discrete Fourier transform) calculation and gradients calculation. Numerical tests demonstrated that the combined envelope inversion + hybrid-domain FWI could obtain much faithful and accurate result than conventional hybrid-domain FWI. The CPU/GPU heterogeneous parallel computation could improve the performance speed.
Monte Carlo simulation of photon migration in a cloud computing environment with MapReduce
Pratx, Guillem; Xing, Lei
2011-01-01
Monte Carlo simulation is considered the most reliable method for modeling photon migration in heterogeneous media. However, its widespread use is hindered by the high computational cost. The purpose of this work is to report on our implementation of a simple MapReduce method for performing fault-tolerant Monte Carlo computations in a massively-parallel cloud computing environment. We ported the MC321 Monte Carlo package to Hadoop, an open-source MapReduce framework. In this implementation, Map tasks compute photon histories in parallel while a Reduce task scores photon absorption. The distributed implementation was evaluated on a commercial compute cloud. The simulation time was found to be linearly dependent on the number of photons and inversely proportional to the number of nodes. For a cluster size of 240 nodes, the simulation of 100 billion photon histories took 22 min, a 1258 × speed-up compared to the single-threaded Monte Carlo program. The overall computational throughput was 85,178 photon histories per node per second, with a latency of 100 s. The distributed simulation produced the same output as the original implementation and was resilient to hardware failure: the correctness of the simulation was unaffected by the shutdown of 50% of the nodes. PMID:22191916
NASA Astrophysics Data System (ADS)
Puckett, Elbridge Gerry; Turcotte, Donald L.; He, Ying; Lokavarapu, Harsha; Robey, Jonathan M.; Kellogg, Louise H.
2018-03-01
Geochemical observations of mantle-derived rocks favor a nearly homogeneous upper mantle, the source of mid-ocean ridge basalts (MORB), and heterogeneous lower mantle regions. Plumes that generate ocean island basalts are thought to sample the lower mantle regions and exhibit more heterogeneity than MORB. These regions have been associated with lower mantle structures known as large low shear velocity provinces (LLSVPS) below Africa and the South Pacific. The isolation of these regions is attributed to compositional differences and density stratification that, consequently, have been the subject of computational and laboratory modeling designed to determine the parameter regime in which layering is stable and understanding how layering evolves. Mathematical models of persistent compositional interfaces in the Earth's mantle may be inherently unstable, at least in some regions of the parameter space relevant to the mantle. Computing approximations to solutions of such problems presents severe challenges, even to state-of-the-art numerical methods. Some numerical algorithms for modeling the interface between distinct compositions smear the interface at the boundary between compositions, such as methods that add numerical diffusion or 'artificial viscosity' in order to stabilize the algorithm. We present two new algorithms for maintaining high-resolution and sharp computational boundaries in computations of these types of problems: a discontinuous Galerkin method with a bound preserving limiter and a Volume-of-Fluid interface tracking algorithm. We compare these new methods with two approaches widely used for modeling the advection of two distinct thermally driven compositional fields in mantle convection computations: a high-order accurate finite element advection algorithm with entropy viscosity and a particle method that carries a scalar quantity representing the location of each compositional field. All four algorithms are implemented in the open source finite element code ASPECT, which we use to compute the velocity, pressure, and temperature associated with the underlying flow field. We compare the performance of these four algorithms on three problems, including computing an approximation to the solution of an initially compositionally stratified fluid at Ra =105 with buoyancy numbers B that vary from no stratification at B = 0 to stratified flow at large B .
Characterizing heterogeneous cellular responses to perturbations.
Slack, Michael D; Martinez, Elisabeth D; Wu, Lani F; Altschuler, Steven J
2008-12-09
Cellular populations have been widely observed to respond heterogeneously to perturbation. However, interpreting the observed heterogeneity is an extremely challenging problem because of the complexity of possible cellular phenotypes, the large dimension of potential perturbations, and the lack of methods for separating meaningful biological information from noise. Here, we develop an image-based approach to characterize cellular phenotypes based on patterns of signaling marker colocalization. Heterogeneous cellular populations are characterized as mixtures of phenotypically distinct subpopulations, and responses to perturbations are summarized succinctly as probabilistic redistributions of these mixtures. We apply our method to characterize the heterogeneous responses of cancer cells to a panel of drugs. We find that cells treated with drugs of (dis-)similar mechanism exhibit (dis-)similar patterns of heterogeneity. Despite the observed phenotypic diversity of cells observed within our data, low-complexity models of heterogeneity were sufficient to distinguish most classes of drug mechanism. Our approach offers a computational framework for assessing the complexity of cellular heterogeneity, investigating the degree to which perturbations induce redistributions of a limited, but nontrivial, repertoire of underlying states and revealing functional significance contained within distinct patterns of heterogeneous responses.
Response variance in functional maps: neural darwinism revisited.
Takahashi, Hirokazu; Yokota, Ryo; Kanzaki, Ryohei
2013-01-01
The mechanisms by which functional maps and map plasticity contribute to cortical computation remain controversial. Recent studies have revisited the theory of neural Darwinism to interpret the learning-induced map plasticity and neuronal heterogeneity observed in the cortex. Here, we hypothesize that the Darwinian principle provides a substrate to explain the relationship between neuron heterogeneity and cortical functional maps. We demonstrate in the rat auditory cortex that the degree of response variance is closely correlated with the size of its representational area. Further, we show that the response variance within a given population is altered through training. These results suggest that larger representational areas may help to accommodate heterogeneous populations of neurons. Thus, functional maps and map plasticity are likely to play essential roles in Darwinian computation, serving as effective, but not absolutely necessary, structures to generate diverse response properties within a neural population.
Response Variance in Functional Maps: Neural Darwinism Revisited
Takahashi, Hirokazu; Yokota, Ryo; Kanzaki, Ryohei
2013-01-01
The mechanisms by which functional maps and map plasticity contribute to cortical computation remain controversial. Recent studies have revisited the theory of neural Darwinism to interpret the learning-induced map plasticity and neuronal heterogeneity observed in the cortex. Here, we hypothesize that the Darwinian principle provides a substrate to explain the relationship between neuron heterogeneity and cortical functional maps. We demonstrate in the rat auditory cortex that the degree of response variance is closely correlated with the size of its representational area. Further, we show that the response variance within a given population is altered through training. These results suggest that larger representational areas may help to accommodate heterogeneous populations of neurons. Thus, functional maps and map plasticity are likely to play essential roles in Darwinian computation, serving as effective, but not absolutely necessary, structures to generate diverse response properties within a neural population. PMID:23874733
Ontology based heterogeneous materials database integration and semantic query
NASA Astrophysics Data System (ADS)
Zhao, Shuai; Qian, Quan
2017-10-01
Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.
Dedicated heterogeneous node scheduling including backfill scheduling
Wood, Robert R [Livermore, CA; Eckert, Philip D [Livermore, CA; Hommes, Gregg [Pleasanton, CA
2006-07-25
A method and system for job backfill scheduling dedicated heterogeneous nodes in a multi-node computing environment. Heterogeneous nodes are grouped into homogeneous node sub-pools. For each sub-pool, a free node schedule (FNS) is created so that the number of to chart the free nodes over time. For each prioritized job, using the FNS of sub-pools having nodes useable by a particular job, to determine the earliest time range (ETR) capable of running the job. Once determined for a particular job, scheduling the job to run in that ETR. If the ETR determined for a lower priority job (LPJ) has a start time earlier than a higher priority job (HPJ), then the LPJ is scheduled in that ETR if it would not disturb the anticipated start times of any HPJ previously scheduled for a future time. Thus, efficient utilization and throughput of such computing environments may be increased by utilizing resources otherwise remaining idle.
Buettner, Florian; Natarajan, Kedar N; Casale, F Paolo; Proserpio, Valentina; Scialdone, Antonio; Theis, Fabian J; Teichmann, Sarah A; Marioni, John C; Stegle, Oliver
2015-02-01
Recent technical developments have enabled the transcriptomes of hundreds of cells to be assayed in an unbiased manner, opening up the possibility that new subpopulations of cells can be found. However, the effects of potential confounding factors, such as the cell cycle, on the heterogeneity of gene expression and therefore on the ability to robustly identify subpopulations remain unclear. We present and validate a computational approach that uses latent variable models to account for such hidden factors. We show that our single-cell latent variable model (scLVM) allows the identification of otherwise undetectable subpopulations of cells that correspond to different stages during the differentiation of naive T cells into T helper 2 cells. Our approach can be used not only to identify cellular subpopulations but also to tease apart different sources of gene expression heterogeneity in single-cell transcriptomes.
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-01-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices. PMID:25177107
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.
Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad
2014-11-01
Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.
NASA Astrophysics Data System (ADS)
Ibrahima, Fayadhoi; Meyer, Daniel; Tchelepi, Hamdi
2016-04-01
Because geophysical data are inexorably sparse and incomplete, stochastic treatments of simulated responses are crucial to explore possible scenarios and assess risks in subsurface problems. In particular, nonlinear two-phase flows in porous media are essential, yet challenging, in reservoir simulation and hydrology. Adding highly heterogeneous and uncertain input, such as the permeability and porosity fields, transforms the estimation of the flow response into a tough stochastic problem for which computationally expensive Monte Carlo (MC) simulations remain the preferred option.We propose an alternative approach to evaluate the probability distribution of the (water) saturation for the stochastic Buckley-Leverett problem when the probability distributions of the permeability and porosity fields are available. We give a computationally efficient and numerically accurate method to estimate the one-point probability density (PDF) and cumulative distribution functions (CDF) of the (water) saturation. The distribution method draws inspiration from a Lagrangian approach of the stochastic transport problem and expresses the saturation PDF and CDF essentially in terms of a deterministic mapping and the distribution and statistics of scalar random fields. In a large class of applications these random fields can be estimated at low computational costs (few MC runs), thus making the distribution method attractive. Even though the method relies on a key assumption of fixed streamlines, we show that it performs well for high input variances, which is the case of interest. Once the saturation distribution is determined, any one-point statistics thereof can be obtained, especially the saturation average and standard deviation. Moreover, the probability of rare events and saturation quantiles (e.g. P10, P50 and P90) can be efficiently derived from the distribution method. These statistics can then be used for risk assessment, as well as data assimilation and uncertainty reduction in the prior knowledge of input distributions. We provide various examples and comparisons with MC simulations to illustrate the performance of the method.
Igarashi, Jun; Shouno, Osamu; Fukai, Tomoki; Tsujino, Hiroshi
2011-11-01
Real-time simulation of a biologically realistic spiking neural network is necessary for evaluation of its capacity to interact with real environments. However, the real-time simulation of such a neural network is difficult due to its high computational costs that arise from two factors: (1) vast network size and (2) the complicated dynamics of biologically realistic neurons. In order to address these problems, mainly the latter, we chose to use general purpose computing on graphics processing units (GPGPUs) for simulation of such a neural network, taking advantage of the powerful computational capability of a graphics processing unit (GPU). As a target for real-time simulation, we used a model of the basal ganglia that has been developed according to electrophysiological and anatomical knowledge. The model consists of heterogeneous populations of 370 spiking model neurons, including computationally heavy conductance-based models, connected by 11,002 synapses. Simulation of the model has not yet been performed in real-time using a general computing server. By parallelization of the model on the NVIDIA Geforce GTX 280 GPU in data-parallel and task-parallel fashion, faster-than-real-time simulation was robustly realized with only one-third of the GPU's total computational resources. Furthermore, we used the GPU's full computational resources to perform faster-than-real-time simulation of three instances of the basal ganglia model; these instances consisted of 1100 neurons and 33,006 synapses and were synchronized at each calculation step. Finally, we developed software for simultaneous visualization of faster-than-real-time simulation output. These results suggest the potential power of GPGPU techniques in real-time simulation of realistic neural networks. Copyright © 2011 Elsevier Ltd. All rights reserved.
Federated data storage system prototype for LHC experiments and data intensive science
NASA Astrophysics Data System (ADS)
Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Ryabinkin, E.; Zarochentsev, A.
2017-10-01
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted physics computing community to evaluate new data handling and processing solutions. Russian grid sites and universities’ clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics collaborations. In our project we address the fundamental problem of designing a computing architecture to integrate distributed storage resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. Studies include development and implementation of federated data storage prototype for Worldwide LHC Computing Grid (WLCG) centres of different levels and University clusters within one National Cloud. The prototype is based on computing resources located in Moscow, Dubna, Saint Petersburg, Gatchina and Geneva. This project intends to implement a federated distributed storage for all kind of operations such as read/write/transfer and access via WAN from Grid centres, university clusters, supercomputers, academic and commercial clouds. The efficiency and performance of the system are demonstrated using synthetic and experiment-specific tests including real data processing and analysis workflows from ATLAS and ALICE experiments, as well as compute-intensive bioinformatics applications (PALEOMIX) running on supercomputers. We present topology and architecture of the designed system, report performance and statistics for different access patterns and show how federated data storage can be used efficiently by physicists and biologists. We also describe how sharing data on a widely distributed storage system can lead to a new computing model and reformations of computing style, for instance how bioinformatics program running on supercomputers can read/write data from the federated storage.
Track classification within wireless sensor network
NASA Astrophysics Data System (ADS)
Doumerc, Robin; Pannetier, Benjamin; Moras, Julien; Dezert, Jean; Canevet, Loic
2017-05-01
In this paper, we present our study on track classification by taking into account environmental information and target estimated states. The tracker uses several motion model adapted to different target dynamics (pedestrian, ground vehicle and SUAV, i.e. small unmanned aerial vehicle) and works in centralized architecture. The main idea is to explore both: classification given by heterogeneous sensors and classification obtained with our fusion module. The fusion module, presented in his paper, provides a class on each track according to track location, velocity and associated uncertainty. To model the likelihood on each class, a fuzzy approach is used considering constraints on target capability to move in the environment. Then the evidential reasoning approach based on Dempster-Shafer Theory (DST) is used to perform a time integration of this classifier output. The fusion rules are tested and compared on real data obtained with our wireless sensor network.In order to handle realistic ground target tracking scenarios, we use an autonomous smart computer deposited in the surveillance area. After the calibration step of the heterogeneous sensor network, our system is able to handle real data from a wireless ground sensor network. The performance of this system is evaluated in a real exercise for intelligence operation ("hunter hunt" scenario).
Mobility-Aware Caching and Computation Offloading in 5G Ultra-Dense Cellular Networks
Chen, Min; Hao, Yixue; Qiu, Meikang; Song, Jeungeun; Wu, Di; Humar, Iztok
2016-01-01
Recent trends show that Internet traffic is increasingly dominated by content, which is accompanied by the exponential growth of traffic. To cope with this phenomena, network caching is introduced to utilize the storage capacity of diverse network devices. In this paper, we first summarize four basic caching placement strategies, i.e., local caching, Device-to-Device (D2D) caching, Small cell Base Station (SBS) caching and Macrocell Base Station (MBS) caching. However, studies show that so far, much of the research has ignored the impact of user mobility. Therefore, taking the effect of the user mobility into consideration, we proposes a joint mobility-aware caching and SBS density placement scheme (MS caching). In addition, differences and relationships between caching and computation offloading are discussed. We present a design of a hybrid computation offloading and support it with experimental results, which demonstrate improved performance in terms of energy cost. Finally, we discuss the design of an incentive mechanism by considering network dynamics, differentiated user’s quality of experience (QoE) and the heterogeneity of mobile terminals in terms of caching and computing capabilities. PMID:27347975
Devi, D Chitra; Uthariaraj, V Rhymend
2016-01-01
Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods.
Devi, D. Chitra; Uthariaraj, V. Rhymend
2016-01-01
Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods. PMID:26955656
Mobility-Aware Caching and Computation Offloading in 5G Ultra-Dense Cellular Networks.
Chen, Min; Hao, Yixue; Qiu, Meikang; Song, Jeungeun; Wu, Di; Humar, Iztok
2016-06-25
Recent trends show that Internet traffic is increasingly dominated by content, which is accompanied by the exponential growth of traffic. To cope with this phenomena, network caching is introduced to utilize the storage capacity of diverse network devices. In this paper, we first summarize four basic caching placement strategies, i.e., local caching, Device-to-Device (D2D) caching, Small cell Base Station (SBS) caching and Macrocell Base Station (MBS) caching. However, studies show that so far, much of the research has ignored the impact of user mobility. Therefore, taking the effect of the user mobility into consideration, we proposes a joint mobility-aware caching and SBS density placement scheme (MS caching). In addition, differences and relationships between caching and computation offloading are discussed. We present a design of a hybrid computation offloading and support it with experimental results, which demonstrate improved performance in terms of energy cost. Finally, we discuss the design of an incentive mechanism by considering network dynamics, differentiated user's quality of experience (QoE) and the heterogeneity of mobile terminals in terms of caching and computing capabilities.
The emergence of spatial cyberinfrastructure.
Wright, Dawn J; Wang, Shaowen
2011-04-05
Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge.
The emergence of spatial cyberinfrastructure
Wright, Dawn J.; Wang, Shaowen
2011-01-01
Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge. PMID:21467227
DOE Office of Scientific and Technical Information (OSTI.GOV)
Radtke, M.A.
This paper will chronicle the activity at Wisconsin Public Service Corporation (WPSC) that resulted in the complete migration of a traditional, late 1970`s vintage, Energy Management System (EMS). The new environment includes networked microcomputers, minicomputers, and the corporate mainframe, and provides on-line access to employees outside the energy control center and some WPSC customers. In the late 1980`s, WPSC was forecasting an EMS computer upgrade or replacement to address both capacity and technology needs. Reasoning that access to diverse computing resources would best position the company to accommodate the uncertain needs of the energy industry in the 90`s, WPSC chosemore » to investigate an in-place migration to a network of computers, able to support heterogeneous hardware and operating systems. The system was developed in a modular fashion, with individual modules being deployed as soon as they were completed. The functional and technical specification was continuously enhanced as operating experience was gained from each operational module. With the migration off the original EMS computers complete, the networked system called DEMAXX (Distributed Energy Management Architecture with eXtensive eXpandability) has exceeded expectations in the areas of: cost, performance, flexibility, and reliability.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Radtke, M.A.
This paper will chronicle the activity at Wisconsin Public Service Corporation (WPSC) that resulted in the complete migration of a traditional, late 1970`s vintage, Energy management System (EMS). The new environment includes networked microcomputers, minicomputers, and the corporate mainframe, and provides on-line access to employees outside the energy control center and some WPSC customers. In the late 1980`s, WPSC was forecasting an EMS computer upgrade or replacement to address both capacity and technology needs. Reasoning that access to diverse computing resources would best position the company to accommodate the uncertain needs of the energy industry in the 90`s, WPSC chosemore » to investigate an in-place migration to a network of computers, able to support heterogeneous hardware and operating systems. The system was developed in a modular fashion, with individual modules being deployed as soon as they were completed. The functional and technical specification was continuously enhanced as operating experience was gained from each operational module. With the migration of the original EMS computers complete, the networked system called DEMAXX (Distributed Energy Management Architecture with eXtensive eXpandability) has exceeded expectations in the areas of: cost, performance, flexibility, and reliability.« less
Scheduling Operations for Massive Heterogeneous Clusters
NASA Technical Reports Server (NTRS)
Humphrey, John; Spagnoli, Kyle
2013-01-01
High-performance computing (HPC) programming has become increasingly difficult with the advent of hybrid supercomputers consisting of multicore CPUs and accelerator boards such as the GPU. Manual tuning of software to achieve high performance on this type of machine has been performed by programmers. This is needlessly difficult and prone to being invalidated by new hardware, new software, or changes in the underlying code. A system was developed for task-based representation of programs, which when coupled with a scheduler and runtime system, allows for many benefits, including higher performance and utilization of computational resources, easier programming and porting, and adaptations of code during runtime. The system consists of a method of representing computer algorithms as a series of data-dependent tasks. The series forms a graph, which can be scheduled for execution on many nodes of a supercomputer efficiently by a computer algorithm. The schedule is executed by a dispatch component, which is tailored to understand all of the hardware types that may be available within the system. The scheduler is informed by a cluster mapping tool, which generates a topology of available resources and their strengths and communication costs. Software is decoupled from its hardware, which aids in porting to future architectures. A computer algorithm schedules all operations, which for systems of high complexity (i.e., most NASA codes), cannot be performed optimally by a human. The system aids in reducing repetitive code, such as communication code, and aids in the reduction of redundant code across projects. It adds new features to code automatically, such as recovering from a lost node or the ability to modify the code while running. In this project, the innovators at the time of this reporting intend to develop two distinct technologies that build upon each other and both of which serve as building blocks for more efficient HPC usage. First is the scheduling and dynamic execution framework, and the second is scalable linear algebra libraries that are built directly on the former.
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems
NASA Technical Reports Server (NTRS)
Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.
2000-01-01
The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Bertalan, Tom; Wu, Yan; Laing, Carlo; Gear, C. William; Kevrekidis, Ioannis G.
2017-01-01
Finding accurate reduced descriptions for large, complex, dynamically evolving networks is a crucial enabler to their simulation, analysis, and ultimately design. Here, we propose and illustrate a systematic and powerful approach to obtaining good collective coarse-grained observables—variables successfully summarizing the detailed state of such networks. Finding such variables can naturally lead to successful reduced dynamic models for the networks. The main premise enabling our approach is the assumption that the behavior of a node in the network depends (after a short initial transient) on the node identity: a set of descriptors that quantify the node properties, whether intrinsic (e.g., parameters in the node evolution equations) or structural (imparted to the node by its connectivity in the particular network structure). The approach creates a natural link with modeling and “computational enabling technology” developed in the context of Uncertainty Quantification. In our case, however, we will not focus on ensembles of different realizations of a problem, each with parameters randomly selected from a distribution. We will instead study many coupled heterogeneous units, each characterized by randomly assigned (heterogeneous) parameter value(s). One could then coin the term Heterogeneity Quantification for this approach, which we illustrate through a model dynamic network consisting of coupled oscillators with one intrinsic heterogeneity (oscillator individual frequency) and one structural heterogeneity (oscillator degree in the undirected network). The computational implementation of the approach, its shortcomings and possible extensions are also discussed. PMID:28659781
Application Portable Parallel Library
NASA Technical Reports Server (NTRS)
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard
2007-01-01
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Accelerating DNA analysis applications on GPU clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Thomas Martin; Patton, Bruce W.; Weber, Charles F.
The primary goal of this project is to evaluate x-ray spectra generated within a scanning electron microscope (SEM) to determine elemental composition of small samples. This will be accomplished by performing Monte Carlo simulations of the electron and photon interactions in the sample and in the x-ray detector. The elemental inventories will be determined by an inverse process that progressively reduces the difference between the measured and simulated x-ray spectra by iteratively adjusting composition and geometric variables in the computational model. The intended benefit of this work will be to develop a method to perform quantitative analysis on substandard samplesmore » (heterogeneous phases, rough surfaces, small sizes, etc.) without involving standard elemental samples or empirical matrix corrections (i.e., true standardless quantitative analysis).« less
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
ERIC Educational Resources Information Center
Abdullah, Sopiah; Shariff, Adilah
2008-01-01
The purpose of the study was to investigate the effects of inquiry-based computer simulation with heterogeneous-ability cooperative learning (HACL) and inquiry-based computer simulation with friendship cooperative learning (FCL) on (a) scientific reasoning (SR) and (b) conceptual understanding (CU) among Form Four students in Malaysian Smart…
2011-08-09
fastest 10 supercomputers in the world. Both systems rely on GPU co-processing, one using AMD cards, the second, called Nebulae , using NVIDIA Tesla...Page 9 of 10 UNCLASSIFIED capability of almost 3 petaflop/s, the highest in TOP500, Nebulae only holds the No. 2 position on the TOP500 list of the
ERIC Educational Resources Information Center
Mueller, Julie; Wood, Eileen; Willoughby, Teena; Ross, Craig; Specht, Jacqueline
2008-01-01
Given the prevalence of computers in education today, it is critical to understand teachers' perspectives regarding computer integration in their classrooms. The current study surveyed a random sample of a heterogeneous group of 185 elementary and 204 secondary teachers in order to provide a comprehensive summary of teacher characteristics and…
Zhu, Yizhou; He, Xingfeng; Mo, Yifei
2015-12-11
All-solid-state Li-ion batteries based on ceramic solid electrolyte materials are a promising next-generation energy storage technology with high energy density and enhanced cycle life. The poor interfacial conductance is one of the key limitations in enabling all-solid-state Li-ion batteries. However, the origin of this poor conductance has not been understood, and there is limited knowledge about the solid electrolyte–electrode interfaces in all-solid-state Li-ion batteries. In this paper, we performed first principles calculations to evaluate the thermodynamics of the interfaces between solid electrolyte and electrode materials and to identify the chemical and electrochemical stabilities of these interfaces. Our computation results revealmore » that many solid electrolyte–electrode interfaces have limited chemical and electrochemical stability, and that the formation of interphase layers is thermodynamically favorable at these interfaces. These formed interphase layers with different properties significantly affect the electrochemical performance of all-solid-state Li-ion batteries. The mechanisms of applying interfacial coating layers to stabilize the interface and to reduce interfacial resistance are illustrated by our computation. This study demonstrates a computational scheme to evaluate the chemical and electrochemical stability of heterogeneous solid interfaces. Finally, the enhanced understanding of the interfacial phenomena provides the strategies of interface engineering to improve performances of all-solid-state Li-ion batteries.« less
NASA Astrophysics Data System (ADS)
Tereshin, Alexander A.; Usilin, Sergey A.; Arlazarov, Vladimir V.
2018-04-01
This paper aims to study the problem of multi-class object detection in video stream with Viola-Jones cascades. An adaptive algorithm for selecting Viola-Jones cascade based on greedy choice strategy in solution of the N-armed bandit problem is proposed. The efficiency of the algorithm on the problem of detection and recognition of the bank card logos in the video stream is shown. The proposed algorithm can be effectively used in documents localization and identification, recognition of road scene elements, localization and tracking of the lengthy objects , and for solving other problems of rigid object detection in a heterogeneous data flows. The computational efficiency of the algorithm makes it possible to use it both on personal computers and on mobile devices based on processors with low power consumption.
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak
2001-01-01
The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
Next generation communications satellites: multiple access and network studies
NASA Technical Reports Server (NTRS)
Meadows, H. E.; Schwartz, M.; Stern, T. E.; Ganguly, S.; Kraimeche, B.; Matsuo, K.; Gopal, I.
1982-01-01
Efficient resource allocation and network design for satellite systems serving heterogeneous user populations with large numbers of small direct-to-user Earth stations are discussed. Focus is on TDMA systems involving a high degree of frequency reuse by means of satellite-switched multiple beams (SSMB) with varying degrees of onboard processing. Algorithms for the efficient utilization of the satellite resources were developed. The effect of skewed traffic, overlapping beams and batched arrivals in packet-switched SSMB systems, integration of stream and bursty traffic, and optimal circuit scheduling in SSMB systems: performance bounds and computational complexity are discussed.
Bhatt, Divesh; Zuckerman, Daniel M.
2010-01-01
We performed “weighted ensemble” path–sampling simulations of adenylate kinase, using several semi–atomistic protein models. The models have an all–atom backbone with various levels of residue interactions. The primary result is that full statistically rigorous path sampling required only a few weeks of single–processor computing time with these models, indicating the addition of further chemical detail should be readily feasible. Our semi–atomistic path ensembles are consistent with previous biophysical findings: the presence of two distinct pathways, identification of intermediates, and symmetry of forward and reverse pathways. PMID:21660120
Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
NASA Astrophysics Data System (ADS)
Zhang, Yihuai; Lebedev, Maxim; Al-Yaseri, Ahmed; Yu, Hongyan; Nwidee, Lezorgia N.; Sarmadivaleh, Mohammad; Barifcani, Ahmed; Iglauer, Stefan
2018-03-01
Pore-scale analysis of carbonate rock is of great relevance to the oil and gas industry owing to their vast application potentials. Although, efficient fluid flow at pore scale is often disrupted owing to the tight rock matrix and complex heterogeneity of limestone microstructures, factors such as porosity, permeability and effective stress greatly impact the rock microstructures; as such an understanding of the effect of these variables is vital for various natural and engineered processes. In this study, the Savonnières limestone as a carbonate mineral was evaluated at micro scales using X-ray micro-computed tomography at high resolutions (3.43 μm and 1.25 μm voxel size) under different effective stress (0 MPa, 20 MPa) to ascertain limestone microstructure and gas permeability and porosity effect. The waterflooding (5 wt% NaCl) test was conducted using microCT in-situ scanning and nanoindentation test was also performed to evaluate microscale geomechanical heterogeneity of the rock. The nanoindentation test results showed that the nano/micro scale geomechanical properties are quite heterogeneous where the indentation modulus for the weak consolidated area was as low as 1 GPa. We observed that the fluid flow easily broke some less-consolidated areas (low indentation modulus) area, coupled with increase in porosity; and consistent with fines/particles migration and re-sedimentation were identified, although the effective stress showed only a minor effect on the rock microstructure.
Concurrent heterogeneous neural model simulation on real-time neuromimetic hardware.
Rast, Alexander; Galluppi, Francesco; Davies, Sergio; Plana, Luis; Patterson, Cameron; Sharp, Thomas; Lester, David; Furber, Steve
2011-11-01
Dedicated hardware is becoming increasingly essential to simulate emerging very-large-scale neural models. Equally, however, it needs to be able to support multiple models of the neural dynamics, possibly operating simultaneously within the same system. This may be necessary either to simulate large models with heterogeneous neural types, or to simplify simulation and analysis of detailed, complex models in a large simulation by isolating the new model to a small subpopulation of a larger overall network. The SpiNNaker neuromimetic chip is a dedicated neural processor able to support such heterogeneous simulations. Implementing these models on-chip uses an integrated library-based tool chain incorporating the emerging PyNN interface that allows a modeller to input a high-level description and use an automated process to generate an on-chip simulation. Simulations using both LIF and Izhikevich models demonstrate the ability of the SpiNNaker system to generate and simulate heterogeneous networks on-chip, while illustrating, through the network-scale effects of wavefront synchronisation and burst gating, methods that can provide effective behavioural abstractions for large-scale hardware modelling. SpiNNaker's asynchronous virtual architecture permits greater scope for model exploration, with scalable levels of functional and temporal abstraction, than conventional (or neuromorphic) computing platforms. The complete system illustrates a potential path to understanding the neural model of computation, by building (and breaking) neural models at various scales, connecting the blocks, then comparing them against the biology: computational cognitive neuroscience. Copyright © 2011 Elsevier Ltd. All rights reserved.
The challenge of big data in public health: an opportunity for visual analytics.
Ola, Oluwakemi; Sedig, Kamran
2014-01-01
Public health (PH) data can generally be characterized as big data. The efficient and effective use of this data determines the extent to which PH stakeholders can sufficiently address societal health concerns as they engage in a variety of work activities. As stakeholders interact with data, they engage in various cognitive activities such as analytical reasoning, decision-making, interpreting, and problem solving. Performing these activities with big data is a challenge for the unaided mind as stakeholders encounter obstacles relating to the data's volume, variety, velocity, and veracity. Such being the case, computer-based information tools are needed to support PH stakeholders. Unfortunately, while existing computational tools are beneficial in addressing certain work activities, they fall short in supporting cognitive activities that involve working with large, heterogeneous, and complex bodies of data. This paper presents visual analytics (VA) tools, a nascent category of computational tools that integrate data analytics with interactive visualizations, to facilitate the performance of cognitive activities involving big data. Historically, PH has lagged behind other sectors in embracing new computational technology. In this paper, we discuss the role that VA tools can play in addressing the challenges presented by big data. In doing so, we demonstrate the potential benefit of incorporating VA tools into PH practice, in addition to highlighting the need for further systematic and focused research.
The Challenge of Big Data in Public Health: An Opportunity for Visual Analytics
Ola, Oluwakemi; Sedig, Kamran
2014-01-01
Public health (PH) data can generally be characterized as big data. The efficient and effective use of this data determines the extent to which PH stakeholders can sufficiently address societal health concerns as they engage in a variety of work activities. As stakeholders interact with data, they engage in various cognitive activities such as analytical reasoning, decision-making, interpreting, and problem solving. Performing these activities with big data is a challenge for the unaided mind as stakeholders encounter obstacles relating to the data’s volume, variety, velocity, and veracity. Such being the case, computer-based information tools are needed to support PH stakeholders. Unfortunately, while existing computational tools are beneficial in addressing certain work activities, they fall short in supporting cognitive activities that involve working with large, heterogeneous, and complex bodies of data. This paper presents visual analytics (VA) tools, a nascent category of computational tools that integrate data analytics with interactive visualizations, to facilitate the performance of cognitive activities involving big data. Historically, PH has lagged behind other sectors in embracing new computational technology. In this paper, we discuss the role that VA tools can play in addressing the challenges presented by big data. In doing so, we demonstrate the potential benefit of incorporating VA tools into PH practice, in addition to highlighting the need for further systematic and focused research. PMID:24678376
Chan, Sheng-Chieh; Chang, Kai-Ping; Fang, Yu-Hua Dean; Tsang, Ngan-Ming; Ng, Shu-Hang; Hsu, Cheng-Lung; Liao, Chun-Ta; Yen, Tzu-Chen
2017-01-01
Plasma Epstein-Barr virus (EBV) DNA concentrations predict prognosis in patients with nasopharyngeal carcinoma (NPC). Recent evidence also indicates that intratumor heterogeneity on F-18 fluorodeoxyglucose positron emission tomography ( 18 F-FDG PET) scans is predictive of treatment outcomes in different solid malignancies. Here, we sought to investigate the prognostic value of heterogeneity parameters in patients with primary NPC. Retrospective cohort study. We examined 101 patients with primary NPC who underwent pretreatment 18 F-FDG PET/computed tomography. Circulating levels of EBV DNA were measured in all participants. The following PET heterogeneity parameters were collected: histogram-based heterogeneity parameters, second-order texture features (uniformity, contrast, entropy, homogeneity, dissimilarity, inverse difference moment), and higher-order (coarseness, contrast, busyness, complexity, strength) texture features. The median follow-up time was 5.14 years. Total lesion glycolysis (TLG), tumor heterogeneity measured by histogram-based parameter skewness, and the majority of second-order or higher-order texture features were significantly associated with overall survival (OS) and/or recurrence-free survival (RFS). In multivariate analysis, age (P =.005), EBV DNA load (P = .0002), and uniformity (P = .001) independently predicted OS. Only skewness retained the independent prognostic significance for RFS. Tumor stage, standardized uptake value, or TLG did not show an independent association with survival endpoints. The combination of uniformity, EBV DNA load, and age resulted in a more reliable prognostic stratification (P < .001). Tumor heterogeneity is superior to traditional PET parameters for predicting outcomes in primary NPC. The combination of uniformity with EBV DNA load can improve prognostic stratification in this clinical entity. 4 Laryngoscope, 127:E22-E28, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Spatial localization in heterogeneous systems
NASA Astrophysics Data System (ADS)
Kao, Hsien-Ching; Beaume, Cédric; Knobloch, Edgar
2014-01-01
We study spatial localization in the generalized Swift-Hohenberg equation with either quadratic-cubic or cubic-quintic nonlinearity subject to spatially heterogeneous forcing. Different types of forcing (sinusoidal or Gaussian) with different spatial scales are considered and the corresponding localized snaking structures are computed. The results indicate that spatial heterogeneity exerts a significant influence on the location of spatially localized structures in both parameter space and physical space, and on their stability properties. The results are expected to assist in the interpretation of experiments on localized structures where departures from spatial homogeneity are generally unavoidable.
Modelling of strong heterogeneities in aerosol single scattering albedos over a polluted region
NASA Astrophysics Data System (ADS)
Mallet, M.; Pont, V.; Liousse, C.
2005-05-01
To date, most models dedicated to the investigation of aerosol direct or semi-direct radiative forcings have assumed the various aerosol components to be either completely externally mixed or homogeneously internally mixed. Some recent works have shown that a core-shell treatment of particles should be more realistic, leading to significant differences in the radiative impact as compared to only externally or well-internally mixed states. To account for these studies, an optical module, ORISAM-RAD, has been developed for computing aerosol radiative properties under the hypothesis of internally mixed particles with a n-layer spherical concentric structure. Mesoscale simulations using ORISAM-RAD, coupled with the 3D mesoscale model Meso-NH-C, have been performed for one selected day (06/24/2001) during the ESCOMPTE experiment in the Marseilles-Fos/Berre region, which illustrate the ability of this new module to reproduce spatial heterogeneities of measured single scattering albedo (ωo), due to industrial and/or urban pollution plumes.
Challenge Paper: Validation of Forensic Techniques for Criminal Prosecution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erbacher, Robert F.; Endicott-Popovsky, Barbara E.; Frincke, Deborah A.
2007-04-10
Abstract: As in many domains, there is increasing agreement in the user and research community that digital forensics analysts would benefit from the extension, development and application of advanced techniques in performing large scale and heterogeneous data analysis. Modern digital forensics analysis of cyber-crimes and cyber-enabled crimes often requires scrutiny of massive amounts of data. For example, a case involving network compromise across multiple enterprises might require forensic analysis of numerous sets of network logs and computer hard drives, potentially involving 100?s of gigabytes of heterogeneous data, or even terabytes or petabytes of data. Also, the goal for forensic analysismore » is to not only determine whether the illicit activity being considered is taking place, but also to identify the source of the activity and the full extent of the compromise or impact on the local network. Even after this analysis, there remains the challenge of using the results in subsequent criminal and civil processes.« less
Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization
Liu, Jin; Huang, Jian; Ma, Shuangge
2013-01-01
Summary In cancer diagnosis studies, high-throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the “large d, small n” feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single datasets can be unsatisfactory. A cost-effective remedy is to conduct integrative analysis of multiple heterogeneous datasets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer datasets, show satisfactory performance of the proposed methods. PMID:24578589
The GPU implementation of micro - Doppler period estimation
NASA Astrophysics Data System (ADS)
Yang, Liyuan; Wang, Junling; Bi, Ran
2018-03-01
Aiming at the problem that the computational complexity and the deficiency of real-time of the wideband radar echo signal, a program is designed to improve the performance of real-time extraction of micro-motion feature in this paper based on the CPU-GPU heterogeneous parallel structure. Firstly, we discuss the principle of the micro-Doppler effect generated by the rolling of the scattering points on the orbiting satellite, analyses how to use Kalman filter to compensate the translational motion of tumbling satellite and how to use the joint time-frequency analysis and inverse Radon transform to extract the micro-motion features from the echo after compensation. Secondly, the advantages of GPU in terms of real-time processing and the working principle of CPU-GPU heterogeneous parallelism are analysed, and a program flow based on GPU to extract the micro-motion feature from the radar echo signal of rolling satellite is designed. At the end of the article the results of extraction are given to verify the correctness of the program and algorithm.
NASA Astrophysics Data System (ADS)
Janardhanan, Vinod M.; Deutschmann, Olaf
Direct internal reforming in solid oxide fuel cell (SOFC) results in increased overall efficiency of the system. Present study focus on the chemical and electrochemical process in an internally reforming anode supported SOFC button cell running on humidified CH 4 (3% H 2 O). The computational approach employs a detailed multi-step model for heterogeneous chemistry in the anode, modified Butler-Volmer formalism for the electrochemistry and Dusty Gas Model (DGM) for the porous media transport. Two-dimensional elliptic model equations are solved for a button cell configuration. The electrochemical model assumes hydrogen as the only electrochemically active species. The predicted cell performances are compared with experimental reports. The results show that model predictions are in good agreement with experimental observation except the open circuit potentials. Furthermore, the steam content in the anode feed stream is found to have remarkable effect on the resulting overpotential losses and surface coverages of various species at the three-phase boundary.
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...
2017-11-06
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
Heterogeneous Monolithic Integration of Single-Crystal Organic Materials.
Park, Kyung Sun; Baek, Jangmi; Park, Yoonkyung; Lee, Lynn; Hyon, Jinho; Koo Lee, Yong-Eun; Shrestha, Nabeen K; Kang, Youngjong; Sung, Myung Mo
2017-02-01
Manufacturing high-performance organic electronic circuits requires the effective heterogeneous integration of different nanoscale organic materials with uniform morphology and high crystallinity in a desired arrangement. In particular, the development of high-performance organic electronic and optoelectronic devices relies on high-quality single crystals that show optimal intrinsic charge-transport properties and electrical performance. Moreover, the heterogeneous integration of organic materials on a single substrate in a monolithic way is highly demanded for the production of fundamental organic electronic components as well as complex integrated circuits. Many of the various methods that have been designed to pattern multiple heterogeneous organic materials on a substrate and the heterogeneous integration of organic single crystals with their crystal growth are described here. Critical issues that have been encountered in the development of high-performance organic integrated electronics are also addressed. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Wahlberg, Nanna; Madsen, Anders Ø; Mikkelsen, Kurt V
2018-06-09
We have investigated the mechanism of the nucleation of acetaminophen on poly(methyl-methacrylate) and poly(vinyl-acetate) utilizing a combination of quantum mechanical computations and electrostatic models. We have used a heterogeneous dielectric solvation model to determine the stability of different orientations of acetaminophen on polymer surfaces. We find that for the nucleation of acetaminophen on the polymer surfaces in vacuum, the most stable orientation is a flat orientation. For the nucleation process in solution where acetaminophen and the polymer surface are surrounded by a solvent, we find that the heterogeneous dielectric solvation model predicts that a sideways orientation is the most stable orientation.
A FFT-based formulation for discrete dislocation dynamics in heterogeneous media
NASA Astrophysics Data System (ADS)
Bertin, N.; Capolungo, L.
2018-02-01
In this paper, an extension of the DDD-FFT approach presented in [1] is developed for heterogeneous elasticity. For such a purpose, an iterative spectral formulation in which convolutions are calculated in the Fourier space is developed to solve for the mechanical state associated with the discrete eigenstrain-based microstructural representation. With this, the heterogeneous DDD-FFT approach is capable of treating anisotropic and heterogeneous elasticity in a computationally efficient manner. In addition, a GPU implementation is presented to allow for further acceleration. As a first example, the approach is used to investigate the interaction between dislocations and second-phase particles, thereby demonstrating its ability to inherently incorporate image forces arising from elastic inhomogeneities.
Strain heterogeneity in sheared colloids revealed by neutron scattering
Chen, Kevin; Wu, Bin; He, Lilin; ...
2018-02-07
Recent computational and theoretical studies have shown that the deformation of colloidal suspensions under a steady shear is highly heterogeneous at the particle level and demonstrate a critical influence on the macroscopic deformation behavior. Despite its relevance to a wide variety of industrial applications of colloidal suspensions, scattering studies focusing on addressing the heterogeneity of the non-equilibrium colloidal structure are scarce thus far. Here in this paper, we report the first experimental result using small-angle neutron scattering. From the evolution of strain heterogeneity, we conclude that the shear-induced deformation transforms from nearly affine behavior at low shear rates, to plasticmore » rearrangements when the shear rate is high.« less
Mägi, Reedik; Horikoshi, Momoko; Sofer, Tamar; Mahajan, Anubha; Kitajima, Hidetoshi; Franceschini, Nora; McCarthy, Mark I.; Morris, Andrew P.
2017-01-01
Abstract Trans-ethnic meta-analysis of genome-wide association studies (GWAS) across diverse populations can increase power to detect complex trait loci when the underlying causal variants are shared between ancestry groups. However, heterogeneity in allelic effects between GWAS at these loci can occur that is correlated with ancestry. Here, a novel approach is presented to detect SNP association and quantify the extent of heterogeneity in allelic effects that is correlated with ancestry. We employ trans-ethnic meta-regression to model allelic effects as a function of axes of genetic variation, derived from a matrix of mean pairwise allele frequency differences between GWAS, and implemented in the MR-MEGA software. Through detailed simulations, we demonstrate increased power to detect association for MR-MEGA over fixed- and random-effects meta-analysis across a range of scenarios of heterogeneity in allelic effects between ethnic groups. We also demonstrate improved fine-mapping resolution, in loci containing a single causal variant, compared to these meta-analysis approaches and PAINTOR, and equivalent performance to MANTRA at reduced computational cost. Application of MR-MEGA to trans-ethnic GWAS of kidney function in 71,461 individuals indicates stronger signals of association than fixed-effects meta-analysis when heterogeneity in allelic effects is correlated with ancestry. Application of MR-MEGA to fine-mapping four type 2 diabetes susceptibility loci in 22,086 cases and 42,539 controls highlights: (i) strong evidence for heterogeneity in allelic effects that is correlated with ancestry only at the index SNP for the association signal at the CDKAL1 locus; and (ii) 99% credible sets with six or fewer variants for five distinct association signals. PMID:28911207
NASA Astrophysics Data System (ADS)
De Lucia, Marco; Kempka, Thomas; Afanasyev, Andrey; Melnik, Oleg; Kühn, Michael
2016-04-01
Coupled reactive transport simulations, especially in heterogeneous settings considering multiphase flow, are extremely time consuming and suffer from significant numerical issues compared to purely hydrodynamic simulations. This represents a major hurdle in the assessment of geological subsurface utilization, since it constrains the practical application of reactive transport modelling to coarse spatial discretization or oversimplified geological settings. In order to overcome such limitations, De Lucia et al. [1] developed and validated a one-way coupling approach between geochemistry and hydrodynamics, which is particularly well suited for CO2 storage simulations, while being of general validity. In the present study, the models used for the validation of the one-way coupling approach introduced by De Lucia et al. (2015), and originally performed with the TOUGHREACT simulator, are transferred to and benchmarked against the multiphase reservoir simulator MUFITS [2]. The geological model is loosely inspired by an existing CO2 storage site. Its grid comprises 2,950 elements enclosed in a single layer, but reflecting a realistic three-dimensional anticline geometry. For the purpose of this comparison, homogeneous and heterogeneous scenarios in terms of porosity and permeability were investigated. In both cases, the results of the MUFITS simulator are in excellent agreement with those produced with the fully-coupled TOUGHREACT simulator, while profiting from significantly higher computational performance. This study demonstrates how a computationally efficient simulator such as MUFITS can be successfully included in a coupled process simulation framework, and also suggests ameliorations and specific strategies for the coupling of chemical processes with hydrodynamics and heat transport, aiming at tackling geoscientific problems beyond the storage of CO2. References [1] De Lucia, M., Kempka, T., and Kühn, M. A coupling alternative to reactive transport simulations for long-term prediction of chemical reactions in heterogeneous CO2 storage systems, Geosci. Model Dev., 8, 279-294, 2015, doi:10.5194/gmd-8-279-2015 [2] Afanasyev, A.A. Application of the reservoir simulator MUFITS for 3D modeling of CO2 storage in geological formations, Energy Procedia, 40, 365-374, 2013, doi:10.1016/j.egypro.2013.08.042
NEXUS - Resilient Intelligent Middleware
NASA Astrophysics Data System (ADS)
Kaveh, N.; Hercock, R. Ghanea
Service-oriented computing, a composition of distributed-object computing, component-based, and Web-based concepts, is becoming the widespread choice for developing dynamic heterogeneous software assets available as services across a network. One of the major strengths of service-oriented technologies is the high abstraction layer and large granularity level at which software assets are viewed compared to traditional object-oriented technologies. Collaboration through encapsulated and separately defined service interfaces creates a service-oriented environment, whereby multiple services can be linked together through their interfaces to compose a functional system. This approach enables better integration of legacy and non-legacy services, via wrapper interfaces, and allows for service composition at a more abstract level especially in cases such as vertical market stacks. The heterogeneous nature of service-oriented technologies and the granularity of their software components makes them a suitable computing model in the pervasive domain.
Strong Effects of Vs30 Heterogeneity on Physics-Based Scenario Ground-Shaking Computations
NASA Astrophysics Data System (ADS)
Louie, J. N.; Pullammanappallil, S. K.
2014-12-01
Hazard mapping and building codes worldwide use the vertically time-averaged shear-wave velocity between the surface and 30 meters depth, Vs30, as one predictor of earthquake ground shaking. Intensive field campaigns a decade ago in Reno, Los Angeles, and Las Vegas measured urban Vs30 transects with 0.3-km spacing. The Clark County, Nevada, Parcel Map includes urban Las Vegas and comprises over 10,000 site measurements over 1500 km2, completed in 2010. All of these data demonstrate fractal spatial statistics, with a fractal dimension of 1.5-1.8 at scale lengths from 0.5 km to 50 km. Vs measurements in boreholes up to 400 m deep show very similar statistics at 1 m to 200 m lengths. When included in physics-based earthquake-scenario ground-shaking computations, the highly heterogeneous Vs30 maps exhibit unexpectedly strong influence. In sensitivity tests (image below), low-frequency computations at 0.1 Hz display amplifications (as well as de-amplifications) of 20% due solely to Vs30. In 0.5-1.0 Hz computations, the amplifications are a factor of two or more. At 0.5 Hz and higher frequencies the amplifications can be larger than what the 1-d Building Code equations would predict from the Vs30 variations. Vs30 heterogeneities at one location have strong influence on amplifications at other locations, stretching out in the predominant direction of wave propagation for that scenario. The sensitivity tests show that shaking and amplifications are highly scenario-dependent. Animations of computed ground motions and how they evolve with time suggest that the fractal Vs30 variance acts to trap wave energy and increases the duration of shaking. Validations of the computations against recorded ground motions, possible in Las Vegas Valley due to the measurements of the Clark County Parcel Map, show that ground motion levels and amplifications match, while recorded shaking has longer duration than computed shaking. Several mechanisms may explain the amplification and increased duration of shaking in the presence of heterogeneous spatial distributions of Vs: conservation of wave energy across velocity changes; geometric focusing of waves by low-velocity lenses; vertical resonance and trapping; horizontal resonance and trapping; and multiple conversion of P- to S-wave energy.
Filters for Improvement of Multiscale Data from Atomistic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, David J.; Reynolds, Daniel R.
Multiscale computational models strive to produce accurate and efficient numerical simulations of systems involving interactions across multiple spatial and temporal scales that typically differ by several orders of magnitude. Some such models utilize a hybrid continuum-atomistic approach combining continuum approximations with first-principles-based atomistic models to capture multiscale behavior. By following the heterogeneous multiscale method framework for developing multiscale computational models, unknown continuum scale data can be computed from an atomistic model. Concurrently coupling the two models requires performing numerous atomistic simulations which can dominate the computational cost of the method. Furthermore, when the resulting continuum data is noisy due tomore » sampling error, stochasticity in the model, or randomness in the initial conditions, filtering can result in significant accuracy gains in the computed multiscale data without increasing the size or duration of the atomistic simulations. In this work, we demonstrate the effectiveness of spectral filtering for increasing the accuracy of noisy multiscale data obtained from atomistic simulations. Moreover, we present a robust and automatic method for closely approximating the optimum level of filtering in the case of additive white noise. By improving the accuracy of this filtered simulation data, it leads to a dramatic computational savings by allowing for shorter and smaller atomistic simulations to achieve the same desired multiscale simulation precision.« less
Filters for Improvement of Multiscale Data from Atomistic Simulations
Gardner, David J.; Reynolds, Daniel R.
2017-01-05
Multiscale computational models strive to produce accurate and efficient numerical simulations of systems involving interactions across multiple spatial and temporal scales that typically differ by several orders of magnitude. Some such models utilize a hybrid continuum-atomistic approach combining continuum approximations with first-principles-based atomistic models to capture multiscale behavior. By following the heterogeneous multiscale method framework for developing multiscale computational models, unknown continuum scale data can be computed from an atomistic model. Concurrently coupling the two models requires performing numerous atomistic simulations which can dominate the computational cost of the method. Furthermore, when the resulting continuum data is noisy due tomore » sampling error, stochasticity in the model, or randomness in the initial conditions, filtering can result in significant accuracy gains in the computed multiscale data without increasing the size or duration of the atomistic simulations. In this work, we demonstrate the effectiveness of spectral filtering for increasing the accuracy of noisy multiscale data obtained from atomistic simulations. Moreover, we present a robust and automatic method for closely approximating the optimum level of filtering in the case of additive white noise. By improving the accuracy of this filtered simulation data, it leads to a dramatic computational savings by allowing for shorter and smaller atomistic simulations to achieve the same desired multiscale simulation precision.« less
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems. PMID:25143977
Jiang, Yuyi; Shao, Zhiqing; Guo, Yi
2014-01-01
A complex computing problem can be solved efficiently on a system with multiple computing nodes by dividing its implementation code into several parallel processing modules or tasks that can be formulated as directed acyclic graph (DAG) problems. The DAG jobs may be mapped to and scheduled on the computing nodes to minimize the total execution time. Searching an optimal DAG scheduling solution is considered to be NP-complete. This paper proposed a tuple molecular structure-based chemical reaction optimization (TMSCRO) method for DAG scheduling on heterogeneous computing systems, based on a very recently proposed metaheuristic method, chemical reaction optimization (CRO). Comparing with other CRO-based algorithms for DAG scheduling, the design of tuple reaction molecular structure and four elementary reaction operators of TMSCRO is more reasonable. TMSCRO also applies the concept of constrained critical paths (CCPs), constrained-critical-path directed acyclic graph (CCPDAG) and super molecule for accelerating convergence. In this paper, we have also conducted simulation experiments to verify the effectiveness and efficiency of TMSCRO upon a large set of randomly generated graphs and the graphs for real world problems.
Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Bailey, David (Technical Monitor)
1996-01-01
The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.
Design and implementation of a high performance network security processor
NASA Astrophysics Data System (ADS)
Wang, Haixin; Bai, Guoqiang; Chen, Hongyi
2010-03-01
The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.