NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network
NASA Astrophysics Data System (ADS)
Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan
2017-01-01
The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
Job Scheduling in a Heterogeneous Grid Environment
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Smith, Warren; Oliker, Leonid; Biswas, Rupak
2004-01-01
Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. However, a number of major technical hurdles must be overcome before this potential can be realized. One problem that is critical to effective utilization of computational grids is the efficient scheduling of jobs. This work addresses this problem by describing and evaluating a grid scheduling architecture and three job migration algorithms. The architecture is scalable and does not assume control of local site resources. The job migration policies use the availability and performance of computer systems, the network bandwidth available between systems, and the volume of input and output data associated with each job. An extensive performance comparison is presented using real workloads from leading computational centers. The results, based on several key metrics, demonstrate that the performance of our distributed migration algorithms is significantly greater than that of a local scheduling framework and comparable to a non-scalable global scheduling approach.
NASA Astrophysics Data System (ADS)
Manfredi, Sabato
2016-06-01
Large-scale dynamic systems are becoming highly pervasive in their occurrence with applications ranging from system biology, environment monitoring, sensor networks, and power systems. They are characterised by high dimensionality, complexity, and uncertainty in the node dynamic/interactions that require more and more computational demanding methods for their analysis and control design, as well as the network size and node system/interaction complexity increase. Therefore, it is a challenging problem to find scalable computational method for distributed control design of large-scale networks. In this paper, we investigate the robust distributed stabilisation problem of large-scale nonlinear multi-agent systems (briefly MASs) composed of non-identical (heterogeneous) linear dynamical systems coupled by uncertain nonlinear time-varying interconnections. By employing Lyapunov stability theory and linear matrix inequality (LMI) technique, new conditions are given for the distributed control design of large-scale MASs that can be easily solved by the toolbox of MATLAB. The stabilisability of each node dynamic is a sufficient assumption to design a global stabilising distributed control. The proposed approach improves some of the existing LMI-based results on MAS by both overcoming their computational limits and extending the applicative scenario to large-scale nonlinear heterogeneous MASs. Additionally, the proposed LMI conditions are further reduced in terms of computational requirement in the case of weakly heterogeneous MASs, which is a common scenario in real application where the network nodes and links are affected by parameter uncertainties. One of the main advantages of the proposed approach is to allow to move from a centralised towards a distributed computing architecture so that the expensive computation workload spent to solve LMIs may be shared among processors located at the networked nodes, thus increasing the scalability of the approach than the network size. Finally, a numerical example shows the applicability of the proposed method and its advantage in terms of computational complexity when compared with the existing approaches.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
Song, Fengguang; Dongarra, Jack
2014-10-01
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Fengguang; Dongarra, Jack
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.
Delussu, Giovanni; Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.
A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data
Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi
2016-01-01
This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes. PMID:27936191
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
Scuba: scalable kernel-based gene prioritization.
Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio
2018-01-25
The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .
A Domain-Specific Language for Aviation Domain Interoperability
ERIC Educational Resources Information Center
Comitz, Paul
2013-01-01
Modern information systems require a flexible, scalable, and upgradeable infrastructure that allows communication and collaboration between heterogeneous information processing and computing environments. Aviation systems from different organizations often use differing representations and distribution policies for the same data and messages,…
Gogoshin, Grigoriy; Boerwinkle, Eric
2017-01-01
Abstract Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology—type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types—single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc. PMID:27681505
Gogoshin, Grigoriy; Boerwinkle, Eric; Rodin, Andrei S
2017-04-01
Bayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs. To overcome these and other obstacles, we present BNOmics, an improved algorithm and software toolkit for inferring and analyzing BNs from omics datasets. BNOmics aims at comprehensive systems biology-type data exploration, including both generating new biological hypothesis and testing and validating the existing ones. Novel aspects of the algorithm center around increasing scalability and applicability to varying data types (with different explicit and implicit distributional assumptions) within the same analysis framework. An output and visualization interface to widely available graph-rendering software is also included. Three diverse applications are detailed. BNOmics was originally developed in the context of genetic epidemiology data and is being continuously optimized to keep pace with the ever-increasing inflow of available large-scale omics datasets. As such, the software scalability and usability on the less than exotic computer hardware are a priority, as well as the applicability of the algorithm and software to the heterogeneous datasets containing many data types-single-nucleotide polymorphisms and other genetic/epigenetic/transcriptome variables, metabolite levels, epidemiological variables, endpoints, and phenotypes, etc.
AWE-WQ: fast-forwarding molecular dynamics using the accelerated weighted ensemble.
Abdul-Wahid, Badi'; Feng, Haoyun; Rajan, Dinesh; Costaouec, Ronan; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2014-10-27
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy.
AWE-WQ: Fast-Forwarding Molecular Dynamics Using the Accelerated Weighted Ensemble
2015-01-01
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy. PMID:25207854
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
NASA Astrophysics Data System (ADS)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten
2017-11-01
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sulakhe, D.; Rodriguez, A.; Wilde, M.
2008-03-01
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
Cheetah: A Framework for Scalable Hierarchical Collective Operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graham, Richard L; Gorentla Venkata, Manjunath; Ladd, Joshua S
2011-01-01
Collective communication operations, used by many scientific applications, tend to limit overall parallel application performance and scalability. Computer systems are becoming more heterogeneous with increasing node and core-per-node counts. Also, a growing number of data-access mechanisms, of varying characteristics, are supported within a single computer system. We describe a new hierarchical collective communication framework that takes advantage of hardware-specific data-access mechanisms. It is flexible, with run-time hierarchy specification, and sharing of collective communication primitives between collective algorithms. Data buffers are shared between levels in the hierarchy reducing collective communication management overhead. We have implemented several versions of the Message Passingmore » Interface (MPI) collective operations, MPI Barrier() and MPI Bcast(), and run experiments using up to 49, 152 processes on a Cray XT5, and a small InfiniBand based cluster. At 49, 152 processes our barrier implementation outperforms the optimized native implementation by 75%. 32 Byte and one Mega-Byte broadcasts outperform it by 62% and 11%, respectively, with better scalability characteristics. Improvements relative to the default Open MPI implementation are much larger.« less
Folding Proteins at 500 ns/hour with Work Queue.
Abdul-Wahid, Badi'; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2012-10-01
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.
Folding Proteins at 500 ns/hour with Work Queue
Abdul-Wahid, Badi’; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A.
2014-01-01
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour. PMID:25540799
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Azevedo, Eduardo; Abbott, Stephen; Koskela, Tuomas
The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning for balancing computational work in pushing particlesmore » and in grid related work, scalable and accurate discretization algorithms for non-linear Coulomb collisions, and communication-avoiding subcycling technology for pushing particles on both CPUs and GPUs are also utilized to dramatically improve the scalability and time-to-solution, hence enabling the difficult kinetic ITER edge simulation on a present-day leadership class computer.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...
2017-11-27
Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less
Using Computing and Data Grids for Large-Scale Science and Engineering
NASA Technical Reports Server (NTRS)
Johnston, William E.
2001-01-01
We use the term "Grid" to refer to a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. These emerging data and computing Grids promise to provide a highly capable and scalable environment for addressing large-scale science problems. We describe the requirements for science Grids, the resulting services and architecture of NASA's Information Power Grid (IPG) and DOE's Science Grid, and some of the scaling issues that have come up in their implementation.
NASA Astrophysics Data System (ADS)
Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin
2016-06-01
CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Flexible services for the support of research.
Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John
2013-01-28
Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
NASA Astrophysics Data System (ADS)
Benini, Luca
2017-06-01
The "internet of everything" envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this goal, but technology scaling is losing steam. Energy efficiency improvement will increasingly hinge on architecture, circuits, design techniques such as heterogeneous 3D integration, mixed-signal preprocessing, event-based approximate computing and non-Von-Neumann architectures for scalable acceleration.
Query optimization for graph analytics on linked data using SPARQL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Seokyong; Lee, Sangkeun; Lim, Seung -Hwan
2015-07-01
Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performancemore » of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.« less
Leverage hadoop framework for large scale clinical informatics applications.
Dong, Xiao; Bahroos, Neil; Sadhu, Eugene; Jackson, Tommie; Chukhman, Morris; Johnson, Robert; Boyd, Andrew; Hynes, Denise
2013-01-01
In this manuscript, we present our experiences using the Apache Hadoop framework for high data volume and computationally intensive applications, and discuss some best practice guidelines in a clinical informatics setting. There are three main aspects in our approach: (a) process and integrate diverse, heterogeneous data sources using standard Hadoop programming tools and customized MapReduce programs; (b) after fine-grained aggregate results are obtained, perform data analysis using the Mahout data mining library; (c) leverage the column oriented features in HBase for patient centric modeling and complex temporal reasoning. This framework provides a scalable solution to meet the rapidly increasing, imperative "Big Data" needs of clinical and translational research. The intrinsic advantage of fault tolerance, high availability and scalability of Hadoop platform makes these applications readily deployable at the enterprise level cluster environment.
Ohue, Masahito; Shimoda, Takehiro; Suzuki, Shuji; Matsuzaki, Yuri; Ishida, Takashi; Akiyama, Yutaka
2014-11-15
The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling. MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luszczek, Piotr R; Tomov, Stanimire Z; Dongarra, Jack J
We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems' algorithms - the LU, QR, and Cholesky factorizations. To generate the extreme level of parallelism needed for the efficient use of coprocessors, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multi-core CPUs andmore » coprocessors using a light-weight runtime system. The use of lightweight runtime systems keeps scheduling overhead low, while enabling the expression of parallelism through otherwise sequential code. This simplifies the development efforts and allows the exploration of the unique strengths of the various hardware components.« less
NASA Astrophysics Data System (ADS)
Mapakshi, N. K.; Chang, J.; Nakshatrala, K. B.
2018-04-01
Mathematical models for flow through porous media typically enjoy the so-called maximum principles, which place bounds on the pressure field. It is highly desirable to preserve these bounds on the pressure field in predictive numerical simulations, that is, one needs to satisfy discrete maximum principles (DMP). Unfortunately, many of the existing formulations for flow through porous media models do not satisfy DMP. This paper presents a robust, scalable numerical formulation based on variational inequalities (VI), to model non-linear flows through heterogeneous, anisotropic porous media without violating DMP. VI is an optimization technique that places bounds on the numerical solutions of partial differential equations. To crystallize the ideas, a modification to Darcy equations by taking into account pressure-dependent viscosity will be discretized using the lowest-order Raviart-Thomas (RT0) and Variational Multi-scale (VMS) finite element formulations. It will be shown that these formulations violate DMP, and, in fact, these violations increase with an increase in anisotropy. It will be shown that the proposed VI-based formulation provides a viable route to enforce DMP. Moreover, it will be shown that the proposed formulation is scalable, and can work with any numerical discretization and weak form. A series of numerical benchmark problems are solved to demonstrate the effects of heterogeneity, anisotropy and non-linearity on DMP violations under the two chosen formulations (RT0 and VMS), and that of non-linearity on solver convergence for the proposed VI-based formulation. Parallel scalability on modern computational platforms will be illustrated through strong-scaling studies, which will prove the efficiency of the proposed formulation in a parallel setting. Algorithmic scalability as the problem size is scaled up will be demonstrated through novel static-scaling studies. The performed static-scaling studies can serve as a guide for users to be able to select an appropriate discretization for a given problem size.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Harold C.; Ibanez, Daniel Alejandro
This report documents the ASC/ATDM Kokkos deliverable "Production Portable Dy- namic Task DAG Capability." This capability enables applications to create and execute a dynamic task DAG ; a collection of heterogeneous computational tasks with a directed acyclic graph (DAG) of "execute after" dependencies where tasks and their dependencies are dynamically created and destroyed as tasks execute. The Kokkos task scheduler executes the dynamic task DAG on the target execution resource; e.g. a multicore CPU, a manycore CPU such as Intel's Knights Landing (KNL), or an NVIDIA GPU. Several major technical challenges had to be addressed during development of Kokkos' Taskmore » DAG capability: (1) portability to a GPU with it's simplified hardware and micro- runtime, (2) thread-scalable memory allocation and deallocation from a bounded pool of memory, (3) thread-scalable scheduler for dynamic task DAG, (4) usability by applications.« less
Scalable and cost-effective NGS genotyping in the cloud.
Souilmi, Yassine; Lancaster, Alex K; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared B; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter J; Wall, Dennis P
2015-10-15
While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.
NASA Astrophysics Data System (ADS)
O'Malley, D.; Le, E. B.; Vesselinov, V. V.
2015-12-01
We present a fast, scalable, and highly-implementable stochastic inverse method for characterization of aquifer heterogeneity. The method utilizes recent advances in randomized matrix algebra and exploits the structure of the Quasi-Linear Geostatistical Approach (QLGA), without requiring a structured grid like Fast-Fourier Transform (FFT) methods. The QLGA framework is a more stable version of Gauss-Newton iterates for a large number of unknown model parameters, but provides unbiased estimates. The methods are matrix-free and do not require derivatives or adjoints, and are thus ideal for complex models and black-box implementation. We also incorporate randomized least-square solvers and data-reduction methods, which speed up computation and simulate missing data points. The new inverse methodology is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. Inversion results based on series of synthetic problems with steady-state and transient calibration data are presented.
Lima, Jakelyne; Cerdeira, Louise Teixeira; Bol, Erick; Schneider, Maria Paula Cruz; Silva, Artur; Azevedo, Vasco; Abelém, Antônio Jorge Gomes
2012-01-01
Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality. PMID:22461785
Visual Analytics for Heterogeneous Geoscience Data
NASA Astrophysics Data System (ADS)
Pan, Y.; Yu, L.; Zhu, F.; Rilee, M. L.; Kuo, K. S.; Jiang, H.; Yu, H.
2017-12-01
Geoscience data obtained from diverse sources have been routinely leveraged by scientists to study various phenomena. The principal data sources include observations and model simulation outputs. These data are characterized by spatiotemporal heterogeneity originated from different instrument design specifications and/or computational model requirements used in data generation processes. Such inherent heterogeneity poses several challenges in exploring and analyzing geoscience data. First, scientists often wish to identify features or patterns co-located among multiple data sources to derive and validate certain hypotheses. Heterogeneous data make it a tedious task to search such features in dissimilar datasets. Second, features of geoscience data are typically multivariate. It is challenging to tackle the high dimensionality of geoscience data and explore the relations among multiple variables in a scalable fashion. Third, there is a lack of transparency in traditional automated approaches, such as feature detection or clustering, in that scientists cannot intuitively interact with their analysis processes and interpret results. To address these issues, we present a new scalable approach that can assist scientists in analyzing voluminous and diverse geoscience data. We expose a high-level query interface that allows users to easily express their customized queries to search features of interest across multiple heterogeneous datasets. For identified features, we develop a visualization interface that enables interactive exploration and analytics in a linked-view manner. Specific visualization techniques such as scatter plots to parallel coordinates are employed in each view to allow users to explore various aspects of features. Different views are linked and refreshed according to user interactions in any individual view. In such a manner, a user can interactively and iteratively gain understanding into the data through a variety of visual analytics operations. We demonstrate with use cases how scientists can combine the query and visualization interfaces to enable a customized workflow facilitating studies using heterogeneous geoscience datasets.
Concurrent heterogeneous neural model simulation on real-time neuromimetic hardware.
Rast, Alexander; Galluppi, Francesco; Davies, Sergio; Plana, Luis; Patterson, Cameron; Sharp, Thomas; Lester, David; Furber, Steve
2011-11-01
Dedicated hardware is becoming increasingly essential to simulate emerging very-large-scale neural models. Equally, however, it needs to be able to support multiple models of the neural dynamics, possibly operating simultaneously within the same system. This may be necessary either to simulate large models with heterogeneous neural types, or to simplify simulation and analysis of detailed, complex models in a large simulation by isolating the new model to a small subpopulation of a larger overall network. The SpiNNaker neuromimetic chip is a dedicated neural processor able to support such heterogeneous simulations. Implementing these models on-chip uses an integrated library-based tool chain incorporating the emerging PyNN interface that allows a modeller to input a high-level description and use an automated process to generate an on-chip simulation. Simulations using both LIF and Izhikevich models demonstrate the ability of the SpiNNaker system to generate and simulate heterogeneous networks on-chip, while illustrating, through the network-scale effects of wavefront synchronisation and burst gating, methods that can provide effective behavioural abstractions for large-scale hardware modelling. SpiNNaker's asynchronous virtual architecture permits greater scope for model exploration, with scalable levels of functional and temporal abstraction, than conventional (or neuromorphic) computing platforms. The complete system illustrates a potential path to understanding the neural model of computation, by building (and breaking) neural models at various scales, connecting the blocks, then comparing them against the biology: computational cognitive neuroscience. Copyright © 2011 Elsevier Ltd. All rights reserved.
Efficient Process Migration for Parallel Processing on Non-Dedicated Networks of Workstations
NASA Technical Reports Server (NTRS)
Chanchio, Kasidit; Sun, Xian-He
1996-01-01
This paper presents the design and preliminary implementation of MpPVM, a software system that supports process migration for PVM application programs in a non-dedicated heterogeneous computing environment. New concepts of migration point as well as migration point analysis and necessary data analysis are introduced. In MpPVM, process migrations occur only at previously inserted migration points. Migration point analysis determines appropriate locations to insert migration points; whereas, necessary data analysis provides a minimum set of variables to be transferred at each migration pint. A new methodology to perform reliable point-to-point data communications in a migration environment is also discussed. Finally, a preliminary implementation of MpPVM and its experimental results are presented, showing the correctness and promising performance of our process migration mechanism in a scalable non-dedicated heterogeneous computing environment. While MpPVM is developed on top of PVM, the process migration methodology introduced in this study is general and can be applied to any distributed software environment.
Execution of parallel algorithms on a heterogeneous multicomputer
NASA Astrophysics Data System (ADS)
Isenstein, Barry S.; Greene, Jonathon
1995-04-01
Many aerospace/defense sensing and dual-use applications require high-performance computing, extensive high-bandwidth interconnect and realtime deterministic operation. This paper will describe the architecture of a scalable multicomputer that includes DSP and RISC processors. A single chassis implementation is capable of delivering in excess of 10 GFLOPS of DSP processing power with 2 Gbytes/s of realtime sensor I/O. A software approach to implementing parallel algorithms called the Parallel Application System (PAS) is also presented. An example of applying PAS to a DSP application is shown.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
NASA Astrophysics Data System (ADS)
Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.
1995-03-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, S.; Zacharia, T.; Baltas, N.
1995-04-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
CFD Research, Parallel Computation and Aerodynamic Optimization
NASA Technical Reports Server (NTRS)
Ryan, James S.
1995-01-01
During the last five years, CFD has matured substantially. Pure CFD research remains to be done, but much of the focus has shifted to integration of CFD into the design process. The work under these cooperative agreements reflects this trend. The recent work, and work which is planned, is designed to enhance the competitiveness of the US aerospace industry. CFD and optimization approaches are being developed and tested, so that the industry can better choose which methods to adopt in their design processes. The range of computer architectures has been dramatically broadened, as the assumption that only huge vector supercomputers could be useful has faded. Today, researchers and industry can trade off time, cost, and availability, choosing vector supercomputers, scalable parallel architectures, networked workstations, or heterogenous combinations of these to complete required computations efficiently.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.
Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...
2017-03-08
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Accelerating Climate Simulations Through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark
2009-01-01
Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
Solving global shallow water equations on heterogeneous supercomputers
Fu, Haohuan; Gan, Lin; Yang, Chao; Xue, Wei; Wang, Lanning; Wang, Xinliang; Huang, Xiaomeng; Yang, Guangwen
2017-01-01
The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved. PMID:28282428
High-Order/Low-Order methods for ocean modeling
Newman, Christopher; Womeldorff, Geoff; Chacón, Luis; ...
2015-06-01
In this study, we examine a High Order/Low Order (HOLO) approach for a z-level ocean model and show that the traditional semi-implicit and split-explicit methods, as well as a recent preconditioning strategy, can easily be cast in the framework of HOLO methods. The HOLO formulation admits an implicit-explicit method that is algorithmically scalable and second-order accurate, allowing timesteps much larger than the barotropic time scale. We show how HOLO approaches, in particular the implicit-explicit method, can provide a solid route for ocean simulation to heterogeneous computing and exascale environments.
Demand Response Resource Quantification with Detailed Building Energy Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hale, Elaine; Horsey, Henry; Merket, Noel
Demand response is a broad suite of technologies that enables changes in electrical load operations in support of power system reliability and efficiency. Although demand response is not a new concept, there is new appetite for comprehensively evaluating its technical potential in the context of renewable energy integration. The complexity of demand response makes this task difficult -- we present new methods for capturing the heterogeneity of potential responses from buildings, their time-varying nature, and metrics such as thermal comfort that help quantify likely acceptability of specific demand response actions. Computed with an automated software framework, the methods are scalable.
NASA Technical Reports Server (NTRS)
Sanyal, Soumya; Jain, Amit; Das, Sajal K.; Biswas, Rupak
2003-01-01
In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.
NASA Astrophysics Data System (ADS)
Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik
2016-09-01
We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
Challenges of Big Data Analysis.
Fan, Jianqing; Han, Fang; Liu, Han
2014-06-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Parallel computing in experimental mechanics and optical measurement: A review (II)
NASA Astrophysics Data System (ADS)
Wang, Tianyi; Kemao, Qian
2018-05-01
With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
Challenges of Big Data Analysis
Fan, Jianqing; Han, Fang; Liu, Han
2014-01-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions. PMID:25419469
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
OWL: A scalable Monte Carlo simulation suite for finite-temperature study of materials
NASA Astrophysics Data System (ADS)
Li, Ying Wai; Yuk, Simuck F.; Cooper, Valentino R.; Eisenbach, Markus; Odbadrakh, Khorgolkhuu
The OWL suite is a simulation package for performing large-scale Monte Carlo simulations. Its object-oriented, modular design enables it to interface with various external packages for energy evaluations. It is therefore applicable to study the finite-temperature properties for a wide range of systems: from simple classical spin models to materials where the energy is evaluated by ab initio methods. This scheme not only allows for the study of thermodynamic properties based on first-principles statistical mechanics, it also provides a means for massive, multi-level parallelism to fully exploit the capacity of modern heterogeneous computer architectures. We will demonstrate how improved strong and weak scaling is achieved by employing novel, parallel and scalable Monte Carlo algorithms, as well as the applications of OWL to a few selected frontier materials research problems. This research was supported by the Office of Science of the Department of Energy under contract DE-AC05-00OR22725.
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Plaza, Javier; Paz, Abel
2010-10-01
Latest generation remote sensing instruments (called hyperspectral imagers) are now able to generate hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. In previous work, we have reported that the scalability of parallel processing algorithms dealing with these high-dimensional data volumes is affected by the amount of data to be exchanged through the communication network of the system. However, large messages are common in hyperspectral imaging applications since processing algorithms are pixel-based, and each pixel vector to be exchanged through the communication network is made up of hundreds of spectral values. Thus, decreasing the amount of data to be exchanged could improve the scalability and parallel performance. In this paper, we propose a new framework based on intelligent utilization of wavelet-based data compression techniques for improving the scalability of a standard hyperspectral image processing chain on heterogeneous networks of workstations. This type of parallel platform is quickly becoming a standard in hyperspectral image processing due to the distributed nature of collected hyperspectral data as well as its flexibility and low cost. Our experimental results indicate that adaptive lossy compression can lead to improvements in the scalability of the hyperspectral processing chain without sacrificing analysis accuracy, even at sub-pixel precision levels.
Modular Universal Scalable Ion-trap Quantum Computer
2016-06-02
SECURITY CLASSIFICATION OF: The main goal of the original MUSIQC proposal was to construct and demonstrate a modular and universally- expandable ion...Distribution Unlimited UU UU UU UU 02-06-2016 1-Aug-2010 31-Jan-2016 Final Report: Modular Universal Scalable Ion-trap Quantum Computer The views...P.O. Box 12211 Research Triangle Park, NC 27709-2211 Ion trap quantum computation, scalable modular architectures REPORT DOCUMENTATION PAGE 11
A scalable neuroinformatics data flow for electrophysiological signals using MapReduce.
Jayapandian, Catherine; Wei, Annan; Ramesh, Priya; Zonjy, Bilal; Lhatoo, Samden D; Loparo, Kenneth; Zhang, Guo-Qiang; Sahoo, Satya S
2015-01-01
Data-driven neuroscience research is providing new insights in progression of neurological disorders and supporting the development of improved treatment approaches. However, the volume, velocity, and variety of neuroscience data generated from sophisticated recording instruments and acquisition methods have exacerbated the limited scalability of existing neuroinformatics tools. This makes it difficult for neuroscience researchers to effectively leverage the growing multi-modal neuroscience data to advance research in serious neurological disorders, such as epilepsy. We describe the development of the Cloudwave data flow that uses new data partitioning techniques to store and analyze electrophysiological signal in distributed computing infrastructure. The Cloudwave data flow uses MapReduce parallel programming algorithm to implement an integrated signal data processing pipeline that scales with large volume of data generated at high velocity. Using an epilepsy domain ontology together with an epilepsy focused extensible data representation format called Cloudwave Signal Format (CSF), the data flow addresses the challenge of data heterogeneity and is interoperable with existing neuroinformatics data representation formats, such as HDF5. The scalability of the Cloudwave data flow is evaluated using a 30-node cluster installed with the open source Hadoop software stack. The results demonstrate that the Cloudwave data flow can process increasing volume of signal data by leveraging Hadoop Data Nodes to reduce the total data processing time. The Cloudwave data flow is a template for developing highly scalable neuroscience data processing pipelines using MapReduce algorithms to support a variety of user applications.
A scalable neuroinformatics data flow for electrophysiological signals using MapReduce
Jayapandian, Catherine; Wei, Annan; Ramesh, Priya; Zonjy, Bilal; Lhatoo, Samden D.; Loparo, Kenneth; Zhang, Guo-Qiang; Sahoo, Satya S.
2015-01-01
Data-driven neuroscience research is providing new insights in progression of neurological disorders and supporting the development of improved treatment approaches. However, the volume, velocity, and variety of neuroscience data generated from sophisticated recording instruments and acquisition methods have exacerbated the limited scalability of existing neuroinformatics tools. This makes it difficult for neuroscience researchers to effectively leverage the growing multi-modal neuroscience data to advance research in serious neurological disorders, such as epilepsy. We describe the development of the Cloudwave data flow that uses new data partitioning techniques to store and analyze electrophysiological signal in distributed computing infrastructure. The Cloudwave data flow uses MapReduce parallel programming algorithm to implement an integrated signal data processing pipeline that scales with large volume of data generated at high velocity. Using an epilepsy domain ontology together with an epilepsy focused extensible data representation format called Cloudwave Signal Format (CSF), the data flow addresses the challenge of data heterogeneity and is interoperable with existing neuroinformatics data representation formats, such as HDF5. The scalability of the Cloudwave data flow is evaluated using a 30-node cluster installed with the open source Hadoop software stack. The results demonstrate that the Cloudwave data flow can process increasing volume of signal data by leveraging Hadoop Data Nodes to reduce the total data processing time. The Cloudwave data flow is a template for developing highly scalable neuroscience data processing pipelines using MapReduce algorithms to support a variety of user applications. PMID:25852536
Opportunities and choice in a new vector era
NASA Astrophysics Data System (ADS)
Nowak, A.
2014-06-01
This work discusses the significant changes in computing landscape related to the progression of Moore's Law, and the implications on scientific computing. Particular attention is devoted to the High Energy Physics domain (HEP), which has always made good use of threading, but levels of parallelism closer to the hardware were often left underutilized. Findings of the CERN openlab Platform Competence Center are reported in the context of expanding "performance dimensions", and especially the resurgence of vectors. These suggest that data oriented designs are feasible in HEP and have considerable potential for performance improvements on multiple levels, but will rarely trump algorithmic enhancements. Finally, an analysis of upcoming hardware and software technologies identifies heterogeneity as a major challenge for software, which will require more emphasis on scalable, efficient design.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
The future of PanDA in ATLAS distributed computing
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.
2015-12-01
Experiments at the Large Hadron Collider (LHC) face unprecedented computing challenges. Heterogeneous resources are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, while data processing requires more than a few billion hours of computing usage per year. The PanDA (Production and Distributed Analysis) system was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. In the process, the old batch job paradigm of locally managed computing in HEP was discarded in favour of a far more automated, flexible and scalable model. The success of PanDA in ATLAS is leading to widespread adoption and testing by other experiments. PanDA is the first exascale workload management system in HEP, already operating at more than a million computing jobs per day, and processing over an exabyte of data in 2013. There are many new challenges that PanDA will face in the near future, in addition to new challenges of scale, heterogeneity and increasing user base. PanDA will need to handle rapidly changing computing infrastructure, will require factorization of code for easier deployment, will need to incorporate additional information sources including network metrics in decision making, be able to control network circuits, handle dynamically sized workload processing, provide improved visualization, and face many other challenges. In this talk we will focus on the new features, planned or recently implemented, that are relevant to the next decade of distributed computing workload management using PanDA.
Computer simulation of heterogeneous polymer photovoltaic devices
NASA Astrophysics Data System (ADS)
Kodali, Hari K.; Ganapathysubramanian, Baskar
2012-04-01
Polymer-based photovoltaic devices have the potential for widespread usage due to their low cost per watt and mechanical flexibility. Efficiencies close to 9.0% have been achieved recently in conjugated polymer based organic solar cells (OSCs). These devices were fabricated using solvent-based processing of electron-donating and electron-accepting materials into the so-called bulk heterojunction (BHJ) architecture. Experimental evidence suggests that a key property determining the power-conversion efficiency of such devices is the final morphological distribution of the donor and acceptor constituents. In order to understand the role of morphology on device performance, we develop a scalable computational framework that efficiently interrogates OSCs to investigate relationships between the morphology at the nano-scale with the device performance. In this work, we extend the Buxton and Clarke model (2007 Modelling Simul. Mater. Sci. Eng. 15 13-26) to simulate realistic devices with complex active layer morphologies using a dimensionally independent, scalable, finite-element method. We incorporate all stages involved in current generation, namely (1) exciton generation and diffusion, (2) charge generation and (3) charge transport in a modular fashion. The numerical challenges encountered during interrogation of realistic microstructures are detailed. We compare each stage of the photovoltaic process for two microstructures: a BHJ morphology and an idealized sawtooth morphology. The results are presented for both two- and three-dimensional structures.
Heterogeneous scalable framework for multiphase flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morris, Karla Vanessa
2013-09-01
Two categories of challenges confront the developer of computational spray models: those related to the computation and those related to the physics. Regarding the computation, the trend towards heterogeneous, multi- and many-core platforms will require considerable re-engineering of codes written for the current supercomputing platforms. Regarding the physics, accurate methods for transferring mass, momentum and energy from the dispersed phase onto the carrier fluid grid have so far eluded modelers. Significant challenges also lie at the intersection between these two categories. To be competitive, any physics model must be expressible in a parallel algorithm that performs well on evolving computermore » platforms. This work created an application based on a software architecture where the physics and software concerns are separated in a way that adds flexibility to both. The develop spray-tracking package includes an application programming interface (API) that abstracts away the platform-dependent parallelization concerns, enabling the scientific programmer to write serial code that the API resolves into parallel processes and threads of execution. The project also developed the infrastructure required to provide similar APIs to other application. The API allow object-oriented Fortran applications direct interaction with Trilinos to support memory management of distributed objects in central processing units (CPU) and graphic processing units (GPU) nodes for applications using C++.« less
Scalable Quantum Networks for Distributed Computing and Sensing
2016-04-01
probabilistic measurement , so we developed quantum memories and guided-wave implementations of same, demonstrating controlled delay of a heralded single...Second, fundamental scalability requires a method to synchronize protocols based on quantum measurements , which are inherently probabilistic. To meet...AFRL-AFOSR-UK-TR-2016-0007 Scalable Quantum Networks for Distributed Computing and Sensing Ian Walmsley THE UNIVERSITY OF OXFORD Final Report 04/01
WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers
NASA Astrophysics Data System (ADS)
Andreeva, J.; Beche, A.; Belov, S.; Kadochnikov, I.; Saiz, P.; Tuckett, D.
2014-06-01
The Worldwide LHC Computing Grid provides resources for the four main virtual organizations. Along with data processing, data distribution is the key computing activity on the WLCG infrastructure. The scale of this activity is very large, the ATLAS virtual organization (VO) alone generates and distributes more than 40 PB of data in 100 million files per year. Another challenge is the heterogeneity of data transfer technologies. Currently there are two main alternatives for data transfers on the WLCG: File Transfer Service and XRootD protocol. Each LHC VO has its own monitoring system which is limited to the scope of that particular VO. There is a need for a global system which would provide a complete cross-VO and cross-technology picture of all WLCG data transfers. We present a unified monitoring tool - WLCG Transfers Dashboard - where all the VOs and technologies coexist and are monitored together. The scale of the activity and the heterogeneity of the system raise a number of technical challenges. Each technology comes with its own monitoring specificities and some of the VOs use several of these technologies. This paper describes the implementation of the system with particular focus on the design principles applied to ensure the necessary scalability and performance, and to easily integrate any new technology providing additional functionality which might be specific to that technology.
High performance computation of radiative transfer equation using the finite element method
NASA Astrophysics Data System (ADS)
Badri, M. A.; Jolivet, P.; Rousseau, B.; Favennec, Y.
2018-05-01
This article deals with an efficient strategy for numerically simulating radiative transfer phenomena using distributed computing. The finite element method alongside the discrete ordinate method is used for spatio-angular discretization of the monochromatic steady-state radiative transfer equation in an anisotropically scattering media. Two very different methods of parallelization, angular and spatial decomposition methods, are presented. To do so, the finite element method is used in a vectorial way. A detailed comparison of scalability, performance, and efficiency on thousands of processors is established for two- and three-dimensional heterogeneous test cases. Timings show that both algorithms scale well when using proper preconditioners. It is also observed that our angular decomposition scheme outperforms our domain decomposition method. Overall, we perform numerical simulations at scales that were previously unattainable by standard radiative transfer equation solvers.
Scalable and Resilient Middleware to Handle Information Exchange during Environment Crisis
NASA Astrophysics Data System (ADS)
Tao, R.; Poslad, S.; Moßgraber, J.; Middleton, S.; Hammitzsch, M.
2012-04-01
The EU FP7 TRIDEC project focuses on enabling real-time, intelligent, information management of collaborative, complex, critical decision processes for earth management. A key challenge is to promote a communication infrastructure to facilitate interoperable environment information services during environment events and crises such as tsunamis and drilling, during which increasing volumes and dimensionality of disparate information sources, including sensor-based and human-based ones, can result, and need to be managed. Such a system needs to support: scalable, distributed messaging; asynchronous messaging; open messaging to handling changing clients such as new and retired automated system and human information sources becoming online or offline; flexible data filtering, and heterogeneous access networks (e.g., GSM, WLAN and LAN). In addition, the system needs to be resilient to handle the ICT system failures, e.g. failure, degradation and overloads, during environment events. There are several system middleware choices for TRIDEC based upon a Service-oriented-architecture (SOA), Event-driven-Architecture (EDA), Cloud Computing, and Enterprise Service Bus (ESB). In an SOA, everything is a service (e.g. data access, processing and exchange); clients can request on demand or subscribe to services registered by providers; more often interaction is synchronous. In an EDA system, events that represent significant changes in state can be processed simply, or as streams or more complexly. Cloud computing is a virtualization, interoperable and elastic resource allocation model. An ESB, a fundamental component for enterprise messaging, supports synchronous and asynchronous message exchange models and has inbuilt resilience against ICT failure. Our middleware proposal is an ESB based hybrid architecture model: an SOA extension supports more synchronous workflows; EDA assists the ESB to handle more complex event processing; Cloud computing can be used to increase and decrease the ESB resources on demand. To reify this hybrid ESB centric architecture, we will adopt two complementary approaches: an open source one for scalability and resilience improvement while a commercial one can be used for ultra-speed messaging, whilst we can bridge between these two to support interoperability. In TRIDEC, to manage such a hybrid messaging system, overlay and underlay management techniques will be adopted. The managers (both global and local) will collect, store and update status information (e.g. CPU utilization, free space, number of clients) and balance the usage, throughput, and delays to improve resilience and scalability. The expected resilience improvement includes dynamic failover, self-healing, pre-emptive load balancing, and bottleneck prediction while the expected improvement for scalability includes capacity estimation, Http Bridge, and automatic configuration and reconfiguration (e.g. add or delete clients and servers).
Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin
2015-10-19
The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.
Scalable asynchronous execution of cellular automata
NASA Astrophysics Data System (ADS)
Folino, Gianluigi; Giordano, Andrea; Mastroianni, Carlo
2016-10-01
The performance and scalability of cellular automata, when executed on parallel/distributed machines, are limited by the necessity of synchronizing all the nodes at each time step, i.e., a node can execute only after the execution of the previous step at all the other nodes. However, these synchronization requirements can be relaxed: a node can execute one step after synchronizing only with the adjacent nodes. In this fashion, different nodes can execute different time steps. This can be a notable advantageous in many novel and increasingly popular applications of cellular automata, such as smart city applications, simulation of natural phenomena, etc., in which the execution times can be different and variable, due to the heterogeneity of machines and/or data and/or executed functions. Indeed, a longer execution time at a node does not slow down the execution at all the other nodes but only at the neighboring nodes. This is particularly advantageous when the nodes that act as bottlenecks vary during the application execution. The goal of the paper is to analyze the benefits that can be achieved with the described asynchronous implementation of cellular automata, when compared to the classical all-to-all synchronization pattern. The performance and scalability have been evaluated through a Petri net model, as this model is very useful to represent the synchronization barrier among nodes. We examined the usual case in which the territory is partitioned into a number of regions, and the computation associated with a region is assigned to a computing node. We considered both the cases of mono-dimensional and two-dimensional partitioning. The results show that the advantage obtained through the asynchronous execution, when compared to the all-to-all synchronous approach is notable, and it can be as large as 90% in terms of speedup.
Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo
2018-02-01
Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
Workflow Management Systems for Molecular Dynamics on Leadership Computers
NASA Astrophysics Data System (ADS)
Wells, Jack; Panitkin, Sergey; Oleynik, Danila; Jha, Shantenu
Molecular Dynamics (MD) simulations play an important role in a range of disciplines from Material Science to Biophysical systems and account for a large fraction of cycles consumed on computing resources. Increasingly science problems require the successful execution of ''many'' MD simulations as opposed to a single MD simulation. There is a need to provide scalable and flexible approaches to the execution of the workload. We present preliminary results on the Titan computer at the Oak Ridge Leadership Computing Facility that demonstrate a general capability to manage workload execution agnostic of a specific MD simulation kernel or execution pattern, and in a manner that integrates disparate grid-based and supercomputing resources. Our results build upon our extensive experience of distributed workload management in the high-energy physics ATLAS project using PanDA (Production and Distributed Analysis System), coupled with recent conceptual advances in our understanding of workload management on heterogeneous resources. We will discuss how we will generalize these initial capabilities towards a more production level service on DOE leadership resources. This research is sponsored by US DOE/ASCR and used resources of the OLCF computing facility.
Open release of the DCA++ project
NASA Astrophysics Data System (ADS)
Haehner, Urs; Solca, Raffaele; Staar, Peter; Alvarez, Gonzalo; Maier, Thomas; Summers, Michael; Schulthess, Thomas
We present the first open release of the DCA++ project, a highly scalable and efficient research code to solve quantum many-body problems with cutting edge quantum cluster algorithms. The implemented dynamical cluster approximation (DCA) and its DCA+ extension with a continuous self-energy capture nonlocal correlations in strongly correlated electron systems thereby allowing insight into high-Tc superconductivity. With the increasing heterogeneity of modern machines, DCA++ provides portable performance on conventional and emerging new architectures, such as hybrid CPU-GPU and Xeon Phi, sustaining multiple petaflops on ORNL's Titan and CSCS' Piz Daint. Moreover, we will describe how best practices in software engineering can be applied to make software development sustainable and scalable in a research group. Software testing and documentation not only prevent productivity collapse, but more importantly, they are necessary for correctness, credibility and reproducibility of scientific results. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF) awarded by the INCITE program, and of the Swiss National Supercomputing Center. OLCF is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Field-Effect Biosensors for On-Site Detection: Recent Advances and Promising Targets.
Choi, Jaebin; Seong, Tae Wha; Jeun, Minhong; Lee, Kwan Hyi
2017-10-01
There is an explosive interest in the immediate and cost-effective analysis of field-collected biological samples, as many advanced biodetection tools are highly sensitive, yet immobile. On-site biosensors are portable and convenient sensors that provide detection results at the point of care. They are designed to secure precision in highly ionic and heterogeneous solutions with minimal hardware. Among various methods that are capable of such analysis, field-effect biosensors are promising candidates due to their unique sensitivity, manufacturing scalability, and integrability with computational circuitry. Recent developments in nanotechnological surface modification show promising results in sensing from blood, serum, and urine. This report gives a particular emphasis on the on-site efficacy of recently published field-effect biosensors, specifically, detection limits in physiological solutions, response times, and scalability. The survey of the properties and existing detection methods of four promising biotargets, exosomes, bacteria, viruses, and metabolites, aims at providing a roadmap for future field-effect and other on-site biosensors. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hadoop-MCC: Efficient Multiple Compound Comparison Algorithm Using Hadoop.
Hua, Guan-Jie; Hung, Che-Lun; Tang, Chuan Yi
2018-01-01
In the past decade, the drug design technologies have been improved enormously. The computer-aided drug design (CADD) has played an important role in analysis and prediction in drug development, which makes the procedure more economical and efficient. However, computation with big data, such as ZINC containing more than 60 million compounds data and GDB-13 with more than 930 million small molecules, is a noticeable issue of time-consuming problem. Therefore, we propose a novel heterogeneous high performance computing method, named as Hadoop-MCC, integrating Hadoop and GPU, to copy with big chemical structure data efficiently. Hadoop-MCC gains the high availability and fault tolerance from Hadoop, as Hadoop is used to scatter input data to GPU devices and gather the results from GPU devices. Hadoop framework adopts mapper/reducer computation model. In the proposed method, mappers response for fetching SMILES data segments and perform LINGO method on GPU, then reducers collect all comparison results produced by mappers. Due to the high availability of Hadoop, all of LINGO computational jobs on mappers can be completed, even if some of the mappers encounter problems. A comparison of LINGO is performed on each the GPU device in parallel. According to the experimental results, the proposed method on multiple GPU devices can achieve better computational performance than the CUDA-MCC on a single GPU device. Hadoop-MCC is able to achieve scalability, high availability, and fault tolerance granted by Hadoop, and high performance as well by integrating computational power of both of Hadoop and GPU. It has been shown that using the heterogeneous architecture as Hadoop-MCC effectively can enhance better computational performance than on a single GPU device. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Low-complexity transcoding algorithm from H.264/AVC to SVC using data mining
NASA Astrophysics Data System (ADS)
Garrido-Cantos, Rosario; De Cock, Jan; Martínez, Jose Luis; Van Leuven, Sebastian; Cuenca, Pedro; Garrido, Antonio
2013-12-01
Nowadays, networks and terminals with diverse characteristics of bandwidth and capabilities coexist. To ensure a good quality of experience, this diverse environment demands adaptability of the video stream. In general, video contents are compressed to save storage capacity and to reduce the bandwidth required for its transmission. Therefore, if these compressed video streams were compressed using scalable video coding schemes, they would be able to adapt to those heterogeneous networks and a wide range of terminals. Since the majority of the multimedia contents are compressed using H.264/AVC, they cannot benefit from that scalability. This paper proposes a low-complexity algorithm to convert an H.264/AVC bitstream without scalability to scalable bitstreams with temporal scalability in baseline and main profiles by accelerating the mode decision task of the scalable video coding encoding stage using machine learning tools. The results show that when our technique is applied, the complexity is reduced by 87% while maintaining coding efficiency.
Modeling Cardiac Electrophysiology at the Organ Level in the Peta FLOPS Computing Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitchell, Lawrence; Bishop, Martin; Hoetzl, Elena
2010-09-30
Despite a steep increase in available compute power, in-silico experimentation with highly detailed models of the heart remains to be challenging due to the high computational cost involved. It is hoped that next generation high performance computing (HPC) resources lead to significant reductions in execution times to leverage a new class of in-silico applications. However, performance gains with these new platforms can only be achieved by engaging a much larger number of compute cores, necessitating strongly scalable numerical techniques. So far strong scalability has been demonstrated only for a moderate number of cores, orders of magnitude below the range requiredmore » to achieve the desired performance boost.In this study, strong scalability of currently used techniques to solve the bidomain equations is investigated. Benchmark results suggest that scalability is limited to 512-4096 cores within the range of relevant problem sizes even when systems are carefully load-balanced and advanced IO strategies are employed.« less
A scalable healthcare information system based on a service-oriented architecture.
Yang, Tzu-Hsiang; Sun, Yeali S; Lai, Feipei
2011-06-01
Many existing healthcare information systems are composed of a number of heterogeneous systems and face the important issue of system scalability. This paper first describes the comprehensive healthcare information systems used in National Taiwan University Hospital (NTUH) and then presents a service-oriented architecture (SOA)-based healthcare information system (HIS) based on the service standard HL7. The proposed architecture focuses on system scalability, in terms of both hardware and software. Moreover, we describe how scalability is implemented in rightsizing, service groups, databases, and hardware scalability. Although SOA-based systems sometimes display poor performance, through a performance evaluation of our HIS based on SOA, the average response time for outpatient, inpatient, and emergency HL7Central systems are 0.035, 0.04, and 0.036 s, respectively. The outpatient, inpatient, and emergency WebUI average response times are 0.79, 1.25, and 0.82 s. The scalability of the rightsizing project and our evaluation results show that the SOA HIS we propose provides evidence that SOA can provide system scalability and sustainability in a highly demanding healthcare information system.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samfass, Philipp
LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. Onemore » of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).« less
Distributed MRI reconstruction using Gadgetron-based cloud computing.
Xue, Hui; Inati, Souheil; Sørensen, Thomas Sangild; Kellman, Peter; Hansen, Michael S
2015-03-01
To expand the open source Gadgetron reconstruction framework to support distributed computing and to demonstrate that a multinode version of the Gadgetron can be used to provide nonlinear reconstruction with clinically acceptable latency. The Gadgetron framework was extended with new software components that enable an arbitrary number of Gadgetron instances to collaborate on a reconstruction task. This cloud-enabled version of the Gadgetron was deployed on three different distributed computing platforms ranging from a heterogeneous collection of commodity computers to the commercial Amazon Elastic Compute Cloud. The Gadgetron cloud was used to provide nonlinear, compressed sensing reconstruction on a clinical scanner with low reconstruction latency (eg, cardiac and neuroimaging applications). The proposed setup was able to handle acquisition and 11 -SPIRiT reconstruction of nine high temporal resolution real-time, cardiac short axis cine acquisitions, covering the ventricles for functional evaluation, in under 1 min. A three-dimensional high-resolution brain acquisition with 1 mm(3) isotropic pixel size was acquired and reconstructed with nonlinear reconstruction in less than 5 min. A distributed computing enabled Gadgetron provides a scalable way to improve reconstruction performance using commodity cluster computing. Nonlinear, compressed sensing reconstruction can be deployed clinically with low image reconstruction latency. © 2014 Wiley Periodicals, Inc.
Siksik, May; Krishnamurthy, Vikram
2017-09-01
This paper proposes a multi-dielectric Brownian dynamics simulation framework for design-space-exploration (DSE) studies of ion-channel permeation. The goal of such DSE studies is to estimate the channel modeling-parameters that minimize the mean-squared error between the simulated and expected "permeation characteristics." To address this computational challenge, we use a methodology based on statistical inference that utilizes the knowledge of channel structure to prune the design space. We demonstrate the proposed framework and DSE methodology using a case study based on the KcsA ion channel, in which the design space is successfully reduced from a 6-D space to a 2-D space. Our results show that the channel dielectric map computed using the framework matches with that computed directly using molecular dynamics with an error of 7%. Finally, the scalability and resolution of the model used are explored, and it is shown that the memory requirements needed for DSE remain constant as the number of parameters (degree of heterogeneity) increases.
Near-Nash targeting strategies for heterogeneous teams of autonomous combat vehicles
NASA Astrophysics Data System (ADS)
Galati, David G.; Simaan, Marwan A.
2008-04-01
Military strategists are currently seeking methodologies to control large numbers of autonomous assets. Automated planners based upon the Nash equilibrium concept in non-zero sum games are one option. Because such planners inherently consider possible adversarial actions, assets are able to adapt to, and to some extent predict, potential enemy actions. However, these planners must function properly both in cases in which a pure Nash strategy does not exist and in scenarios possessing multiple Nash equilibria. Another issue that needs to be overcome is the scalability of the Nash equilibrium. That is, as the dimensionality of the problem increases, the Nash strategies become unfeasible to compute using traditional methodologies. In this paper we introduce the concept of near-Nash strategies as a mechanism to overcome these difficulties. We then illustrate this concept by deriving the near-Nash strategies and using these strategies as the basis for an intelligent battle plan for heterogeneous teams of autonomous combat air vehicles in the Multi-Team Dynamic Weapon Target Assignment model.
LDRD final report : leveraging multi-way linkages on heterogeneous data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dunlavy, Daniel M.; Kolda, Tamara Gibson
2010-09-01
This report is a summary of the accomplishments of the 'Leveraging Multi-way Linkages on Heterogeneous Data' which ran from FY08 through FY10. The goal was to investigate scalable and robust methods for multi-way data analysis. We developed a new optimization-based method called CPOPT for fitting a particular type of tensor factorization to data; CPOPT was compared against existing methods and found to be more accurate than any faster method and faster than any equally accurate method. We extended this method to computing tensor factorizations for problems with incomplete data; our results show that you can recover scientifically meaningfully factorizations withmore » large amounts of missing data (50% or more). The project has involved 5 members of the technical staff, 2 postdocs, and 1 summer intern. It has resulted in a total of 13 publications, 2 software releases, and over 30 presentations. Several follow-on projects have already begun, with more potential projects in development.« less
NASA Astrophysics Data System (ADS)
Augustin, Christoph M.; Neic, Aurel; Liebmann, Manfred; Prassl, Anton J.; Niederer, Steven A.; Haase, Gundolf; Plank, Gernot
2016-01-01
Electromechanical (EM) models of the heart have been used successfully to study fundamental mechanisms underlying a heart beat in health and disease. However, in all modeling studies reported so far numerous simplifications were made in terms of representing biophysical details of cellular function and its heterogeneity, gross anatomy and tissue microstructure, as well as the bidirectional coupling between electrophysiology (EP) and tissue distension. One limiting factor is the employed spatial discretization methods which are not sufficiently flexible to accommodate complex geometries or resolve heterogeneities, but, even more importantly, the limited efficiency of the prevailing solver techniques which is not sufficiently scalable to deal with the incurring increase in degrees of freedom (DOF) when modeling cardiac electromechanics at high spatio-temporal resolution. This study reports on the development of a novel methodology for solving the nonlinear equation of finite elasticity using human whole organ models of cardiac electromechanics, discretized at a high para-cellular resolution. Three patient-specific, anatomically accurate, whole heart EM models were reconstructed from magnetic resonance (MR) scans at resolutions of 220 μm, 440 μm and 880 μm, yielding meshes of approximately 184.6, 24.4 and 3.7 million tetrahedral elements and 95.9, 13.2 and 2.1 million displacement DOF, respectively. The same mesh was used for discretizing the governing equations of both electrophysiology (EP) and nonlinear elasticity. A novel algebraic multigrid (AMG) preconditioner for an iterative Krylov solver was developed to deal with the resulting computational load. The AMG preconditioner was designed under the primary objective of achieving favorable strong scaling characteristics for both setup and solution runtimes, as this is key for exploiting current high performance computing hardware. Benchmark results using the 220 μm, 440 μm and 880 μm meshes demonstrate efficient scaling up to 1024, 4096 and 8192 compute cores which allowed the simulation of a single heart beat in 44.3, 87.8 and 235.3 minutes, respectively. The efficiency of the method allows fast simulation cycles without compromising anatomical or biophysical detail.
Augustin, Christoph M.; Neic, Aurel; Liebmann, Manfred; Prassl, Anton J.; Niederer, Steven A.; Haase, Gundolf; Plank, Gernot
2016-01-01
Electromechanical (EM) models of the heart have been used successfully to study fundamental mechanisms underlying a heart beat in health and disease. However, in all modeling studies reported so far numerous simplifications were made in terms of representing biophysical details of cellular function and its heterogeneity, gross anatomy and tissue microstructure, as well as the bidirectional coupling between electrophysiology (EP) and tissue distension. One limiting factor is the employed spatial discretization methods which are not sufficiently flexible to accommodate complex geometries or resolve heterogeneities, but, even more importantly, the limited efficiency of the prevailing solver techniques which are not sufficiently scalable to deal with the incurring increase in degrees of freedom (DOF) when modeling cardiac electromechanics at high spatio-temporal resolution. This study reports on the development of a novel methodology for solving the nonlinear equation of finite elasticity using human whole organ models of cardiac electromechanics, discretized at a high para-cellular resolution. Three patient-specific, anatomically accurate, whole heart EM models were reconstructed from magnetic resonance (MR) scans at resolutions of 220 μm, 440 μm and 880 μm, yielding meshes of approximately 184.6, 24.4 and 3.7 million tetrahedral elements and 95.9, 13.2 and 2.1 million displacement DOF, respectively. The same mesh was used for discretizing the governing equations of both electrophysiology (EP) and nonlinear elasticity. A novel algebraic multigrid (AMG) preconditioner for an iterative Krylov solver was developed to deal with the resulting computational load. The AMG preconditioner was designed under the primary objective of achieving favorable strong scaling characteristics for both setup and solution runtimes, as this is key for exploiting current high performance computing hardware. Benchmark results using the 220 μm, 440 μm and 880 μm meshes demonstrate efficient scaling up to 1024, 4096 and 8192 compute cores which allowed the simulation of a single heart beat in 44.3, 87.8 and 235.3 minutes, respectively. The efficiency of the method allows fast simulation cycles without compromising anatomical or biophysical detail. PMID:26819483
NASA Astrophysics Data System (ADS)
Frey, Davide; Guerraoui, Rachid; Kermarrec, Anne-Marie; Koldehofe, Boris; Mogensen, Martin; Monod, Maxime; Quéma, Vivien
Gossip-based information dissemination protocols are considered easy to deploy, scalable and resilient to network dynamics. Load-balancing is inherent in these protocols as the dissemination work is evenly spread among all nodes. Yet, large-scale distributed systems are usually heterogeneous with respect to network capabilities such as bandwidth. In practice, a blind load-balancing strategy might significantly hamper the performance of the gossip dissemination.
A Semantic Big Data Platform for Integrating Heterogeneous Wearable Data in Healthcare.
Mezghani, Emna; Exposito, Ernesto; Drira, Khalil; Da Silveira, Marcos; Pruski, Cédric
2015-12-01
Advances supported by emerging wearable technologies in healthcare promise patients a provision of high quality of care. Wearable computing systems represent one of the most thrust areas used to transform traditional healthcare systems into active systems able to continuously monitor and control the patients' health in order to manage their care at an early stage. However, their proliferation creates challenges related to data management and integration. The diversity and variety of wearable data related to healthcare, their huge volume and their distribution make data processing and analytics more difficult. In this paper, we propose a generic semantic big data architecture based on the "Knowledge as a Service" approach to cope with heterogeneity and scalability challenges. Our main contribution focuses on enriching the NIST Big Data model with semantics in order to smartly understand the collected data, and generate more accurate and valuable information by correlating scattered medical data stemming from multiple wearable devices or/and from other distributed data sources. We have implemented and evaluated a Wearable KaaS platform to smartly manage heterogeneous data coming from wearable devices in order to assist the physicians in supervising the patient health evolution and keep the patient up-to-date about his/her status.
Theoretical foundation for measuring the groundwater age distribution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, William Payton; Arnold, Bill Walter
2014-01-01
In this study, we use PFLOTRAN, a highly scalable, parallel, flow and reactive transport code to simulate the concentrations of 3H, 3He, CFC-11, CFC-12, CFC-113, SF6, 39Ar, 81Kr, 4He and themean groundwater age in heterogeneous fields on grids with an excess of 10 million nodes. We utilize this computational platform to simulate the concentration of multiple tracers in high-resolution, heterogeneous 2-D and 3-D domains, and calculate tracer-derived ages. Tracer-derived ages show systematic biases toward younger ages when the groundwater age distribution contains water older than the maximum tracer age. The deviation of the tracer-derived age distribution from the true groundwatermore » age distribution increases with increasing heterogeneity of the system. However, the effect of heterogeneity is diminished as the mean travel time gets closer the tracer age limit. Age distributions in 3-D domains differ significantly from 2-D domains. 3D simulations show decreased mean age, and less variance in age distribution for identical heterogeneity statistics. High-performance computing allows for investigation of tracer and groundwater age systematics in high-resolution domains, providing a platform for understanding and utilizing environmental tracer and groundwater age information in heterogeneous 3-D systems. Groundwater environmental tracers can provide important constraints for the calibration of groundwater flow models. Direct simulation of environmental tracer concentrations in models has the additional advantage of avoiding assumptions associated with using calculated groundwater age values. This study quantifies model uncertainty reduction resulting from the addition of environmental tracer concentration data. The analysis uses a synthetic heterogeneous aquifer and the calibration of a flow and transport model using the pilot point method. Results indicate a significant reduction in the uncertainty in permeability with the addition of environmental tracer data, relative to the use of hydraulic measurements alone. Anthropogenic tracers and their decay products, such as CFC11, 3H, and 3He, provide significant constraint oninput permeability values in the model. Tracer data for 39Ar provide even more complete information on the heterogeneity of permeability and variability in the flow system than the anthropogenic tracers, leading to greater parameter uncertainty reduction.« less
NASA Astrophysics Data System (ADS)
Tolba, Khaled Ibrahim; Morgenthal, Guido
2018-01-01
This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
NASA Astrophysics Data System (ADS)
Baynes, K.; Gilman, J.; Pilone, D.; Mitchell, A. E.
2015-12-01
The NASA EOSDIS (Earth Observing System Data and Information System) Common Metadata Repository (CMR) is a continuously evolving metadata system that merges all existing capabilities and metadata from EOS ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) systems. This flagship catalog has been developed with several key requirements: fast search and ingest performance ability to integrate heterogenous external inputs and outputs high availability and resiliency scalability evolvability and expandability This talk will focus on the advantages and potential challenges of tackling these requirements using a microservices architecture, which decomposes system functionality into smaller, loosely-coupled, individually-scalable elements that communicate via well-defined APIs. In addition, time will be spent examining specific elements of the CMR architecture and identifying opportunities for future integrations.
Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM
2009-03-17
A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lorenz, Daniel; Wolf, Felix
2016-02-17
The PRIMA-X (Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing) project is the successor of the DOE PRIMA (Performance Refactoring of Instrumentation, Measurement, and Analysis Technologies for Petascale Computing) project, which addressed the challenge of creating a core measurement infrastructure that would serve as a common platform for both integrating leading parallel performance systems (notably TAU and Scalasca) and developing next-generation scalable performance tools. The PRIMA-X project shifts the focus away from refactorization of robust performance tools towards a re-targeting of the parallel performance measurement and analysis architecture for extreme scales. The massive concurrency, asynchronous execution dynamics,more » hardware heterogeneity, and multi-objective prerequisites (performance, power, resilience) that identify exascale systems introduce fundamental constraints on the ability to carry forward existing performance methodologies. In particular, there must be a deemphasis of per-thread observation techniques to significantly reduce the otherwise unsustainable flood of redundant performance data. Instead, it will be necessary to assimilate multi-level resource observations into macroscopic performance views, from which resilient performance metrics can be attributed to the computational features of the application. This requires a scalable framework for node-level and system-wide monitoring and runtime analyses of dynamic performance information. Also, the interest in optimizing parallelism parameters with respect to performance and energy drives the integration of tool capabilities in the exascale environment further. Initially, PRIMA-X was a collaborative project between the University of Oregon (lead institution) and the German Research School for Simulation Sciences (GRS). Because Prof. Wolf, the PI at GRS, accepted a position as full professor at Technische Universität Darmstadt (TU Darmstadt) starting February 1st, 2015, the project ended at GRS on January 31st, 2015. This report reflects the work accomplished at GRS until then. The work of GRS is expected to be continued at TU Darmstadt. The first main accomplishment of GRS is the design of different thread-level aggregation techniques. We created a prototype capable of aggregating the thread-level information in performance profiles using these techniques. The next step will be the integration of the most promising techniques into the Score-P measurement system and their evaluation. The second main accomplishment is a substantial increase of Score-P’s scalability, achieved by improving the design of the system-tree representation in Score-P’s profile format. We developed a new representation and a distributed algorithm to create the scalable system tree representation. Finally, we developed a lightweight approach to MPI wait-state profiling. Former algorithms either needed piggy-backing, which can cause significant runtime overhead, or tracing, which comes with its own set of scaling challenges. Our approach works with local data only and, thus, is scalable and has very little overhead.« less
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
This proposal develops scalable R / Bioconductor software infrastructure and data resources to integrate complex, heterogeneous, and large cancer genomic experiments. The falling cost of genomic assays facilitates collection of multiple data types (e.g., gene and transcript expression, structural variation, copy number, methylation, and microRNA data) from a set of clinical specimens. Furthermore, substantial resources are now available from large consortium activities like The Cancer Genome Atlas (TCGA).
Research on distributed heterogeneous data PCA algorithm based on cloud platform
NASA Astrophysics Data System (ADS)
Zhang, Jin; Huang, Gang
2018-05-01
Principal component analysis (PCA) of heterogeneous data sets can solve the problem that centralized data scalability is limited. In order to reduce the generation of intermediate data and error components of distributed heterogeneous data sets, a principal component analysis algorithm based on heterogeneous data sets under cloud platform is proposed. The algorithm performs eigenvalue processing by using Householder tridiagonalization and QR factorization to calculate the error component of the heterogeneous database associated with the public key to obtain the intermediate data set and the lost information. Experiments on distributed DBM heterogeneous datasets show that the model method has the feasibility and reliability in terms of execution time and accuracy.
A novel processing platform for post tape out flows
NASA Astrophysics Data System (ADS)
Vu, Hien T.; Kim, Soohong; Word, James; Cai, Lynn Y.
2018-03-01
As the computational requirements for post tape out (PTO) flows increase at the 7nm and below technology nodes, there is a need to increase the scalability of the computational tools in order to reduce the turn-around time (TAT) of the flows. Utilization of design hierarchy has been one proven method to provide sufficient partitioning to enable PTO processing. However, as the data is processed through the PTO flow, its effective hierarchy is reduced. The reduction is necessary to achieve the desired accuracy. Also, the sequential nature of the PTO flow is inherently non-scalable. To address these limitations, we are proposing a quasi-hierarchical solution that combines multiple levels of parallelism to increase the scalability of the entire PTO flow. In this paper, we describe the system and present experimental results demonstrating the runtime reduction through scalable processing with thousands of computational cores.
Applying a cloud computing approach to storage architectures for spacecraft
NASA Astrophysics Data System (ADS)
Baldor, Sue A.; Quiroz, Carlos; Wood, Paul
As sensor technologies, processor speeds, and memory densities increase, spacecraft command, control, processing, and data storage systems have grown in complexity to take advantage of these improvements and expand the possible missions of spacecraft. Spacecraft systems engineers are increasingly looking for novel ways to address this growth in complexity and mitigate associated risks. Looking to conventional computing, many solutions have been executed to solve both the problem of complexity and heterogeneity in systems. In particular, the cloud-based paradigm provides a solution for distributing applications and storage capabilities across multiple platforms. In this paper, we propose utilizing a cloud-like architecture to provide a scalable mechanism for providing mass storage in spacecraft networks that can be reused on multiple spacecraft systems. By presenting a consistent interface to applications and devices that request data to be stored, complex systems designed by multiple organizations may be more readily integrated. Behind the abstraction, the cloud storage capability would manage wear-leveling, power consumption, and other attributes related to the physical memory devices, critical components in any mass storage solution for spacecraft. Our approach employs SpaceWire networks and SpaceWire-capable devices, although the concept could easily be extended to non-heterogeneous networks consisting of multiple spacecraft and potentially the ground segment.
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
Scalable Robust Principal Component Analysis Using Grassmann Averages.
Hauberg, Sren; Feragen, Aasa; Enficiaud, Raffi; Black, Michael J
2016-11-01
In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average ( GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average ( TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
Unbiased, scalable sampling of protein loop conformations from probabilistic priors.
Zhang, Yajia; Hauser, Kris
2013-01-01
Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.
Unbiased, scalable sampling of protein loop conformations from probabilistic priors
2013-01-01
Background Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Results Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Conclusion Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion. PMID:24565175
Innovative HPC architectures for the study of planetary plasma environments
NASA Astrophysics Data System (ADS)
Amaya, Jorge; Wolf, Anna; Lembège, Bertrand; Zitz, Anke; Alvarez, Damian; Lapenta, Giovanni
2016-04-01
DEEP-ER is an European Commission founded project that develops a new type of High Performance Computer architecture. The revolutionary system is currently used by KU Leuven to study the effects of the solar wind on the global environments of the Earth and Mercury. The new architecture combines the versatility of Intel Xeon computing nodes with the power of the upcoming Intel Xeon Phi accelerators. Contrary to classical heterogeneous HPC architectures, where it is customary to find CPU and accelerators in the same computing nodes, in the DEEP-ER system CPU nodes are grouped together (Cluster) and independently from the accelerator nodes (Booster). The system is equipped with a state of the art interconnection network, a highly scalable and fast I/O and a fail recovery resiliency system. The final objective of the project is to introduce a scalable system that can be used to create the next generation of exascale supercomputers. The code iPic3D from KU Leuven is being adapted to this new architecture. This particle-in-cell code can now perform the computation of the electromagnetic fields in the Cluster while the particles are moved in the Booster side. Using fast and scalable Xeon Phi accelerators in the Booster we can introduce many more particles per cell in the simulation than what is possible in the current generation of HPC systems, allowing to calculate fully kinetic plasmas with very low interpolation noise. The system will be used to perform fully kinetic, low noise, 3D simulations of the interaction of the solar wind with the magnetosphere of the Earth and Mercury. Preliminary simulations have been performed in other HPC centers in order to compare the results in different systems. In this presentation we show the complexity of the plasma flow around the planets, including the development of hydrodynamic instabilities at the flanks, the presence of the collision-less shock, the magnetosheath, the magnetopause, reconnection zones, the formation of the plasma sheet and the magnetotail, and the variation of ion/electron plasma flows when crossing these frontiers. The simulations also give access to detailed information about the particle dynamics and their velocity distribution at locations that can be used for comparison with satellite data.
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
NASA Astrophysics Data System (ADS)
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Wanted: Scalable Tracers for Diffusion Measurements
2015-01-01
Scalable tracers are potentially a useful tool to examine diffusion mechanisms and to predict diffusion coefficients, particularly for hindered diffusion in complex, heterogeneous, or crowded systems. Scalable tracers are defined as a series of tracers varying in size but with the same shape, structure, surface chemistry, deformability, and diffusion mechanism. Both chemical homology and constant dynamics are required. In particular, branching must not vary with size, and there must be no transition between ordinary diffusion and reptation. Measurements using scalable tracers yield the mean diffusion coefficient as a function of size alone; measurements using nonscalable tracers yield the variation due to differences in the other properties. Candidate scalable tracers are discussed for two-dimensional (2D) diffusion in membranes and three-dimensional diffusion in aqueous solutions. Correlations to predict the mean diffusion coefficient of globular biomolecules from molecular mass are reviewed briefly. Specific suggestions for the 3D case include the use of synthetic dendrimers or random hyperbranched polymers instead of dextran and the use of core–shell quantum dots. Another useful tool would be a series of scalable tracers varying in deformability alone, prepared by varying the density of crosslinking in a polymer to make say “reinforced Ficoll” or “reinforced hyperbranched polyglycerol.” PMID:25319586
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
2012-04-21
the photoelectric effect. The typical shortest wavelengths needed for ion traps range from 194 nm for Hg+ to 493 nm for Ba +, corresponding to 6.4-2.5...REPORT Comprehensive Materials and Morphologies Study of Ion Traps (COMMIT) for scalable Quantum Computation - Final Report 14. ABSTRACT 16. SECURITY...CLASSIFICATION OF: Trapped ion systems, are extremely promising for large-scale quantum computation, but face a vexing problem, with motional quantum
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
Wang, Bei; Ethier, Stephane; Tang, William; ...
2017-06-29
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Seekhao, Nuttiiya; Shung, Caroline; JaJa, Joseph; Mongeau, Luc; Li-Jessen, Nicole Y K
2016-05-01
We present an efficient and scalable scheme for implementing agent-based modeling (ABM) simulation with In Situ visualization of large complex systems on heterogeneous computing platforms. The scheme is designed to make optimal use of the resources available on a heterogeneous platform consisting of a multicore CPU and a GPU, resulting in minimal to no resource idle time. Furthermore, the scheme was implemented under a client-server paradigm that enables remote users to visualize and analyze simulation data as it is being generated at each time step of the model. Performance of a simulation case study of vocal fold inflammation and wound healing with 3.8 million agents shows 35× and 7× speedup in execution time over single-core and multi-core CPU respectively. Each iteration of the model took less than 200 ms to simulate, visualize and send the results to the client. This enables users to monitor the simulation in real-time and modify its course as needed.
Uncertainty analyses of CO2 plume expansion subsequent to wellbore CO2 leakage into aquifers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hou, Zhangshuan; Bacon, Diana H.; Engel, David W.
2014-08-01
In this study, we apply an uncertainty quantification (UQ) framework to CO2 sequestration problems. In one scenario, we look at the risk of wellbore leakage of CO2 into a shallow unconfined aquifer in an urban area; in another scenario, we study the effects of reservoir heterogeneity on CO2 migration. We combine various sampling approaches (quasi-Monte Carlo, probabilistic collocation, and adaptive sampling) in order to reduce the number of forward calculations while trying to fully explore the input parameter space and quantify the input uncertainty. The CO2 migration is simulated using the PNNL-developed simulator STOMP-CO2e (the water-salt-CO2 module). For computationally demandingmore » simulations with 3D heterogeneity fields, we combined the framework with a scalable version module, eSTOMP, as the forward modeling simulator. We built response curves and response surfaces of model outputs with respect to input parameters, to look at the individual and combined effects, and identify and rank the significance of the input parameters.« less
Scalable cloud without dedicated storage
NASA Astrophysics Data System (ADS)
Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.
2015-05-01
We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
The Scalable Checkpoint/Restart Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A.
The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
TomoMiner and TomoMinerCloud: A software platform for large-scale subtomogram structural analysis
Frazier, Zachary; Xu, Min; Alber, Frank
2017-01-01
SUMMARY Cryo-electron tomography (cryoET) captures the 3D electron density distribution of macromolecular complexes in close to native state. With the rapid advance of cryoET acquisition technologies, it is possible to generate large numbers (>100,000) of subtomograms, each containing a macromolecular complex. Often, these subtomograms represent a heterogeneous sample due to variations in structure and composition of a complex in situ form or because particles are a mixture of different complexes. In this case subtomograms must be classified. However, classification of large numbers of subtomograms is a time-intensive task and often a limiting bottleneck. This paper introduces an open source software platform, TomoMiner, for large-scale subtomogram classification, template matching, subtomogram averaging, and alignment. Its scalable and robust parallel processing allows efficient classification of tens to hundreds of thousands of subtomograms. Additionally, TomoMiner provides a pre-configured TomoMinerCloud computing service permitting users without sufficient computing resources instant access to TomoMiners high-performance features. PMID:28552576
Towards Scalable Graph Computation on Mobile Devices.
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2014-10-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.
Towards Scalable Graph Computation on Mobile Devices
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2015-01-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach. PMID:25859564
Bussery, Justin; Denis, Leslie-Alexandre; Guillon, Benjamin; Liu, Pengfeï; Marchetti, Gino; Rahal, Ghita
2018-04-01
We describe the genesis, design and evolution of a computing platform designed and built to improve the success rate of biomedical translational research. The eTRIKS project platform was developed with the aim of building a platform that can securely host heterogeneous types of data and provide an optimal environment to run tranSMART analytical applications. Many types of data can now be hosted, including multi-OMICS data, preclinical laboratory data and clinical information, including longitudinal data sets. During the last two years, the platform has matured into a robust translational research knowledge management system that is able to host other data mining applications and support the development of new analytical tools. Copyright © 2018 Elsevier Ltd. All rights reserved.
Scalable parallel distance field construction for large-scale applications
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan -Liu; ...
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. Anew distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking overtime, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate itsmore » efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. In conclusion, our work greatly extends the usability of distance fields for demanding applications.« less
Scalable Parallel Distance Field Construction for Large-Scale Applications.
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan-Liu; Kolla, Hemanth; Chen, Jacqueline H
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data
NASA Astrophysics Data System (ADS)
Li, Z.; Hodgson, M.; Li, W.
2016-12-01
Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
UTM Safely Enabling UAS Operations in Low-Altitude Airspace
NASA Technical Reports Server (NTRS)
Kopardekar, Parimal
2017-01-01
Conduct research, development and testing to identify airspace operations requirements to enable large-scale visual and beyond visual line of sight UAS operations in the low-altitude airspace. Use build-a-little-test-a-little strategy remote areas to urban areas Low density: No traffic management required but understanding of airspace constraints. Cooperative traffic management: Understanding of airspace constraints and other operations. Manned and unmanned traffic management: Scalable and heterogeneous operations. UTM construct consistent with FAAs risk-based strategy. UTM research platform is used for simulations and tests. UTM offers path towards scalability.
UTM Safely Enabling UAS Operations in Low-Altitude Airspace
NASA Technical Reports Server (NTRS)
Kopardekar, Parimal H.
2016-01-01
Conduct research, development and testing to identify airspace operations requirements to enable large-scale visual and beyond visual line of sight UAS operations in the low-altitude airspace. Use build-a-little-test-a-little strategy remote areas to urban areas Low density: No traffic management required but understanding of airspace constraints. Cooperative traffic management: Understanding of airspace constraints and other operations. Manned and unmanned traffic management: Scalable and heterogeneous operations. UTM construct consistent with FAAs risk-based strategy. UTM research platform is used for simulations and tests. UTM offers path towards scalability.
Toward Scalable Trustworthy Computing Using the Human-Physiology-Immunity Metaphor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hively, Lee M; Sheldon, Frederick T
The cybersecurity landscape consists of an ad hoc patchwork of solutions. Optimal cybersecurity is difficult for various reasons: complexity, immense data and processing requirements, resource-agnostic cloud computing, practical time-space-energy constraints, inherent flaws in 'Maginot Line' defenses, and the growing number and sophistication of cyberattacks. This article defines the high-priority problems and examines the potential solution space. In that space, achieving scalable trustworthy computing and communications is possible through real-time knowledge-based decisions about cyber trust. This vision is based on the human-physiology-immunity metaphor and the human brain's ability to extract knowledge from data and information. The article outlines future steps towardmore » scalable trustworthy systems requiring a long-term commitment to solve the well-known challenges.« less
Scalable digital hardware for a trapped ion quantum computer
NASA Astrophysics Data System (ADS)
Mount, Emily; Gaultney, Daniel; Vrijsen, Geert; Adams, Michael; Baek, So-Young; Hudek, Kai; Isabella, Louis; Crain, Stephen; van Rynbach, Andre; Maunz, Peter; Kim, Jungsang
2016-12-01
Many of the challenges of scaling quantum computer hardware lie at the interface between the qubits and the classical control signals used to manipulate them. Modular ion trap quantum computer architectures address scalability by constructing individual quantum processors interconnected via a network of quantum communication channels. Successful operation of such quantum hardware requires a fully programmable classical control system capable of frequency stabilizing the continuous wave lasers necessary for loading, cooling, initialization, and detection of the ion qubits, stabilizing the optical frequency combs used to drive logic gate operations on the ion qubits, providing a large number of analog voltage sources to drive the trap electrodes, and a scheme for maintaining phase coherence among all the controllers that manipulate the qubits. In this work, we describe scalable solutions to these hardware development challenges.
Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform.
Carranza, Cesar; Llamocca, Daniel; Pattichis, Marios
2016-01-01
The discrete periodic radon transform (DPRT) has extensively been used in applications that involve image reconstructions from projections. Beyond classic applications, the DPRT can also be used to compute fast convolutions that avoids the use of floating-point arithmetic associated with the use of the fast Fourier transform. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions and the need for a large number of memory accesses. This paper introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: a parallel array of fixed-point adder trees; circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees; an image block-based approach to DPRT computation that can fit the proposed architecture to available resources; and fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an N × N image (N prime), the proposed approach can compute up to N(2) additions per clock cycle. Compared with the previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a 251×251 image, for approximately 25% fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized just 2N + ⌈log(2) N⌉ + 1 and 2N + 3 ⌈log(2) N⌉ + B + 2 cycles, architectures that can compute the DPRT and its inverse in respectively, where B is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-b additions than for the systolic implementation and provides a tradeoff between speed and additional 1-b additions. All of the proposed DPRT architectures were implemented in VHSIC Hardware Description Language (VHDL) and validated using an Field-Programmable Gate Array (FPGA) implementation.
Scalable Optical-Fiber Communication Networks
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Peterson, John C.
1993-01-01
Scalable arbitrary fiber extension network (SAFEnet) is conceptual fiber-optic communication network passing digital signals among variety of computers and input/output devices at rates from 200 Mb/s to more than 100 Gb/s. Intended for use with very-high-speed computers and other data-processing and communication systems in which message-passing delays must be kept short. Inherent flexibility makes it possible to match performance of network to computers by optimizing configuration of interconnections. In addition, interconnections made redundant to provide tolerance to faults.
Blueprint for a microwave trapped ion quantum computer.
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K
2017-02-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Loikith, P.; Lee, H.; McGibbney, L. J.; Whitehall, K. D.
2014-12-01
Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark. Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk, and makes iterative algorithms feasible. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 100 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning (ML) based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. The goals of SciSpark are to: (1) Decrease the time to compute comparison statistics and plots from minutes to seconds; (2) Allow for interactive exploration of time-series properties over seasons and years; (3) Decrease the time for satellite data ingestion into RCMES to hours; (4) Allow for Level-2 comparisons with higher-order statistics or PDF's in minutes to hours; and (5) Move RCMES into a near real time decision-making platform. We will report on: the architecture and design of SciSpark, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning (sharding) of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
Method and system for benchmarking computers
Gustafson, John L.
1993-09-14
A testing system and method for benchmarking computer systems. The system includes a store containing a scalable set of tasks to be performed to produce a solution in ever-increasing degrees of resolution as a larger number of the tasks are performed. A timing and control module allots to each computer a fixed benchmarking interval in which to perform the stored tasks. Means are provided for determining, after completion of the benchmarking interval, the degree of progress through the scalable set of tasks and for producing a benchmarking rating relating to the degree of progress for each computer.
A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osei-Kuffuor, Daniel; Fattebert, Jean-Luc
2014-01-01
Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N 3) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix,more » based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.« less
Towards a Cloud Based Smart Traffic Management Framework
NASA Astrophysics Data System (ADS)
Rahimi, M. M.; Hakimpour, F.
2017-09-01
Traffic big data has brought many opportunities for traffic management applications. However several challenges like heterogeneity, storage, management, processing and analysis of traffic big data may hinder their efficient and real-time applications. All these challenges call for well-adapted distributed framework for smart traffic management that can efficiently handle big traffic data integration, indexing, query processing, mining and analysis. In this paper, we present a novel, distributed, scalable and efficient framework for traffic management applications. The proposed cloud computing based framework can answer technical challenges for efficient and real-time storage, management, process and analyse of traffic big data. For evaluation of the framework, we have used OpenStreetMap (OSM) real trajectories and road network on a distributed environment. Our evaluation results indicate that speed of data importing to this framework exceeds 8000 records per second when the size of datasets is near to 5 million. We also evaluate performance of data retrieval in our proposed framework. The data retrieval speed exceeds 15000 records per second when the size of datasets is near to 5 million. We have also evaluated scalability and performance of our proposed framework using parallelisation of a critical pre-analysis in transportation applications. The results show that proposed framework achieves considerable performance and efficiency in traffic management applications.
Xu, Min; Chai, Xiaoqi; Muthakana, Hariank; Liang, Xiaodan; Yang, Ge; Zeev-Ben-Mordehai, Tzviya; Xing, Eric P.
2017-01-01
Abstract Motivation: Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations makes the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Although it is no longer difficult to acquire CECT data containing such amount of subtomograms due to advances in data acquisition automation, existing computational approaches have very limited scalability or discrimination ability, making them incapable of processing such amount of data. Results: To complement existing approaches, in this article we propose a new approach for subdividing subtomograms into smaller but relatively homogeneous subsets. The structures in these subsets can then be separately recovered using existing computation intensive methods. Our approach is based on supervised structural feature extraction using deep learning, in combination with unsupervised clustering and reference-free classification. Our experiments show that, compared with existing unsupervised rotation invariant feature and pose-normalization based approaches, our new approach achieves significant improvements in both discrimination ability and scalability. More importantly, our new approach is able to discover new structural classes and recover structures that do not exist in training data. Availability and Implementation: Source code freely available at http://www.cs.cmu.edu/∼mxu1/software. Contact: mxu1@cs.cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28881965
NASA Astrophysics Data System (ADS)
Fang, Y.; Hou, J.; Engel, D.; Lin, G.; Yin, J.; Han, B.; Fang, Z.; Fountoulakis, V.
2011-12-01
In this study, we introduce an uncertainty quantification (UQ) software framework for carbon sequestration, with the focus of studying being the effect of spatial heterogeneity of reservoir properties on CO2 migration. We use a sequential Gaussian method (SGSIM) to generate realizations of permeability fields with various spatial statistical attributes. To deal with the computational difficulties, we integrate the following ideas/approaches: 1) firstly, we use three different sampling approaches (probabilistic collocation, quasi-Monte Carlo, and adaptive sampling approaches) to reduce the required forward calculations while trying to explore the parameter space and quantify the input uncertainty; 2) secondly, we use eSTOMP as the forward modeling simulator. eSTOMP is implemented using the Global Arrays toolkit (GA) that is based on one-sided inter-processor communication and supports a shared memory programming style on distributed memory platforms. It provides highly-scalable performance. It uses a data model to partition most of the large scale data structures into a relatively small number of distinct classes. The lower level simulator infrastructure (e.g. meshing support, associated data structures, and data mapping to processors) is separated from the higher level physics and chemistry algorithmic routines using a grid component interface; and 3) besides the faster model and more efficient algorithms to speed up the forward calculation, we built an adaptive system infrastructure to select the best possible data transfer mechanisms, to optimally allocate system resources to improve performance, and to integrate software packages and data for composing carbon sequestration simulation, computation, analysis, estimation and visualization. We will demonstrate the framework with a given CO2 injection scenario in a heterogeneous sandstone reservoir.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trujillo, Angelina Michelle
Strategy, Planning, Acquiring- very large scale computing platforms come and go and planning for immensely scalable machines often precedes actual procurement by 3 years. Procurement can be another year or more. Integration- After Acquisition, machines must be integrated into the computing environments at LANL. Connection to scalable storage via large scale storage networking, assuring correct and secure operations. Management and Utilization – Ongoing operations, maintenance, and trouble shooting of the hardware and systems software at massive scale is required.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.; Joy, K. I.
2013-07-01
In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Jackson, Rebecca D; Best, Thomas M; Borlawsky, Tara B; Lai, Albert M; James, Stephen; Gurcan, Metin N
2012-01-01
The conduct of clinical and translational research regularly involves the use of a variety of heterogeneous and large-scale data resources. Scalable methods for the integrative analysis of such resources, particularly when attempting to leverage computable domain knowledge in order to generate actionable hypotheses in a high-throughput manner, remain an open area of research. In this report, we describe both a generalizable design pattern for such integrative knowledge-anchored hypothesis discovery operations and our experience in applying that design pattern in the experimental context of a set of driving research questions related to the publicly available Osteoarthritis Initiative data repository. We believe that this ‘test bed’ project and the lessons learned during its execution are both generalizable and representative of common clinical and translational research paradigms. PMID:22647689
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crozier, Paul; Howard, Micah; Rider, William J.
The SPARC (Sandia Parallel Aerodynamics and Reentry Code) will provide nuclear weapon qualification evidence for the random vibration and thermal environments created by re-entry of a warhead into the earth’s atmosphere. SPARC incorporates the innovative approaches of ATDM projects on several fronts including: effective harnessing of heterogeneous compute nodes using Kokkos, exascale-ready parallel scalability through asynchronous multi-tasking, uncertainty quantification through Sacado integration, implementation of state-of-the-art reentry physics and multiscale models, use of advanced verification and validation methods, and enabling of improved workflows for users. SPARC is being developed primarily for the Department of Energy nuclear weapon program, with additional developmentmore » and use of the code is being supported by the Department of Defense for conventional weapons programs.« less
NASA Technical Reports Server (NTRS)
Kopardekar, Parimal H.
2017-01-01
Conduct research, development and testing to identify airspace operations requirements to enable large-scale visual and beyond visual line of sight UAS operations in the low-altitude airspace. Use build-a-little-test-a-little strategy remote areas to urban areas Low density: No traffic management required but understanding of airspace constraints. Cooperative traffic management: Understanding of airspace constraints and other operations. Manned and unmanned traffic management: Scalable and heterogeneous operations. UTM construct consistent with FAAs risk-based strategy. UTM research platform is used for simulations and tests. UTM offers path towards scalability
Scalability and Validation of Big Data Bioinformatics Software.
Yang, Andrian; Troup, Michael; Ho, Joshua W K
2017-01-01
This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
Visual Analytics for Power Grid Contingency Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wong, Pak C.; Huang, Zhenyu; Chen, Yousu
2014-01-20
Contingency analysis is the process of employing different measures to model scenarios, analyze them, and then derive the best response to remove the threats. This application paper focuses on a class of contingency analysis problems found in the power grid management system. A power grid is a geographically distributed interconnected transmission network that transmits and delivers electricity from generators to end users. The power grid contingency analysis problem is increasingly important because of both the growing size of the underlying raw data that need to be analyzed and the urgency to deliver working solutions in an aggressive timeframe. Failure tomore » do so may bring significant financial, economic, and security impacts to all parties involved and the society at large. The paper presents a scalable visual analytics pipeline that transforms about 100 million contingency scenarios to a manageable size and form for grid operators to examine different scenarios and come up with preventive or mitigation strategies to address the problems in a predictive and timely manner. Great attention is given to the computational scalability, information scalability, visual scalability, and display scalability issues surrounding the data analytics pipeline. Most of the large-scale computation requirements of our work are conducted on a Cray XMT multi-threaded parallel computer. The paper demonstrates a number of examples using western North American power grid models and data.« less
Blueprint for a microwave trapped ion quantum computer
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G.; Mølmer, Klaus; Devitt, Simon J.; Wunderlich, Christof; Hensinger, Winfried K.
2017-01-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion–based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation–based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error–threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects. PMID:28164154
Coupled Physics Environment (CouPE) library - Design, Implementation, and Release
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mahadevan, Vijay S.
Over several years, high fidelity, validated mono-physics solvers with proven scalability on peta-scale architectures have been developed independently. Based on a unified component-based architecture, these existing codes can be coupled with a unified mesh-data backplane and a flexible coupling-strategy-based driver suite to produce a viable tool for analysts. In this report, we present details on the design decisions and developments on CouPE, an acronym that stands for Coupled Physics Environment that orchestrates a coupled physics solver through the interfaces exposed by MOAB array-based unstructured mesh, both of which are part of SIGMA (Scalable Interfaces for Geometry and Mesh-Based Applications) toolkit.more » The SIGMA toolkit contains libraries that enable scalable geometry and unstructured mesh creation and handling in a memory and computationally efficient implementation. The CouPE version being prepared for a full open-source release along with updated documentation will contain several useful examples that will enable users to start developing their applications natively using the native MOAB mesh and couple their models to existing physics applications to analyze and solve real world problems of interest. An integrated multi-physics simulation capability for the design and analysis of current and future nuclear reactor models is also being investigated as part of the NEAMS RPL, to tightly couple neutron transport, thermal-hydraulics and structural mechanics physics under the SHARP framework. This report summarizes the efforts that have been invested in CouPE to bring together several existing physics applications namely PROTEUS (neutron transport code), Nek5000 (computational fluid-dynamics code) and Diablo (structural mechanics code). The goal of the SHARP framework is to perform fully resolved coupled physics analysis of a reactor on heterogeneous geometry, in order to reduce the overall numerical uncertainty while leveraging available computational resources. The design of CouPE along with motivations that led to implementation choices are also discussed. The first release of the library will be different from the current version of the code that integrates the components in SHARP and explanation on the need for forking the source base will also be provided. Enhancements in the functionality and improved user guides will be available as part of the release. CouPE v0.1 is scheduled for an open-source release in December 2014 along with SIGMA v1.1 components that provide support for language-agnostic mesh loading, traversal and query interfaces along with scalable solution transfer of fields between different physics codes. The coupling methodology and software interfaces of the library are presented, along with verification studies on two representative fast sodium-cooled reactor demonstration problems to prove the usability of the CouPE library.« less
Cloud computing applications for biomedical science: A perspective.
Navale, Vivek; Bourne, Philip E
2018-06-01
Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Cloud computing applications for biomedical science: A perspective
2018-01-01
Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
A generic, cost-effective, and scalable cell lineage analysis platform
Biezuner, Tamir; Spiro, Adam; Raz, Ofir; Amir, Shiran; Milo, Lilach; Adar, Rivka; Chapal-Ilani, Noa; Berman, Veronika; Fried, Yael; Ainbinder, Elena; Cohen, Galit; Barr, Haim M.; Halaban, Ruth; Shapiro, Ehud
2016-01-01
Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased by functional dependencies. Here we show an integrated biochemical-computational platform for generic single-cell lineage analysis that is retrospective, cost-effective, and scalable. It consists of a biochemical-computational pipeline that inputs individual cells, produces targeted single-cell sequencing data, and uses it to generate a lineage tree of the input cells. We validated the platform by applying it to cells sampled from an ex vivo grown tree and analyzed its feasibility landscape by computer simulations. We conclude that the platform may serve as a generic tool for lineage analysis and thus pave the way toward large-scale human cell lineage discovery. PMID:27558250
Scalable Domain Decomposed Monte Carlo Particle Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, Matthew Joseph
2013-12-05
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.
Moutsatsos, Ioannis K; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J; Jenkins, Jeremy L; Holway, Nicholas; Tallarico, John; Parker, Christian N
2017-03-01
High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community.
Moutsatsos, Ioannis K.; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J.; Jenkins, Jeremy L.; Holway, Nicholas; Tallarico, John; Parker, Christian N.
2016-01-01
High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an “off-the-shelf,” open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community. PMID:27899692
Continuous Heterogeneous Photocatalysis in Serial Micro-Batch Reactors.
Pieber, Bartholomäus; Shalom, Menny; Antonietti, Markus; Seeberger, Peter H; Gilmore, Kerry
2018-01-29
Solid reagents, leaching catalysts, and heterogeneous photocatalysts are commonly employed in batch processes but are ill-suited for continuous-flow chemistry. Heterogeneous catalysts for thermal reactions are typically used in packed-bed reactors, which cannot be penetrated by light and thus are not suitable for photocatalytic reactions involving solids. We demonstrate that serial micro-batch reactors (SMBRs) allow for the continuous utilization of solid materials together with liquids and gases in flow. This technology was utilized to develop selective and efficient fluorination reactions using a modified graphitic carbon nitride heterogeneous catalyst instead of costly homogeneous metal polypyridyl complexes. The merger of this inexpensive, recyclable catalyst and the SMBR approach enables sustainable and scalable photocatalysis. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
2004-10-01
MONITORING AGENCY NAME(S) AND ADDRESS(ES) Defense Advanced Research Projects Agency AFRL/IFTC 3701 North Fairfax Drive...Scalable Parallel Libraries for Large-Scale Concurrent Applications," Technical Report UCRL -JC-109251, Lawrence Livermore National Laboratory
Rodrigues, Miguel A; Balzan, Gustavo; Rosa, Mónica; Gomes, Diana; de Azevedo, Edmundo G; Singh, Satish K; Matos, Henrique A; Geraldes, Vítor
2013-01-01
Freezing is an important operation in biotherapeutics industry. However, water crystallization in solution, containing electrolytes, sugars and proteins, is difficult to control and usually leads to substantial spatial solute heterogeneity. Herein, we address the influence of the geometry of freezing direction (axial or radial) on the heterogeneity of the frozen matrix, in terms of local concentration of solutes and thermal history. Solutions of hemoglobin were frozen radially and axially using small-scale and pilot-scale freezing systems. Concentration of hemoglobin, sucrose and pH values were measured by ice-core sampling and temperature profiles were measured at several locations. The results showed that natural convection is the major source for the cryoconcentration heterogeneity of solutes over the geometry of the container. A significant improvement in this spatial heterogeneity was observed when the freezing geometry was nonconvective, i.e., the freezing front progression was unidirectional from bottom to top. Using this geometry, less than 10% variation in solutes concentration was obtained throughout the frozen solutions. This result was reproducible, even when the volume was increased by two orders of magnitude (from 30 mL to 3 L). The temperature profiles obtained for the nonconvective freezing geometry were predicted using a relatively simple computational fluid dynamics model. The reproducible solutes distribution, predictable temperature profiles, and scalability demonstrate that the bottom to top freezing geometry enables an extended control over the freezing process. This geometry has therefore shown the potential to contribute to a better understanding and control of the risks inherent to frozen storage. © 2013 American Institute of Chemical Engineers.
An overview of the heterogeneous telescope network system: Concept, scalability and operation
NASA Astrophysics Data System (ADS)
White, R. R.; Allan, A.
2008-03-01
In the coming decade there will be an avalanche of data streams devoted to astronomical exploration opening new windows of scientific discovery. The shear volume of data and the diversity of event types (Kantor 2006; Kaiser 2004; Vestrand & Theiler & Wozniak 2004) will necessitate; the move to a common language for the communication of event data, and enabling telescope systems with the ability to not just simply respond, but to act independently in order to take full advantage of available resources in a timely manner. Developed over the past three years, the Virtual Observatory Event (VOEvent) provides the best format for carrying these diverse event messages (White et al. 2006a; Seaman & Warner 2006). However, in order for the telescopes to be able to act independently, a system of interoperable network nodes must be in place, that will allow the astronomical assets to not only issue event notifications, but to coordinate and request specific observations. The Heterogeneous Telescope Network (HTN) is a network architecture that can achieve the goals set forth and provide a scalable design to match both fully autonomous and manual telescope system needs (Allan et al. 2006a; White et al. 2006b; Hessman 2006b). In this paper we will show the design concept of this meta-network and nodes, their scalable architecture and complexity, and how this concept can meet the needs of institutions in the near future.
NASA Technical Reports Server (NTRS)
Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas
2000-01-01
An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.
Final Report. Center for Scalable Application Development Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
2014-10-26
The Center for Scalable Application Development Software (CScADS) was established as a part- nership between Rice University, Argonne National Laboratory, University of California Berkeley, University of Tennessee – Knoxville, and University of Wisconsin – Madison. CScADS pursued an integrated set of activities with the aim of increasing the productivity of DOE computational scientists by catalyzing the development of systems software, libraries, compilers, and tools for leadership computing platforms. Principal Center activities were workshops to engage the research community in the challenges of leadership computing, research and development of open-source software, and work with computational scientists to help them develop codesmore » for leadership computing platforms. This final report summarizes CScADS activities at Rice University in these areas.« less
Enabling a high throughput real time data pipeline for a large radio telescope array with GPUs
NASA Astrophysics Data System (ADS)
Edgar, R. G.; Clark, M. A.; Dale, K.; Mitchell, D. A.; Ord, S. M.; Wayth, R. B.; Pfister, H.; Greenhill, L. J.
2010-10-01
The Murchison Widefield Array (MWA) is a next-generation radio telescope currently under construction in the remote Western Australia Outback. Raw data will be generated continuously at 5 GiB s-1, grouped into 8 s cadences. This high throughput motivates the development of on-site, real time processing and reduction in preference to archiving, transport and off-line processing. Each batch of 8 s data must be completely reduced before the next batch arrives. Maintaining real time operation will require a sustained performance of around 2.5 TFLOP s-1 (including convolutions, FFTs, interpolations and matrix multiplications). We describe a scalable heterogeneous computing pipeline implementation, exploiting both the high computing density and FLOP-per-Watt ratio of modern GPUs. The architecture is highly parallel within and across nodes, with all major processing elements performed by GPUs. Necessary scatter-gather operations along the pipeline are loosely synchronized between the nodes hosting the GPUs. The MWA will be a frontier scientific instrument and a pathfinder for planned peta- and exa-scale facilities.
Agent-Based Intelligent Interface for Wheelchair Movement Control
Barriuso, Alberto L.; De Paz, Juan F.
2018-01-01
People who suffer from any kind of motor difficulty face serious complications to autonomously move in their daily lives. However, a growing number research projects which propose different powered wheelchairs control systems are arising. Despite of the interest of the research community in the area, there is no platform that allows an easy integration of various control methods that make use of heterogeneous sensors and computationally demanding algorithms. In this work, an architecture based on virtual organizations of agents is proposed that makes use of a flexible and scalable communication protocol that allows the deployment of embedded agents in computationally limited devices. In order to validate the proper functioning of the proposed system, it has been integrated into a conventional wheelchair and a set of alternative control interfaces have been developed and deployed, including a portable electroencephalography system, a voice interface or as specifically designed smartphone application. A set of tests were conducted to test both the platform adequacy and the accuracy and ease of use of the proposed control systems yielding positive results that can be useful in further wheelchair interfaces design and implementation. PMID:29751603
Processing Diabetes Mellitus Composite Events in MAGPIE.
Brugués, Albert; Bromuri, Stefano; Barry, Michael; Del Toro, Óscar Jiménez; Mazurkiewicz, Maciej R; Kardas, Przemyslaw; Pegueroles, Josep; Schumacher, Michael
2016-02-01
The focus of this research is in the definition of programmable expert Personal Health Systems (PHS) to monitor patients affected by chronic diseases using agent oriented programming and mobile computing to represent the interactions happening amongst the components of the system. The paper also discusses issues of knowledge representation within the medical domain when dealing with temporal patterns concerning the physiological values of the patient. In the presented agent based PHS the doctors can personalize for each patient monitoring rules that can be defined in a graphical way. Furthermore, to achieve better scalability, the computations for monitoring the patients are distributed among their devices rather than being performed in a centralized server. The system is evaluated using data of 21 diabetic patients to detect temporal patterns according to a set of monitoring rules defined. The system's scalability is evaluated by comparing it with a centralized approach. The evaluation concerning the detection of temporal patterns highlights the system's ability to monitor chronic patients affected by diabetes. Regarding the scalability, the results show the fact that an approach exploiting the use of mobile computing is more scalable than a centralized approach. Therefore, more likely to satisfy the needs of next generation PHSs. PHSs are becoming an adopted technology to deal with the surge of patients affected by chronic illnesses. This paper discusses architectural choices to make an agent based PHS more scalable by using a distributed mobile computing approach. It also discusses how to model the medical knowledge in the PHS in such a way that it is modifiable at run time. The evaluation highlights the necessity of distributing the reasoning to the mobile part of the system and that modifiable rules are able to deal with the change in lifestyle of the patients affected by chronic illnesses.
A software methodology for compiling quantum programs
NASA Astrophysics Data System (ADS)
Häner, Thomas; Steiger, Damian S.; Svore, Krysta; Troyer, Matthias
2018-04-01
Quantum computers promise to transform our notions of computation by offering a completely new paradigm. To achieve scalable quantum computation, optimizing compilers and a corresponding software design flow will be essential. We present a software architecture for compiling quantum programs from a high-level language program to hardware-specific instructions. We describe the necessary layers of abstraction and their differences and similarities to classical layers of a computer-aided design flow. For each layer of the stack, we discuss the underlying methods for compilation and optimization. Our software methodology facilitates more rapid innovation among quantum algorithm designers, quantum hardware engineers, and experimentalists. It enables scalable compilation of complex quantum algorithms and can be targeted to any specific quantum hardware implementation.
Scalable Multiprocessor for High-Speed Computing in Space
NASA Technical Reports Server (NTRS)
Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard
2004-01-01
A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
A Disciplined Architectural Approach to Scaling Data Analysis for Massive, Scientific Data
NASA Astrophysics Data System (ADS)
Crichton, D. J.; Braverman, A. J.; Cinquini, L.; Turmon, M.; Lee, H.; Law, E.
2014-12-01
Data collections across remote sensing and ground-based instruments in astronomy, Earth science, and planetary science are outpacing scientists' ability to analyze them. Furthermore, the distribution, structure, and heterogeneity of the measurements themselves pose challenges that limit the scalability of data analysis using traditional approaches. Methods for developing science data processing pipelines, distribution of scientific datasets, and performing analysis will require innovative approaches that integrate cyber-infrastructure, algorithms, and data into more systematic approaches that can more efficiently compute and reduce data, particularly distributed data. This requires the integration of computer science, machine learning, statistics and domain expertise to identify scalable architectures for data analysis. The size of data returned from Earth Science observing satellites and the magnitude of data from climate model output, is predicted to grow into the tens of petabytes challenging current data analysis paradigms. This same kind of growth is present in astronomy and planetary science data. One of the major challenges in data science and related disciplines defining new approaches to scaling systems and analysis in order to increase scientific productivity and yield. Specific needs include: 1) identification of optimized system architectures for analyzing massive, distributed data sets; 2) algorithms for systematic analysis of massive data sets in distributed environments; and 3) the development of software infrastructures that are capable of performing massive, distributed data analysis across a comprehensive data science framework. NASA/JPL has begun an initiative in data science to address these challenges. Our goal is to evaluate how scientific productivity can be improved through optimized architectural topologies that identify how to deploy and manage the access, distribution, computation, and reduction of massive, distributed data, while managing the uncertainties of scientific conclusions derived from such capabilities. This talk will provide an overview of JPL's efforts in developing a comprehensive architectural approach to data science.
Xu, Min; Chai, Xiaoqi; Muthakana, Hariank; Liang, Xiaodan; Yang, Ge; Zeev-Ben-Mordehai, Tzviya; Xing, Eric P
2017-07-15
Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations makes the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Although it is no longer difficult to acquire CECT data containing such amount of subtomograms due to advances in data acquisition automation, existing computational approaches have very limited scalability or discrimination ability, making them incapable of processing such amount of data. To complement existing approaches, in this article we propose a new approach for subdividing subtomograms into smaller but relatively homogeneous subsets. The structures in these subsets can then be separately recovered using existing computation intensive methods. Our approach is based on supervised structural feature extraction using deep learning, in combination with unsupervised clustering and reference-free classification. Our experiments show that, compared with existing unsupervised rotation invariant feature and pose-normalization based approaches, our new approach achieves significant improvements in both discrimination ability and scalability. More importantly, our new approach is able to discover new structural classes and recover structures that do not exist in training data. Source code freely available at http://www.cs.cmu.edu/∼mxu1/software . mxu1@cs.cmu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Magnoni, L.; Suthakar, U.; Cordeiro, C.; Georgiou, M.; Andreeva, J.; Khan, A.; Smith, D. R.
2015-12-01
Monitoring the WLCG infrastructure requires the gathering and analysis of a high volume of heterogeneous data (e.g. data transfers, job monitoring, site tests) coming from different services and experiment-specific frameworks to provide a uniform and flexible interface for scientists and sites. The current architecture, where relational database systems are used to store, to process and to serve monitoring data, has limitations in coping with the foreseen increase in the volume (e.g. higher LHC luminosity) and the variety (e.g. new data-transfer protocols and new resource-types, as cloud-computing) of WLCG monitoring events. This paper presents a new scalable data store and analytics platform designed by the Support for Distributed Computing (SDC) group, at the CERN IT department, which uses a variety of technologies each one targeting specific aspects of big-scale distributed data-processing (commonly referred as lambda-architecture approach). Results of data processing on Hadoop for WLCG data activities monitoring are presented, showing how the new architecture can easily analyze hundreds of millions of transfer logs in a few minutes. Moreover, a comparison of data partitioning, compression and file format (e.g. CSV, Avro) is presented, with particular attention given to how the file structure impacts the overall MapReduce performance. In conclusion, the evolution of the current implementation, which focuses on data storage and batch processing, towards a complete lambda-architecture is discussed, with consideration of candidate technology for the serving layer (e.g. Elasticsearch) and a description of a proof of concept implementation, based on Apache Spark and Esper, for the real-time part which compensates for batch-processing latency and automates problem detection and failures.
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2016-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
NASA Astrophysics Data System (ADS)
Konduri, Aditya
Many natural and engineering systems are governed by nonlinear partial differential equations (PDEs) which result in a multiscale phenomena, e.g. turbulent flows. Numerical simulations of these problems are computationally very expensive and demand for extreme levels of parallelism. At realistic conditions, simulations are being carried out on massively parallel computers with hundreds of thousands of processing elements (PEs). It has been observed that communication between PEs as well as their synchronization at these extreme scales take up a significant portion of the total simulation time and result in poor scalability of codes. This issue is likely to pose a bottleneck in scalability of codes on future Exascale systems. In this work, we propose an asynchronous computing algorithm based on widely used finite difference methods to solve PDEs in which synchronization between PEs due to communication is relaxed at a mathematical level. We show that while stability is conserved when schemes are used asynchronously, accuracy is greatly degraded. Since message arrivals at PEs are random processes, so is the behavior of the error. We propose a new statistical framework in which we show that average errors drop always to first-order regardless of the original scheme. We propose new asynchrony-tolerant schemes that maintain accuracy when synchronization is relaxed. The quality of the solution is shown to depend, not only on the physical phenomena and numerical schemes, but also on the characteristics of the computing machine. A novel algorithm using remote memory access communications has been developed to demonstrate excellent scalability of the method for large-scale computing. Finally, we present a path to extend this method in solving complex multi-scale problems on Exascale machines.
A scalable plant-resolving radiative transfer model based on optimized GPU ray tracing
USDA-ARS?s Scientific Manuscript database
A new model for radiative transfer in participating media and its application to complex plant canopies is presented. The goal was to be able to efficiently solve complex canopy-scale radiative transfer problems while also representing sub-plant heterogeneity. In the model, individual leaf surfaces ...
Learning Analytics Platform, towards an Open Scalable Streaming Solution for Education
ERIC Educational Resources Information Center
Lewkow, Nicholas; Zimmerman, Neil; Riedesel, Mark; Essa, Alfred
2015-01-01
Next generation digital learning environments require delivering "just-in-time feedback" to learners and those who support them. Unlike traditional business intelligence environments, streaming data requires resilient infrastructure that can move data at scale from heterogeneous data sources, process the data quickly for use across…
Jaschob, Daniel; Riffle, Michael
2012-07-30
Laboratories engaged in computational biology or bioinformatics frequently need to run lengthy, multistep, and user-driven computational jobs. Each job can tie up a computer for a few minutes to several days, and many laboratories lack the expertise or resources to build and maintain a dedicated computer cluster. JobCenter is a client-server application and framework for job management and distributed job execution. The client and server components are both written in Java and are cross-platform and relatively easy to install. All communication with the server is client-driven, which allows worker nodes to run anywhere (even behind external firewalls or "in the cloud") and provides inherent load balancing. Adding a worker node to the worker pool is as simple as dropping the JobCenter client files onto any computer and performing basic configuration, which provides tremendous ease-of-use, flexibility, and limitless horizontal scalability. Each worker installation may be independently configured, including the types of jobs it is able to run. Executed jobs may be written in any language and may include multistep workflows. JobCenter is a versatile and scalable distributed job management system that allows laboratories to very efficiently distribute all computational work among available resources. JobCenter is freely available at http://code.google.com/p/jobcenter/.
A scalable quantum computer with ions in an array of microtraps
Cirac; Zoller
2000-04-06
Quantum computers require the storage of quantum information in a set of two-level systems (called qubits), the processing of this information using quantum gates and a means of final readout. So far, only a few systems have been identified as potentially viable quantum computer models--accurate quantum control of the coherent evolution is required in order to realize gate operations, while at the same time decoherence must be avoided. Examples include quantum optical systems (such as those utilizing trapped ions or neutral atoms, cavity quantum electrodynamics and nuclear magnetic resonance) and solid state systems (using nuclear spins, quantum dots and Josephson junctions). The most advanced candidates are the quantum optical and nuclear magnetic resonance systems, and we expect that they will allow quantum computing with about ten qubits within the next few years. This is still far from the numbers required for useful applications: for example, the factorization of a 200-digit number requires about 3,500 qubits, rising to 100,000 if error correction is implemented. Scalability of proposed quantum computer architectures to many qubits is thus of central importance. Here we propose a model for an ion trap quantum computer that combines scalability (a feature usually associated with solid state proposals) with the advantages of quantum optical systems (in particular, quantum control and long decoherence times).
Quantum interference in heterogeneous superconducting-photonic circuits on a silicon chip.
Schuck, C; Guo, X; Fan, L; Ma, X; Poot, M; Tang, H X
2016-01-21
Quantum information processing holds great promise for communicating and computing data efficiently. However, scaling current photonic implementation approaches to larger system size remains an outstanding challenge for realizing disruptive quantum technology. Two main ingredients of quantum information processors are quantum interference and single-photon detectors. Here we develop a hybrid superconducting-photonic circuit system to show how these elements can be combined in a scalable fashion on a silicon chip. We demonstrate the suitability of this approach for integrated quantum optics by interfering and detecting photon pairs directly on the chip with waveguide-coupled single-photon detectors. Using a directional coupler implemented with silicon nitride nanophotonic waveguides, we observe 97% interference visibility when measuring photon statistics with two monolithically integrated superconducting single-photon detectors. The photonic circuit and detector fabrication processes are compatible with standard semiconductor thin-film technology, making it possible to implement more complex and larger scale quantum photonic circuits on silicon chips.
Privacy enabling technology for video surveillance
NASA Astrophysics Data System (ADS)
Dufaux, Frédéric; Ouaret, Mourad; Abdeljaoued, Yousri; Navarro, Alfonso; Vergnenègre, Fabrice; Ebrahimi, Touradj
2006-05-01
In this paper, we address the problem privacy in video surveillance. We propose an efficient solution based on transformdomain scrambling of regions of interest in a video sequence. More specifically, the sign of selected transform coefficients is flipped during encoding. We address more specifically the case of Motion JPEG 2000. Simulation results show that the technique can be successfully applied to conceal information in regions of interest in the scene while providing with a good level of security. Furthermore, the scrambling is flexible and allows adjusting the amount of distortion introduced. This is achieved with a small impact on coding performance and negligible computational complexity increase. In the proposed video surveillance system, heterogeneous clients can remotely access the system through the Internet or 2G/3G mobile phone network. Thanks to the inherently scalable Motion JPEG 2000 codestream, the server is able to adapt the resolution and bandwidth of the delivered video depending on the usage environment of the client.
High-resolution coupled physics solvers for analysing fine-scale nuclear reactor design problems.
Mahadevan, Vijay S; Merzari, Elia; Tautges, Timothy; Jain, Rajeev; Obabko, Aleksandr; Smith, Michael; Fischer, Paul
2014-08-06
An integrated multi-physics simulation capability for the design and analysis of current and future nuclear reactor models is being investigated, to tightly couple neutron transport and thermal-hydraulics physics under the SHARP framework. Over several years, high-fidelity, validated mono-physics solvers with proven scalability on petascale architectures have been developed independently. Based on a unified component-based architecture, these existing codes can be coupled with a mesh-data backplane and a flexible coupling-strategy-based driver suite to produce a viable tool for analysts. The goal of the SHARP framework is to perform fully resolved coupled physics analysis of a reactor on heterogeneous geometry, in order to reduce the overall numerical uncertainty while leveraging available computational resources. The coupling methodology and software interfaces of the framework are presented, along with verification studies on two representative fast sodium-cooled reactor demonstration problems to prove the usability of the SHARP framework.
High-resolution coupled physics solvers for analysing fine-scale nuclear reactor design problems
Mahadevan, Vijay S.; Merzari, Elia; Tautges, Timothy; Jain, Rajeev; Obabko, Aleksandr; Smith, Michael; Fischer, Paul
2014-01-01
An integrated multi-physics simulation capability for the design and analysis of current and future nuclear reactor models is being investigated, to tightly couple neutron transport and thermal-hydraulics physics under the SHARP framework. Over several years, high-fidelity, validated mono-physics solvers with proven scalability on petascale architectures have been developed independently. Based on a unified component-based architecture, these existing codes can be coupled with a mesh-data backplane and a flexible coupling-strategy-based driver suite to produce a viable tool for analysts. The goal of the SHARP framework is to perform fully resolved coupled physics analysis of a reactor on heterogeneous geometry, in order to reduce the overall numerical uncertainty while leveraging available computational resources. The coupling methodology and software interfaces of the framework are presented, along with verification studies on two representative fast sodium-cooled reactor demonstration problems to prove the usability of the SHARP framework. PMID:24982250
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system
NASA Astrophysics Data System (ADS)
Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.
2014-06-01
The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
Bringing the CMS distributed computing system into scalable operations
NASA Astrophysics Data System (ADS)
Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.
2010-04-01
Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
A look at scalable dense linear algebra libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, J.J.; Van de Geijn, R.A.; Walker, D.W.
1992-01-01
We discuss the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object- oriented interface to the library permits more portable applications to be written, and is easy to learn and use, since details of the parallel implementation are hidden from the user. Experiments on the Intel Touchstone Delta system with a prototype code that uses the square block scattered decomposition to perform LU factorization aremore » presented and analyzed. It was found that the code was both scalable and efficient, performing at about 14 GFLOPS (double precision) for the largest problem considered.« less
A look at scalable dense linear algebra libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dongarra, J.J.; Van de Geijn, R.A.; Walker, D.W.
1992-08-01
We discuss the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object- oriented interface to the library permits more portable applications to be written, and is easy to learn and use, since details of the parallel implementation are hidden from the user. Experiments on the Intel Touchstone Delta system with a prototype code that uses the square block scattered decomposition to perform LU factorization aremore » presented and analyzed. It was found that the code was both scalable and efficient, performing at about 14 GFLOPS (double precision) for the largest problem considered.« less
Modelling heterogeneous interfaces for solar water splitting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham, Tuan Anh; Ping, Yuan; Galli, Giulia
2017-01-09
The generation of hydrogen from water and sunlight others a promising approach for producing scalable and sustainable carbon-free energy. The key of a successful solar-to-fuel technology is the design of efficient, long-lasting and low-cost photoelectrochemical cells, which are responsible for absorbing sunlight and driving water splitting reactions. To this end, a detailed understanding and control of heterogeneous interfaces between photoabsorbers, electrolytes and catalysts present in photoelectrochemical cells is essential. Here we review recent progress and open challenges in predicting physicochemical properties of heterogeneous interfaces for solar water splitting applications using first-principles-based approaches, and highlights the key role of these calculationsmore » in interpreting increasingly complex experiments.« less
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
2012-01-01
Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
NASA Astrophysics Data System (ADS)
Wei, Hai-Rui; Deng, Fu-Guo
2014-12-01
Quantum logic gates are the key elements in quantum computing. Here we investigate the possibility of achieving a scalable and compact quantum computing based on stationary electron-spin qubits, by using the giant optical circular birefringence induced by quantum-dot spins in double-sided optical microcavities as a result of cavity quantum electrodynamics. We design the compact quantum circuits for implementing universal and deterministic quantum gates for electron-spin systems, including the two-qubit CNOT gate and the three-qubit Toffoli gate. They are compact and economic, and they do not require additional electron-spin qubits. Moreover, our devices have good scalability and are attractive as they both are based on solid-state quantum systems and the qubits are stationary. They are feasible with the current experimental technology, and both high fidelity and high efficiency can be achieved when the ratio of the side leakage to the cavity decay is low.
Wei, Hai-Rui; Deng, Fu-Guo
2014-12-18
Quantum logic gates are the key elements in quantum computing. Here we investigate the possibility of achieving a scalable and compact quantum computing based on stationary electron-spin qubits, by using the giant optical circular birefringence induced by quantum-dot spins in double-sided optical microcavities as a result of cavity quantum electrodynamics. We design the compact quantum circuits for implementing universal and deterministic quantum gates for electron-spin systems, including the two-qubit CNOT gate and the three-qubit Toffoli gate. They are compact and economic, and they do not require additional electron-spin qubits. Moreover, our devices have good scalability and are attractive as they both are based on solid-state quantum systems and the qubits are stationary. They are feasible with the current experimental technology, and both high fidelity and high efficiency can be achieved when the ratio of the side leakage to the cavity decay is low.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
Low-power, transparent optical network interface for high bandwidth off-chip interconnects.
Liboiron-Ladouceur, Odile; Wang, Howard; Garg, Ajay S; Bergman, Keren
2009-04-13
The recent emergence of multicore architectures and chip multiprocessors (CMPs) has accelerated the bandwidth requirements in high-performance processors for both on-chip and off-chip interconnects. For next generation computing clusters, the delivery of scalable power efficient off-chip communications to each compute node has emerged as a key bottleneck to realizing the full computational performance of these systems. The power dissipation is dominated by the off-chip interface and the necessity to drive high-speed signals over long distances. We present a scalable photonic network interface approach that fully exploits the bandwidth capacity offered by optical interconnects while offering significant power savings over traditional E/O and O/E approaches. The power-efficient interface optically aggregates electronic serial data streams into a multiple WDM channel packet structure at time-of-flight latencies. We demonstrate a scalable optical network interface with 70% improvement in power efficiency for a complete end-to-end PCI Express data transfer.
Digital quantum simulators in a scalable architecture of hybrid spin-photon qubits
Chiesa, Alessandro; Santini, Paolo; Gerace, Dario; Raftery, James; Houck, Andrew A.; Carretta, Stefano
2015-01-01
Resolving quantum many-body problems represents one of the greatest challenges in physics and physical chemistry, due to the prohibitively large computational resources that would be required by using classical computers. A solution has been foreseen by directly simulating the time evolution through sequences of quantum gates applied to arrays of qubits, i.e. by implementing a digital quantum simulator. Superconducting circuits and resonators are emerging as an extremely promising platform for quantum computation architectures, but a digital quantum simulator proposal that is straightforwardly scalable, universal, and realizable with state-of-the-art technology is presently lacking. Here we propose a viable scheme to implement a universal quantum simulator with hybrid spin-photon qubits in an array of superconducting resonators, which is intrinsically scalable and allows for local control. As representative examples we consider the transverse-field Ising model, a spin-1 Hamiltonian, and the two-dimensional Hubbard model and we numerically simulate the scheme by including the main sources of decoherence. PMID:26563516
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.
Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John
2012-12-05
For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
XPRESS: eXascale PRogramming Environment and System Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brightwell, Ron; Sterling, Thomas; Koniges, Alice
The XPRESS Project is one of four major projects of the DOE Office of Science Advanced Scientific Computing Research X-stack Program initiated in September, 2012. The purpose of XPRESS is to devise an innovative system software stack to enable practical and useful exascale computing around the end of the decade with near-term contributions to efficient and scalable operation of trans-Petaflops performance systems in the next two to three years; both for DOE mission-critical applications. To this end, XPRESS directly addresses critical challenges in computing of efficiency, scalability, and programmability through introspective methods of dynamic adaptive resource management and task scheduling.
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Palamuttam, R. S.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; Verma, R.; Waliser, D. E.; Lee, H.
2015-12-01
Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 10 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. We have implemented a parallel data ingest capability in which the user specifies desired variables (arrays) as several time-sorted lists of URL's (i.e. using OPeNDAP model.nc?varname, or local files). The specified variables are partitioned by time/space and then each Spark node pulls its bundle of arrays into memory to begin a computation pipeline. We also investigated the performance of several N-dim. array libraries (scala breeze, java jblas & netlib-java, and ND4J). We are currently developing science codes using ND4J and studying memory behavior on the JVM. On the pyspark side, many of our science codes already use the numpy and SciPy ecosystems. The talk will cover: the architecture of SciSpark, the design of the scientific RDD (sRDD) data structure, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
Towards a Next-Generation Catalogue Cross-Match Service
NASA Astrophysics Data System (ADS)
Pineau, F.; Boch, T.; Derriere, S.; Arches Consortium
2015-09-01
We have been developing in the past several catalogue cross-match tools. On one hand the CDS XMatch service (Pineau et al. 2011), able to perform basic but very efficient cross-matches, scalable to the largest catalogues on a single regular server. On the other hand, as part of the European project ARCHES1, we have been developing a generic and flexible tool which performs potentially complex multi-catalogue cross-matches and which computes probabilities of association based on a novel statistical framework. Although the two approaches have been managed so far as different tracks, the need for next generation cross-match services dealing with both efficiency and complexity is becoming pressing with forthcoming projects which will produce huge high quality catalogues. We are addressing this challenge which is both theoretical and technical. In ARCHES we generalize to N catalogues the candidate selection criteria - based on the chi-square distribution - described in Pineau et al. (2011). We formulate and test a number of Bayesian hypothesis which necessarily increases dramatically with the number of catalogues. To assign a probability to each hypotheses, we rely on estimated priors which account for local densities of sources. We validated our developments by comparing the theoretical curves we derived with the results of Monte-Carlo simulations. The current prototype is able to take into account heterogeneous positional errors, object extension and proper motion. The technical complexity is managed by OO programming design patterns and SQL-like functionalities. Large tasks are split into smaller independent pieces for scalability. Performances are achieved resorting to multi-threading, sequential reads and several tree data-structures. In addition to kd-trees, we account for heterogeneous positional errors and object's extension using M-trees. Proper-motions are supported using a modified M-tree we developed, inspired from Time Parametrized R-trees (TPR-tree). Quantitative tests in comparison with the basic cross-match will be presented.
Performance Models for the Spike Banded Linear System Solver
Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...
2011-01-01
With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
: A Scalable and Transparent System for Simulating MPI Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2010-01-01
is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by is being expanded to support a wider set of applications, andmore » MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, sik. In the largest runs, has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time.« less
Scalability of Findability: Decentralized Search and Retrieval in Large Information Networks
ERIC Educational Resources Information Center
Ke, Weimao
2010-01-01
Amid the rapid growth of information today is the increasing challenge for people to survive and navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web challenge information retrieval in these environments. Collection of information in advance and centralization of IR operations are hardly possible because…
Surviving the Glut: The Management of Event Streams in Cyberphysical Systems
NASA Astrophysics Data System (ADS)
Buchmann, Alejandro
Alejandro Buchmann is Professor in the Department of Computer Science, Technische Universität Darmstadt, where he heads the Databases and Distributed Systems Group. He received his MS (1977) and PhD (1980) from the University of Texas at Austin. He was an Assistant/Associate Professor at the Institute for Applied Mathematics and Systems IIMAS/UNAM in Mexico, doing research on databases for CAD, geographic information systems, and objectoriented databases. At Computer Corporation of America (later Xerox Advanced Information Systems) in Cambridge, Mass., he worked in the areas of active databases and real-time databases, and at GTE Laboratories, Waltham, in the areas of distributed object systems and the integration of heterogeneous legacy systems. 1991 he returned to academia and joined T.U. Darmstadt. His current research interests are at the intersection of middleware, databases, eventbased distributed systems, ubiquitous computing, and very large distributed systems (P2P, WSN). Much of the current research is concerned with guaranteeing quality of service and reliability properties in these systems, for example, scalability, performance, transactional behaviour, consistency, and end-to-end security. Many research projects imply collaboration with industry and cover a broad spectrum of application domains. Further information can be found at http://www.dvs.tu-darmstadt.de
Scalable Domain Decomposed Monte Carlo Particle Transport
NASA Astrophysics Data System (ADS)
O'Brien, Matthew Joseph
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.
Performance prediction: A case study using a multi-ring KSR-1 machine
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhu, Jianping
1995-01-01
While computers with tens of thousands of processors have successfully delivered high performance power for solving some of the so-called 'grand-challenge' applications, the notion of scalability is becoming an important metric in the evaluation of parallel machine architectures and algorithms. In this study, the prediction of scalability and its application are carefully investigated. A simple formula is presented to show the relation between scalability, single processor computing power, and degradation of parallelism. A case study is conducted on a multi-ring KSR1 shared virtual memory machine. Experimental and theoretical results show that the influence of topology variation of an architecture is predictable. Therefore, the performance of an algorithm on a sophisticated, heirarchical architecture can be predicted and the best algorithm-machine combination can be selected for a given application.
Effect of asynchrony on numerical simulations of fluid flow phenomena
NASA Astrophysics Data System (ADS)
Konduri, Aditya; Mahoney, Bryan; Donzis, Diego
2015-11-01
Designing scalable CFD codes on massively parallel computers is a challenge. This is mainly due to the large number of communications between processing elements (PEs) and their synchronization, leading to idling of PEs. Indeed, communication will likely be the bottleneck in the scalability of codes on Exascale machines. Our recent work on asynchronous computing for PDEs based on finite-differences has shown that it is possible to relax synchronization between PEs at a mathematical level. Computations then proceed regardless of the status of communication, reducing the idle time of PEs and improving the scalability. However, accuracy of the schemes is greatly affected. We have proposed asynchrony-tolerant (AT) schemes to address this issue. In this work, we study the effect of asynchrony on the solution of fluid flow problems using standard and AT schemes. We show that asynchrony creates additional scales with low energy content. The specific wavenumbers affected can be shown to be due to two distinct effects: the randomness in the arrival of messages and the corresponding switching between schemes. Understanding these errors allow us to effectively control them, rendering the method's feasibility in solving turbulent flows at realistic conditions on future computing systems.
Approximate l-fold cross-validation with Least Squares SVM and Kernel Ridge Regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Richard E; Zhang, Hao; Parker, Lynne Edwards
2013-01-01
Kernel methods have difficulties scaling to large modern data sets. The scalability issues are based on computational and memory requirements for working with a large matrix. These requirements have been addressed over the years by using low-rank kernel approximations or by improving the solvers scalability. However, Least Squares Support VectorMachines (LS-SVM), a popular SVM variant, and Kernel Ridge Regression still have several scalability issues. In particular, the O(n^3) computational complexity for solving a single model, and the overall computational complexity associated with tuning hyperparameters are still major problems. We address these problems by introducing an O(n log n) approximate l-foldmore » cross-validation method that uses a multi-level circulant matrix to approximate the kernel. In addition, we prove our algorithm s computational complexity and present empirical runtimes on data sets with approximately 1 million data points. We also validate our approximate method s effectiveness at selecting hyperparameters on real world and standard benchmark data sets. Lastly, we provide experimental results on using a multi-level circulant kernel approximation to solve LS-SVM problems with hyperparameters selected using our method.« less
2012-01-01
Background Laboratories engaged in computational biology or bioinformatics frequently need to run lengthy, multistep, and user-driven computational jobs. Each job can tie up a computer for a few minutes to several days, and many laboratories lack the expertise or resources to build and maintain a dedicated computer cluster. Results JobCenter is a client–server application and framework for job management and distributed job execution. The client and server components are both written in Java and are cross-platform and relatively easy to install. All communication with the server is client-driven, which allows worker nodes to run anywhere (even behind external firewalls or “in the cloud”) and provides inherent load balancing. Adding a worker node to the worker pool is as simple as dropping the JobCenter client files onto any computer and performing basic configuration, which provides tremendous ease-of-use, flexibility, and limitless horizontal scalability. Each worker installation may be independently configured, including the types of jobs it is able to run. Executed jobs may be written in any language and may include multistep workflows. Conclusions JobCenter is a versatile and scalable distributed job management system that allows laboratories to very efficiently distribute all computational work among available resources. JobCenter is freely available at http://code.google.com/p/jobcenter/. PMID:22846423
Experimental realization of universal geometric quantum gates with solid-state spins.
Zu, C; Wang, W-B; He, L; Zhang, W-G; Dai, C-Y; Wang, F; Duan, L-M
2014-10-02
Experimental realization of a universal set of quantum logic gates is the central requirement for the implementation of a quantum computer. In an 'all-geometric' approach to quantum computation, the quantum gates are implemented using Berry phases and their non-Abelian extensions, holonomies, from geometric transformation of quantum states in the Hilbert space. Apart from its fundamental interest and rich mathematical structure, the geometric approach has some built-in noise-resilience features. On the experimental side, geometric phases and holonomies have been observed in thermal ensembles of liquid molecules using nuclear magnetic resonance; however, such systems are known to be non-scalable for the purposes of quantum computing. There are proposals to implement geometric quantum computation in scalable experimental platforms such as trapped ions, superconducting quantum bits and quantum dots, and a recent experiment has realized geometric single-bit gates in a superconducting system. Here we report the experimental realization of a universal set of geometric quantum gates using the solid-state spins of diamond nitrogen-vacancy centres. These diamond defects provide a scalable experimental platform with the potential for room-temperature quantum computing, which has attracted strong interest in recent years. Our experiment shows that all-geometric and potentially robust quantum computation can be realized with solid-state spin quantum bits, making use of recent advances in the coherent control of this system.
Scalability problems of simple genetic algorithms.
Thierens, D
1999-01-01
Scalable evolutionary computation has become an intensively studied research topic in recent years. The issue of scalability is predominant in any field of algorithmic design, but it became particularly relevant for the design of competent genetic algorithms once the scalability problems of simple genetic algorithms were understood. Here we present some of the work that has aided in getting a clear insight in the scalability problems of simple genetic algorithms. Particularly, we discuss the important issue of building block mixing. We show how the need for mixing places a boundary in the GA parameter space that, together with the boundary from the schema theorem, delimits the region where the GA converges reliably to the optimum in problems of bounded difficulty. This region shrinks rapidly with increasing problem size unless the building blocks are tightly linked in the problem coding structure. In addition, we look at how straightforward extensions of the simple genetic algorithm-namely elitism, niching, and restricted mating are not significantly improving the scalability problems.
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2017-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments. PMID:28190948
Providing scalable system software for high-end simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greenberg, D.
1997-12-31
Detailed, full-system, complex physics simulations have been shown to be feasible on systems containing thousands of processors. In order to manage these computer systems it has been necessary to create scalable system services. In this talk Sandia`s research on scalable systems will be described. The key concepts of low overhead data movement through portals and of flexible services through multi-partition architectures will be illustrated in detail. The talk will conclude with a discussion of how these techniques can be applied outside of the standard monolithic MPP system.
NASA Technical Reports Server (NTRS)
Luke, Edward Allen
1993-01-01
Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.
Scalability improvements to NRLMOL for DFT calculations of large molecules
NASA Astrophysics Data System (ADS)
Diaz, Carlos Manuel
Advances in high performance computing (HPC) have provided a way to treat large, computationally demanding tasks using thousands of processors. With the development of more powerful HPC architectures, the need to create efficient and scalable code has grown more important. Electronic structure calculations are valuable in understanding experimental observations and are routinely used for new materials predictions. For the electronic structure calculations, the memory and computation time are proportional to the number of atoms. Memory requirements for these calculations scale as N2, where N is the number of atoms. While the recent advances in HPC offer platforms with large numbers of cores, the limited amount of memory available on a given node and poor scalability of the electronic structure code hinder their efficient usage of these platforms. This thesis will present some developments to overcome these bottlenecks in order to study large systems. These developments, which are implemented in the NRLMOL electronic structure code, involve the use of sparse matrix storage formats and the use of linear algebra using sparse and distributed matrices. These developments along with other related development now allow ground state density functional calculations using up to 25,000 basis functions and the excited state calculations using up to 17,000 basis functions while utilizing all cores on a node. An example on a light-harvesting triad molecule is described. Finally, future plans to further improve the scalability will be presented.
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases
Hasan, S. M. Shamimul; Fox, Edward A.; Bisset, Keith; ...
2017-11-06
Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. Asmore » a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less
Scalable metagenomic taxonomy classification using a reference genome database
Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.
2013-01-01
Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.
Schumacher, André; Pireddu, Luca; Niemenmaa, Matti; Kallio, Aleksi; Korpelainen, Eija; Zanetti, Gianluigi; Heljanko, Keijo
2014-01-01
Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts. Available under the open source MIT license at http://sourceforge.net/projects/seqpig/
Scalable Implementation of Finite Elements by NASA _ Implicit (ScIFEi)
NASA Technical Reports Server (NTRS)
Warner, James E.; Bomarito, Geoffrey F.; Heber, Gerd; Hochhalter, Jacob D.
2016-01-01
Scalable Implementation of Finite Elements by NASA (ScIFEN) is a parallel finite element analysis code written in C++. ScIFEN is designed to provide scalable solutions to computational mechanics problems. It supports a variety of finite element types, nonlinear material models, and boundary conditions. This report provides an overview of ScIFEi (\\Sci-Fi"), the implicit solid mechanics driver within ScIFEN. A description of ScIFEi's capabilities is provided, including an overview of the tools and features that accompany the software as well as a description of the input and output le formats. Results from several problems are included, demonstrating the efficiency and scalability of ScIFEi by comparing to finite element analysis using a commercial code.
Urrios, Arturo; de Nadal, Eulàlia; Solé, Ricard; Posas, Francesc
2016-01-01
Engineered synthetic biological devices have been designed to perform a variety of functions from sensing molecules and bioremediation to energy production and biomedicine. Notwithstanding, a major limitation of in vivo circuit implementation is the constraint associated to the use of standard methodologies for circuit design. Thus, future success of these devices depends on obtaining circuits with scalable complexity and reusable parts. Here we show how to build complex computational devices using multicellular consortia and space as key computational elements. This spatial modular design grants scalability since its general architecture is independent of the circuit’s complexity, minimizes wiring requirements and allows component reusability with minimal genetic engineering. The potential use of this approach is demonstrated by implementation of complex logical functions with up to six inputs, thus demonstrating the scalability and flexibility of this method. The potential implications of our results are outlined. PMID:26829588
Next Generation Distributed Computing for Cancer Research
Agarwal, Pankaj; Owzar, Kouros
2014-01-01
Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing. PMID:25983539
Next generation distributed computing for cancer research.
Agarwal, Pankaj; Owzar, Kouros
2014-01-01
Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing.
NASA Astrophysics Data System (ADS)
Yan, Beichuan; Regueiro, Richard A.
2018-02-01
A three-dimensional (3D) DEM code for simulating complex-shaped granular particles is parallelized using message-passing interface (MPI). The concepts of link-block, ghost/border layer, and migration layer are put forward for design of the parallel algorithm, and theoretical scalability function of 3-D DEM scalability and memory usage is derived. Many performance-critical implementation details are managed optimally to achieve high performance and scalability, such as: minimizing communication overhead, maintaining dynamic load balance, handling particle migrations across block borders, transmitting C++ dynamic objects of particles between MPI processes efficiently, eliminating redundant contact information between adjacent MPI processes. The code executes on multiple US Department of Defense (DoD) supercomputers and tests up to 2048 compute nodes for simulating 10 million three-axis ellipsoidal particles. Performance analyses of the code including speedup, efficiency, scalability, and granularity across five orders of magnitude of simulation scale (number of particles) are provided, and they demonstrate high speedup and excellent scalability. It is also discovered that communication time is a decreasing function of the number of compute nodes in strong scaling measurements. The code's capability of simulating a large number of complex-shaped particles on modern supercomputers will be of value in both laboratory studies on micromechanical properties of granular materials and many realistic engineering applications involving granular materials.
Defect Genome of Cubic Perovskites for Fuel Cell Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.
Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less
Defect Genome of Cubic Perovskites for Fuel Cell Applications
Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.; ...
2017-10-10
Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less
Computational Thinking Patterns
ERIC Educational Resources Information Center
Ioannidou, Andri; Bennett, Vicki; Repenning, Alexander; Koh, Kyu Han; Basawapatna, Ashok
2011-01-01
The iDREAMS project aims to reinvent Computer Science education in K-12 schools, by using game design and computational science for motivating and educating students through an approach we call Scalable Game Design, starting at the middle school level. In this paper we discuss the use of Computational Thinking Patterns as the basis for our…
Scalable quantum computation scheme based on quantum-actuated nuclear-spin decoherence-free qubits
NASA Astrophysics Data System (ADS)
Dong, Lihong; Rong, Xing; Geng, Jianpei; Shi, Fazhan; Li, Zhaokai; Duan, Changkui; Du, Jiangfeng
2017-11-01
We propose a novel theoretical scheme of quantum computation. Nuclear spin pairs are utilized to encode decoherence-free (DF) qubits. A nitrogen-vacancy center serves as a quantum actuator to initialize, readout, and quantum control the DF qubits. The realization of CNOT gates between two DF qubits are also presented. Numerical simulations show high fidelities of all these processes. Additionally, we discuss the potential of scalability. Our scheme reduces the challenge of classical interfaces from controlling and observing complex quantum systems down to a simple quantum actuator. It also provides a novel way to handle complex quantum systems.
Cloud Computing in Support of Synchronized Disaster Response Operations
2010-09-01
scalable, Web application based on cloud computing technologies to facilitate communication between a broad range of public and private entities without...requiring them to compromise security or competitive advantage. The proposed design applies the unique benefits of cloud computing architectures such as
Computational scalability of large size image dissemination
NASA Astrophysics Data System (ADS)
Kooper, Rob; Bajcsy, Peter
2011-01-01
We have investigated the computational scalability of image pyramid building needed for dissemination of very large image data. The sources of large images include high resolution microscopes and telescopes, remote sensing and airborne imaging, and high resolution scanners. The term 'large' is understood from a user perspective which means either larger than a display size or larger than a memory/disk to hold the image data. The application drivers for our work are digitization projects such as the Lincoln Papers project (each image scan is about 100-150MB or about 5000x8000 pixels with the total number to be around 200,000) and the UIUC library scanning project for historical maps from 17th and 18th century (smaller number but larger images). The goal of our work is understand computational scalability of the web-based dissemination using image pyramids for these large image scans, as well as the preservation aspects of the data. We report our computational benchmarks for (a) building image pyramids to be disseminated using the Microsoft Seadragon library, (b) a computation execution approach using hyper-threading to generate image pyramids and to utilize the underlying hardware, and (c) an image pyramid preservation approach using various hard drive configurations of Redundant Array of Independent Disks (RAID) drives for input/output operations. The benchmarks are obtained with a map (334.61 MB, JPEG format, 17591x15014 pixels). The discussion combines the speed and preservation objectives.
NASA Astrophysics Data System (ADS)
Jamroz, Benjamin F.; Klöfkorn, Robert
2016-08-01
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scale simulations.
Chelonia: A self-healing, replicated storage system
NASA Astrophysics Data System (ADS)
Kerr Nilsen, Jon; Toor, Salman; Nagy, Zsombor; Read, Alex
2011-12-01
Chelonia is a novel grid storage system designed to fill the requirements gap between those of large, sophisticated scientific collaborations which have adopted the grid paradigm for their distributed storage needs, and of corporate business communities gravitating towards the cloud paradigm. Chelonia is an integrated system of heterogeneous, geographically dispersed storage sites which is easily and dynamically expandable and optimized for high availability and scalability. The architecture and implementation in term of web-services running inside the Advanced Resource Connector Hosting Environment Dameon (ARC HED) are described and results of tests in both local -area and wide-area networks that demonstrate the fault tolerance, stability and scalability of Chelonia will be presented. In addition, example setups for production deployments for small and medium-sized VO's are described.
NASA Astrophysics Data System (ADS)
Altintas, I.; Block, J.; Braun, H.; de Callafon, R. A.; Gollner, M. J.; Smarr, L.; Trouve, A.
2013-12-01
Recent studies confirm that climate change will cause wildfires to increase in frequency and severity in the coming decades especially for California and in much of the North American West. The most critical sustainability issue in the midst of these ever-changing dynamics is how to achieve a new social-ecological equilibrium of this fire ecology. Wildfire wind speeds and directions change in an instant, and first responders can only be effective when they take action as quickly as the conditions change. To deliver information needed for sustainable policy and management in this dynamically changing fire regime, we must capture these details to understand the environmental processes. We are building an end-to-end cyberinfrastructure (CI), called WIFIRE, for real-time and data-driven simulation, prediction and visualization of wildfire behavior. The WIFIRE integrated CI system supports social-ecological resilience to the changing fire ecology regime in the face of urban dynamics and climate change. Networked observations, e.g., heterogeneous satellite data and real-time remote sensor data is integrated with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfire's Rate of Spread. Our collaborative WIFIRE team of scientists, engineers, technologists, government policy managers, private industry, and firefighters architects implement CI pathways that enable joint innovation for wildfire management. Scientific workflows are used as an integrative distributed programming model and simplify the implementation of engineering modules for data-driven simulation, prediction and visualization while allowing integration with large-scale computing facilities. WIFIRE will be scalable to users with different skill-levels via specialized web interfaces and user-specified alerts for environmental events broadcasted to receivers before, during and after a wildfire. Scalability of the WIFIRE approach allows many sensors to be subjected to user-specified data processing algorithms to generate threshold alerts within seconds. Integration of this sensor data into both rapidly available fire image data and models will better enable situational awareness, responses and decision support at local, state, national, and international levels. The products of WIFIRE will be initially disseminated to our collaborators (SDG&E, CAL FIRE, USFS), covering academic, private, and government laboratories while generating values to emergency officials, and consequently to the general public. WIFIRE may be used by government agencies in the future to save lives and property during wildfire events, test the effectiveness of response and evacuation scenarios before they occur and assess the effectiveness of high-density sensor networks in improving fire and weather predictions. WIFIRE's high-density network, therefore, will serve as a testbed for future applications worldwide.
High Performance Computing Multicast
2012-02-01
responsiveness, first-tier applications often implement replicated in- memory key-value stores , using them to store state or to cache data from services...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response. A...alternative that replicates data , combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response
Scalable splitting algorithms for big-data interferometric imaging in the SKA era
NASA Astrophysics Data System (ADS)
Onose, Alexandru; Carrillo, Rafael E.; Repetti, Audrey; McEwen, Jason D.; Thiran, Jean-Philippe; Pesquet, Jean-Christophe; Wiaux, Yves
2016-11-01
In the context of next-generation radio telescopes, like the Square Kilometre Array (SKA), the efficient processing of large-scale data sets is extremely important. Convex optimization tasks under the compressive sensing framework have recently emerged and provide both enhanced image reconstruction quality and scalability to increasingly larger data sets. We focus herein mainly on scalability and propose two new convex optimization algorithmic structures able to solve the convex optimization tasks arising in radio-interferometric imaging. They rely on proximal splitting and forward-backward iterations and can be seen, by analogy, with the CLEAN major-minor cycle, as running sophisticated CLEAN-like iterations in parallel in multiple data, prior, and image spaces. Both methods support any convex regularization function, in particular, the well-studied ℓ1 priors promoting image sparsity in an adequate domain. Tailored for big-data, they employ parallel and distributed computations to achieve scalability, in terms of memory and computational requirements. One of them also exploits randomization, over data blocks at each iteration, offering further flexibility. We present simulation results showing the feasibility of the proposed methods as well as their advantages compared to state-of-the-art algorithmic solvers. Our MATLAB code is available online on GitHub.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
MOLAR: Modular Linux and Adaptive Runtime Support for HEC OS/R Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frank Mueller
2009-02-05
MOLAR is a multi-institution research effort that concentrates on adaptive, reliable,and efficient operating and runtime system solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined by the FAST-OS - forum to address scalable technology for runtime and operating systems --- and HECRTF --- high-end computing revitalization task force --- activities by providing a modular Linux and adaptable runtime support for high-end computing operating and runtime systems. The MOLAR research has the following goals to address these issues. (1) Create a modular and configurable Linux system that allows customized changes based onmore » the requirements of the applications, runtime systems, and cluster management software. (2) Build runtime systems that leverage the OS modularity and configurability to improve efficiency, reliability, scalability, ease-of-use, and provide support to legacy and promising programming models. (3) Advance computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. (4) Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions. The overall goal of the research conducted at NCSU is to develop scalable algorithms for high-availability without single points of failure and without single points of control.« less
A Debugger for Computational Grid Applications
NASA Technical Reports Server (NTRS)
Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)
2001-01-01
This viewgraph presentation gives an overview of a debugger for computational grid applications. Details are given on NAS parallel tools groups (including parallelization support tools, evaluation of various parallelization strategies, and distributed and aggregated computing), debugger dependencies, scalability, initial implementation, the process grid, and information on Globus.
Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.
2009-01-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N
2009-06-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
Coalescent: an open-source and scalable framework for exact calculations in coalescent theory
2012-01-01
Background Currently, there is no open-source, cross-platform and scalable framework for coalescent analysis in population genetics. There is no scalable GUI based user application either. Such a framework and application would not only drive the creation of more complex and realistic models but also make them truly accessible. Results As a first attempt, we built a framework and user application for the domain of exact calculations in coalescent analysis. The framework provides an API with the concepts of model, data, statistic, phylogeny, gene tree and recursion. Infinite-alleles and infinite-sites models are considered. It defines pluggable computations such as counting and listing all the ancestral configurations and genealogies and computing the exact probability of data. It can visualize a gene tree, trace and visualize the internals of the recursion algorithm for further improvement and attach dynamically a number of output processors. The user application defines jobs in a plug-in like manner so that they can be activated, deactivated, installed or uninstalled on demand. Multiple jobs can be run and their inputs edited. Job inputs are persisted across restarts and running jobs can be cancelled where applicable. Conclusions Coalescent theory plays an increasingly important role in analysing molecular population genetic data. Models involved are mathematically difficult and computationally challenging. An open-source, scalable framework that lets users immediately take advantage of the progress made by others will enable exploration of yet more difficult and realistic models. As models become more complex and mathematically less tractable, the need for an integrated computational approach is obvious. Object oriented designs, though has upfront costs, are practical now and can provide such an integrated approach. PMID:23033878
The up-scaling of ecosystem functions in a heterogeneous world
NASA Astrophysics Data System (ADS)
Lohrer, Andrew M.; Thrush, Simon F.; Hewitt, Judi E.; Kraan, Casper
2015-05-01
Earth is in the midst of a biodiversity crisis that is impacting the functioning of ecosystems and the delivery of valued goods and services. However, the implications of large scale species losses are often inferred from small scale ecosystem functioning experiments with little knowledge of how the dominant drivers of functioning shift across scales. Here, by integrating observational and manipulative experimental field data, we reveal scale-dependent influences on primary productivity in shallow marine habitats, thus demonstrating the scalability of complex ecological relationships contributing to coastal marine ecosystem functioning. Positive effects of key consumers (burrowing urchins, Echinocardium cordatum) on seafloor net primary productivity (NPP) elucidated by short-term, single-site experiments persisted across multiple sites and years. Additional experimentation illustrated how these effects amplified over time, resulting in greater primary producer biomass sediment chlorophyll a content (Chla) in the longer term, depending on climatic context and habitat factors affecting the strengths of mutually reinforcing feedbacks. The remarkable coherence of results from small and large scales is evidence of real-world ecosystem function scalability and ecological self-organisation. This discovery provides greater insights into the range of responses to broad-scale anthropogenic stressors in naturally heterogeneous environmental settings.
The up-scaling of ecosystem functions in a heterogeneous world
Lohrer, Andrew M.; Thrush, Simon F.; Hewitt, Judi E.; Kraan, Casper
2015-01-01
Earth is in the midst of a biodiversity crisis that is impacting the functioning of ecosystems and the delivery of valued goods and services. However, the implications of large scale species losses are often inferred from small scale ecosystem functioning experiments with little knowledge of how the dominant drivers of functioning shift across scales. Here, by integrating observational and manipulative experimental field data, we reveal scale-dependent influences on primary productivity in shallow marine habitats, thus demonstrating the scalability of complex ecological relationships contributing to coastal marine ecosystem functioning. Positive effects of key consumers (burrowing urchins, Echinocardium cordatum) on seafloor net primary productivity (NPP) elucidated by short-term, single-site experiments persisted across multiple sites and years. Additional experimentation illustrated how these effects amplified over time, resulting in greater primary producer biomass sediment chlorophyll a content (Chla) in the longer term, depending on climatic context and habitat factors affecting the strengths of mutually reinforcing feedbacks. The remarkable coherence of results from small and large scales is evidence of real-world ecosystem function scalability and ecological self-organisation. This discovery provides greater insights into the range of responses to broad-scale anthropogenic stressors in naturally heterogeneous environmental settings. PMID:25993477
A scalable and flexible hybrid energy storage system design and implementation
NASA Astrophysics Data System (ADS)
Kim, Younghyun; Koh, Jason; Xie, Qing; Wang, Yanzhi; Chang, Naehyuck; Pedram, Massoud
2014-06-01
Energy storage systems (ESS) are becoming one of the most important components that noticeably change overall system performance in various applications, ranging from the power grid infrastructure to electric vehicles (EV) and portable electronics. However, a homogeneous ESS is subject to limited characteristics in terms of cost, efficiency, lifetime, etc., by the energy storage technology that comprises the ESS. On the other hand, hybrid ESS (HESS) are a viable solution for a practical ESS with currently available technologies as they have potential to overcome such limitations by exploiting only advantages of heterogeneous energy storage technologies while hiding their drawbacks. However, the HESS concept basically mandates sophisticated design and control to actually make the benefits happen. The HESS architecture should be able to provide controllability of many parts, which are often fixed in homogeneous ESS, and novel management policies should be able to utilize the control features. This paper introduces a complete design practice of a HESS prototype to demonstrate scalability, flexibility, and energy efficiency. It is composed of three heterogenous energy storage elements: lead-acid batteries, lithium-ion batteries, and supercapacitors. We demonstrate a novel system control methodology and enhanced energy efficiency through this design practice.
Gronau, Greta; Jacobsen, Matthew M.; Huang, Wenwen; Rizzo, Daniel J.; Li, David; Staii, Cristian; Pugno, Nicola M.; Wong, Joyce Y.; Kaplan, David L.; Buehler, Markus J.
2016-01-01
Scalable computational modelling tools are required to guide the rational design of complex hierarchical materials with predictable functions. Here, we utilize mesoscopic modelling, integrated with genetic block copolymer synthesis and bioinspired spinning process, to demonstrate de novo materials design that incorporates chemistry, processing and material characterization. We find that intermediate hydrophobic/hydrophilic block ratios observed in natural spider silks and longer chain lengths lead to outstanding silk fibre formation. This design by nature is based on the optimal combination of protein solubility, self-assembled aggregate size and polymer network topology. The original homogeneous network structure becomes heterogeneous after spinning, enhancing the anisotropic network connectivity along the shear flow direction. Extending beyond the classical polymer theory, with insights from the percolation network model, we illustrate the direct proportionality between network conductance and fibre Young's modulus. This integrated approach provides a general path towards de novo functional network materials with enhanced mechanical properties and beyond (optical, electrical or thermal) as we have experimentally verified. PMID:26017575
Lin, Shangchao; Ryu, Seunghwa; Tokareva, Olena; Gronau, Greta; Jacobsen, Matthew M; Huang, Wenwen; Rizzo, Daniel J; Li, David; Staii, Cristian; Pugno, Nicola M; Wong, Joyce Y; Kaplan, David L; Buehler, Markus J
2015-05-28
Scalable computational modelling tools are required to guide the rational design of complex hierarchical materials with predictable functions. Here, we utilize mesoscopic modelling, integrated with genetic block copolymer synthesis and bioinspired spinning process, to demonstrate de novo materials design that incorporates chemistry, processing and material characterization. We find that intermediate hydrophobic/hydrophilic block ratios observed in natural spider silks and longer chain lengths lead to outstanding silk fibre formation. This design by nature is based on the optimal combination of protein solubility, self-assembled aggregate size and polymer network topology. The original homogeneous network structure becomes heterogeneous after spinning, enhancing the anisotropic network connectivity along the shear flow direction. Extending beyond the classical polymer theory, with insights from the percolation network model, we illustrate the direct proportionality between network conductance and fibre Young's modulus. This integrated approach provides a general path towards de novo functional network materials with enhanced mechanical properties and beyond (optical, electrical or thermal) as we have experimentally verified.
Atomistics of vapour–liquid–solid nanowire growth
Wang, Hailong; Zepeda-Ruiz, Luis A.; Gilmer, George H.; Upmanyu, Moneesh
2013-01-01
Vapour–liquid–solid route and its variants are routinely used for scalable synthesis of semiconducting nanowires, yet the fundamental growth processes remain unknown. Here we employ atomic-scale computations based on model potentials to study the stability and growth of gold-catalysed silicon nanowires. Equilibrium studies uncover segregation at the solid-like surface of the catalyst particle, a liquid AuSi droplet, and a silicon-rich droplet–nanowire interface enveloped by heterogeneous truncating facets. Supersaturation of the droplets leads to rapid one-dimensional growth on the truncating facets and much slower nucleation-controlled two-dimensional growth on the main facet. Surface diffusion is suppressed and the excess Si flux occurs through the droplet bulk which, together with the Si-rich interface and contact line, lowers the nucleation barrier on the main facet. The ensuing step flow is modified by Au diffusion away from the step edges. Our study highlights key interfacial characteristics for morphological and compositional control of semiconducting nanowire arrays. PMID:23752586
Quantum interference in heterogeneous superconducting-photonic circuits on a silicon chip
Schuck, C.; Guo, X.; Fan, L.; Ma, X.; Poot, M.; Tang, H. X.
2016-01-01
Quantum information processing holds great promise for communicating and computing data efficiently. However, scaling current photonic implementation approaches to larger system size remains an outstanding challenge for realizing disruptive quantum technology. Two main ingredients of quantum information processors are quantum interference and single-photon detectors. Here we develop a hybrid superconducting-photonic circuit system to show how these elements can be combined in a scalable fashion on a silicon chip. We demonstrate the suitability of this approach for integrated quantum optics by interfering and detecting photon pairs directly on the chip with waveguide-coupled single-photon detectors. Using a directional coupler implemented with silicon nitride nanophotonic waveguides, we observe 97% interference visibility when measuring photon statistics with two monolithically integrated superconducting single-photon detectors. The photonic circuit and detector fabrication processes are compatible with standard semiconductor thin-film technology, making it possible to implement more complex and larger scale quantum photonic circuits on silicon chips. PMID:26792424
Parallel discrete-event simulation schemes with heterogeneous processing elements.
Kim, Yup; Kwon, Ikhyun; Chae, Huiseung; Yook, Soon-Hyung
2014-07-01
To understand the effects of nonidentical processing elements (PEs) on parallel discrete-event simulation (PDES) schemes, two stochastic growth models, the restricted solid-on-solid (RSOS) model and the Family model, are investigated by simulations. The RSOS model is the model for the PDES scheme governed by the Kardar-Parisi-Zhang equation (KPZ scheme). The Family model is the model for the scheme governed by the Edwards-Wilkinson equation (EW scheme). Two kinds of distributions for nonidentical PEs are considered. In the first kind computing capacities of PEs are not much different, whereas in the second kind the capacities are extremely widespread. The KPZ scheme on the complex networks shows the synchronizability and scalability regardless of the kinds of PEs. The EW scheme never shows the synchronizability for the random configuration of PEs of the first kind. However, by regularizing the arrangement of PEs of the first kind, the EW scheme is made to show the synchronizability. In contrast, EW scheme never shows the synchronizability for any configuration of PEs of the second kind.
High-resolution coupled physics solvers for analysing fine-scale nuclear reactor design problems
Mahadevan, Vijay S.; Merzari, Elia; Tautges, Timothy; ...
2014-06-30
An integrated multi-physics simulation capability for the design and analysis of current and future nuclear reactor models is being investigated, to tightly couple neutron transport and thermal-hydraulics physics under the SHARP framework. Over several years, high-fidelity, validated mono-physics solvers with proven scalability on petascale architectures have been developed independently. Based on a unified component-based architecture, these existing codes can be coupled with a mesh-data backplane and a flexible coupling-strategy-based driver suite to produce a viable tool for analysts. The goal of the SHARP framework is to perform fully resolved coupled physics analysis of a reactor on heterogeneous geometry, in ordermore » to reduce the overall numerical uncertainty while leveraging available computational resources. Finally, the coupling methodology and software interfaces of the framework are presented, along with verification studies on two representative fast sodium-cooled reactor demonstration problems to prove the usability of the SHARP framework.« less
2017-02-01
enable high scalability and reconfigurability for inter-CPU/Memory communications with an increased number of communication channels in frequency ...interconnect technology (MRFI) to enable high scalability and re-configurability for inter-CPU/Memory communications with an increased number of communication ...testing in the University of California, Los Angeles (UCLA) Center for High Frequency Electronics, and Dr. Afshin Momtaz at Broadcom Corporation for
Cloud-based mobility management in heterogeneous wireless networks
NASA Astrophysics Data System (ADS)
Kravchuk, Serhii; Minochkin, Dmytro; Omiotek, Zbigniew; Bainazarov, Ulan; Weryńska-Bieniasz, RóŻa; Iskakova, Aigul
2017-08-01
Mobility management is the key feature that supports the roaming of users between different systems. Handover is the essential aspect in the development of solutions supporting mobility scenarios. The handover process becomes more complex in a heterogeneous environment compared to the homogeneous one. Seamlessness and reduction of delay in servicing the handover calls, which can reduce the handover dropping probability, also require complex algorithms to provide a desired QoS for mobile users. A challenging problem to increase the scalability and availability of handover decision mechanisms is discussed. The aim of the paper is to propose cloud based handover as a service concept to cope with the challenges that arise.
A Strategic Approach to Network Defense: Framing the Cloud
2011-03-10
accepted network defensive principles, to reduce risks associated with emerging virtualization capabilities and scalability of cloud computing . This expanded...defensive framework can assist enterprise networking and cloud computing architects to better design more secure systems.
A Scalable, Collaborative, Interactive Light-field Display System
2014-06-01
displays, 3D display, holographic video, integral photography, plenoptic , computed photography 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF...light-field, holographic displays, 3D display, holographic video, integral photography, plenoptic , computed photography 1 Distribution A: Approved
The BioIntelligence Framework: a new computational platform for biomedical knowledge computing.
Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles; Mousses, Spyro
2013-01-01
Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information.
Scalable Faceted Ranking in Tagging Systems
NASA Astrophysics Data System (ADS)
Orlicki, José I.; Alvarez-Hamelin, J. Ignacio; Fierens, Pablo I.
Nowadays, web collaborative tagging systems which allow users to upload, comment on and recommend contents, are growing. Such systems can be represented as graphs where nodes correspond to users and tagged-links to recommendations. In this paper we analyze the problem of computing a ranking of users with respect to a facet described as a set of tags. A straightforward solution is to compute a PageRank-like algorithm on a facet-related graph, but it is not feasible for online computation. We propose an alternative: (i) a ranking for each tag is computed offline on the basis of tag-related subgraphs; (ii) a faceted order is generated online by merging rankings corresponding to all the tags in the facet. Based on the graph analysis of YouTube and Flickr, we show that step (i) is scalable. We also present efficient algorithms for step (ii), which are evaluated by comparing their results with two gold standards.
CloudMan as a platform for tool, data, and analysis distribution.
Afgan, Enis; Chapman, Brad; Taylor, James
2012-11-27
Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.
Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B.; Perumalla, Kalyan S.
Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
Continuous Variable Cluster State Generation over the Optical Spatial Mode Comb
Pooser, Raphael C.; Jing, Jietai
2014-10-20
One way quantum computing uses single qubit projective measurements performed on a cluster state (a highly entangled state of multiple qubits) in order to enact quantum gates. The model is promising due to its potential scalability; the cluster state may be produced at the beginning of the computation and operated on over time. Continuous variables (CV) offer another potential benefit in the form of deterministic entanglement generation. This determinism can lead to robust cluster states and scalable quantum computation. Recent demonstrations of CV cluster states have made great strides on the path to scalability utilizing either time or frequency multiplexingmore » in optical parametric oscillators (OPO) both above and below threshold. The techniques relied on a combination of entangling operators and beam splitter transformations. Here we show that an analogous transformation exists for amplifiers with Gaussian inputs states operating on multiple spatial modes. By judicious selection of local oscillators (LOs), the spatial mode distribution is analogous to the optical frequency comb consisting of axial modes in an OPO cavity. We outline an experimental system that generates cluster states across the spatial frequency comb which can also scale the amount of quantum noise reduction to potentially larger than in other systems.« less
Using Swarming Agents for Scalable Security in Large Network Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crouse, Michael; White, Jacob L.; Fulp, Errin W.
2011-09-23
The difficulty of securing computer infrastructures increases as they grow in size and complexity. Network-based security solutions such as IDS and firewalls cannot scale because of exponentially increasing computational costs inherent in detecting the rapidly growing number of threat signatures. Hostbased solutions like virus scanners and IDS suffer similar issues, and these are compounded when enterprises try to monitor these in a centralized manner. Swarm-based autonomous agent systems like digital ants and artificial immune systems can provide a scalable security solution for large network environments. The digital ants approach offers a biologically inspired design where each ant in the virtualmore » colony can detect atoms of evidence that may help identify a possible threat. By assembling the atomic evidences from different ant types the colony may detect the threat. This decentralized approach can require, on average, fewer computational resources than traditional centralized solutions; however there are limits to its scalability. This paper describes how dividing a large infrastructure into smaller managed enclaves allows the digital ant framework to effectively operate in larger environments. Experimental results will show that using smaller enclaves allows for more consistent distribution of agents and results in faster response times.« less
Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids
Yoginath, Srikanth B.; Perumalla, Kalyan S.
2018-01-31
Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. However, cloned execution is highly challenging to realize on large, distributed memory computing platforms, due to the dynamic nature of the computational load across clones, and due to the complex dependencies spanning the clone tree. In this paper, we present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning. It efficiently and dynamically creates whole logical copies of a dynamic tree of simulations across a largemore » parallel system without full physical duplication of computation and memory. The performance of a prototype implementation executed on up to 1,024 graphical processing units of a supercomputing system has been evaluated with three benchmarks—heat diffusion, forest fire, and disease propagation models—delivering a speed up of over two orders of magnitude compared to replicated runs. Finally, the results demonstrate a significantly faster and scalable way to execute many what-if scenario ensembles of large simulations via cloning using the CloneX interface.« less
2010-07-01
Cloud computing , an emerging form of computing in which users have access to scalable, on-demand capabilities that are provided through Internet... cloud computing , (2) the information security implications of using cloud computing services in the Federal Government, and (3) federal guidance and...efforts to address information security when using cloud computing . The complete report is titled Information Security: Federal Guidance Needed to
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jamroz, Benjamin F.; Klofkorn, Robert
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scalemore » simulations.« less
Jamroz, Benjamin F.; Klofkorn, Robert
2016-08-26
The scalability of computational applications on current and next-generation supercomputers is increasingly limited by the cost of inter-process communication. We implement non-blocking asynchronous communication in the High-Order Methods Modeling Environment for the time integration of the hydrostatic fluid equations using both the spectral-element and discontinuous Galerkin methods. This allows the overlap of computation with communication, effectively hiding some of the costs of communication. A novel detail about our approach is that it provides some data movement to be performed during the asynchronous communication even in the absence of other computations. This method produces significant performance and scalability gains in large-scalemore » simulations.« less
NASA Astrophysics Data System (ADS)
Chen, Chui-Zhen; Xie, Ying-Ming; Liu, Jie; Lee, Patrick A.; Law, K. T.
2018-03-01
Quantum anomalous Hall insulator/superconductor heterostructures emerged as a competitive platform to realize topological superconductors with chiral Majorana edge states as shown in recent experiments [He et al. Science 357, 294 (2017), 10.1126/science.aag2792]. However, chiral Majorana modes, being extended, cannot be used for topological quantum computation. In this work, we show that quasi-one-dimensional quantum anomalous Hall structures exhibit a large topological regime (much larger than the two-dimensional case) which supports localized Majorana zero energy modes. The non-Abelian properties of a cross-shaped quantum anomalous Hall junction is shown explicitly by time-dependent calculations. We believe that the proposed quasi-one-dimensional quantum anomalous Hall structures can be easily fabricated for scalable topological quantum computation.
NASA Astrophysics Data System (ADS)
Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.
2015-12-01
Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
Quantitative Investigation of the Technologies That Support Cloud Computing
ERIC Educational Resources Information Center
Hu, Wenjin
2014-01-01
Cloud computing is dramatically shaping modern IT infrastructure. It virtualizes computing resources, provides elastic scalability, serves as a pay-as-you-use utility, simplifies the IT administrators' daily tasks, enhances the mobility and collaboration of data, and increases user productivity. We focus on providing generalized black-box…
2006-04-01
and Scalability, (2) Sensors and Platforms, (3) Distributed Computing and Processing , (4) Information Management, (5) Fusion and Resource Management...use of the deployed system. 3.3 Distributed Computing and Processing Session The Distributed Computing and Processing Session consisted of three
NASA Technical Reports Server (NTRS)
Fineberg, Samuel A.; Kutler, Paul (Technical Monitor)
1997-01-01
The Whitney project is integrating commodity off-the-shelf PC hardware and software technology to build a parallel supercomputer with hundreds to thousands of nodes. To build such a system, one must have a scalable software model, and the installation and maintenance of the system software must be completely automated. We describe the design of an architecture for booting, installing, and configuring nodes in such a system with particular consideration given to scalability and ease of maintenance. This system has been implemented on a 40-node prototype of Whitney and is to be used on the 500 processor Whitney system to be built in 1998.
If It's in the Cloud, Get It on Paper: Cloud Computing Contract Issues
ERIC Educational Resources Information Center
Trappler, Thomas J.
2010-01-01
Much recent discussion has focused on the pros and cons of cloud computing. Some institutions are attracted to cloud computing benefits such as rapid deployment, flexible scalability, and low initial start-up cost, while others are concerned about cloud computing risks such as those related to data location, level of service, and security…
Dense, Efficient Chip-to-Chip Communication at the Extremes of Computing
ERIC Educational Resources Information Center
Loh, Matthew
2013-01-01
The scalability of CMOS technology has driven computation into a diverse range of applications across the power consumption, performance and size spectra. Communication is a necessary adjunct to computation, and whether this is to push data from node-to-node in a high-performance computing cluster or from the receiver of wireless link to a neural…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crossno, Patricia Joyce; Dunlavy, Daniel M.; Stanton, Eric T.
This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages ofmore » information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.« less
NASA Technical Reports Server (NTRS)
Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash
2003-01-01
Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop
Schumacher, André; Pireddu, Luca; Niemenmaa, Matti; Kallio, Aleksi; Korpelainen, Eija; Zanetti, Gianluigi; Heljanko, Keijo
2014-01-01
Summary: Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig’s scalability over many computing nodes and illustrate its use with example scripts. Availability and Implementation: Available under the open source MIT license at http://sourceforge.net/projects/seqpig/ Contact: andre.schumacher@yahoo.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24149054
Algorithmic psychometrics and the scalable subject.
Stark, Luke
2018-04-01
Recent public controversies, ranging from the 2014 Facebook 'emotional contagion' study to psychographic data profiling by Cambridge Analytica in the 2016 American presidential election, Brexit referendum and elsewhere, signal watershed moments in which the intersecting trajectories of psychology and computer science have become matters of public concern. The entangled history of these two fields grounds the application of applied psychological techniques to digital technologies, and an investment in applying calculability to human subjectivity. Today, a quantifiable psychological subject position has been translated, via 'big data' sets and algorithmic analysis, into a model subject amenable to classification through digital media platforms. I term this position the 'scalable subject', arguing it has been shaped and made legible by algorithmic psychometrics - a broad set of affordances in digital platforms shaped by psychology and the behavioral sciences. In describing the contours of this 'scalable subject', this paper highlights the urgent need for renewed attention from STS scholars on the psy sciences, and on a computational politics attentive to psychology, emotional expression, and sociality via digital media.
Scalable real space pseudopotential density functional codes for materials in the exascale regime
NASA Astrophysics Data System (ADS)
Lena, Charles; Chelikowsky, James; Schofield, Grady; Biller, Ariel; Kronik, Leeor; Saad, Yousef; Deslippe, Jack
Real-space pseudopotential density functional theory has proven to be an efficient method for computing the properties of matter in many different states and geometries, including liquids, wires, slabs, and clusters with and without spin polarization. Fully self-consistent solutions using this approach have been routinely obtained for systems with thousands of atoms. Yet, there are many systems of notable larger sizes where quantum mechanical accuracy is desired, but scalability proves to be a hindrance. Such systems include large biological molecules, complex nanostructures, or mismatched interfaces. We will present an overview of our new massively parallel algorithms, which offer improved scalability in preparation for exascale supercomputing. We will illustrate these algorithms by considering the electronic structure of a Si nanocrystal exceeding 104 atoms. Support provided by the SciDAC program, Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences. Grant Numbers DE-SC0008877 (Austin) and DE-FG02-12ER4 (Berkeley).
Scalable Parallel Computation for Extended MHD Modeling of Fusion Plasmas
NASA Astrophysics Data System (ADS)
Glasser, Alan H.
2008-11-01
Parallel solution of a linear system is scalable if simultaneously doubling the number of dependent variables and the number of processors results in little or no increase in the computation time to solution. Two approaches have this property for parabolic systems: multigrid and domain decomposition. Since extended MHD is primarily a hyperbolic rather than a parabolic system, additional steps must be taken to parabolize the linear system to be solved by such a method. Such physics-based preconditioning (PBP) methods have been pioneered by Chac'on, using finite volumes for spatial discretization, multigrid for solution of the preconditioning equations, and matrix-free Newton-Krylov methods for the accurate solution of the full nonlinear preconditioned equations. The work described here is an extension of these methods using high-order spectral element methods and FETI-DP domain decomposition. Application of PBP to a flux-source representation of the physics equations is discussed. The resulting scalability will be demonstrated for simple wave and for ideal and Hall MHD waves.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tock, Yoav; Mandler, Benjamin; Moreira, Jose
2013-01-01
As HPC systems and applications get bigger and more complex, we are approaching an era in which resiliency and run-time elasticity concerns be- come paramount.We offer a building block for an alternative resiliency approach in which computations will be able to make progress while components fail, in addition to enabling a dynamic set of nodes throughout a computation lifetime. The core of our solution is a hierarchical scalable membership service provid- ing eventual consistency semantics. An attribute replication service is used for hierarchy organization, and is exposed to external applications. Our solution is based on P2P technologies and provides resiliencymore » and elastic runtime support at ultra large scales. Resulting middleware is general purpose while exploiting HPC platform unique features and architecture. We have implemented and tested this system on BlueGene/P with Linux, and using worst-case analysis, evaluated the service scalability as effective for up to 1M nodes.« less
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
NASA Astrophysics Data System (ADS)
Myre, Joseph M.
Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
Clinical results of HIS, RIS, PACS integration using data integration CASE tools
NASA Astrophysics Data System (ADS)
Taira, Ricky K.; Chan, Hing-Ming; Breant, Claudine M.; Huang, Lu J.; Valentino, Daniel J.
1995-05-01
Current infrastructure research in PACS is dominated by the development of communication networks (local area networks, teleradiology, ATM networks, etc.), multimedia display workstations, and hierarchical image storage architectures. However, limited work has been performed on developing flexible, expansible, and intelligent information processing architectures for the vast decentralized image and text data repositories prevalent in healthcare environments. Patient information is often distributed among multiple data management systems. Current large-scale efforts to integrate medical information and knowledge sources have been costly with limited retrieval functionality. Software integration strategies to unify distributed data and knowledge sources is still lacking commercially. Systems heterogeneity (i.e., differences in hardware platforms, communication protocols, database management software, nomenclature, etc.) is at the heart of the problem and is unlikely to be standardized in the near future. In this paper, we demonstrate the use of newly available CASE (computer- aided software engineering) tools to rapidly integrate HIS, RIS, and PACS information systems. The advantages of these tools include fast development time (low-level code is generated from graphical specifications), and easy system maintenance (excellent documentation, easy to perform changes, and centralized code repository in an object-oriented database). The CASE tools are used to develop and manage the `middle-ware' in our client- mediator-serve architecture for systems integration. Our architecture is scalable and can accommodate heterogeneous database and communication protocols.
BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators.
Nogueira, David; Tomas, Pedro; Roma, Nuno
2016-01-01
The computational demand of exact-search procedures has pressed the exploitation of parallel processing accelerators to reduce the execution time of many applications. However, this often imposes strict restrictions in terms of the problem size and implementation efforts, mainly due to their possibly distinct architectures. To circumvent this limitation, a new exact-search alignment tool (BowMapCL) based on the Burrows-Wheeler Transform and FM-Index is presented. Contrasting to other alternatives, BowMapCL is based on a unified implementation using OpenCL, allowing the exploitation of multiple and possibly different devices (e.g., NVIDIA, AMD/ATI, and Intel GPUs/APUs). Furthermore, to efficiently exploit such heterogeneous architectures, BowMapCL incorporates several techniques to promote its performance and scalability, including multiple buffering, work-queue task-distribution, and dynamic load-balancing, together with index partitioning, bit-encoding, and sampling. When compared with state-of-the-art tools, the attained results showed that BowMapCL (using a single GPU) is 2 × to 7.5 × faster than mainstream multi-threaded CPU BWT-based aligners, like Bowtie, BWA, and SOAP2; and up to 4 × faster than the best performing state-of-the-art GPU implementations (namely, SOAP3 and HPG-BWT). When multiple and completely distinct devices are considered, BowMapCL efficiently scales the offered throughput, ensuring a convenient load-balance of the involved processing in the several distinct devices.
Scalable Rapidly Deployable Convex Optimization for Data Analytics
SOCPs , SDPs, exponential cone programs, and power cone programs. CVXPY supports basic methods for distributed optimization, on...multiple heterogenous platforms. We have also done basic research in various application areas , using CVXPY , to demonstrate its usefulness. See attached report for publication information....Over the period of the contract we have developed the full stack for wide use of convex optimization, in machine learning and many other areas .
Performance and Scalability of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.
Fully implicit adaptive mesh refinement MHD algorithm
NASA Astrophysics Data System (ADS)
Philip, Bobby
2005-10-01
In the macroscopic simulation of plasmas, the numerical modeler is faced with the challenge of dealing with multiple time and length scales. The former results in stiffness due to the presence of very fast waves. The latter requires one to resolve the localized features that the system develops. Traditional approaches based on explicit time integration techniques and fixed meshes are not suitable for this challenge, as such approaches prevent the modeler from using realistic plasma parameters to keep the computation feasible. We propose here a novel approach, based on implicit methods and structured adaptive mesh refinement (SAMR). Our emphasis is on both accuracy and scalability with the number of degrees of freedom. To our knowledge, a scalable, fully implicit AMR algorithm has not been accomplished before for MHD. As a proof-of-principle, we focus on the reduced resistive MHD model as a basic MHD model paradigm, which is truly multiscale. The approach taken here is to adapt mature physics-based technologyootnotetextL. Chac'on et al., J. Comput. Phys. 178 (1), 15- 36 (2002) to AMR grids, and employ AMR-aware multilevel techniques (such as fast adaptive composite --FAC-- algorithms) for scalability. We will demonstrate that the concept is indeed feasible, featuring optimal scalability under grid refinement. Results of fully-implicit, dynamically-adaptive AMR simulations will be presented on a variety of problems.
Joint-layer encoder optimization for HEVC scalable extensions
NASA Astrophysics Data System (ADS)
Tsai, Chia-Ming; He, Yuwen; Dong, Jie; Ye, Yan; Xiu, Xiaoyu; He, Yong
2014-09-01
Scalable video coding provides an efficient solution to support video playback on heterogeneous devices with various channel conditions in heterogeneous networks. SHVC is the latest scalable video coding standard based on the HEVC standard. To improve enhancement layer coding efficiency, inter-layer prediction including texture and motion information generated from the base layer is used for enhancement layer coding. However, the overall performance of the SHVC reference encoder is not fully optimized because rate-distortion optimization (RDO) processes in the base and enhancement layers are independently considered. It is difficult to directly extend the existing joint-layer optimization methods to SHVC due to the complicated coding tree block splitting decisions and in-loop filtering process (e.g., deblocking and sample adaptive offset (SAO) filtering) in HEVC. To solve those problems, a joint-layer optimization method is proposed by adjusting the quantization parameter (QP) to optimally allocate the bit resource between layers. Furthermore, to make more proper resource allocation, the proposed method also considers the viewing probability of base and enhancement layers according to packet loss rate. Based on the viewing probability, a novel joint-layer RD cost function is proposed for joint-layer RDO encoding. The QP values of those coding tree units (CTUs) belonging to lower layers referenced by higher layers are decreased accordingly, and the QP values of those remaining CTUs are increased to keep total bits unchanged. Finally the QP values with minimal joint-layer RD cost are selected to match the viewing probability. The proposed method was applied to the third temporal level (TL-3) pictures in the Random Access configuration. Simulation results demonstrate that the proposed joint-layer optimization method can improve coding performance by 1.3% for these TL-3 pictures compared to the SHVC reference encoder without joint-layer optimization.
Simplified Distributed Computing
NASA Astrophysics Data System (ADS)
Li, G. G.
2006-05-01
The distributed computing runs from high performance parallel computing, GRID computing, to an environment where idle CPU cycles and storage space of numerous networked systems are harnessed to work together through the Internet. In this work we focus on building an easy and affordable solution for computationally intensive problems in scientific applications based on existing technology and hardware resources. This system consists of a series of controllers. When a job request is detected by a monitor or initialized by an end user, the job manager launches the specific job handler for this job. The job handler pre-processes the job, partitions the job into relative independent tasks, and distributes the tasks into the processing queue. The task handler picks up the related tasks, processes the tasks, and puts the results back into the processing queue. The job handler also monitors and examines the tasks and the results, and assembles the task results into the overall solution for the job request when all tasks are finished for each job. A resource manager configures and monitors all participating notes. A distributed agent is deployed on all participating notes to manage the software download and report the status. The processing queue is the key to the success of this distributed system. We use BEA's Weblogic JMS queue in our implementation. It guarantees the message delivery and has the message priority and re-try features so that the tasks never get lost. The entire system is built on the J2EE technology and it can be deployed on heterogeneous platforms. It can handle algorithms and applications developed in any languages on any platforms. J2EE adaptors are provided to manage and communicate the existing applications to the system so that the applications and algorithms running on Unix, Linux and Windows can all work together. This system is easy and fast to develop based on the industry's well-adopted technology. It is highly scalable and heterogeneous. It is an open system and any number and type of machines can join the system to provide the computational power. This asynchronous message-based system can achieve second of response time. For efficiency, communications between distributed tasks are often done at the start and end of the tasks but intermediate status of the tasks can also be provided.
Costs of cloud computing for a biometry department. A case study.
Knaus, J; Hieke, S; Binder, H; Schwarzer, G
2013-01-01
"Cloud" computing providers, such as the Amazon Web Services (AWS), offer stable and scalable computational resources based on hardware virtualization, with short, usually hourly, billing periods. The idea of pay-as-you-use seems appealing for biometry research units which have only limited access to university or corporate data center resources or grids. This case study compares the costs of an existing heterogeneous on-site hardware pool in a Medical Biometry and Statistics department to a comparable AWS offer. The "total cost of ownership", including all direct costs, is determined for the on-site hardware, and hourly prices are derived, based on actual system utilization during the year 2011. Indirect costs, which are difficult to quantify are not included in this comparison, but nevertheless some rough guidance from our experience is given. To indicate the scale of costs for a methodological research project, a simulation study of a permutation-based statistical approach is performed using AWS and on-site hardware. In the presented case, with a system utilization of 25-30 percent and 3-5-year amortization, on-site hardware can result in smaller costs, compared to hourly rental in the cloud dependent on the instance chosen. Renting cloud instances with sufficient main memory is a deciding factor in this comparison. Costs for on-site hardware may vary, depending on the specific infrastructure at a research unit, but have only moderate impact on the overall comparison and subsequent decision for obtaining affordable scientific computing resources. Overall utilization has a much stronger impact as it determines the actual computing hours needed per year. Taking this into ac count, cloud computing might still be a viable option for projects with limited maturity, or as a supplement for short peaks in demand.
Parallel multiscale simulations of a brain aneurysm
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em
2012-01-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066
Parallel multiscale simulations of a brain aneurysm.
Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.
Parallel multiscale simulations of a brain aneurysm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em, E-mail: george_karniadakis@brown.edu
2013-07-01
Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm.more » The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.« less
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters
ERIC Educational Resources Information Center
Younge, Andrew J.
2016-01-01
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…
Current and Future Development of a Non-hydrostatic Unified Atmospheric Model (NUMA)
2010-09-09
following capabilities: 1. Highly scalable on current and future computer architectures ( exascale computing and beyond and GPUs) 2. Flexibility... Exascale Computing • 10 of Top 500 are already in the Petascale range • Should also keep our eyes on GPUs (e.g., Mare Nostrum) 2. Numerical
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.
Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V
2016-03-08
The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
Trainable hardware for dynamical computing using error backpropagation through physical media.
Hermans, Michiel; Burm, Michaël; Van Vaerenbergh, Thomas; Dambre, Joni; Bienstman, Peter
2015-03-24
Neural networks are currently implemented on digital Von Neumann machines, which do not fully leverage their intrinsic parallelism. We demonstrate how to use a novel class of reconfigurable dynamical systems for analogue information processing, mitigating this problem. Our generic hardware platform for dynamic, analogue computing consists of a reciprocal linear dynamical system with nonlinear feedback. Thanks to reciprocity, a ubiquitous property of many physical phenomena like the propagation of light and sound, the error backpropagation-a crucial step for tuning such systems towards a specific task-can happen in hardware. This can potentially speed up the optimization process significantly, offering important benefits for the scalability of neuro-inspired hardware. In this paper, we show, using one experimentally validated and one conceptual example, that such systems may provide a straightforward mechanism for constructing highly scalable, fully dynamical analogue computers.
Trainable hardware for dynamical computing using error backpropagation through physical media
NASA Astrophysics Data System (ADS)
Hermans, Michiel; Burm, Michaël; van Vaerenbergh, Thomas; Dambre, Joni; Bienstman, Peter
2015-03-01
Neural networks are currently implemented on digital Von Neumann machines, which do not fully leverage their intrinsic parallelism. We demonstrate how to use a novel class of reconfigurable dynamical systems for analogue information processing, mitigating this problem. Our generic hardware platform for dynamic, analogue computing consists of a reciprocal linear dynamical system with nonlinear feedback. Thanks to reciprocity, a ubiquitous property of many physical phenomena like the propagation of light and sound, the error backpropagation—a crucial step for tuning such systems towards a specific task—can happen in hardware. This can potentially speed up the optimization process significantly, offering important benefits for the scalability of neuro-inspired hardware. In this paper, we show, using one experimentally validated and one conceptual example, that such systems may provide a straightforward mechanism for constructing highly scalable, fully dynamical analogue computers.
Heidenreich, Elvio A; Ferrero, José M; Doblaré, Manuel; Rodríguez, José F
2010-07-01
Many problems in biology and engineering are governed by anisotropic reaction-diffusion equations with a very rapidly varying reaction term. This usually implies the use of very fine meshes and small time steps in order to accurately capture the propagating wave while avoiding the appearance of spurious oscillations in the wave front. This work develops a family of macro finite elements amenable for solving anisotropic reaction-diffusion equations with stiff reactive terms. The developed elements are incorporated on a semi-implicit algorithm based on operator splitting that includes adaptive time stepping for handling the stiff reactive term. A linear system is solved on each time step to update the transmembrane potential, whereas the remaining ordinary differential equations are solved uncoupled. The method allows solving the linear system on a coarser mesh thanks to the static condensation of the internal degrees of freedom (DOF) of the macroelements while maintaining the accuracy of the finer mesh. The method and algorithm have been implemented in parallel. The accuracy of the method has been tested on two- and three-dimensional examples demonstrating excellent behavior when compared to standard linear elements. The better performance and scalability of different macro finite elements against standard finite elements have been demonstrated in the simulation of a human heart and a heterogeneous two-dimensional problem with reentrant activity. Results have shown a reduction of up to four times in computational cost for the macro finite elements with respect to equivalent (same number of DOF) standard linear finite elements as well as good scalability properties.
NASA Astrophysics Data System (ADS)
Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong
As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.
CloudMan as a platform for tool, data, and analysis distribution
2012-01-01
Background Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. Results CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. Conclusions With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions. PMID:23181507
Architectural Considerations for Highly Scalable Computing to Support On-demand Video Analytics
2017-04-19
enforcement . The system was tested in the wild using video files as well as a commercial Video Management System supporting more than 100 surveillance...research were used to implement a distributed on-demand video analytics system that was prototyped for the use of forensics investigators in law...cameras as video sources. The architectural considerations of this system are presented. Issues to be reckoned with in implementing a scalable
Long-time dynamics through parallel trajectory splicing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perez, Danny; Cubuk, Ekin D.; Waterland, Amos
2015-11-24
Simulating the atomistic evolution of materials over long time scales is a longstanding challenge, especially for complex systems where the distribution of barrier heights is very heterogeneous. Such systems are difficult to investigate using conventional long-time scale techniques, and the fact that they tend to remain trapped in small regions of configuration space for extended periods of time strongly limits the physical insights gained from short simulations. We introduce a novel simulation technique, Parallel Trajectory Splicing (ParSplice), that aims at addressing this problem through the timewise parallelization of long trajectories. The computational efficiency of ParSplice stems from a speculation strategymore » whereby predictions of the future evolution of the system are leveraged to increase the amount of work that can be concurrently performed at any one time, hence improving the scalability of the method. ParSplice is also able to accurately account for, and potentially reuse, a substantial fraction of the computational work invested in the simulation. We validate the method on a simple Ag surface system and demonstrate substantial increases in efficiency compared to previous methods. As a result, we then demonstrate the power of ParSplice through the study of topology changes in Ag 42Cu 13 core–shell nanoparticles.« less
Progress Towards a Rad-Hydro Code for Modern Computing Architectures LA-UR-10-02825
NASA Astrophysics Data System (ADS)
Wohlbier, J. G.; Lowrie, R. B.; Bergen, B.; Calef, M.
2010-11-01
We are entering an era of high performance computing where data movement is the overwhelming bottleneck to scalable performance, as opposed to the speed of floating-point operations per processor. All multi-core hardware paradigms, whether heterogeneous or homogeneous, be it the Cell processor, GPGPU, or multi-core x86, share this common trait. In multi-physics applications such as inertial confinement fusion or astrophysics, one may be solving multi-material hydrodynamics with tabular equation of state data lookups, radiation transport, nuclear reactions, and charged particle transport in a single time cycle. The algorithms are intensely data dependent, e.g., EOS, opacity, nuclear data, and multi-core hardware memory restrictions are forcing code developers to rethink code and algorithm design. For the past two years LANL has been funding a small effort referred to as Multi-Physics on Multi-Core to explore ideas for code design as pertaining to inertial confinement fusion and astrophysics applications. The near term goals of this project are to have a multi-material radiation hydrodynamics capability, with tabular equation of state lookups, on cartesian and curvilinear block structured meshes. In the longer term we plan to add fully implicit multi-group radiation diffusion and material heat conduction, and block structured AMR. We will report on our progress to date.
The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences
USDA-ARS?s Scientific Manuscript database
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...
Uncertainty quantification and risk analyses of CO2 leakage in heterogeneous geological formations
NASA Astrophysics Data System (ADS)
Hou, Z.; Murray, C. J.; Rockhold, M. L.
2012-12-01
A stochastic sensitivity analysis framework is adopted to evaluate the impact of spatial heterogeneity in permeability on CO2 leakage risk. The leakage is defined as the total mass of CO2 moving into the overburden through the caprock-overburden interface, in both gaseous and liquid (dissolved) phases. The entropy-based framework has the ability to quantify the uncertainty associated with the input parameters in the form of prior pdfs (probability density functions). Effective sampling of the prior pdfs enables us to fully explore the parameter space and systematically evaluate the individual and combined effects of the parameters of interest on CO2 leakage risk. The parameters that are considered in the study include: mean, variance, and horizontal to vertical spatial anisotropy ratio for caprock permeability, and those same parameters for reservoir permeability. Given the sampled spatial variogram parameters, multiple realizations of permeability fields were generated using GSLIB subroutines. For each permeability field, a numerical simulator, STOMP, (in the water-salt-CO2-energy operational mode) is used to simulate the CO2 migration within the reservoir and caprock up to 50 years after injection. Due to intensive computational demand, we run both a scalable version simulator eSTOMP and serial STOMP on various supercomputers. We then perform statistical analyses and summarize the relationships between the parameters of interest (mean/variance/anisotropy ratio of caprock and reservoir permeability) and CO2 leakage ratio. We also present the effects of those parameters on CO2 plume radius and reservoir injectivity. The statistical analysis provides a reduced order model that can be used to estimate the impact of heterogeneity on caprock leakage.
JESPP: Joint Experimentation on Scalable Parallel Processors Supercomputers
2010-03-01
were for the relatively small market of scientific and engineering applications. Contrast this with GPUs that are designed to improve the end- user...experience in mass- market arenas such as gaming. In order to get meaningful speed-up using the GPU, it was determined that the data transfer and...Included) Conference Year Effectively using a Large GPGPU-Enhanced Linux Cluster HPCMP UGC 2009 FLOPS per Watt: Heterogeneous-Computing’s Approach
Fully implicit adaptive mesh refinement algorithm for reduced MHD
NASA Astrophysics Data System (ADS)
Philip, Bobby; Pernice, Michael; Chacon, Luis
2006-10-01
In the macroscopic simulation of plasmas, the numerical modeler is faced with the challenge of dealing with multiple time and length scales. Traditional approaches based on explicit time integration techniques and fixed meshes are not suitable for this challenge, as such approaches prevent the modeler from using realistic plasma parameters to keep the computation feasible. We propose here a novel approach, based on implicit methods and structured adaptive mesh refinement (SAMR). Our emphasis is on both accuracy and scalability with the number of degrees of freedom. As a proof-of-principle, we focus on the reduced resistive MHD model as a basic MHD model paradigm, which is truly multiscale. The approach taken here is to adapt mature physics-based technology to AMR grids, and employ AMR-aware multilevel techniques (such as fast adaptive composite grid --FAC-- algorithms) for scalability. We demonstrate that the concept is indeed feasible, featuring near-optimal scalability under grid refinement. Results of fully-implicit, dynamically-adaptive AMR simulations in challenging dissipation regimes will be presented on a variety of problems that benefit from this capability, including tearing modes, the island coalescence instability, and the tilt mode instability. L. Chac'on et al., J. Comput. Phys. 178 (1), 15- 36 (2002) B. Philip, M. Pernice, and L. Chac'on, Lecture Notes in Computational Science and Engineering, accepted (2006)
A survey of CPU-GPU heterogeneous computing techniques
Mittal, Sparsh; Vetter, Jeffrey S.
2015-07-04
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
A survey of CPU-GPU heterogeneous computing techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S.
As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design of CPU-GPU fused chips and petascale heterogeneous supercomputers. In this paper, we survey heterogeneous computing techniques (HCTs) such as workload-partitioning which enable utilizing both CPU and GPU to improve performance and/or energy efficiency. We review heterogeneous computing approaches at runtime, algorithm, programming, compiler and applicationmore » level. Further, we review both discrete and fused CPU-GPU systems; and discuss benchmark suites designed for evaluating heterogeneous computing systems (HCSs). Furthermore, we believe that this paper will provide insights into working and scope of applications of HCTs to researchers and motivate them to further harness the computational powers of CPUs and GPUs to achieve the goal of exascale performance.« less
Architectures and Applications for Scalable Quantum Information Systems
2007-01-01
quantum computation models, such as adiabatic quantum computing , can be converted to quantum circuits. Therefore, in our design flow’s first phase...vol. 26, no. 5, pp. 1484–1509, 1997. [19] A. Childs, E. Farhi, and J. Preskill, “Robustness of adiabatic quantum computation ,” Phys. Rev. A, vol. 65...magnetic resonance computer with three quantum bits that simulates an adiabatic quantum optimization algorithm. Adiabatic
ERIC Educational Resources Information Center
Islam, Muhammad Faysal
2013-01-01
Cloud computing offers the advantage of on-demand, reliable and cost efficient computing solutions without the capital investment and management resources to build and maintain in-house data centers and network infrastructures. Scalability of cloud solutions enable consumers to upgrade or downsize their services as needed. In a cloud environment,…
Integrated Visible Photonics for Trapped-Ion Quantum Computing
2017-06-10
necessarily reflect the views of the Department of Defense. Abstract- A scalable trapped-ion-based quantum - computing architecture requires the... Quantum Computing Dave Kharas, Cheryl Sorace-Agaskar, Suraj Bramhavar, William Loh, Jeremy M. Sage, Paul W. Juodawlkis, and John...coherence times, strong coulomb interactions, and optical addressability, hold great promise for implementation of practical quantum information
The Development of the Non-hydrostatic Unified Model of the Atmosphere (NUMA)
2011-09-19
capabilities: 1. Highly scalable on current and future computer architectures ( exascale computing: this means CPUs and GPUs) 2. Flexibility to use a...From Terascale to Petascale/ Exascale Computing • 10 of Top 500 are already in the Petascale range • 3 of top 10 are GPU-based machines 2
Security and Cloud Outsourcing Framework for Economic Dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Security and Cloud Outsourcing Framework for Economic Dispatch
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi; ...
2017-04-24
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Scalable quantum computer architecture with coupled donor-quantum dot qubits
Schenkel, Thomas; Lo, Cheuk Chi; Weis, Christoph; Lyon, Stephen; Tyryshkin, Alexei; Bokor, Jeffrey
2014-08-26
A quantum bit computing architecture includes a plurality of single spin memory donor atoms embedded in a semiconductor layer, a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, wherein a first voltage applied across at least one pair of the aligned quantum dot and donor atom controls a donor-quantum dot coupling. A method of performing quantum computing in a scalable architecture quantum computing apparatus includes arranging a pattern of single spin memory donor atoms in a semiconductor layer, forming a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, applying a first voltage across at least one aligned pair of a quantum dot and donor atom to control a donor-quantum dot coupling, and applying a second voltage between one or more quantum dots to control a Heisenberg exchange J coupling between quantum dots and to cause transport of a single spin polarized electron between quantum dots.
The BioIntelligence Framework: a new computational platform for biomedical knowledge computing
Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles
2013-01-01
Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information. PMID:22859646
Serving ocean model data on the cloud
Meisinger, Michael; Farcas, Claudiu; Farcas, Emilia; Alexander, Charles; Arrott, Matthew; de La Beaujardiere, Jeff; Hubbard, Paul; Mendelssohn, Roy; Signell, Richard P.
2010-01-01
The NOAA-led Integrated Ocean Observing System (IOOS) and the NSF-funded Ocean Observatories Initiative Cyberinfrastructure Project (OOI-CI) are collaborating on a prototype data delivery system for numerical model output and other gridded data using cloud computing. The strategy is to take an existing distributed system for delivering gridded data and redeploy on the cloud, making modifications to the system that allow it to harness the scalability of the cloud as well as adding functionality that the scalability affords.
The novel high-performance 3-D MT inverse solver
NASA Astrophysics Data System (ADS)
Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey
2016-04-01
We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
The Efficiency and the Scalability of an Explicit Operator on an IBM POWER4 System
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We present an evaluation of the efficiency and the scalability of an explicit CFD operator on an IBM POWER4 system. The POWER4 architecture exhibits a common trend in HPC architectures: boosting CPU processing power by increasing the number of functional units, while hiding the latency of memory access by increasing the depth of the memory hierarchy. The overall machine performance depends on the ability of the caches-buses-fabric-memory to feed the functional units with the data to be processed. In this study we evaluate the efficiency and scalability of one explicit CFD operator on an IBM POWER4. This operator performs computations at the points of a Cartesian grid and involves a few dozen floating point numbers and on the order of 100 floating point operations per grid point. The computations in all grid points are independent. Specifically, we estimate the efficiency of the RHS operator (SP of NPB) on a single processor as the observed/peak performance ratio. Then we estimate the scalability of the operator on a single chip (2 CPUs), a single MCM (8 CPUs), 16 CPUs, and the whole machine (32 CPUs). Then we perform the same measurements for a chache-optimized version of the RHS operator. For our measurements we use the HPM (Hardware Performance Monitor) counters available on the POWER4. These counters allow us to analyze the obtained performance results.
Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H
2014-05-28
Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem sizes arising in the field of electronic structure theory is demonstrated for current high-performance computer architectures such as Cray or Intel/Infiniband. For a matrix of dimension 260,000, scalability up to 295,000 CPU cores has been shown on BlueGene/P.
Dynamic Load Balancing for Adaptive Computations on Distributed-Memory Machines
NASA Technical Reports Server (NTRS)
1999-01-01
Dynamic load balancing is central to adaptive mesh-based computations on large-scale parallel computers. The principal investigator has investigated various issues on the dynamic load balancing problem under NASA JOVE and JAG rants. The major accomplishments of the project are two graph partitioning algorithms and a load balancing framework. The S-HARP dynamic graph partitioner is known to be the fastest among the known dynamic graph partitioners to date. It can partition a graph of over 100,000 vertices in 0.25 seconds on a 64- processor Cray T3E distributed-memory multiprocessor while maintaining the scalability of over 16-fold speedup. Other known and widely used dynamic graph partitioners take over a second or two while giving low scalability of a few fold speedup on 64 processors. These results have been published in journals and peer-reviewed flagship conferences.
CA-LOD: Collision Avoidance Level of Detail for Scalable, Controllable Crowds
NASA Astrophysics Data System (ADS)
Paris, Sébastien; Gerdelan, Anton; O'Sullivan, Carol
The new wave of computer-driven entertainment technology throws audiences and game players into massive virtual worlds where entire cities are rendered in real time. Computer animated characters run through inner-city streets teeming with pedestrians, all fully rendered with 3D graphics, animations, particle effects and linked to 3D sound effects to produce more realistic and immersive computer-hosted entertainment experiences than ever before. Computing all of this detail at once is enormously computationally expensive, and game designers as a rule, have sacrificed the behavioural realism in favour of better graphics. In this paper we propose a new Collision Avoidance Level of Detail (CA-LOD) algorithm that allows games to support huge crowds in real time with the appearance of more intelligent behaviour. We propose two collision avoidance models used for two different CA-LODs: a fuzzy steering focusing on the performances, and a geometric steering to obtain the best realism. Mixing these approaches allows to obtain thousands of autonomous characters in real time, resulting in a scalable but still controllable crowd.
Statistical exchange-coupling errors and the practicality of scalable silicon donor qubits
NASA Astrophysics Data System (ADS)
Song, Yang; Das Sarma, S.
2016-12-01
Recent experimental efforts have led to considerable interest in donor-based localized electron spins in Si as viable qubits for a scalable silicon quantum computer. With the use of isotopically purified 28Si and the realization of extremely long spin coherence time in single-donor electrons, the recent experimental focus is on two-coupled donors with the eventual goal of a scaled-up quantum circuit. Motivated by this development, we simulate the statistical distribution of the exchange coupling J between a pair of donors under realistic donor placement straggles, and quantify the errors relative to the intended J value. With J values in a broad range of donor-pair separation ( 5 <|R |<60 nm), we work out various cases systematically, for a target donor separation R0 along the [001], [110] and [111] Si crystallographic directions, with |R0|=10 ,20 or 30 nm and standard deviation σR=1 ,2 ,5 or 10 nm. Our extensive theoretical results demonstrate the great challenge for a prescribed J gate even with just a donor pair, a first step for any scalable Si-donor-based quantum computer.
An efficient and scalable deformable model for virtual reality-based medical applications.
Choi, Kup-Sze; Sun, Hanqiu; Heng, Pheng-Ann
2004-09-01
Modeling of tissue deformation is of great importance to virtual reality (VR)-based medical simulations. Considerable effort has been dedicated to the development of interactively deformable virtual tissues. In this paper, an efficient and scalable deformable model is presented for virtual-reality-based medical applications. It considers deformation as a localized force transmittal process which is governed by algorithms based on breadth-first search (BFS). The computational speed is scalable to facilitate real-time interaction by adjusting the penetration depth. Simulated annealing (SA) algorithms are developed to optimize the model parameters by using the reference data generated with the linear static finite element method (FEM). The mechanical behavior and timing performance of the model have been evaluated. The model has been applied to simulate the typical behavior of living tissues and anisotropic materials. Integration with a haptic device has also been achieved on a generic personal computer (PC) platform. The proposed technique provides a feasible solution for VR-based medical simulations and has the potential for multi-user collaborative work in virtual environment.
ExSTraCS 2.0: Description and Evaluation of a Scalable Learning Classifier System.
Urbanowicz, Ryan J; Moore, Jason H
2015-09-01
Algorithmic scalability is a major concern for any machine learning strategy in this age of 'big data'. A large number of potentially predictive attributes is emblematic of problems in bioinformatics, genetic epidemiology, and many other fields. Previously, ExS-TraCS was introduced as an extended Michigan-style supervised learning classifier system that combined a set of powerful heuristics to successfully tackle the challenges of classification, prediction, and knowledge discovery in complex, noisy, and heterogeneous problem domains. While Michigan-style learning classifier systems are powerful and flexible learners, they are not considered to be particularly scalable. For the first time, this paper presents a complete description of the ExS-TraCS algorithm and introduces an effective strategy to dramatically improve learning classifier system scalability. ExSTraCS 2.0 addresses scalability with (1) a rule specificity limit, (2) new approaches to expert knowledge guided covering and mutation mechanisms, and (3) the implementation and utilization of the TuRF algorithm for improving the quality of expert knowledge discovery in larger datasets. Performance over a complex spectrum of simulated genetic datasets demonstrated that these new mechanisms dramatically improve nearly every performance metric on datasets with 20 attributes and made it possible for ExSTraCS to reliably scale up to perform on related 200 and 2000-attribute datasets. ExSTraCS 2.0 was also able to reliably solve the 6, 11, 20, 37, 70, and 135 multiplexer problems, and did so in similar or fewer learning iterations than previously reported, with smaller finite training sets, and without using building blocks discovered from simpler multiplexer problems. Furthermore, ExS-TraCS usability was made simpler through the elimination of previously critical run parameters.
Photo-z-SQL: Integrated, flexible photometric redshift computation in a database
NASA Astrophysics Data System (ADS)
Beck, R.; Dobos, L.; Budavári, T.; Szalay, A. S.; Csabai, I.
2017-04-01
We present a flexible template-based photometric redshift estimation framework, implemented in C#, that can be seamlessly integrated into a SQL database (or DB) server and executed on-demand in SQL. The DB integration eliminates the need to move large photometric datasets outside a database for redshift estimation, and utilizes the computational capabilities of DB hardware. The code is able to perform both maximum likelihood and Bayesian estimation, and can handle inputs of variable photometric filter sets and corresponding broad-band magnitudes. It is possible to take into account the full covariance matrix between filters, and filter zero points can be empirically calibrated using measurements with given redshifts. The list of spectral templates and the prior can be specified flexibly, and the expensive synthetic magnitude computations are done via lazy evaluation, coupled with a caching of results. Parallel execution is fully supported. For large upcoming photometric surveys such as the LSST, the ability to perform in-place photo-z calculation would be a significant advantage. Also, the efficient handling of variable filter sets is a necessity for heterogeneous databases, for example the Hubble Source Catalog, and for cross-match services such as SkyQuery. We illustrate the performance of our code on two reference photo-z estimation testing datasets, and provide an analysis of execution time and scalability with respect to different configurations. The code is available for download at https://github.com/beckrob/Photo-z-SQL.
Methodologies and systems for heterogeneous concurrent computing
NASA Technical Reports Server (NTRS)
Sunderam, V. S.
1994-01-01
Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.
Palomar, Esther; Chen, Xiaohong; Liu, Zhiming; Maharjan, Sabita; Bowen, Jonathan
2016-10-28
Smart city systems embrace major challenges associated with climate change, energy efficiency, mobility and future services by embedding the virtual space into a complex cyber-physical system. Those systems are constantly evolving and scaling up, involving a wide range of integration among users, devices, utilities, public services and also policies. Modelling such complex dynamic systems' architectures has always been essential for the development and application of techniques/tools to support design and deployment of integration of new components, as well as for the analysis, verification, simulation and testing to ensure trustworthiness. This article reports on the definition and implementation of a scalable component-based architecture that supports a cooperative energy demand response (DR) system coordinating energy usage between neighbouring households. The proposed architecture, called refinement of Cyber-Physical Component Systems (rCPCS), which extends the refinement calculus for component and object system (rCOS) modelling method, is implemented using Eclipse Extensible Coordination Tools (ECT), i.e., Reo coordination language. With rCPCS implementation in Reo, we specify the communication, synchronisation and co-operation amongst the heterogeneous components of the system assuring, by design scalability and the interoperability, correctness of component cooperation.
Palomar, Esther; Chen, Xiaohong; Liu, Zhiming; Maharjan, Sabita; Bowen, Jonathan
2016-01-01
Smart city systems embrace major challenges associated with climate change, energy efficiency, mobility and future services by embedding the virtual space into a complex cyber-physical system. Those systems are constantly evolving and scaling up, involving a wide range of integration among users, devices, utilities, public services and also policies. Modelling such complex dynamic systems’ architectures has always been essential for the development and application of techniques/tools to support design and deployment of integration of new components, as well as for the analysis, verification, simulation and testing to ensure trustworthiness. This article reports on the definition and implementation of a scalable component-based architecture that supports a cooperative energy demand response (DR) system coordinating energy usage between neighbouring households. The proposed architecture, called refinement of Cyber-Physical Component Systems (rCPCS), which extends the refinement calculus for component and object system (rCOS) modelling method, is implemented using Eclipse Extensible Coordination Tools (ECT), i.e., Reo coordination language. With rCPCS implementation in Reo, we specify the communication, synchronisation and co-operation amongst the heterogeneous components of the system assuring, by design scalability and the interoperability, correctness of component cooperation. PMID:27801829
Proxy-equation paradigm: A strategy for massively parallel asynchronous computations
NASA Astrophysics Data System (ADS)
Mittal, Ankita; Girimaji, Sharath
2017-09-01
Massively parallel simulations of transport equation systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs), which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence. In this paper, we propose and develop a novel "proxy equation" concept for a general transport equation that (i) tolerates asynchrony with minimal added error, (ii) preserves convergence order and thus, (iii) expected to scale efficiently on massively parallel machines. The central idea is to modify a priori the transport equation at the PE boundaries to offset asynchrony errors. Proof-of-concept computations are performed using a one-dimensional advection (convection) diffusion equation. The results demonstrate the promise and advantages of the present strategy.
Optimized scalable network switch
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2007-12-04
In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
Optimized scalable network switch
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
2010-02-23
In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
Scalable Entity-Based Modeling of Population-Based Systems, Final LDRD Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cleary, A J; Smith, S G; Vassilevska, T K
2005-01-27
The goal of this project has been to develop tools, capabilities and expertise in the modeling of complex population-based systems via scalable entity-based modeling (EBM). Our initial focal application domain has been the dynamics of large populations exposed to disease-causing agents, a topic of interest to the Department of Homeland Security in the context of bioterrorism. In the academic community, discrete simulation technology based on individual entities has shown initial success, but the technology has not been scaled to the problem sizes or computational resources of LLNL. Our developmental emphasis has been on the extension of this technology to parallelmore » computers and maturation of the technology from an academic to a lab setting.« less
Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter
2015-01-20
While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
A Study on Fast Gates for Large-Scale Quantum Simulation with Trapped Ions
Taylor, Richard L.; Bentley, Christopher D. B.; Pedernales, Julen S.; Lamata, Lucas; Solano, Enrique; Carvalho, André R. R.; Hope, Joseph J.
2017-01-01
Large-scale digital quantum simulations require thousands of fundamental entangling gates to construct the simulated dynamics. Despite success in a variety of small-scale simulations, quantum information processing platforms have hitherto failed to demonstrate the combination of precise control and scalability required to systematically outmatch classical simulators. We analyse how fast gates could enable trapped-ion quantum processors to achieve the requisite scalability to outperform classical computers without error correction. We analyze the performance of a large-scale digital simulator, and find that fidelity of around 70% is realizable for π-pulse infidelities below 10−5 in traps subject to realistic rates of heating and dephasing. This scalability relies on fast gates: entangling gates faster than the trap period. PMID:28401945
A Study on Fast Gates for Large-Scale Quantum Simulation with Trapped Ions.
Taylor, Richard L; Bentley, Christopher D B; Pedernales, Julen S; Lamata, Lucas; Solano, Enrique; Carvalho, André R R; Hope, Joseph J
2017-04-12
Large-scale digital quantum simulations require thousands of fundamental entangling gates to construct the simulated dynamics. Despite success in a variety of small-scale simulations, quantum information processing platforms have hitherto failed to demonstrate the combination of precise control and scalability required to systematically outmatch classical simulators. We analyse how fast gates could enable trapped-ion quantum processors to achieve the requisite scalability to outperform classical computers without error correction. We analyze the performance of a large-scale digital simulator, and find that fidelity of around 70% is realizable for π-pulse infidelities below 10 -5 in traps subject to realistic rates of heating and dephasing. This scalability relies on fast gates: entangling gates faster than the trap period.
Scalable cluster administration - Chiba City I approach and lessons learned.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Navarro, J. P.; Evard, R.; Nurmi, D.
2002-07-01
Systems administrators of large clusters often need to perform the same administrative activity hundreds or thousands of times. Often such activities are time-consuming, especially the tasks of installing and maintaining software. By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with remote hardware control, cluster administrators can automate all administrative tasks. Scalable cluster administration addresses the following challenge: What systems design techniques can cluster builders use to automate cluster administration on very large clusters? We describe the approach used in the Mathematics and Computer Science Division of Argonne National Laboratory on Chiba City I, a 314-node Linuxmore » cluster; and we analyze the scalability, flexibility, and reliability benefits and limitations from that approach.« less
NASA Astrophysics Data System (ADS)
Weiss, Chester J.
2013-08-01
An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.
ERIC Educational Resources Information Center
Repenning, Alexander; Webb, David C.; Koh, Kyu Han; Nickerson, Hilarie; Miller, Susan B.; Brand, Catharine; Her Many Horses, Ian; Basawapatna, Ashok; Gluck, Fred; Grover, Ryan; Gutierrez, Kris; Repenning, Nadia
2015-01-01
An educated citizenry that participates in and contributes to science technology engineering and mathematics innovation in the 21st century will require broad literacy and skills in computer science (CS). School systems will need to give increased attention to opportunities for students to engage in computational thinking and ways to promote a…
Software designs of image processing tasks with incremental refinement of computation.
Anastasia, Davide; Andreopoulos, Yiannis
2010-08-01
Software realizations of computationally-demanding image processing tasks (e.g., image transforms and convolution) do not currently provide graceful degradation when their clock-cycles budgets are reduced, e.g., when delay deadlines are imposed in a multitasking environment to meet throughput requirements. This is an important obstacle in the quest for full utilization of modern programmable platforms' capabilities since worst-case considerations must be in place for reasonable quality of results. In this paper, we propose (and make available online) platform-independent software designs performing bitplane-based computation combined with an incremental packing framework in order to realize block transforms, 2-D convolution and frame-by-frame block matching. The proposed framework realizes incremental computation: progressive processing of input-source increments improves the output quality monotonically. Comparisons with the equivalent nonincremental software realization of each algorithm reveal that, for the same precision of the result, the proposed approach can lead to comparable or faster execution, while it can be arbitrarily terminated and provide the result up to the computed precision. Application examples with region-of-interest based incremental computation, task scheduling per frame, and energy-distortion scalability verify that our proposal provides significant performance scalability with graceful degradation.
Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks
Kaltenbacher, Barbara; Hasenauer, Jan
2017-01-01
Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics. PMID:28114351
Parallel peak pruning for scalable SMP contour tree computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.
As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Heterogeneous Distributed Computing for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Sunderam, Vaidy S.
1998-01-01
The research supported under this award focuses on heterogeneous distributed computing for high-performance applications, with particular emphasis on computational aerosciences. The overall goal of this project was to and investigate issues in, and develop solutions to, efficient execution of computational aeroscience codes in heterogeneous concurrent computing environments. In particular, we worked in the context of the PVM[1] system and, subsequent to detailed conversion efforts and performance benchmarking, devising novel techniques to increase the efficacy of heterogeneous networked environments for computational aerosciences. Our work has been based upon the NAS Parallel Benchmark suite, but has also recently expanded in scope to include the NAS I/O benchmarks as specified in the NHT-1 document. In this report we summarize our research accomplishments under the auspices of the grant.
Optically Driven Spin Based Quantum Dots for Quantum Computing - Research Area 6 Physics 6.3.2
2015-12-15
quantum dots (SAQD) in Schottky diodes . Based on spins in these dots, a scalable architecture has been proposed [Adv. in Physics, 59, 703 (2010)] by us...housed in two coupled quantum dots with tunneling between them, as described above, may not be scalable but can serve as a node in a quantum network. The... tunneling -coupled two-electron spin ground states in the vertically coupled quantum dots for “universal computation” two spin qubits within the universe of
Mitigating Spam Using Spatio-Temporal Reputation
2010-01-01
scalable; computation can occur in near real-time and over 500,000 emails can be scored an hour. 1 Introduction Roughly 90% of the total volume of email on...Sokolsky, and J. M. Smith. Dynamic trust management. IEEE Computer (Special Issue on Trust Mangement ), 2009. [11] P. Boykins and B. Roychowdhury
OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Seyong; Vetter, Jeffrey S
2014-01-01
Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research framework to address those issues in the directive-based accelerator programming. This paper explains important design strategies and key compiler transformation techniques needed to implement the reference OpenACC compiler. Moreover, this paper demonstrates the efficacy of OpenARC as a research framework for directive-based programming study, by proposing andmore » implementing OpenACC extensions in the OpenARC framework to 1) support hybrid programming of the unified memory and separate memory and 2) exploit architecture-specific features in an abstract manner. Porting thirteen standard OpenACC programs and three extended OpenACC programs to CUDA GPUs shows that OpenARC performs similarly to a commercial OpenACC compiler, while it serves as a high-level research framework.« less
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...
2016-11-16
Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
2012-01-01
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author map on a cluster using Map-Reduce frame-work. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on a training data set automatically extracted from two network snapshots taken in two consecutive time duration. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process. A case study of hypotheses generation based on the proposed method has been presented in the paper. PMID:22759614
The Quantum Socket: Wiring for Superconducting Qubits - Part 3
NASA Astrophysics Data System (ADS)
Mariantoni, M.; Bejianin, J. H.; McConkey, T. G.; Rinehart, J. R.; Bateman, J. D.; Earnest, C. T.; McRae, C. H.; Rohanizadegan, Y.; Shiri, D.; Penava, B.; Breul, P.; Royak, S.; Zapatka, M.; Fowler, A. G.
The implementation of a quantum computer requires quantum error correction codes, which allow to correct errors occurring on physical quantum bits (qubits). Ensemble of physical qubits will be grouped to form a logical qubit with a lower error rate. Reaching low error rates will necessitate a large number of physical qubits. Thus, a scalable qubit architecture must be developed. Superconducting qubits have been used to realize error correction. However, a truly scalable qubit architecture has yet to be demonstrated. A critical step towards scalability is the realization of a wiring method that allows to address qubits densely and accurately. A quantum socket that serves this purpose has been designed and tested at microwave frequencies. In this talk, we show results where the socket is used at millikelvin temperatures to measure an on-chip superconducting resonator. The control electronics is another fundamental element for scalability. We will present a proposal based on the quantum socket to interconnect a classical control hardware to a superconducting qubit hardware, where both are operated at millikelvin temperatures.
A Systems Approach to Scalable Transportation Network Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2006-01-01
Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.
Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin
2014-10-14
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
NASA Astrophysics Data System (ADS)
Jiang, Xikai; Li, Jiyuan; Zhao, Xujun; Qin, Jian; Karpeev, Dmitry; Hernandez-Ortiz, Juan; de Pablo, Juan J.; Heinonen, Olle
2016-08-01
Large classes of materials systems in physics and engineering are governed by magnetic and electrostatic interactions. Continuum or mesoscale descriptions of such systems can be cast in terms of integral equations, whose direct computational evaluation requires O(N2) operations, where N is the number of unknowns. Such a scaling, which arises from the many-body nature of the relevant Green's function, has precluded wide-spread adoption of integral methods for solution of large-scale scientific and engineering problems. In this work, a parallel computational approach is presented that relies on using scalable open source libraries and utilizes a kernel-independent Fast Multipole Method (FMM) to evaluate the integrals in O(N) operations, with O(N) memory cost, thereby substantially improving the scalability and efficiency of computational integral methods. We demonstrate the accuracy, efficiency, and scalability of our approach in the context of two examples. In the first, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space. In the second, we solve an electrostatic problem involving polarizable dielectric bodies in an unbounded dielectric medium. The results from these test cases show that our proposed parallel approach, which is built on a kernel-independent FMM, can enable highly efficient and accurate simulations and allow for considerable flexibility in a broad range of applications.
Jiang, Xikai; Li, Jiyuan; Zhao, Xujun; ...
2016-08-10
Large classes of materials systems in physics and engineering are governed by magnetic and electrostatic interactions. Continuum or mesoscale descriptions of such systems can be cast in terms of integral equations, whose direct computational evaluation requires O( N 2) operations, where N is the number of unknowns. Such a scaling, which arises from the many-body nature of the relevant Green's function, has precluded wide-spread adoption of integral methods for solution of large-scale scientific and engineering problems. In this work, a parallel computational approach is presented that relies on using scalable open source libraries and utilizes a kernel-independent Fast Multipole Methodmore » (FMM) to evaluate the integrals in O( N) operations, with O( N) memory cost, thereby substantially improving the scalability and efficiency of computational integral methods. We demonstrate the accuracy, efficiency, and scalability of our approach in the context of two examples. In the first, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space. In the second, we solve an electrostatic problem involving polarizable dielectric bodies in an unbounded dielectric medium. Lastly, the results from these test cases show that our proposed parallel approach, which is built on a kernel-independent FMM, can enable highly efficient and accurate simulations and allow for considerable flexibility in a broad range of applications.« less
Cooling the Collective Motion of Trapped Ions to Initialize a Quantum Register
2016-09-13
computation [1] provides a gen- eral framework for fundamental investigations into sub- jects such as entanglement, quantum measurement, and quantum ...information theory. Since quantum computation relies on entanglement between qubits, any implementa- tion of a quantum computer must offer isolation from the...for realiz- ing a quantum computer , which is scalable to an arbitrary number of qubits. Their scheme is based on a collection of trapped atomic ions
Heterogeneity in Health Care Computing Environments
Sengupta, Soumitra
1989-01-01
This paper discusses issues of heterogeneity in computer systems, networks, databases, and presentation techniques, and the problems it creates in developing integrated medical information systems. The need for institutional, comprehensive goals are emphasized. Using the Columbia-Presbyterian Medical Center's computing environment as the case study, various steps to solve the heterogeneity problem are presented.
Entanglement in a Quantum Annealing Processor
2016-09-07
that QA is a viable technology for large- scale quantum computing . DOI: 10.1103/PhysRevX.4.021041 Subject Areas: Quantum Physics, Quantum Information...Superconductivity I. INTRODUCTION The past decade has been exciting for the field of quantum computation . A wide range of physical imple- mentations...measurements used in studying prototype universal quantum computers [9–14]. These constraints make it challenging to experimentally determine whether a scalable
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.
O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D
2015-04-01
The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.
Phonon-based scalable platform for chip-scale quantum computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reinke, Charles M.; El-Kady, Ihab
Here, we present a scalable phonon-based quantum computer on a phononic crystal platform. Practical schemes involve selective placement of a single acceptor atom in the peak of the strain field in a high-Q phononic crystal cavity that enables coupling of the phonon modes to the energy levels of the atom. We show theoretical optimization of the cavity design and coupling waveguide, along with estimated performance figures of the coupled system. A qubit can be created by entangling a phonon at the resonance frequency of the cavity with the atom states. Qubits based on this half-sound, half-matter quasi-particle, called a phoniton,more » may outcompete other quantum architectures in terms of combined emission rate, coherence lifetime, and fabrication demands.« less
Wei, Hai-Rui; Deng, Fu-Guo
2013-07-29
We investigate the possibility of achieving scalable photonic quantum computing by the giant optical circular birefringence induced by a quantum-dot spin in a double-sided optical microcavity as a result of cavity quantum electrodynamics. We construct a deterministic controlled-not gate on two photonic qubits by two single-photon input-output processes and the readout on an electron-medium spin confined in an optical resonant microcavity. This idea could be applied to multi-qubit gates on photonic qubits and we give the quantum circuit for a three-photon Toffoli gate. High fidelities and high efficiencies could be achieved when the side leakage to the cavity loss rate is low. It is worth pointing out that our devices work in both the strong and the weak coupling regimes.
Phonon-based scalable platform for chip-scale quantum computing
Reinke, Charles M.; El-Kady, Ihab
2016-12-19
Here, we present a scalable phonon-based quantum computer on a phononic crystal platform. Practical schemes involve selective placement of a single acceptor atom in the peak of the strain field in a high-Q phononic crystal cavity that enables coupling of the phonon modes to the energy levels of the atom. We show theoretical optimization of the cavity design and coupling waveguide, along with estimated performance figures of the coupled system. A qubit can be created by entangling a phonon at the resonance frequency of the cavity with the atom states. Qubits based on this half-sound, half-matter quasi-particle, called a phoniton,more » may outcompete other quantum architectures in terms of combined emission rate, coherence lifetime, and fabrication demands.« less
Parallel scalability of Hartree-Fock calculations
NASA Astrophysics Data System (ADS)
Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.
2015-03-01
Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.
Scalable quantum memory in the ultrastrong coupling regime.
Kyaw, T H; Felicetti, S; Romero, G; Solano, E; Kwek, L-C
2015-03-02
Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances.
Scalable quantum memory in the ultrastrong coupling regime
Kyaw, T. H.; Felicetti, S.; Romero, G.; Solano, E.; Kwek, L.-C.
2015-01-01
Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances. PMID:25727251
NASA Astrophysics Data System (ADS)
Robbins, Hannah; Hu, Sijung; Liu, Changqing
2015-03-01
The demand for rapid screening technologies, to be used outside of a traditional healthcare setting, has been vastly expanding. This is requiring a new engineering platform for faster and cost effective techniques to be easily adopted through forward-thinking manufacturing procedures, i.e., advanced miniaturisation and heterogeneous integration of high performance microfluidics based point-of-care testing (POCT) systems. Although there has been a considerable amount of research into POCT systems, there exist tremendous challenges and bottlenecks in the design and manufacturing in order to reach a clinical acceptability of sensitivity and selectivity, as well as smart microsystems for healthcare. The project aims to research how to enable scalable production of such complex systems through 1) advanced miniaturisation of a physical layout and opto-electronic component allocation through an optimal design; and 2) heterogeneous integration of multiplexed fluorescence detection (MFD) for in vitro POCT. Verification is being arranged through experimental testing with a series of dilutions of commonly used fluorescence dye, i.e. Cy5. Iterative procedures will be engaged until satisfaction of the detection limit, of Cy5 dye, 1.209x10-10 M. The research creates a new avenue of rapid screening POCT manufacturing solutions with a particular view on high performance and multifunctional detection systems not only in POCT, but also life sciences and environmental applications.
A Framework for Collaborative and Convenient Learning on Cloud Computing Platforms
ERIC Educational Resources Information Center
Sharma, Deepika; Kumar, Vikas
2017-01-01
The depth of learning resides in collaborative work with more engagement and fun. Technology can enhance collaboration with a higher level of convenience and cloud computing can facilitate this in a cost effective and scalable manner. However, to deploy a successful online learning environment, elementary components of learning pedagogy must be…
ERIC Educational Resources Information Center
Blikstein, Paulo; Worsley, Marcelo; Piech, Chris; Sahami, Mehran; Cooper, Steven; Koller, Daphne
2014-01-01
New high-frequency, automated data collection and analysis algorithms could offer new insights into complex learning processes, especially for tasks in which students have opportunities to generate unique open-ended artifacts such as computer programs. These approaches should be particularly useful because the need for scalable project-based and…
A One-Year Introductory Robotics Curriculum for Computer Science Upperclassmen
ERIC Educational Resources Information Center
Correll, N.; Wing, R.; Coleman, D.
2013-01-01
This paper describes a one-year introductory robotics course sequence focusing on computational aspects of robotics for third- and fourth-year students. The key challenges this curriculum addresses are "scalability," i.e., how to teach a robotics class with a limited amount of hardware to a large audience, "student assessment,"…
Scalable Vector Media-processors for Embedded Systems
2002-05-01
Set Architecture for Multimedia “When you do the common things in life in an uncommon way, you will command the attention of the world.” George ...Bibliography [ABHS89] M. August, G. Brost , C. Hsiung, and C. Schiffleger. Cray X-MP: The Birth of a Super- computer. IEEE Computer, 22(1):45–52, January
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-12-01
Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-01-01
Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
NASA Astrophysics Data System (ADS)
Kong, Fande; Cai, Xiao-Chuan
2017-07-01
Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear in many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexact Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here "geometry" includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.
NASA Technical Reports Server (NTRS)
McGuire, Tim
1998-01-01
In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.
Kong, Fande; Cai, Xiao-Chuan
2017-03-24
Nonlinear fluid-structure interaction (FSI) problems on unstructured meshes in 3D appear many applications in science and engineering, such as vibration analysis of aircrafts and patient-specific diagnosis of cardiovascular diseases. In this work, we develop a highly scalable, parallel algorithmic and software framework for FSI problems consisting of a nonlinear fluid system and a nonlinear solid system, that are coupled monolithically. The FSI system is discretized by a stabilized finite element method in space and a fully implicit backward difference scheme in time. To solve the large, sparse system of nonlinear algebraic equations at each time step, we propose an inexactmore » Newton-Krylov method together with a multilevel, smoothed Schwarz preconditioner with isogeometric coarse meshes generated by a geometry preserving coarsening algorithm. Here ''geometry'' includes the boundary of the computational domain and the wet interface between the fluid and the solid. We show numerically that the proposed algorithm and implementation are highly scalable in terms of the number of linear and nonlinear iterations and the total compute time on a supercomputer with more than 10,000 processor cores for several problems with hundreds of millions of unknowns.« less
Orchestrating Distributed Resource Ensembles for Petascale Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baldin, Ilya; Mandal, Anirban; Ruth, Paul
2014-04-24
Distributed, data-intensive computational science applications of interest to DOE scientific com- munities move large amounts of data for experiment data management, distributed analysis steps, remote visualization, and accessing scientific instruments. These applications need to orchestrate ensembles of resources from multiple resource pools and interconnect them with high-capacity multi- layered networks across multiple domains. It is highly desirable that mechanisms are designed that provide this type of resource provisioning capability to a broad class of applications. It is also important to have coherent monitoring capabilities for such complex distributed environments. In this project, we addressed these problems by designing an abstractmore » API, enabled by novel semantic resource descriptions, for provisioning complex and heterogeneous resources from multiple providers using their native provisioning mechanisms and control planes: computational, storage, and multi-layered high-speed network domains. We used an extensible resource representation based on semantic web technologies to afford maximum flexibility to applications in specifying their needs. We evaluated the effectiveness of provisioning using representative data-intensive ap- plications. We also developed mechanisms for providing feedback about resource performance to the application, to enable closed-loop feedback control and dynamic adjustments to resource allo- cations (elasticity). This was enabled through development of a novel persistent query framework that consumes disparate sources of monitoring data, including perfSONAR, and provides scalable distribution of asynchronous notifications.« less
A Grid Metadata Service for Earth and Environmental Sciences
NASA Astrophysics Data System (ADS)
Fiore, Sandro; Negro, Alessandro; Aloisio, Giovanni
2010-05-01
Critical challenges for climate modeling researchers are strongly connected with the increasingly complex simulation models and the huge quantities of produced datasets. Future trends in climate modeling will only increase computational and storage requirements. For this reason the ability to transparently access to both computational and data resources for large-scale complex climate simulations must be considered as a key requirement for Earth Science and Environmental distributed systems. From the data management perspective (i) the quantity of data will continuously increases, (ii) data will become more and more distributed and widespread, (iii) data sharing/federation will represent a key challenging issue among different sites distributed worldwide, (iv) the potential community of users (large and heterogeneous) will be interested in discovery experimental results, searching of metadata, browsing collections of files, compare different results, display output, etc.; A key element to carry out data search and discovery, manage and access huge and distributed amount of data is the metadata handling framework. What we propose for the management of distributed datasets is the GRelC service (a data grid solution focusing on metadata management). Despite the classical approaches, the proposed data-grid solution is able to address scalability, transparency, security and efficiency and interoperability. The GRelC service we propose is able to provide access to metadata stored in different and widespread data sources (relational databases running on top of MySQL, Oracle, DB2, etc. leveraging SQL as query language, as well as XML databases - XIndice, eXist, and libxml2 based documents, adopting either XPath or XQuery) providing a strong data virtualization layer in a grid environment. Such a technological solution for distributed metadata management leverages on well known adopted standards (W3C, OASIS, etc.); (ii) supports role-based management (based on VOMS), which increases flexibility and scalability; (iii) provides full support for Grid Security Infrastructure, which means (authorization, mutual authentication, data integrity, data confidentiality and delegation); (iv) is compatible with existing grid middleware such as gLite and Globus and finally (v) is currently adopted at the Euro-Mediterranean Centre for Climate Change (CMCC - Italy) to manage the entire CMCC data production activity as well as in the international Climate-G testbed.
Towards scalable quantum communication and computation: Novel approaches and realizations
NASA Astrophysics Data System (ADS)
Jiang, Liang
Quantum information science involves exploration of fundamental laws of quantum mechanics for information processing tasks. This thesis presents several new approaches towards scalable quantum information processing. First, we consider a hybrid approach to scalable quantum computation, based on an optically connected network of few-qubit quantum registers. Specifically, we develop a novel scheme for scalable quantum computation that is robust against various imperfections. To justify that nitrogen-vacancy (NV) color centers in diamond can be a promising realization of the few-qubit quantum register, we show how to isolate a few proximal nuclear spins from the rest of the environment and use them for the quantum register. We also demonstrate experimentally that the nuclear spin coherence is only weakly perturbed under optical illumination, which allows us to implement quantum logical operations that use the nuclear spins to assist the repetitive-readout of the electronic spin. Using this technique, we demonstrate more than two-fold improvement in signal-to-noise ratio. Apart from direct application to enhance the sensitivity of the NV-based nano-magnetometer, this experiment represents an important step towards the realization of robust quantum information processors using electronic and nuclear spin qubits. We then study realizations of quantum repeaters for long distance quantum communication. Specifically, we develop an efficient scheme for quantum repeaters based on atomic ensembles. We use dynamic programming to optimize various quantum repeater protocols. In addition, we propose a new protocol of quantum repeater with encoding, which efficiently uses local resources (about 100 qubits) to identify and correct errors, to achieve fast one-way quantum communication over long distances. Finally, we explore quantum systems with topological order. Such systems can exhibit remarkable phenomena such as quasiparticles with anyonic statistics and have been proposed as candidates for naturally error-free quantum computation. We propose a scheme to unambiguously detect the anyonic statistics in spin lattice realizations using ultra-cold atoms in an optical lattice. We show how to reliably read and write topologically protected quantum memory using an atomic or photonic qubit.
Scalable and responsive event processing in the cloud
Suresh, Visalakshmi; Ezhilchelvan, Paul; Watson, Paul
2013-01-01
Event processing involves continuous evaluation of queries over streams of events. Response-time optimization is traditionally done over a fixed set of nodes and/or by using metrics measured at query-operator levels. Cloud computing makes it easy to acquire and release computing nodes as required. Leveraging this flexibility, we propose a novel, queueing-theory-based approach for meeting specified response-time targets against fluctuating event arrival rates by drawing only the necessary amount of computing resources from a cloud platform. In the proposed approach, the entire processing engine of a distinct query is modelled as an atomic unit for predicting response times. Several such units hosted on a single node are modelled as a multiple class M/G/1 system. These aspects eliminate intrusive, low-level performance measurements at run-time, and also offer portability and scalability. Using model-based predictions, cloud resources are efficiently used to meet response-time targets. The efficacy of the approach is demonstrated through cloud-based experiments. PMID:23230164
NASA Technical Reports Server (NTRS)
Shen, Bo-Wen; Tao, Wei-Kuo; Chern, Jiun-Dar
2007-01-01
Improving our understanding of hurricane inter-annual variability and the impact of climate change (e.g., doubling CO2 and/or global warming) on hurricanes brings both scientific and computational challenges to researchers. As hurricane dynamics involves multiscale interactions among synoptic-scale flows, mesoscale vortices, and small-scale cloud motions, an ideal numerical model suitable for hurricane studies should demonstrate its capabilities in simulating these interactions. The newly-developed multiscale modeling framework (MMF, Tao et al., 2007) and the substantial computing power by the NASA Columbia supercomputer show promise in pursuing the related studies, as the MMF inherits the advantages of two NASA state-of-the-art modeling components: the GEOS4/fvGCM and 2D GCEs. This article focuses on the computational issues and proposes a revised methodology to improve the MMF's performance and scalability. It is shown that this prototype implementation enables 12-fold performance improvements with 364 CPUs, thereby making it more feasible to study hurricane climate.
Energy-efficient quantum computing
NASA Astrophysics Data System (ADS)
Ikonen, Joni; Salmilehto, Juha; Möttönen, Mikko
2017-04-01
In the near future, one of the major challenges in the realization of large-scale quantum computers operating at low temperatures is the management of harmful heat loads owing to thermal conduction of cabling and dissipation at cryogenic components. This naturally raises the question that what are the fundamental limitations of energy consumption in scalable quantum computing. In this work, we derive the greatest lower bound for the gate error induced by a single application of a bosonic drive mode of given energy. Previously, such an error type has been considered to be inversely proportional to the total driving power, but we show that this limitation can be circumvented by introducing a qubit driving scheme which reuses and corrects drive pulses. Specifically, our method serves to reduce the average energy consumption per gate operation without increasing the average gate error. Thus our work shows that precise, scalable control of quantum systems can, in principle, be implemented without the introduction of excessive heat or decoherence.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets.
Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O; Gelfand, Alan E
2016-01-01
Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online.
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets
Datta, Abhirup; Banerjee, Sudipto; Finley, Andrew O.; Gelfand, Alan E.
2018-01-01
Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This article develops a class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive U.S. Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. Supplementary materials for this article are available online. PMID:29720777
A Massively Parallel Code for Polarization Calculations
NASA Astrophysics Data System (ADS)
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
From fuzzy recurrence plots to scalable recurrence networks of time series
NASA Astrophysics Data System (ADS)
Pham, Tuan D.
2017-04-01
Recurrence networks, which are derived from recurrence plots of nonlinear time series, enable the extraction of hidden features of complex dynamical systems. Because fuzzy recurrence plots are represented as grayscale images, this paper presents a variety of texture features that can be extracted from fuzzy recurrence plots. Based on the notion of fuzzy recurrence plots, defuzzified, undirected, and unweighted recurrence networks are introduced. Network measures can be computed for defuzzified recurrence networks that are scalable to meet the demand for the network-based analysis of big data.
pcircle - A Suite of Scalable Parallel File System Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
WANG, FEIYI
2015-10-01
Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
Functionality limit of classical simulated annealing
NASA Astrophysics Data System (ADS)
Hasegawa, M.
2015-09-01
By analyzing the system dynamics in the landscape paradigm, optimization function of classical simulated annealing is reviewed on the random traveling salesman problems. The properly functioning region of the algorithm is experimentally determined in the size-time plane and the influence of its boundary on the scalability test is examined in the standard framework of this method. From both results, an empirical choice of temperature length is plausibly explained as a minimum requirement that the algorithm maintains its scalability within its functionality limit. The study exemplifies the applicability of computational physics analysis to the optimization algorithm research.
2014-01-01
Background Integrating and analyzing heterogeneous genome-scale data is a huge algorithmic challenge for modern systems biology. Bipartite graphs can be useful for representing relationships across pairs of disparate data types, with the interpretation of these relationships accomplished through an enumeration of maximal bicliques. Most previously-known techniques are generally ill-suited to this foundational task, because they are relatively inefficient and without effective scaling. In this paper, a powerful new algorithm is described that produces all maximal bicliques in a bipartite graph. Unlike most previous approaches, the new method neither places undue restrictions on its input nor inflates the problem size. Efficiency is achieved through an innovative exploitation of bipartite graph structure, and through computational reductions that rapidly eliminate non-maximal candidates from the search space. An iterative selection of vertices for consideration based on non-decreasing common neighborhood sizes boosts efficiency and leads to more balanced recursion trees. Results The new technique is implemented and compared to previously published approaches from graph theory and data mining. Formal time and space bounds are derived. Experiments are performed on both random graphs and graphs constructed from functional genomics data. It is shown that the new method substantially outperforms the best previous alternatives. Conclusions The new method is streamlined, efficient, and particularly well-suited to the study of huge and diverse biological data. A robust implementation has been incorporated into GeneWeaver, an online tool for integrating and analyzing functional genomics experiments, available at http://geneweaver.org. The enormous increase in scalability it provides empowers users to study complex and previously unassailable gene-set associations between genes and their biological functions in a hierarchical fashion and on a genome-wide scale. This practical computational resource is adaptable to almost any applications environment in which bipartite graphs can be used to model relationships between pairs of heterogeneous entities. PMID:24731198
Perspective: The future of quantum dot photonic integrated circuits
NASA Astrophysics Data System (ADS)
Norman, Justin C.; Jung, Daehwan; Wan, Yating; Bowers, John E.
2018-03-01
Direct epitaxial integration of III-V materials on Si offers substantial manufacturing cost and scalability advantages over heterogeneous integration. The challenge is that epitaxial growth introduces high densities of crystalline defects that limit device performance and lifetime. Quantum dot lasers, amplifiers, modulators, and photodetectors epitaxially grown on Si are showing promise for achieving low-cost, scalable integration with silicon photonics. The unique electrical confinement properties of quantum dots provide reduced sensitivity to the crystalline defects that result from III-V/Si growth, while their unique gain dynamics show promise for improved performance and new functionalities relative to their quantum well counterparts in many devices. Clear advantages for using quantum dot active layers for lasers and amplifiers on and off Si have already been demonstrated, and results for quantum dot based photodetectors and modulators look promising. Laser performance on Si is improving rapidly with continuous-wave threshold currents below 1 mA, injection efficiencies of 87%, and output powers of 175 mW at 20 °C. 1500-h reliability tests at 35 °C showed an extrapolated mean-time-to-failure of more than ten million hours. This represents a significant stride toward efficient, scalable, and reliable III-V lasers on on-axis Si substrates for photonic integrate circuits that are fully compatible with complementary metal-oxide-semiconductor (CMOS) foundries.
Introducing a distributed unstructured mesh into gyrokinetic particle-in-cell code, XGC
NASA Astrophysics Data System (ADS)
Yoon, Eisung; Shephard, Mark; Seol, E. Seegyoung; Kalyanaraman, Kaushik
2017-10-01
XGC has shown good scalability for large leadership supercomputers. The current production version uses a copy of the entire unstructured finite element mesh on every MPI rank. Although an obvious scalability issue if the mesh sizes are to be dramatically increased, the current approach is also not optimal with respect to data locality of particles and mesh information. To address these issues we have initiated the development of a distributed mesh PIC method. This approach directly addresses the base scalability issue with respect to mesh size and, through the use of a mesh entity centric view of the particle mesh relationship, provides opportunities to address data locality needs of many core and GPU supported heterogeneous systems. The parallel mesh PIC capabilities are being built on the Parallel Unstructured Mesh Infrastructure (PUMI). The presentation will first overview the form of mesh distribution used and indicate the structures and functions used to support the mesh, the particles and their interaction. Attention will then focus on the node-level optimizations being carried out to ensure performant operation of all PIC operations on the distributed mesh. Partnership for Edge Physics Simulation (EPSI) Grant No. DE-SC0008449 and Center for Extended Magnetohydrodynamic Modeling (CEMM) Grant No. DE-SC0006618.
NASA Astrophysics Data System (ADS)
Park, Soomyung; Joo, Seong-Soon; Yae, Byung-Ho; Lee, Jong-Hyun
2002-07-01
In this paper, we present the Optical Cross-Connect (OXC) Management Control System Architecture, which has the scalability and robust maintenance and provides the distributed managing environment in the optical transport network. The OXC system we are developing, which is divided into the hardware and the internal and external software for the OXC system, is made up the OXC subsystem with the Optical Transport Network (OTN) sub layers-hardware and the optical switch control system, the signaling control protocol subsystem performing the User-to-Network Interface (UNI) and Network-to-Network Interface (NNI) signaling control, the Operation Administration Maintenance & Provisioning (OAM&P) subsystem, and the network management subsystem. And the OXC management control system has the features that can support the flexible expansion of the optical transport network, provide the connectivity to heterogeneous external network elements, be added or deleted without interrupting OAM&P services, be remotely operated, provide the global view and detail information for network planner and operator, and have Common Object Request Broker Architecture (CORBA) based the open system architecture adding and deleting the intelligent service networking functions easily in future. To meet these considerations, we adopt the object oriented development method in the whole developing steps of the system analysis, design, and implementation to build the OXC management control system with the scalability, the maintenance, and the distributed managing environment. As a consequently, the componentification for the OXC operation management functions of each subsystem makes the robust maintenance, and increases code reusability. Also, the component based OXC management control system architecture will have the flexibility and scalability in nature.
NASA Technical Reports Server (NTRS)
Aiken, Alexander
2001-01-01
The Scalable Analysis Toolkit (SAT) project aimed to demonstrate that it is feasible and useful to statically detect software bugs in very large systems. The technical focus of the project was on a relatively new class of constraint-based techniques for analysis software, where the desired facts about programs (e.g., the presence of a particular bug) are phrased as constraint problems to be solved. At the beginning of this project, the most successful forms of formal software analysis were limited forms of automatic theorem proving (as exemplified by the analyses used in language type systems and optimizing compilers), semi-automatic theorem proving for full verification, and model checking. With a few notable exceptions these approaches had not been demonstrated to scale to software systems of even 50,000 lines of code. Realistic approaches to large-scale software analysis cannot hope to make every conceivable formal method scale. Thus, the SAT approach is to mix different methods in one application by using coarse and fast but still adequate methods at the largest scales, and reserving the use of more precise but also more expensive methods at smaller scales for critical aspects (that is, aspects critical to the analysis problem under consideration) of a software system. The principled method proposed for combining a heterogeneous collection of formal systems with different scalability characteristics is mixed constraints. This idea had been used previously in small-scale applications with encouraging results: using mostly coarse methods and narrowly targeted precise methods, useful information (meaning the discovery of bugs in real programs) was obtained with excellent scalability.
NASA Astrophysics Data System (ADS)
Huang, T.; Alarcon, C.; Quach, N. T.
2014-12-01
Capture, curate, and analysis are the typical activities performed at any given Earth Science data center. Modern data management systems must be adaptable to heterogeneous science data formats, scalable to meet the mission's quality of service requirements, and able to manage the life-cycle of any given science data product. Designing a scalable data management doesn't happen overnight. It takes countless hours of refining, refactoring, retesting, and re-architecting. The Horizon data management and workflow framework, developed at the Jet Propulsion Laboratory, is a portable, scalable, and reusable framework for developing high-performance data management and product generation workflow systems to automate data capturing, data curation, and data analysis activities. The NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC)'s Data Management and Archive System (DMAS) is its core data infrastructure that handles capturing and distribution of hundreds of thousands of satellite observations each day around the clock. DMAS is an application of the Horizon framework. The NASA Global Imagery Browse Services (GIBS) is NASA's Earth Observing System Data and Information System (EOSDIS)'s solution for making high-resolution global imageries available to the science communities. The Imagery Exchange (TIE), an application of the Horizon framework, is a core subsystem for GIBS responsible for data capturing and imagery generation automation to support the EOSDIS' 12 distributed active archive centers and 17 Science Investigator-led Processing Systems (SIPS). This presentation discusses our ongoing effort in refining, refactoring, retesting, and re-architecting the Horizon framework to enable data-intensive science and its applications.
ERIC Educational Resources Information Center
Leonard, Jacqueline; Barnes-Johnson, Joy; Mitchell, Monica; Unertl, Adrienne; Stubbe, Christopher R.; Ingraham, Latanya
2017-01-01
This research report presents the final year results of a three-year research project on computational thinking (CT). The project, funded by the National Science Foundation, involved training teachers in grades four through six to implement Scalable Game Design and LEGO® EV3 robotics during afterschool clubs. Thirty teachers and 531 students took…
NVIDIA OptiX ray-tracing engine as a new tool for modelling medical imaging systems
NASA Astrophysics Data System (ADS)
Pietrzak, Jakub; Kacperski, Krzysztof; Cieślar, Marek
2015-03-01
The most accurate technique to model the X- and gamma radiation path through a numerically defined object is the Monte Carlo simulation which follows single photons according to their interaction probabilities. A simplified and much faster approach, which just integrates total interaction probabilities along selected paths, is known as ray tracing. Both techniques are used in medical imaging for simulating real imaging systems and as projectors required in iterative tomographic reconstruction algorithms. These approaches are ready for massive parallel implementation e.g. on Graphics Processing Units (GPU), which can greatly accelerate the computation time at a relatively low cost. In this paper we describe the application of the NVIDIA OptiX ray-tracing engine, popular in professional graphics and rendering applications, as a new powerful tool for X- and gamma ray-tracing in medical imaging. It allows the implementation of a variety of physical interactions of rays with pixel-, mesh- or nurbs-based objects, and recording any required quantities, like path integrals, interaction sites, deposited energies, and others. Using the OptiX engine we have implemented a code for rapid Monte Carlo simulations of Single Photon Emission Computed Tomography (SPECT) imaging, as well as the ray-tracing projector, which can be used in reconstruction algorithms. The engine generates efficient, scalable and optimized GPU code, ready to run on multi GPU heterogeneous systems. We have compared the results our simulations with the GATE package. With the OptiX engine the computation time of a Monte Carlo simulation can be reduced from days to minutes.
Evaluation of 3D printed anatomically scalable transfemoral prosthetic knee.
Ramakrishnan, Tyagi; Schlafly, Millicent; Reed, Kyle B
2017-07-01
This case study compares a transfemoral amputee's gait while using the existing Ossur Total Knee 2000 and our novel 3D printed anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee is 3D printed out of a carbon-fiber and nylon composite that has a gear-mesh coupling with a hard-stop weight-actuated locking mechanism aided by a cross-linked four-bar spring mechanism. This design can be scaled using anatomical dimensions of a human femur and tibia to have a unique fit for each user. The transfemoral amputee who was tested is high functioning and walked on the Computer Assisted Rehabilitation Environment (CAREN) at a self-selected pace. The motion capture and force data that was collected showed that there were distinct differences in the gait dynamics. The data was used to perform the Combined Gait Asymmetry Metric (CGAM), where the scores revealed that the overall asymmetry of the gait on the Ossur Total Knee was more asymmetric than the anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee had higher peak knee flexion that caused a large step time asymmetry. This made walking on the anatomically scalable transfemoral prosthetic knee more strenuous due to the compensatory movements in adapting to the different dynamics. This can be overcome by tuning the cross-linked spring mechanism to emulate the dynamics of the subject better. The subject stated that the knee would be good for daily use and has the potential to be adapted as a running knee.
Osborn, Sarah; Zulian, Patrick; Benson, Thomas; ...
2018-01-30
This work describes a domain embedding technique between two nonmatching meshes used for generating realizations of spatially correlated random fields with applications to large-scale sampling-based uncertainty quantification. The goal is to apply the multilevel Monte Carlo (MLMC) method for the quantification of output uncertainties of PDEs with random input coefficients on general and unstructured computational domains. We propose a highly scalable, hierarchical sampling method to generate realizations of a Gaussian random field on a given unstructured mesh by solving a reaction–diffusion PDE with a stochastic right-hand side. The stochastic PDE is discretized using the mixed finite element method on anmore » embedded domain with a structured mesh, and then, the solution is projected onto the unstructured mesh. This work describes implementation details on how to efficiently transfer data from the structured and unstructured meshes at coarse levels, assuming that this can be done efficiently on the finest level. We investigate the efficiency and parallel scalability of the technique for the scalable generation of Gaussian random fields in three dimensions. An application of the MLMC method is presented for quantifying uncertainties of subsurface flow problems. Here, we demonstrate the scalability of the sampling method with nonmatching mesh embedding, coupled with a parallel forward model problem solver, for large-scale 3D MLMC simulations with up to 1.9·109 unknowns.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osborn, Sarah; Zulian, Patrick; Benson, Thomas
This work describes a domain embedding technique between two nonmatching meshes used for generating realizations of spatially correlated random fields with applications to large-scale sampling-based uncertainty quantification. The goal is to apply the multilevel Monte Carlo (MLMC) method for the quantification of output uncertainties of PDEs with random input coefficients on general and unstructured computational domains. We propose a highly scalable, hierarchical sampling method to generate realizations of a Gaussian random field on a given unstructured mesh by solving a reaction–diffusion PDE with a stochastic right-hand side. The stochastic PDE is discretized using the mixed finite element method on anmore » embedded domain with a structured mesh, and then, the solution is projected onto the unstructured mesh. This work describes implementation details on how to efficiently transfer data from the structured and unstructured meshes at coarse levels, assuming that this can be done efficiently on the finest level. We investigate the efficiency and parallel scalability of the technique for the scalable generation of Gaussian random fields in three dimensions. An application of the MLMC method is presented for quantifying uncertainties of subsurface flow problems. Here, we demonstrate the scalability of the sampling method with nonmatching mesh embedding, coupled with a parallel forward model problem solver, for large-scale 3D MLMC simulations with up to 1.9·109 unknowns.« less
Stochastic Multiscale Analysis and Design of Engine Disks
2010-07-28
shown recently to fail when used with data-driven non-linear stochastic input models (KPCA, IsoMap, etc.). Need for scalable exascale computing algorithms Materials Process Design and Control Laboratory Cornell University
Scalable quantum information processing with atomic ensembles and flying photons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mei Feng; Yu Yafei; Feng Mang
2009-10-15
We present a scheme for scalable quantum information processing with atomic ensembles and flying photons. Using the Rydberg blockade, we encode the qubits in the collective atomic states, which could be manipulated fast and easily due to the enhanced interaction in comparison to the single-atom case. We demonstrate that our proposed gating could be applied to generation of two-dimensional cluster states for measurement-based quantum computation. Moreover, the atomic ensembles also function as quantum repeaters useful for long-distance quantum state transfer. We show the possibility of our scheme to work in bad cavity or in weak coupling regime, which could muchmore » relax the experimental requirement. The efficient coherent operations on the ensemble qubits enable our scheme to be switchable between quantum computation and quantum communication using atomic ensembles.« less
A scalable architecture for online anomaly detection of WLCG batch jobs
NASA Astrophysics Data System (ADS)
Kuehn, E.; Fischer, M.; Giffels, M.; Jung, C.; Petzold, A.
2016-10-01
For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially configuration issues or misbehaving batch jobs preventing a smooth operation need to be detected as early as possible. At the GridKa data and computing centre we therefore operate a tool BPNetMon for monitoring traffic data and characteristics of WLCG batch jobs and pilots locally on different worker nodes. On the one hand local information itself are not sufficient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local misconfiguration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network.
Vortex Filaments in Grids for Scalable, Fine Smoke Simulation.
Meng, Zhang; Weixin, Si; Yinling, Qian; Hanqiu, Sun; Jing, Qin; Heng, Pheng-Ann
2015-01-01
Vortex modeling can produce attractive visual effects of dynamic fluids, which are widely applicable for dynamic media, computer games, special effects, and virtual reality systems. However, it is challenging to effectively simulate intensive and fine detailed fluids such as smoke with fast increasing vortex filaments and smoke particles. The authors propose a novel vortex filaments in grids scheme in which the uniform grids dynamically bridge the vortex filaments and smoke particles for scalable, fine smoke simulation with macroscopic vortex structures. Using the vortex model, their approach supports the trade-off between simulation speed and scale of details. After computing the whole velocity, external control can be easily exerted on the embedded grid to guide the vortex-based smoke motion. The experimental results demonstrate the efficiency of using the proposed scheme for a visually plausible smoke simulation with macroscopic vortex structures.
Many-core graph analytics using accelerated sparse linear algebra routines
NASA Astrophysics Data System (ADS)
Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric
2016-05-01
Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
2011-06-01
4. Conclusion The Web -based AGeS system described in this paper is a computationally-efficient and scalable system for high- throughput genome...method for protecting web services involves making them more resilient to attack using autonomic computing techniques. This paper presents our initial...20–23, 2011 2011 DoD High Performance Computing Modernzation Program Users Group Conference HPCMP UGC 2011 The papers in this book comprise the
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters
Bajaj, Chandrajit
2009-01-01
Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
NASA Astrophysics Data System (ADS)
Tysowski, Piotr K.; Ling, Xinhua; Lütkenhaus, Norbert; Mosca, Michele
2018-04-01
Quantum key distribution (QKD) is a means of generating keys between a pair of computing hosts that is theoretically secure against cryptanalysis, even by a quantum computer. Although there is much active research into improving the QKD technology itself, there is still significant work to be done to apply engineering methodology and determine how it can be practically built to scale within an enterprise IT environment. Significant challenges exist in building a practical key management service (KMS) for use in a metropolitan network. QKD is generally a point-to-point technique only and is subject to steep performance constraints. The integration of QKD into enterprise-level computing has been researched, to enable quantum-safe communication. A novel method for constructing a KMS is presented that allows arbitrary computing hosts on one site to establish multiple secure communication sessions with the hosts of another site. A key exchange protocol is proposed where symmetric private keys are granted to hosts while satisfying the scalability needs of an enterprise population of users. The KMS operates within a layered architectural style that is able to interoperate with various underlying QKD implementations. Variable levels of security for the host population are enforced through a policy engine. A network layer provides key generation across a network of nodes connected by quantum links. Scheduling and routing functionality allows quantum key material to be relayed across trusted nodes. Optimizations are performed to match the real-time host demand for key material with the capacity afforded by the infrastructure. The result is a flexible and scalable architecture that is suitable for enterprise use and independent of any specific QKD technology.
Scalable algorithms for 3D extended MHD.
NASA Astrophysics Data System (ADS)
Chacon, Luis
2007-11-01
In the modeling of plasmas with extended MHD (XMHD), the challenge is to resolve long time scales while rendering the whole simulation manageable. In XMHD, this is particularly difficult because fast (dispersive) waves are supported, resulting in a very stiff set of PDEs. In explicit schemes, such stiffness results in stringent numerical stability time-step constraints, rendering them inefficient and algorithmically unscalable. In implicit schemes, it yields very ill-conditioned algebraic systems, which are difficult to invert. In this talk, we present recent theoretical and computational progress that demonstrate a scalable 3D XMHD solver (i.e., CPU ˜N, with N the number of degrees of freedom). The approach is based on Newton-Krylov methods, which are preconditioned for efficiency. The preconditioning stage admits suitable approximations without compromising the quality of the overall solution. In this work, we employ optimal (CPU ˜N) multilevel methods on a parabolized XMHD formulation, which renders the whole algorithm scalable. The (crucial) parabolization step is required to render XMHD multilevel-friendly. Algebraically, the parabolization step can be interpreted as a Schur factorization of the Jacobian matrix, thereby providing a solid foundation for the current (and future extensions of the) approach. We will build towards 3D extended MHDootnotetextL. Chac'on, Comput. Phys. Comm., 163 (3), 143-171 (2004)^,ootnotetextL. Chac'on et al., 33rd EPS Conf. Plasma Physics, Rome, Italy, 2006 by discussing earlier algorithmic breakthroughs in 2D reduced MHDootnotetextL. Chac'on et al., J. Comput. Phys. 178 (1), 15- 36 (2002) and 2D Hall MHD.ootnotetextL. Chac'on et al., J. Comput. Phys., 188 (2), 573-592 (2003)
Advanced technologies for scalable ATLAS conditions database access on the grid
NASA Astrophysics Data System (ADS)
Basset, R.; Canali, L.; Dimitrov, G.; Girone, M.; Hawkings, R.; Nevski, P.; Valassi, A.; Vaniachine, A.; Viegas, F.; Walker, R.; Wong, A.
2010-04-01
During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic work-flow, ATLAS database scalability tests provided feedback for Conditions Db software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysis of server performance under stress tests indicates that Conditions Db data access is limited by the disk I/O throughput. An unacceptable side-effect of the disk I/O saturation is a degradation of the WLCG 3D Services that update Conditions Db data at all ten ATLAS Tier-1 sites using the technology of Oracle Streams. To avoid such bottlenecks we prototyped and tested a novel approach for database peak load avoidance in Grid computing. Our approach is based upon the proven idea of pilot job submission on the Grid: instead of the actual query, an ATLAS utility library sends to the database server a pilot query first.
2016-10-13
enielse@sandia.gov and a.morello@unsw.edu.au Keywords: quantum computing , silicon, tomography Supplementarymaterial for this article is available online...Abstract State of the art qubit systems are reaching the gatefidelities required for scalable quantum computation architectures. Further improvements in...and addressedwhen the qubit is usedwithin a fault-tolerant quantum computation scheme. 1. Introduction One of themain challenges in the physical
Noise thresholds for optical quantum computers.
Dawson, Christopher M; Haselgrove, Henry L; Nielsen, Michael A
2006-01-20
In this Letter we numerically investigate the fault-tolerant threshold for optical cluster-state quantum computing. We allow both photon loss noise and depolarizing noise (as a general proxy for all local noise), and obtain a threshold region of allowed pairs of values for the two types of noise. Roughly speaking, our results show that scalable optical quantum computing is possible for photon loss probabilities <3 x 10(-3), and for depolarization probabilities <10(-4).
TethysCluster: A comprehensive approach for harnessing cloud resources for hydrologic modeling
NASA Astrophysics Data System (ADS)
Nelson, J.; Jones, N.; Ames, D. P.
2015-12-01
Advances in water resources modeling are improving the information that can be supplied to support decisions affecting the safety and sustainability of society. However, as water resources models become more sophisticated and data-intensive they require more computational power to run. Purchasing and maintaining the computing facilities needed to support certain modeling tasks has been cost-prohibitive for many organizations. With the advent of the cloud, the computing resources needed to address this challenge are now available and cost-effective, yet there still remains a significant technical barrier to leverage these resources. This barrier inhibits many decision makers and even trained engineers from taking advantage of the best science and tools available. Here we present the Python tools TethysCluster and CondorPy, that have been developed to lower the barrier to model computation in the cloud by providing (1) programmatic access to dynamically scalable computing resources, (2) a batch scheduling system to queue and dispatch the jobs to the computing resources, (3) data management for job inputs and outputs, and (4) the ability to dynamically create, submit, and monitor computing jobs. These Python tools leverage the open source, computing-resource management, and job management software, HTCondor, to offer a flexible and scalable distributed-computing environment. While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision-support systems deployed as web apps.
GASPRNG: GPU accelerated scalable parallel random number generator library
NASA Astrophysics Data System (ADS)
Gao, Shuang; Peterson, Gregory D.
2013-04-01
Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or workstation with NVIDIA GPU (Tested on Fermi GTX480, Tesla C1060, Tesla M2070). Operating system: Linux with CUDA version 4.0 or later. Should also run on MacOS, Windows, or UNIX. Has the code been vectorized or parallelized?: Yes. Parallelized using MPI directives. RAM: 512 MB˜ 732 MB (main memory on host CPU, depending on the data type of random numbers.) / 512 MB (GPU global memory) Classification: 4.13, 6.5. Nature of problem: Many computational science applications are able to consume large numbers of random numbers. For example, Monte Carlo simulations are able to consume limitless random numbers for the computation as long as resources for the computing are supported. Moreover, parallel computational science applications require independent streams of random numbers to attain statistically significant results. The SPRNG library provides this capability, but at a significant computational cost. The GASPRNG library presented here accelerates the generators of independent streams of random numbers using graphical processing units (GPUs). Solution method: Multiple copies of random number generators in GPUs allow a computational science application to consume large numbers of random numbers from independent, parallel streams. GASPRNG is a random number generators library to allow a computational science application to employ multiple copies of random number generators to boost performance. Users can interface GASPRNG with software code executing on microprocessors and/or GPUs. Running time: The tests provided take a few minutes to run.
Argonne Simulation Framework for Intelligent Transportation Systems
DOT National Transportation Integrated Search
1996-01-01
A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distribu...
Husain, Syed S; Kalinin, Alexandr; Truong, Anh; Dinov, Ivo D
Intuitive formulation of informative and computationally-efficient queries on big and complex datasets present a number of challenges. As data collection is increasingly streamlined and ubiquitous, data exploration, discovery and analytics get considerably harder. Exploratory querying of heterogeneous and multi-source information is both difficult and necessary to advance our knowledge about the world around us. We developed a mechanism to integrate dispersed multi-source data and service the mashed information via human and machine interfaces in a secure, scalable manner. This process facilitates the exploration of subtle associations between variables, population strata, or clusters of data elements, which may be opaque to standard independent inspection of the individual sources. This a new platform includes a device agnostic tool (Dashboard webapp, http://socr.umich.edu/HTML5/Dashboard/) for graphical querying, navigating and exploring the multivariate associations in complex heterogeneous datasets. The paper illustrates this core functionality and serviceoriented infrastructure using healthcare data (e.g., US data from the 2010 Census, Demographic and Economic surveys, Bureau of Labor Statistics, and Center for Medicare Services) as well as Parkinson's Disease neuroimaging data. Both the back-end data archive and the front-end dashboard interfaces are continuously expanded to include additional data elements and new ways to customize the human and machine interactions. A client-side data import utility allows for easy and intuitive integration of user-supplied datasets. This completely open-science framework may be used for exploratory analytics, confirmatory analyses, meta-analyses, and education and training purposes in a wide variety of fields.
Efficient computation of kinship and identity coefficients on large pedigrees.
Cheng, En; Elliott, Brendan; Ozsoyoglu, Z Meral
2009-06-01
With the rapidly expanding field of medical genetics and genetic counseling, genealogy information is becoming increasingly abundant. An important computation on pedigree data is the calculation of identity coefficients, which provide a complete description of the degree of relatedness of a pair of individuals. The areas of application of identity coefficients are numerous and diverse, from genetic counseling to disease tracking, and thus, the computation of identity coefficients merits special attention. However, the computation of identity coefficients is not done directly, but rather as the final step after computing a set of generalized kinship coefficients. In this paper, we first propose a novel Path-Counting Formula for calculating generalized kinship coefficients, which is motivated by Wright's path-counting method for computing inbreeding coefficient. We then present an efficient and scalable scheme for calculating generalized kinship coefficients on large pedigrees using NodeCodes, a special encoding scheme for expediting the evaluation of queries on pedigree graph structures. Furthermore, we propose an improved scheme using Family NodeCodes for the computation of generalized kinship coefficients, which is motivated by the significant improvement of using Family NodeCodes for inbreeding coefficient over the use of NodeCodes. We also perform experiments for evaluating the efficiency of our method, and compare it with the performance of the traditional recursive algorithm for three individuals. Experimental results demonstrate that the resulting scheme is more scalable and efficient than the traditional recursive methods for computing generalized kinship coefficients.
ERIC Educational Resources Information Center
Ngai, Grace; Chan, Stephen C. F.; Leong, Hong Va; Ng, Vincent T. Y.
2013-01-01
This article presents the design and development of i*CATch, a construction kit for physical and wearable computing that was designed to be scalable, plug-and-play, and to provide support for iterative and exploratory learning. It consists of a standardized construction interface that can be adapted for a wide range of soft textiles or electronic…
ERIC Educational Resources Information Center
Leonard, Jacqueline; Buss, Alan; Gamboa, Ruben; Mitchell, Monica; Fashola, Olatokunbo S.; Hubert, Tarcia; Almughyirah, Sultan
2016-01-01
This paper describes the findings of a pilot study that used robotics and game design to develop middle school students' computational thinking strategies. One hundred and twenty-four students engaged in LEGO® EV3 robotics and created games using Scalable Game Design software. The results of the study revealed students' pre-post self-efficacy…
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures
NASA Astrophysics Data System (ADS)
Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.
2016-12-01
The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
Experimental realization of Shor's quantum factoring algorithm using nuclear magnetic resonance.
Vandersypen, L M; Steffen, M; Breyta, G; Yannoni, C S; Sherwood, M H; Chuang, I L
The number of steps any classical computer requires in order to find the prime factors of an l-digit integer N increases exponentially with l, at least using algorithms known at present. Factoring large integers is therefore conjectured to be intractable classically, an observation underlying the security of widely used cryptographic codes. Quantum computers, however, could factor integers in only polynomial time, using Shor's quantum factoring algorithm. Although important for the study of quantum computers, experimental demonstration of this algorithm has proved elusive. Here we report an implementation of the simplest instance of Shor's algorithm: factorization of N = 15 (whose prime factors are 3 and 5). We use seven spin-1/2 nuclei in a molecule as quantum bits, which can be manipulated with room temperature liquid-state nuclear magnetic resonance techniques. This method of using nuclei to store quantum information is in principle scalable to systems containing many quantum bits, but such scalability is not implied by the present work. The significance of our work lies in the demonstration of experimental and theoretical techniques for precise control and modelling of complex quantum computers. In particular, we present a simple, parameter-free but predictive model of decoherence effects in our system.
Harnessing the power of emerging petascale platforms
NASA Astrophysics Data System (ADS)
Mellor-Crummey, John
2007-07-01
As part of the US Department of Energy's Scientific Discovery through Advanced Computing (SciDAC-2) program, science teams are tackling problems that require computational simulation and modeling at the petascale. A grand challenge for computer science is to develop software technology that makes it easier to harness the power of these systems to aid scientific discovery. As part of its activities, the SciDAC-2 Center for Scalable Application Development Software (CScADS) is building open source software tools to support efficient scientific computing on the emerging leadership-class platforms. In this paper, we describe two tools for performance analysis and tuning that are being developed as part of CScADS: a tool for analyzing scalability and performance, and a tool for optimizing loop nests for better node performance. We motivate these tools by showing how they apply to S3D, a turbulent combustion code under development at Sandia National Laboratory. For S3D, our node performance analysis tool helped uncover several performance bottlenecks. Using our loop nest optimization tool, we transformed S3D's most costly loop nest to reduce execution time by a factor of 2.94 for a processor working on a 503 domain.
Embedded DCT and wavelet methods for fine granular scalable video: analysis and comparison
NASA Astrophysics Data System (ADS)
van der Schaar-Mitrea, Mihaela; Chen, Yingwei; Radha, Hayder
2000-04-01
Video transmission over bandwidth-varying networks is becoming increasingly important due to emerging applications such as streaming of video over the Internet. The fundamental obstacle in designing such systems resides in the varying characteristics of the Internet (i.e. bandwidth variations and packet-loss patterns). In MPEG-4, a new SNR scalability scheme, called Fine-Granular-Scalability (FGS), is currently under standardization, which is able to adapt in real-time (i.e. at transmission time) to Internet bandwidth variations. The FGS framework consists of a non-scalable motion-predicted base-layer and an intra-coded fine-granular scalable enhancement layer. For example, the base layer can be coded using a DCT-based MPEG-4 compliant, highly efficient video compression scheme. Subsequently, the difference between the original and decoded base-layer is computed, and the resulting FGS-residual signal is intra-frame coded with an embedded scalable coder. In order to achieve high coding efficiency when compressing the FGS enhancement layer, it is crucial to analyze the nature and characteristics of residual signals common to the SNR scalability framework (including FGS). In this paper, we present a thorough analysis of SNR residual signals by evaluating its statistical properties, compaction efficiency and frequency characteristics. The signal analysis revealed that the energy compaction of the DCT and wavelet transforms is limited and the frequency characteristic of SNR residual signals decay rather slowly. Moreover, the blockiness artifacts of the low bit-rate coded base-layer result in artificial high frequencies in the residual signal. Subsequently, a variety of wavelet and embedded DCT coding techniques applicable to the FGS framework are evaluated and their results are interpreted based on the identified signal properties. As expected from the theoretical signal analysis, the rate-distortion performances of the embedded wavelet and DCT-based coders are very similar. However, improved results can be obtained for the wavelet coder by deblocking the base- layer prior to the FGS residual computation. Based on the theoretical analysis and our measurements, we can conclude that for an optimal complexity versus coding-efficiency trade- off, only limited wavelet decomposition (e.g. 2 stages) needs to be performed for the FGS-residual signal. Also, it was observed that the good rate-distortion performance of a coding technique for a certain image type (e.g. natural still-images) does not necessarily translate into similarly good performance for signals with different visual characteristics and statistical properties.
Scalable web services for the PSIPRED Protein Analysis Workbench.
Buchan, Daniel W A; Minneci, Federico; Nugent, Tim C O; Bryson, Kevin; Jones, David T
2013-07-01
Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.
Scalable implementation of boson sampling with trapped ions.
Shen, C; Zhang, Z; Duan, L-M
2014-02-07
Boson sampling solves a classically intractable problem by sampling from a probability distribution given by matrix permanents. We propose a scalable implementation of boson sampling using local transverse phonon modes of trapped ions to encode the bosons. The proposed scheme allows deterministic preparation and high-efficiency readout of the bosons in the Fock states and universal mode mixing. With the state-of-the-art trapped ion technology, it is feasible to realize boson sampling with tens of bosons by this scheme, which would outperform the most powerful classical computers and constitute an effective disproof of the famous extended Church-Turing thesis.
Simplex-stochastic collocation method with improved scalability
NASA Astrophysics Data System (ADS)
Edeling, W. N.; Dwight, R. P.; Cinnella, P.
2016-04-01
The Simplex-Stochastic Collocation (SSC) method is a robust tool used to propagate uncertain input distributions through a computer code. However, it becomes prohibitively expensive for problems with dimensions higher than 5. The main purpose of this paper is to identify bottlenecks, and to improve upon this bad scalability. In order to do so, we propose an alternative interpolation stencil technique based upon the Set-Covering problem, and we integrate the SSC method in the High-Dimensional Model-Reduction framework. In addition, we address the issue of ill-conditioned sample matrices, and we present an analytical map to facilitate uniformly-distributed simplex sampling.
Two-dimensional photonic crystal slab nanocavities on bulk single-crystal diamond
NASA Astrophysics Data System (ADS)
Wan, Noel H.; Mouradian, Sara; Englund, Dirk
2018-04-01
Color centers in diamond are promising spin qubits for quantum computing and quantum networking. In photon-mediated entanglement distribution schemes, the efficiency of the optical interface ultimately determines the scalability of such systems. Nano-scale optical cavities coupled to emitters constitute a robust spin-photon interface that can increase spontaneous emission rates and photon extraction efficiencies. In this work, we introduce the fabrication of 2D photonic crystal slab nanocavities with high quality factors and cubic wavelength mode volumes—directly in bulk diamond. This planar platform offers scalability and considerably expands the toolkit for classical and quantum nanophotonics in diamond.
Segmentation of breast ultrasound images based on active contours using neutrosophic theory.
Lotfollahi, Mahsa; Gity, Masoumeh; Ye, Jing Yong; Mahlooji Far, A
2018-04-01
Ultrasound imaging is an effective approach for diagnosing breast cancer, but it is highly operator-dependent. Recent advances in computer-aided diagnosis have suggested that it can assist physicians in diagnosis. Definition of the region of interest before computer analysis is still needed. Since manual outlining of the tumor contour is tedious and time-consuming for a physician, developing an automatic segmentation method is important for clinical application. The present paper represents a novel method to segment breast ultrasound images. It utilizes a combination of region-based active contour and neutrosophic theory to overcome the natural properties of ultrasound images including speckle noise and tissue-related textures. First, due to inherent speckle noise and low contrast of these images, we have utilized a non-local means filter and fuzzy logic method for denoising and image enhancement, respectively. This paper presents an improved weighted region-scalable active contour to segment breast ultrasound images using a new feature derived from neutrosophic theory. This method has been applied to 36 breast ultrasound images. It generates true-positive and false-positive results, and similarity of 95%, 6%, and 90%, respectively. The purposed method indicates clear advantages over other conventional methods of active contour segmentation, i.e., region-scalable fitting energy and weighted region-scalable fitting energy.
Performance-scalable volumetric data classification for online industrial inspection
NASA Astrophysics Data System (ADS)
Abraham, Aby J.; Sadki, Mustapha; Lea, R. M.
2002-03-01
Non-intrusive inspection and non-destructive testing of manufactured objects with complex internal structures typically requires the enhancement, analysis and visualization of high-resolution volumetric data. Given the increasing availability of fast 3D scanning technology (e.g. cone-beam CT), enabling on-line detection and accurate discrimination of components or sub-structures, the inherent complexity of classification algorithms inevitably leads to throughput bottlenecks. Indeed, whereas typical inspection throughput requirements range from 1 to 1000 volumes per hour, depending on density and resolution, current computational capability is one to two orders-of-magnitude less. Accordingly, speeding up classification algorithms requires both reduction of algorithm complexity and acceleration of computer performance. A shape-based classification algorithm, offering algorithm complexity reduction, by using ellipses as generic descriptors of solids-of-revolution, and supporting performance-scalability, by exploiting the inherent parallelism of volumetric data, is presented. A two-stage variant of the classical Hough transform is used for ellipse detection and correlation of the detected ellipses facilitates position-, scale- and orientation-invariant component classification. Performance-scalability is achieved cost-effectively by accelerating a PC host with one or more COTS (Commercial-Off-The-Shelf) PCI multiprocessor cards. Experimental results are reported to demonstrate the feasibility and cost-effectiveness of the data-parallel classification algorithm for on-line industrial inspection applications.
Wiewiórka, Marek S; Messina, Antonio; Pacholewska, Alicja; Maffioletti, Sergio; Gawrysiak, Piotr; Okoniewski, Michał J
2014-09-15
Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes. Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
One-Cell Doubling Evaluation by Living Arrays of Yeast, ODELAY!
Herricks, Thurston; Dilworth, David J.; Mast, Fred D.; ...
2016-11-16
Cell growth is a complex phenotype widely used in systems biology to gauge the impact of genetic and environmental perturbations. Due to the magnitude of genome-wide studies, resolution is often sacrificed in favor of throughput, creating a demand for scalable, time-resolved, quantitative methods of growth assessment. We present ODELAY (One-cell Doubling Evaluation by Living Arrays of Yeast), an automated and scalable growth analysis platform. High measurement density and single-cell resolution provide a powerful tool for large-scale multiparameter growth analysis based on the modeling of microcolony expansion on solid media. Pioneered in yeast but applicable to other colony forming organisms, ODELAYmore » extracts the three key growth parameters (lag time, doubling time, and carrying capacity) that define microcolony expansion from single cells, simultaneously permitting the assessment of population heterogeneity. The utility of ODELAY is illustrated using yeast mutants, revealing a spectrum of phenotypes arising from single and combinatorial growth parameter perturbations.« less
One-Cell Doubling Evaluation by Living Arrays of Yeast, ODELAY!
DOE Office of Scientific and Technical Information (OSTI.GOV)
Herricks, Thurston; Dilworth, David J.; Mast, Fred D.
Cell growth is a complex phenotype widely used in systems biology to gauge the impact of genetic and environmental perturbations. Due to the magnitude of genome-wide studies, resolution is often sacrificed in favor of throughput, creating a demand for scalable, time-resolved, quantitative methods of growth assessment. We present ODELAY (One-cell Doubling Evaluation by Living Arrays of Yeast), an automated and scalable growth analysis platform. High measurement density and single-cell resolution provide a powerful tool for large-scale multiparameter growth analysis based on the modeling of microcolony expansion on solid media. Pioneered in yeast but applicable to other colony forming organisms, ODELAYmore » extracts the three key growth parameters (lag time, doubling time, and carrying capacity) that define microcolony expansion from single cells, simultaneously permitting the assessment of population heterogeneity. The utility of ODELAY is illustrated using yeast mutants, revealing a spectrum of phenotypes arising from single and combinatorial growth parameter perturbations.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Visweswara Sathanur, Arun; Choudhury, Sutanay; Joslyn, Cliff A.
Property graphs can be used to represent heterogeneous networks with attributed vertices and edges. Given one property graph, simulating another graph with same or greater size with identical statistical properties with respect to the attributes and connectivity is critical for privacy preservation and benchmarking purposes. In this work we tackle the problem of capturing the statistical dependence of the edge connectivity on the vertex labels and using the same distribution to regenerate property graphs of the same or expanded size in a scalable manner. However, accurate simulation becomes a challenge when the attributes do not completely explain the network structure.more » We propose the Property Graph Model (PGM) approach that uses an attribute (or label) augmentation strategy to mitigate the problem and preserve the graph connectivity as measured via degree distribution, vertex label distributions and edge connectivity. Our proposed algorithm is scalable with a linear complexity in the number of edges in the target graph. We illustrate the efficacy of the PGM approach in regenerating and expanding the datasets by leveraging two distinct illustrations.« less
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN
Hammond, G E; Lichtner, P C; Mills, R T
2014-01-01
[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097
Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN.
Hammond, G E; Lichtner, P C; Mills, R T
2014-01-01
[1] To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.
A Framework for Real-Time Collection, Analysis, and Classification of Ubiquitous Infrasound Data
NASA Astrophysics Data System (ADS)
Christe, A.; Garces, M. A.; Magana-Zook, S. A.; Schnurr, J. M.
2015-12-01
Traditional infrasound arrays are generally expensive to install and maintain. There are ~10^3 infrasound channels on Earth today. The amount of data currently provided by legacy architectures can be processed on a modest server. However, the growing availability of low-cost, ubiquitous, and dense infrasonic sensor networks presents a substantial increase in the volume, velocity, and variety of data flow. Initial data from a prototype ubiquitous global infrasound network is already pushing the boundaries of traditional research server and communication systems, in particular when serving data products over heterogeneous, international network topologies. We present a scalable, cloud-based approach for capturing and analyzing large amounts of dense infrasonic data (>10^6 channels). We utilize Akka actors with WebSockets to maintain data connections with infrasound sensors. Apache Spark provides streaming, batch, machine learning, and graph processing libraries which will permit signature classification, cross-correlation, and other analytics in near real time. This new framework and approach provide significant advantages in scalability and cost.
Scattering Properties of Heterogeneous Mineral Particles with Absorbing Inclusions
NASA Technical Reports Server (NTRS)
Dlugach, Janna M.; Mishchenko, Michael I.
2015-01-01
We analyze the results of numerically exact computer modeling of scattering and absorption properties of randomly oriented poly-disperse heterogeneous particles obtained by placing microscopic absorbing grains randomly on the surfaces of much larger spherical mineral hosts or by imbedding them randomly inside the hosts. These computations are paralleled by those for heterogeneous particles obtained by fully encapsulating fractal-like absorbing clusters in the mineral hosts. All computations are performed using the superposition T-matrix method. In the case of randomly distributed inclusions, the results are compared with the outcome of Lorenz-Mie computations for an external mixture of the mineral hosts and absorbing grains. We conclude that internal aggregation can affect strongly both the integral radiometric and differential scattering characteristics of the heterogeneous particle mixtures.
Performance and scalability evaluation of "Big Memory" on Blue Gene Linux.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoshii, K.; Iskra, K.; Naik, H.
2011-05-01
We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of 'Big Memory' - an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solelymore » for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karthik, Rajasekar
2014-01-01
In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
Pseudo-orthogonalization of memory patterns for associative memory.
Oku, Makito; Makino, Takaki; Aihara, Kazuyuki
2013-11-01
A new method for improving the storage capacity of associative memory models on a neural network is proposed. The storage capacity of the network increases in proportion to the network size in the case of random patterns, but, in general, the capacity suffers from correlation among memory patterns. Numerous solutions to this problem have been proposed so far, but their high computational cost limits their scalability. In this paper, we propose a novel and simple solution that is locally computable without any iteration. Our method involves XNOR masking of the original memory patterns with random patterns, and the masked patterns and masks are concatenated. The resulting decorrelated patterns allow higher storage capacity at the cost of the pattern length. Furthermore, the increase in the pattern length can be reduced through blockwise masking, which results in a small amount of capacity loss. Movie replay and image recognition are presented as examples to demonstrate the scalability of the proposed method.
Hierarchical MFMO Circuit Modules for an Energy-Efficient SDR DBF
NASA Astrophysics Data System (ADS)
Mar, Jeich; Kuo, Chi-Cheng; Wu, Shin-Ru; Lin, You-Rong
The hierarchical multi-function matrix operation (MFMO) circuit modules are designed using coordinate rotations digital computer (CORDIC) algorithm for realizing the intensive computation of matrix operations. The paper emphasizes that the designed hierarchical MFMO circuit modules can be used to develop a power-efficient software-defined radio (SDR) digital beamformer (DBF). The formulas of the processing time for the scalable MFMO circuit modules implemented in field programmable gate array (FPGA) are derived to allocate the proper logic resources for the hardware reconfiguration. The hierarchical MFMO circuit modules are scalable to the changing number of array branches employed for the SDR DBF to achieve the purpose of power saving. The efficient reuse of the common MFMO circuit modules in the SDR DBF can also lead to energy reduction. Finally, the power dissipation and reconfiguration function in the different modes of the SDR DBF are observed from the experiment results.
Emergent Adaptive Noise Reduction from Communal Cooperation of Sensor Grid
NASA Technical Reports Server (NTRS)
Jones, Kennie H.; Jones, Michael G.; Nark, Douglas M.; Lodding, Kenneth N.
2010-01-01
In the last decade, the realization of small, inexpensive, and powerful devices with sensors, computers, and wireless communication has promised the development of massive sized sensor networks with dense deployments over large areas capable of high fidelity situational assessments. However, most management models have been based on centralized control and research has concentrated on methods for passing data from sensor devices to the central controller. Most implementations have been small but, as it is not scalable, this methodology is insufficient for massive deployments. Here, a specific application of a large sensor network for adaptive noise reduction demonstrates a new paradigm where communities of sensor/computer devices assess local conditions and make local decisions from which emerges a global behaviour. This approach obviates many of the problems of centralized control as it is not prone to single point of failure and is more scalable, efficient, robust, and fault tolerant
Final report for “Extreme-scale Algorithms and Solver Resilience”
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, William Douglas
2017-06-30
This is a joint project with principal investigators at Oak Ridge National Laboratory, Sandia National Laboratories, the University of California at Berkeley, and the University of Tennessee. Our part of the project involves developing performance models for highly scalable algorithms and the development of latency tolerant iterative methods. During this project, we extended our performance models for the Multigrid method for solving large systems of linear equations and conducted experiments with highly scalable variants of conjugate gradient methods that avoid blocking synchronization. In addition, we worked with the other members of the project on alternative techniques for resilience and reproducibility.more » We also presented an alternative approach for reproducible dot-products in parallel computations that performs almost as well as the conventional approach by separating the order of computation from the details of the decomposition of vectors across the processes.« less
DualTrust: A Distributed Trust Model for Swarm-Based Autonomic Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maiden, Wendy M.; Dionysiou, Ioanna; Frincke, Deborah A.
2011-02-01
For autonomic computing systems that utilize mobile agents and ant colony algorithms for their sensor layer, trust management is important for the acceptance of the mobile agent sensors and to protect the system from malicious behavior by insiders and entities that have penetrated network defenses. This paper examines the trust relationships, evidence, and decisions in a representative system and finds that by monitoring the trustworthiness of the autonomic managers rather than the swarming sensors, the trust management problem becomes much more scalable and still serves to protect the swarm. We then propose the DualTrust conceptual trust model. By addressing themore » autonomic manager’s bi-directional primary relationships in the ACS architecture, DualTrust is able to monitor the trustworthiness of the autonomic managers, protect the sensor swarm in a scalable manner, and provide global trust awareness for the orchestrating autonomic manager.« less
Parallel multigrid smoothing: polynomial versus Gauss-Seidel
NASA Astrophysics Data System (ADS)
Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray
2003-07-01
Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
A scalable silicon photonic chip-scale optical switch for high performance computing systems.
Yu, Runxiang; Cheung, Stanley; Li, Yuliang; Okamoto, Katsunari; Proietti, Roberto; Yin, Yawei; Yoo, S J B
2013-12-30
This paper discusses the architecture and provides performance studies of a silicon photonic chip-scale optical switch for scalable interconnect network in high performance computing systems. The proposed switch exploits optical wavelength parallelism and wavelength routing characteristics of an Arrayed Waveguide Grating Router (AWGR) to allow contention resolution in the wavelength domain. Simulation results from a cycle-accurate network simulator indicate that, even with only two transmitter/receiver pairs per node, the switch exhibits lower end-to-end latency and higher throughput at high (>90%) input loads compared with electronic switches. On the device integration level, we propose to integrate all the components (ring modulators, photodetectors and AWGR) on a CMOS-compatible silicon photonic platform to ensure a compact, energy efficient and cost-effective device. We successfully demonstrate proof-of-concept routing functions on an 8 × 8 prototype fabricated using foundry services provided by OpSIS-IME.
CoMD Implementation Suite in Emerging Programming Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haque, Riyaz; Reeve, Sam; Juallmes, Luc
CoMD-Em is a software implementation suite of the CoMD [4] proxy app using different emerging programming models. It is intended to analyze the features and capabilities of novel programming models that could help ensure code and performance portability and scalability across heterogeneous platforms while improving programmer productivity. Another goal is to provide the authors and venders with some meaningful feedback regarding the capabilities and limitations of their models. The actual application is a classical molecular dynamics (MD) simulation using either the Lennard-Jones method (LJ) or the embedded atom method (EAM) for primary particle interaction. The code can be extended tomore » support alternate interaction models. The code is expected ro run on a wide class of heterogeneous hardware configurations like shard/distributed/hybrid memory, GPU's and any other platform supported by the underlying programming model.« less
GenoMetric Query Language: a novel approach to large-scale genomic data management.
Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano
2015-06-15
Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Agata, Ryoichiro; Ichimura, Tsuyoshi; Hori, Takane; Hirahara, Kazuro; Hashimoto, Chihiro; Hori, Muneo
2018-04-01
The simultaneous estimation of the asthenosphere's viscosity and coseismic slip/afterslip is expected to improve largely the consistency of the estimation results to observation data of crustal deformation collected in widely spread observation points, compared to estimations of slips only. Such an estimate can be formulated as a non-linear inverse problem of material properties of viscosity and input force that is equivalent to fault slips based on large-scale finite-element (FE) modeling of crustal deformation, in which the degree of freedom is in the order of 109. We formulated and developed a computationally efficient adjoint-based estimation method for this inverse problem, together with a fast and scalable FE solver for the associated forward and adjoint problems. In a numerical experiment that imitates the 2011 Tohoku-Oki earthquake, the advantage of the proposed method is confirmed by comparing the estimated results with those obtained using simplified estimation methods. The computational cost required for the optimization shows that the proposed method enabled the targeted estimation to be completed with moderate amount of computational resources.
Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.
Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai
2017-11-01
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.
NASA Astrophysics Data System (ADS)
Martin, William G. K.; Hasekamp, Otto P.
2018-01-01
In previous work, we derived the adjoint method as a computationally efficient path to three-dimensional (3D) retrievals of clouds and aerosols. In this paper we will demonstrate the use of adjoint methods for retrieving two-dimensional (2D) fields of cloud extinction. The demonstration uses a new 2D radiative transfer solver (FSDOM). This radiation code was augmented with adjoint methods to allow efficient derivative calculations needed to retrieve cloud and surface properties from multi-angle reflectance measurements. The code was then used in three synthetic retrieval studies. Our retrieval algorithm adjusts the cloud extinction field and surface albedo to minimize the measurement misfit function with a gradient-based, quasi-Newton approach. At each step we compute the value of the misfit function and its gradient with two calls to the solver FSDOM. First we solve the forward radiative transfer equation to compute the residual misfit with measurements, and second we solve the adjoint radiative transfer equation to compute the gradient of the misfit function with respect to all unknowns. The synthetic retrieval studies verify that adjoint methods are scalable to retrieval problems with many measurements and unknowns. We can retrieve the vertically-integrated optical depth of moderately thick clouds as a function of the horizontal coordinate. It is also possible to retrieve the vertical profile of clouds that are separated by clear regions. The vertical profile retrievals improve for smaller cloud fractions. This leads to the conclusion that cloud edges actually increase the amount of information that is available for retrieving the vertical profile of clouds. However, to exploit this information one must retrieve the horizontally heterogeneous cloud properties with a 2D (or 3D) model. This prototype shows that adjoint methods can efficiently compute the gradient of the misfit function. This work paves the way for the application of similar methods to 3D remote sensing problems.
Scheduling Operations for Massive Heterogeneous Clusters
NASA Technical Reports Server (NTRS)
Humphrey, John; Spagnoli, Kyle
2013-01-01
High-performance computing (HPC) programming has become increasingly difficult with the advent of hybrid supercomputers consisting of multicore CPUs and accelerator boards such as the GPU. Manual tuning of software to achieve high performance on this type of machine has been performed by programmers. This is needlessly difficult and prone to being invalidated by new hardware, new software, or changes in the underlying code. A system was developed for task-based representation of programs, which when coupled with a scheduler and runtime system, allows for many benefits, including higher performance and utilization of computational resources, easier programming and porting, and adaptations of code during runtime. The system consists of a method of representing computer algorithms as a series of data-dependent tasks. The series forms a graph, which can be scheduled for execution on many nodes of a supercomputer efficiently by a computer algorithm. The schedule is executed by a dispatch component, which is tailored to understand all of the hardware types that may be available within the system. The scheduler is informed by a cluster mapping tool, which generates a topology of available resources and their strengths and communication costs. Software is decoupled from its hardware, which aids in porting to future architectures. A computer algorithm schedules all operations, which for systems of high complexity (i.e., most NASA codes), cannot be performed optimally by a human. The system aids in reducing repetitive code, such as communication code, and aids in the reduction of redundant code across projects. It adds new features to code automatically, such as recovering from a lost node or the ability to modify the code while running. In this project, the innovators at the time of this reporting intend to develop two distinct technologies that build upon each other and both of which serve as building blocks for more efficient HPC usage. First is the scheduling and dynamic execution framework, and the second is scalable linear algebra libraries that are built directly on the former.
A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU
NASA Astrophysics Data System (ADS)
Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha
2018-03-01
Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.
Cloud Quantum Computing of an Atomic Nucleus
NASA Astrophysics Data System (ADS)
Dumitrescu, E. F.; McCaskey, A. J.; Hagen, G.; Jansen, G. R.; Morris, T. D.; Papenbrock, T.; Pooser, R. C.; Dean, D. J.; Lougovski, P.
2018-05-01
We report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
Cloud Quantum Computing of an Atomic Nucleus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumitrescu, Eugene F.; McCaskey, Alex J.; Hagen, Gaute
Here, we report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
Cloud Quantum Computing of an Atomic Nucleus.
Dumitrescu, E F; McCaskey, A J; Hagen, G; Jansen, G R; Morris, T D; Papenbrock, T; Pooser, R C; Dean, D J; Lougovski, P
2018-05-25
We report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
Cloud Quantum Computing of an Atomic Nucleus
Dumitrescu, Eugene F.; McCaskey, Alex J.; Hagen, Gaute; ...
2018-05-23
Here, we report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
Design and implementation of scalable tape archiver
NASA Technical Reports Server (NTRS)
Nemoto, Toshihiro; Kitsuregawa, Masaru; Takagi, Mikio
1996-01-01
In order to reduce costs, computer manufacturers try to use commodity parts as much as possible. Mainframes using proprietary processors are being replaced by high performance RISC microprocessor-based workstations, which are further being replaced by the commodity microprocessor used in personal computers. Highly reliable disks for mainframes are also being replaced by disk arrays, which are complexes of disk drives. In this paper we try to clarify the feasibility of a large scale tertiary storage system composed of 8-mm tape archivers utilizing robotics. In the near future, the 8-mm tape archiver will be widely used and become a commodity part, since recent rapid growth of multimedia applications requires much larger storage than disk drives can provide. We designed a scalable tape archiver which connects as many 8-mm tape archivers (element archivers) as possible. In the scalable archiver, robotics can exchange a cassette tape between two adjacent element archivers mechanically. Thus, we can build a large scalable archiver inexpensively. In addition, a sophisticated migration mechanism distributes frequently accessed tapes (hot tapes) evenly among all of the element archivers, which improves the throughput considerably. Even with the failures of some tape drives, the system dynamically redistributes hot tapes to the other element archivers which have live tape drives. Several kinds of specially tailored huge archivers are on the market, however, the 8-mm tape scalable archiver could replace them. To maintain high performance in spite of high access locality when a large number of archivers are attached to the scalable archiver, it is necessary to scatter frequently accessed cassettes among the element archivers and to use the tape drives efficiently. For this purpose, we introduce two cassette migration algorithms, foreground migration and background migration. Background migration transfers cassettes between element archivers to redistribute frequently accessed cassettes, thus balancing the load of each archiver. Background migration occurs the robotics are idle. Both migration algorithms are based on access frequency and space utility of each element archiver. To normalize these parameters according to the number of drives in each element archiver, it is possible to maintain high performance even if some tape drives fail. We found that the foreground migration is efficient at reducing access response time. Beside the foreground migration, the background migration makes it possible to track the transition of spatial access locality quickly.
NASA Astrophysics Data System (ADS)
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke
2018-01-01
Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.
High fidelity wireless network evaluation for heterogeneous cognitive radio networks
NASA Astrophysics Data System (ADS)
Ding, Lei; Sagduyu, Yalin; Yackoski, Justin; Azimi-Sadjadi, Babak; Li, Jason; Levy, Renato; Melodia, Tammaso
2012-06-01
We present a high fidelity cognitive radio (CR) network emulation platform for wireless system tests, measure- ments, and validation. This versatile platform provides the configurable functionalities to control and repeat realistic physical channel effects in integrated space, air, and ground networks. We combine the advantages of scalable simulation environment with reliable hardware performance for high fidelity and repeatable evaluation of heterogeneous CR networks. This approach extends CR design only at device (software-defined-radio) or lower-level protocol (dynamic spectrum access) level to end-to-end cognitive networking, and facilitates low-cost deployment, development, and experimentation of new wireless network protocols and applications on frequency- agile programmable radios. Going beyond the channel emulator paradigm for point-to-point communications, we can support simultaneous transmissions by network-level emulation that allows realistic physical-layer inter- actions between diverse user classes, including secondary users, primary users, and adversarial jammers in CR networks. In particular, we can replay field tests in a lab environment with real radios perceiving and learning the dynamic environment thereby adapting for end-to-end goals over distributed spectrum coordination channels that replace the common control channel as a single point of failure. CR networks offer several dimensions of tunable actions including channel, power, rate, and route selection. The proposed network evaluation platform is fully programmable and can reliably evaluate the necessary cross-layer design solutions with configurable op- timization space by leveraging the hardware experiments to represent the realistic effects of physical channel, topology, mobility, and jamming on spectrum agility, situational awareness, and network resiliency. We also provide the flexibility to scale up the test environment by introducing virtual radios and establishing seamless signal-level interactions with real radios. This holistic wireless evaluation approach supports a large-scale, het- erogeneous, and dynamic CR network architecture and allows developing cross-layer network protocols under high fidelity, repeatable, and scalable wireless test scenarios suitable for heterogeneous space, air, and ground networks.
HeNCE: A Heterogeneous Network Computing Environment
Beguelin, Adam; Dongarra, Jack J.; Geist, George Al; ...
1994-01-01
Network computing seeks to utilize the aggregate resources of many networked computers to solve a single problem. In so doing it is often possible to obtain supercomputer performance from an inexpensive local area network. The drawback is that network computing is complicated and error prone when done by hand, especially if the computers have different operating systems and data formats and are thus heterogeneous. The heterogeneous network computing environment (HeNCE) is an integrated graphical environment for creating and running parallel programs over a heterogeneous collection of computers. It is built on a lower level package called parallel virtual machine (PVM).more » The HeNCE philosophy of parallel programming is to have the programmer graphically specify the parallelism of a computation and to automate, as much as possible, the tasks of writing, compiling, executing, debugging, and tracing the network computation. Key to HeNCE is a graphical language based on directed graphs that describe the parallelism and data dependencies of an application. Nodes in the graphs represent conventional Fortran or C subroutines and the arcs represent data and control flow. This article describes the present state of HeNCE, its capabilities, limitations, and areas of future research.« less
Supermultiplicative Speedups of Probabilistic Model-Building Genetic Algorithms
2009-02-01
physicists as well as practitioners in evolutionary computation. The project was later extended to the one-dimensional SK spin glass with power -law... Brasil ) 10. Yuji Sato (Hosei University, Japan) 11. Shunsukc Saruwatari (Tokyo University, Japan) 12. Jian-Hung Chen (Feng Chia University, Taiwan...scalability. In A. Tiwari, J. Knowlcs, E. Avincri, K. Dahal, and R. Roy (Eds.) Applications of Soft Computing: Recent Trends. Berlin: Springer (2006
Communication Avoiding and Overlapping for Numerical Linear Algebra
2012-05-08
future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve scalability by reducing...linear algebra problems to future exascale systems, communication cost must be avoided or overlapped. Communication-avoiding 2.5D algorithms improve...will continue to grow relative to the cost of computation. With exascale computing as the long-term goal, the community needs to develop techniques
Advanced Dynamically Adaptive Algorithms for Stochastic Simulations on Extreme Scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiu, Dongbin
2017-03-03
The focus of the project is the development of mathematical methods and high-performance computational tools for stochastic simulations, with a particular emphasis on computations on extreme scales. The core of the project revolves around the design of highly efficient and scalable numerical algorithms that can adaptively and accurately, in high dimensional spaces, resolve stochastic problems with limited smoothness, even containing discontinuities.
Heterogeneous concurrent computing with exportable services
NASA Technical Reports Server (NTRS)
Sunderam, Vaidy
1995-01-01
Heterogeneous concurrent computing, based on the traditional process-oriented model, is approaching its functionality and performance limits. An alternative paradigm, based on the concept of services, supporting data driven computation, and built on a lightweight process infrastructure, is proposed to enhance the functional capabilities and the operational efficiency of heterogeneous network-based concurrent computing. TPVM is an experimental prototype system supporting exportable services, thread-based computation, and remote memory operations that is built as an extension of and an enhancement to the PVM concurrent computing system. TPVM offers a significantly different computing paradigm for network-based computing, while maintaining a close resemblance to the conventional PVM model in the interest of compatibility and ease of transition Preliminary experiences have demonstrated that the TPVM framework presents a natural yet powerful concurrent programming interface, while being capable of delivering performance improvements of upto thirty percent.
Cloud-based Web Services for Near-Real-Time Web access to NPP Satellite Imagery and other Data
NASA Astrophysics Data System (ADS)
Evans, J. D.; Valente, E. G.
2010-12-01
We are building a scalable, cloud computing-based infrastructure for Web access to near-real-time data products synthesized from the U.S. National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP) and other geospatial and meteorological data. Given recent and ongoing changes in the the NPP and NPOESS programs (now Joint Polar Satellite System), the need for timely delivery of NPP data is urgent. We propose an alternative to a traditional, centralized ground segment, using distributed Direct Broadcast facilities linked to industry-standard Web services by a streamlined processing chain running in a scalable cloud computing environment. Our processing chain, currently implemented on Amazon.com's Elastic Compute Cloud (EC2), retrieves raw data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) and synthesizes data products such as Sea-Surface Temperature, Vegetation Indices, etc. The cloud computing approach lets us grow and shrink computing resources to meet large and rapid fluctuations (twice daily) in both end-user demand and data availability from polar-orbiting sensors. Early prototypes have delivered various data products to end-users with latencies between 6 and 32 minutes. We have begun to replicate machine instances in the cloud, so as to reduce latency and maintain near-real time data access regardless of increased data input rates or user demand -- all at quite moderate monthly costs. Our service-based approach (in which users invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored and composite (e.g., false-color multiband) products on demand. To facilitate broad impact and adoption of our technology, we have emphasized open, industry-standard software interfaces and open source software. Through our work, we envision the widespread establishment of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sensors. A scalable architecture based on cloud computing ensures cost-effective, real-time processing and delivery of NPP and other data. Access via standard Web services maximizes its interoperability and usefulness.
Simple, efficient allocation of modelling runs on heterogeneous clusters with MPI
Donato, David I.
2017-01-01
In scientific modelling and computation, the choice of an appropriate method for allocating tasks for parallel processing depends on the computational setting and on the nature of the computation. The allocation of independent but similar computational tasks, such as modelling runs or Monte Carlo trials, among the nodes of a heterogeneous computational cluster is a special case that has not been specifically evaluated previously. A simulation study shows that a method of on-demand (that is, worker-initiated) pulling from a bag of tasks in this case leads to reliably short makespans for computational jobs despite heterogeneity both within and between cluster nodes. A simple reference implementation in the C programming language with the Message Passing Interface (MPI) is provided.
Ding, Yuxiao; Klyushin, Alexander; Huang, Xing; Jones, Travis; Teschner, Detre; Girgsdies, Frank; Rodenas, Tania; Schlögl, Robert; Heumann, Saskia
2018-03-19
By taking inspiration from the catalytic properties of single-site catalysts and the enhancement of performance through ionic liquids on metal catalysts, we exploited a scalable way to place single cobalt ions on a carbon-nanotube surface bridged by polymerized ionic liquid. Single dispersed cobalt ions coordinated by ionic liquid are used as heterogeneous catalysts for the oxygen evolution reaction (OER). Performance data reveals high activity and stable operation without chemical instability. © 2017 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)
2001-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
NASA Technical Reports Server (NTRS)
Guruswamy, Guru
2004-01-01
A procedure to accurately generate AIC using the Navier-Stokes solver including grid deformation is presented. Preliminary results show good comparisons between experiment and computed flutter boundaries for a rectangular wing. A full wing body configuration of an orbital space plane is selected for demonstration on a large number of processors. In the final paper the AIC of full wing body configuration will be computed. The scalability of the procedure on supercomputer will be demonstrated.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.
Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker
2016-01-01
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
Marc Snir | Argonne National Laboratory
Molecular biology Proteomics Environmental science & technology Air quality Atmospheric & climate , H.S., Jr., Demonstrating the scalability of a molecular dynamics application on a Petaflop computer Transformations IGSBInstitute for Genomics and Systems Biology IMEInstitute for Molecular Engineering JCESRJoint
The TeraShake Computational Platform for Large-Scale Earthquake Simulations
NASA Astrophysics Data System (ADS)
Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas
Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.
Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli
2016-09-10
The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
The scalable implementation of quantum walks using classical light
NASA Astrophysics Data System (ADS)
Goyal, Sandeep K.; Roux, F. S.; Forbes, Andrew; Konrad, Thomas
2014-02-01
A quantum walk is the quantum analog of the classical random walks. Despite their simple structure they form a universal platform to implement any algorithm of quantum computation. However, it is very hard to realize quantum walks with a sufficient number of iterations in quantum systems due to their sensitivity to environmental influences and subsequent loss of coherence. Here we present a scalable implementation scheme for one-dimensional quantum walks for arbitrary number of steps using the orbital angular momentum modes of classical light beams. Furthermore, we show that using the same setup with a minor adjustment we can also realize electric quantum walks.
Scalable nuclear density functional theory with Sky3D
NASA Astrophysics Data System (ADS)
Afibuzzaman, Md; Schuetrumpf, Bastian; Aktulga, Hasan Metin
2018-02-01
In nuclear astrophysics, quantum simulations of large inhomogeneous dense systems as they appear in the crusts of neutron stars present big challenges. The number of particles in a simulation with periodic boundary conditions is strongly limited due to the immense computational cost of the quantum methods. In this paper, we describe techniques for an efficient and scalable parallel implementation of Sky3D, a nuclear density functional theory solver that operates on an equidistant grid. Presented techniques allow Sky3D to achieve good scaling and high performance on a large number of cores, as demonstrated through detailed performance analysis on a Cray XC40 supercomputer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Barton
2014-06-30
Peta-scale computing environments pose significant challenges for both system and application developers and addressing them required more than simply scaling up existing tera-scale solutions. Performance analysis tools play an important role in gaining this understanding, but previous monolithic tools with fixed feature sets have not sufficed. Instead, this project worked on the design, implementation, and evaluation of a general, flexible tool infrastructure supporting the construction of performance tools as “pipelines” of high-quality tool building blocks. These tool building blocks provide common performance tool functionality, and are designed for scalability, lightweight data acquisition and analysis, and interoperability. For this project, wemore » built on Open|SpeedShop, a modular and extensible open source performance analysis tool set. The design and implementation of such a general and reusable infrastructure targeted for petascale systems required us to address several challenging research issues. All components needed to be designed for scale, a task made more difficult by the need to provide general modules. The infrastructure needed to support online data aggregation to cope with the large amounts of performance and debugging data. We needed to be able to map any combination of tool components to each target architecture. And we needed to design interoperable tool APIs and workflows that were concrete enough to support the required functionality, yet provide the necessary flexibility to address a wide range of tools. A major result of this project is the ability to use this scalable infrastructure to quickly create tools that match with a machine architecture and a performance problem that needs to be understood. Another benefit is the ability for application engineers to use the highly scalable, interoperable version of Open|SpeedShop, which are reassembled from the tool building blocks into a flexible, multi-user interface set of tools. This set of tools targeted at Office of Science Leadership Class computer systems and selected Office of Science application codes. We describe the contributions made by the team at the University of Wisconsin. The project built on the efforts in Open|SpeedShop funded by DOE/NNSA and the DOE/NNSA Tri-Lab community, extended Open|Speedshop to the Office of Science Leadership Class Computing Facilities, and addressed new challenges found on these cutting edge systems. Work done under this project at Wisconsin can be divided into two categories, new algorithms and techniques for debugging, and foundation infrastructure work on our Dyninst binary analysis and instrumentation toolkits and MRNet scalability infrastructure.« less
Semantic Framework of Internet of Things for Smart Cities: Case Studies.
Zhang, Ningyu; Chen, Huajun; Chen, Xi; Chen, Jiaoyan
2016-09-14
In recent years, the advancement of sensor technology has led to the generation of heterogeneous Internet-of-Things (IoT) data by smart cities. Thus, the development and deployment of various aspects of IoT-based applications are necessary to mine the potential value of data to the benefit of people and their lives. However, the variety, volume, heterogeneity, and real-time nature of data obtained from smart cities pose considerable challenges. In this paper, we propose a semantic framework that integrates the IoT with machine learning for smart cities. The proposed framework retrieves and models urban data for certain kinds of IoT applications based on semantic and machine-learning technologies. Moreover, we propose two case studies: pollution detection from vehicles and traffic pattern detection. The experimental results show that our system is scalable and capable of accommodating a large number of urban regions with different types of IoT applications.
Semantic Framework of Internet of Things for Smart Cities: Case Studies
Zhang, Ningyu; Chen, Huajun; Chen, Xi; Chen, Jiaoyan
2016-01-01
In recent years, the advancement of sensor technology has led to the generation of heterogeneous Internet-of-Things (IoT) data by smart cities. Thus, the development and deployment of various aspects of IoT-based applications are necessary to mine the potential value of data to the benefit of people and their lives. However, the variety, volume, heterogeneity, and real-time nature of data obtained from smart cities pose considerable challenges. In this paper, we propose a semantic framework that integrates the IoT with machine learning for smart cities. The proposed framework retrieves and models urban data for certain kinds of IoT applications based on semantic and machine-learning technologies. Moreover, we propose two case studies: pollution detection from vehicles and traffic pattern detection. The experimental results show that our system is scalable and capable of accommodating a large number of urban regions with different types of IoT applications. PMID:27649185
In vivo generation of DNA sequence diversity for cellular barcoding
Peikon, Ian D.; Gizatullina, Diana I.; Zador, Anthony M.
2014-01-01
Heterogeneity is a ubiquitous feature of biological systems. A complete understanding of such systems requires a method for uniquely identifying and tracking individual components and their interactions with each other. We have developed a novel method of uniquely tagging individual cells in vivo with a genetic ‘barcode’ that can be recovered by DNA sequencing. Our method is a two-component system comprised of a genetic barcode cassette whose fragments are shuffled by Rci, a site-specific DNA invertase. The system is highly scalable, with the potential to generate theoretical diversities in the billions. We demonstrate the feasibility of this technique in Escherichia coli. Currently, this method could be employed to track the dynamics of populations of microbes through various bottlenecks. Advances of this method should prove useful in tracking interactions of cells within a network, and/or heterogeneity within complex biological samples. PMID:25013177
Implementation of the NAS Parallel Benchmarks in Java
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2002-01-01
Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.
Computing Gröbner and Involutive Bases for Linear Systems of Difference Equations
NASA Astrophysics Data System (ADS)
Yanovich, Denis
2018-02-01
The computation of involutive bases and Gröbner bases for linear systems of difference equations is solved and its importance for physical and mathematical problems is discussed. The algorithm and issues concerning its implementation in C are presented and calculation times are compared with the competing programs. The paper ends with consideration on the parallel version of this implementation and its scalability.
Implementation of Virtualization Oriented Architecture: A Healthcare Industry Case Study
NASA Astrophysics Data System (ADS)
Rao, G. Subrahmanya Vrk; Parthasarathi, Jinka; Karthik, Sundararaman; Rao, Gvn Appa; Ganesan, Suresh
This paper presents a Virtualization Oriented Architecture (VOA) and an implementation of VOA for Hridaya - a Telemedicine initiative. Hadoop Compute cloud was established at our labs and jobs which require a massive computing capability such as ECG signal analysis were submitted and the study is presented in this current paper. VOA takes advantage of inexpensive community PCs and provides added advantages such as Fault Tolerance, Scalability, Performance, High Availability.
Anytime Prediction: Efficient Ensemble Methods for Any Computational Budget
2014-01-21
difficult problem and is the focus of this work. 1.1 Motivation The number of machine learning applications which involve real time and latency sensitive pre...significantly increasing latency , and the computational costs associated with hosting a service are often critical to its viability. For such...balancing training costs, concerns such as scalability and tractability are often more important, as opposed to factors such as latency which are more
Scalable Automated Model Search
2014-05-20
ma- chines. Categories and Subject Descriptors Big Data [Distributed Computing]: Large scale optimization 1. INTRODUCTION Modern scientific and...from Continuum Analytics[1], and Apache Spark 0.8.1. Additionally, we made use of Hadoop 1.0.4 configured on local disks as our data store for the large...Borkar et al. Hyracks: A flexible and extensible foundation for data -intensive computing. In ICDE, 2011. [16] J. Canny and H. Zhao. Big data
geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling
NASA Astrophysics Data System (ADS)
Cowart, C.; Block, J.; Crawl, D.; Graham, J.; Gupta, A.; Nguyen, M.; de Callafon, R.; Smarr, L.; Altintas, I.
2015-12-01
The NSF-funded WIFIRE project has developed an open-source, online geospatial workflow platform for unifying geoprocessing tools and models for for fire and other geospatially dependent modeling applications. It is a product of WIFIRE's objective to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. geoKepler includes a set of reusable GIS components, or actors, for the Kepler Scientific Workflow System (https://kepler-project.org). Actors exist for reading and writing GIS data in formats such as Shapefile, GeoJSON, KML, and using OGC web services such as WFS. The actors also allow for calling geoprocessing tools in other packages such as GDAL and GRASS. Kepler integrates functions from multiple platforms and file formats into one framework, thus enabling optimal GIS interoperability, model coupling, and scalability. Products of the GIS actors can be fed directly to models such as FARSITE and WRF. Kepler's ability to schedule and scale processes using Hadoop and Spark also makes geoprocessing ultimately extensible and computationally scalable. The reusable workflows in geoKepler can be made to run automatically when alerted by real-time environmental conditions. Here, we show breakthroughs in the speed of creating complex data for hazard assessments with this platform. We also demonstrate geoKepler workflows that use Data Assimilation to ingest real-time weather data into wildfire simulations, and for data mining techniques to gain insight into environmental conditions affecting fire behavior. Existing machine learning tools and libraries such as R and MLlib are being leveraged for this purpose in Kepler, as well as Kepler's Distributed Data Parallel (DDP) capability to provide a framework for scalable processing. geoKepler workflows can be executed via an iPython notebook as a part of a Jupyter hub at UC San Diego for sharing and reporting of the scientific analysis and results from various runs of geoKepler workflows. The communication between iPython and Kepler workflow executions is established through an iPython magic function for Kepler that we have implemented. In summary, geoKepler is an ecosystem that makes geospatial processing and analysis of any kind programmable, reusable, scalable and sharable.
Predicting protein function and other biomedical characteristics with heterogeneous ensembles
Whalen, Sean; Pandey, Om Prakash
2015-01-01
Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink. PMID:26342255
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
A Cloud-based Infrastructure and Architecture for Environmental System Research
NASA Astrophysics Data System (ADS)
Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.
2016-12-01
The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units
NASA Astrophysics Data System (ADS)
Corral, Roque; Gisbert, Fernando; Pueblas, Jesus
2017-02-01
The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
paraGSEA: a scalable approach for large-scale gene expression profiling
Peng, Shaoliang; Yang, Shunyun
2017-01-01
Abstract More studies have been conducted using gene expression similarity to identify functional connections among genes, diseases and drugs. Gene Set Enrichment Analysis (GSEA) is a powerful analytical method for interpreting gene expression data. However, due to its enormous computational overhead in the estimation of significance level step and multiple hypothesis testing step, the computation scalability and efficiency are poor on large-scale datasets. We proposed paraGSEA for efficient large-scale transcriptome data analysis. By optimization, the overall time complexity of paraGSEA is reduced from O(mn) to O(m+n), where m is the length of the gene sets and n is the length of the gene expression profiles, which contributes more than 100-fold increase in performance compared with other popular GSEA implementations such as GSEA-P, SAM-GS and GSEA2. By further parallelization, a near-linear speed-up is gained on both workstations and clusters in an efficient manner with high scalability and performance on large-scale datasets. The analysis time of whole LINCS phase I dataset (GSE92742) was reduced to nearly half hour on a 1000 node cluster on Tianhe-2, or within 120 hours on a 96-core workstation. The source code of paraGSEA is licensed under the GPLv3 and available at http://github.com/ysycloud/paraGSEA. PMID:28973463
2010-01-01
Background An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. Methods This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. Results A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. Conclusion WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data. PMID:21210985
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Kumbhare, Alok; Wickramaarachchi, Charith
2014-08-25
Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines themore » scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre
Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less
NASA Astrophysics Data System (ADS)
Clay, M. P.; Buaria, D.; Yeung, P. K.; Gotoh, T.
2018-07-01
This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes.
Three-Dimensional Wiring for Extensible Quantum Computing: The Quantum Socket
NASA Astrophysics Data System (ADS)
Béjanin, J. H.; McConkey, T. G.; Rinehart, J. R.; Earnest, C. T.; McRae, C. R. H.; Shiri, D.; Bateman, J. D.; Rohanizadegan, Y.; Penava, B.; Breul, P.; Royak, S.; Zapatka, M.; Fowler, A. G.; Mariantoni, M.
2016-10-01
Quantum computing architectures are on the verge of scalability, a key requirement for the implementation of a universal quantum computer. The next stage in this quest is the realization of quantum error-correction codes, which will mitigate the impact of faulty quantum information on a quantum computer. Architectures with ten or more quantum bits (qubits) have been realized using trapped ions and superconducting circuits. While these implementations are potentially scalable, true scalability will require systems engineering to combine quantum and classical hardware. One technology demanding imminent efforts is the realization of a suitable wiring method for the control and the measurement of a large number of qubits. In this work, we introduce an interconnect solution for solid-state qubits: the quantum socket. The quantum socket fully exploits the third dimension to connect classical electronics to qubits with higher density and better performance than two-dimensional methods based on wire bonding. The quantum socket is based on spring-mounted microwires—the three-dimensional wires—that push directly on a microfabricated chip, making electrical contact. A small wire cross section (approximately 1 mm), nearly nonmagnetic components, and functionality at low temperatures make the quantum socket ideal for operating solid-state qubits. The wires have a coaxial geometry and operate over a frequency range from dc to 8 GHz, with a contact resistance of approximately 150 m Ω , an impedance mismatch of approximately 10 Ω , and minimal cross talk. As a proof of principle, we fabricate and use a quantum socket to measure high-quality superconducting resonators at a temperature of approximately 10 mK. Quantum error-correction codes such as the surface code will largely benefit from the quantum socket, which will make it possible to address qubits located on a two-dimensional lattice. The present implementation of the socket could be readily extended to accommodate a quantum processor with a (10 ×10 )-qubit lattice, which would allow for the realization of a simple quantum memory.
Astronomy In The Cloud: Using Mapreduce For Image Coaddition
NASA Astrophysics Data System (ADS)
Wiley, Keith; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.
2011-01-01
In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computational challenges such as anomaly detection, classification, and moving object tracking. Since such studies require the highest quality data, methods such as image coaddition, i.e., registration, stacking, and mosaicing, will be critical to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources, e.g., asteroids, or transient objects, e.g., supernovae, these datastreams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this paper we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data is partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources, i.e., platforms where Hadoop is offered as a service. We report on our experience implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multi-terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image coaddition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results compring their performance. This work is funded by the NSF and by NASA.
Astronomy in the Cloud: Using MapReduce for Image Co-Addition
NASA Astrophysics Data System (ADS)
Wiley, K.; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.
2011-03-01
In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification and moving-object tracking. Since such studies benefit from the highest-quality data, methods such as image co-addition, i.e., astrometric registration followed by per-pixel summation, will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources such as potentially hazardous asteroids or transient objects such as supernovae, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this article we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data are partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources: i.e., platforms where Hadoop is offered as a service. We report on our experience of implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multiterabyte imaging data set provides a good testbed for algorithm development, since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image co-addition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.
Scalable Conjunction Processing using Spatiotemporally Indexed Ephemeris Data
NASA Astrophysics Data System (ADS)
Budianto-Ho, I.; Johnson, S.; Sivilli, R.; Alberty, C.; Scarberry, R.
2014-09-01
The collision warnings produced by the Joint Space Operations Center (JSpOC) are of critical importance in protecting U.S. and allied spacecraft against destructive collisions and protecting the lives of astronauts during space flight. As the Space Surveillance Network (SSN) improves its sensor capabilities for tracking small and dim space objects, the number of tracked objects increases from thousands to hundreds of thousands of objects, while the number of potential conjunctions increases with the square of the number of tracked objects. Classical filtering techniques such as apogee and perigee filters have proven insufficient. Novel and orders of magnitude faster conjunction analysis algorithms are required to find conjunctions in a timely manner. Stellar Science has developed innovative filtering techniques for satellite conjunction processing using spatiotemporally indexed ephemeris data that efficiently and accurately reduces the number of objects requiring high-fidelity and computationally-intensive conjunction analysis. Two such algorithms, one based on the k-d Tree pioneered in robotics applications and the other based on Spatial Hash Tables used in computer gaming and animation, use, at worst, an initial O(N log N) preprocessing pass (where N is the number of tracked objects) to build large O(N) spatial data structures that substantially reduce the required number of O(N^2) computations, substituting linear memory usage for quadratic processing time. The filters have been implemented as Open Services Gateway initiative (OSGi) plug-ins for the Continuous Anomalous Orbital Situation Discriminator (CAOS-D) conjunction analysis architecture. We have demonstrated the effectiveness, efficiency, and scalability of the techniques using a catalog of 100,000 objects, an analysis window of one day, on a 64-core computer with 1TB shared memory. Each algorithm can process the full catalog in 6 minutes or less, almost a twenty-fold performance improvement over the baseline implementation running on the same machine. We will present an overview of the algorithms and results that demonstrate the scalability of our concepts.
Adiabatic Quantum Computation: Coherent Control Back Action.
Goswami, Debabrata
2006-11-22
Though attractive from scalability aspects, optical approaches to quantum computing are highly prone to decoherence and rapid population loss due to nonradiative processes such as vibrational redistribution. We show that such effects can be reduced by adiabatic coherent control, in which quantum interference between multiple excitation pathways is used to cancel coupling to the unwanted, non-radiative channels. We focus on experimentally demonstrated adiabatic controlled population transfer experiments wherein the details on the coherence aspects are yet to be explored theoretically but are important for quantum computation. Such quantum computing schemes also form a back-action connection to coherent control developments.
Computing Evans functions numerically via boundary-value problems
NASA Astrophysics Data System (ADS)
Barker, Blake; Nguyen, Rose; Sandstede, Björn; Ventura, Nathaniel; Wahl, Colin
2018-03-01
The Evans function has been used extensively to study spectral stability of travelling-wave solutions in spatially extended partial differential equations. To compute Evans functions numerically, several shooting methods have been developed. In this paper, an alternative scheme for the numerical computation of Evans functions is presented that relies on an appropriate boundary-value problem formulation. Convergence of the algorithm is proved, and several examples, including the computation of eigenvalues for a multi-dimensional problem, are given. The main advantage of the scheme proposed here compared with earlier methods is that the scheme is linear and scalable to large problems.
A scalable method for computing quadruplet wave-wave interactions
NASA Astrophysics Data System (ADS)
Van Vledder, Gerbrant
2017-04-01
Non-linear four-wave interactions are a key physical process in the evolution of wind generated ocean waves. The present generation operational wave models use the Discrete Interaction Approximation (DIA), but it accuracy is poor. It is now generally acknowledged that the DIA should be replaced with a more accurate method to improve predicted spectral shapes and derived parameters. The search for such a method is challenging as one should find a balance between accuracy and computational requirements. Such a method is presented here in the form of a scalable and adaptive method that can mimic both the time consuming exact Snl4 approach and the fast but inaccurate DIA, and everything in between. The method provides an elegant approach to improve the DIA, not by including more arbitrarily shaped wave number configurations, but by a mathematically consistent reduction of an exact method, viz. the WRT method. The adaptiveness is to adapt the abscissa of the locus integrand in relation to the magnitude of the known terms. The adaptiveness is extended to the highest level of the WRT method to select interacting wavenumber configurations in a hierarchical way in relation to their importance. This adaptiveness results in a speed-up of one to three orders of magnitude depending on the measure of accuracy. This definition of accuracy should not be expressed in terms of the quality of the transfer integral for academic spectra but rather in terms of wave model performance in a dynamic run. This has consequences for the balance between the required accuracy and the computational workload for evaluating these interactions. The performance of the scalable method on different scales is illustrated with results from academic spectra, simple growth curves to more complicated field cases using a 3G-wave model.
A comparative analysis of dynamic grids vs. virtual grids using the A3pviGrid framework.
Shankaranarayanan, Avinas; Amaldas, Christine
2010-11-01
With the proliferation of Quad/Multi-core micro-processors in mainstream platforms such as desktops and workstations; a large number of unused CPU cycles can be utilized for running virtual machines (VMs) as dynamic nodes in distributed environments. Grid services and its service oriented business broker now termed cloud computing could deploy image based virtualization platforms enabling agent based resource management and dynamic fault management. In this paper we present an efficient way of utilizing heterogeneous virtual machines on idle desktops as an environment for consumption of high performance grid services. Spurious and exponential increases in the size of the datasets are constant concerns in medical and pharmaceutical industries due to the constant discovery and publication of large sequence databases. Traditional algorithms are not modeled at handing large data sizes under sudden and dynamic changes in the execution environment as previously discussed. This research was undertaken to compare our previous results with running the same test dataset with that of a virtual Grid platform using virtual machines (Virtualization). The implemented architecture, A3pviGrid utilizes game theoretic optimization and agent based team formation (Coalition) algorithms to improve upon scalability with respect to team formation. Due to the dynamic nature of distributed systems (as discussed in our previous work) all interactions were made local within a team transparently. This paper is a proof of concept of an experimental mini-Grid test-bed compared to running the platform on local virtual machines on a local test cluster. This was done to give every agent its own execution platform enabling anonymity and better control of the dynamic environmental parameters. We also analyze performance and scalability of Blast in a multiple virtual node setup and present our findings. This paper is an extension of our previous research on improving the BLAST application framework using dynamic Grids on virtualization platforms such as the virtual box.
Clustering evolving proteins into homologous families.
Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A
2013-04-08
Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
Collective Framework and Performance Optimizations to Open MPI for Cray XT Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ladd, Joshua S; Gorentla Venkata, Manjunath; Shamis, Pavel
2011-01-01
The performance and scalability of collective operations plays a key role in the performance and scalability of many scientific applications. Within the Open MPI code base we have developed a general purpose hierarchical collective operations framework called Cheetah, and applied it at large scale on the Oak Ridge Leadership Computing Facility's Jaguar (OLCF) platform, obtaining better performance and scalability than the native MPI implementation. This paper discuss Cheetah's design and implementation, and optimizations to the framework for Cray XT 5 platforms. Our results show that the Cheetah's Broadcast and Barrier perform better than the native MPI implementation. For medium data,more » the Cheetah's Broadcast outperforms the native MPI implementation by 93% for 49,152 processes problem size. For small and large data, it out performs the native MPI implementation by 10% and 9%, respectively, at 24,576 processes problem size. The Cheetah's Barrier performs 10% better than the native MPI implementation for 12,288 processes problem size.« less