dV/dt - Accelerating the Rate of Progress towards Extreme Scale Collaborative Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Livny, Miron
This report introduces publications that report the results of a project that aimed to design a computational framework that enables computational experimentation at scale while supporting the model of “submit locally, compute globally”. The project focuses on estimating application resource needs, finding the appropriate computing resources, acquiring those resources,deploying the applications and data on the resources, managing applications and resources during run.
Exploring Cloud Computing for Large-scale Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Guang; Han, Binh; Yin, Jian
This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
Experience in using commercial clouds in CMS
NASA Astrophysics Data System (ADS)
Bauerdick, L.; Bockelman, B.; Dykstra, D.; Fuess, S.; Garzoglio, G.; Girone, M.; Gutsche, O.; Holzman, B.; Hufnagel, D.; Kim, H.; Kennedy, R.; Mason, D.; Spentzouris, P.; Timm, S.; Tiradani, A.; Vaandering, E.; CMS Collaboration
2017-10-01
Historically high energy physics computing has been performed on large purpose-built computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.
Experience in using commercial clouds in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bauerdick, L.; Bockelman, B.; Dykstra, D.
Historically high energy physics computing has been performed on large purposebuilt computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is amore » growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.« less
Performance of distributed multiscale simulations
Borgdorff, J.; Ben Belgacem, M.; Bona-Casas, C.; Fazendeiro, L.; Groen, D.; Hoenen, O.; Mizeranschi, A.; Suter, J. L.; Coster, D.; Coveney, P. V.; Dubitzky, W.; Hoekstra, A. G.; Strand, P.; Chopard, B.
2014-01-01
Multiscale simulations model phenomena across natural scales using monolithic or component-based code, running on local or distributed resources. In this work, we investigate the performance of distributed multiscale computing of component-based models, guided by six multiscale applications with different characteristics and from several disciplines. Three modes of distributed multiscale computing are identified: supplementing local dependencies with large-scale resources, load distribution over multiple resources, and load balancing of small- and large-scale resources. We find that the first mode has the apparent benefit of increasing simulation speed, and the second mode can increase simulation speed if local resources are limited. Depending on resource reservation and model coupling topology, the third mode may result in a reduction of resource consumption. PMID:24982258
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian; ...
2017-09-29
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications
NASA Technical Reports Server (NTRS)
Johnston, William E.
2002-01-01
With the advent of Grids - infrastructure for using and managing widely distributed computing and data resources in the science environment - there is now an opportunity to provide a standard, large-scale, computing, data, instrument, and collaboration environment for science that spans many different projects and provides the required infrastructure and services in a relatively uniform and supportable way. Grid technology has evolved over the past several years to provide the services and infrastructure needed for building 'virtual' systems and organizations. We argue that Grid technology provides an excellent basis for the creation of the integrated environments that can combine the resources needed to support the large- scale science projects located at multiple laboratories and universities. We present some science case studies that indicate that a paradigm shift in the process of science will come about as a result of Grids providing transparent and secure access to advanced and integrated information and technologies infrastructure: powerful computing systems, large-scale data archives, scientific instruments, and collaboration tools. These changes will be in the form of services that can be integrated with the user's work environment, and that enable uniform and highly capable access to these computers, data, and instruments, regardless of the location or exact nature of these resources. These services will integrate transient-use resources like computing systems, scientific instruments, and data caches (e.g., as they are needed to perform a simulation or analyze data from a single experiment); persistent-use resources. such as databases, data catalogues, and archives, and; collaborators, whose involvement will continue for the lifetime of a project or longer. While we largely address large-scale science in this paper, Grids, particularly when combined with Web Services, will address a broad spectrum of science scenarios. both large and small scale.
NASA's Information Power Grid: Large Scale Distributed Computing and Data Management
NASA Technical Reports Server (NTRS)
Johnston, William E.; Vaziri, Arsi; Hinke, Tom; Tanner, Leigh Ann; Feiereisen, William J.; Thigpen, William; Tang, Harry (Technical Monitor)
2001-01-01
Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for Grids is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. Multi-disciplinary simulations provide a good example of a class of applications that are very likely to require aggregation of widely distributed computing, data, and intellectual resources. Such simulations - e.g. whole system aircraft simulation and whole system living cell simulation - require integrating applications and data that are developed by different teams of researchers frequently in different locations. The research team's are the only ones that have the expertise to maintain and improve the simulation code and/or the body of experimental data that drives the simulations. This results in an inherently distributed computing and data management environment.
Cognitive Model Exploration and Optimization: A New Challenge for Computational Science
2010-03-01
the generation and analysis of computational cognitive models to explain various aspects of cognition. Typically the behavior of these models...computational scale of a workstation, so we have turned to high performance computing (HPC) clusters and volunteer computing for large-scale...computational resources. The majority of applications on the Department of Defense HPC clusters focus on solving partial differential equations (Post
Consolidation of cloud computing in ATLAS
NASA Astrophysics Data System (ADS)
Taylor, Ryan P.; Domingues Cordeiro, Cristovao Jose; Giordano, Domenico; Hover, John; Kouba, Tomas; Love, Peter; McNab, Andrew; Schovancova, Jaroslava; Sobie, Randall; ATLAS Collaboration
2017-10-01
Throughout the first half of LHC Run 2, ATLAS cloud computing has undergone a period of consolidation, characterized by building upon previously established systems, with the aim of reducing operational effort, improving robustness, and reaching higher scale. This paper describes the current state of ATLAS cloud computing. Cloud activities are converging on a common contextualization approach for virtual machines, and cloud resources are sharing monitoring and service discovery components. We describe the integration of Vacuum resources, streamlined usage of the Simulation at Point 1 cloud for offline processing, extreme scaling on Amazon compute resources, and procurement of commercial cloud capacity in Europe. Finally, building on the previously established monitoring infrastructure, we have deployed a real-time monitoring and alerting platform which coalesces data from multiple sources, provides flexible visualization via customizable dashboards, and issues alerts and carries out corrective actions in response to problems.
Dynamic VM Provisioning for TORQUE in a Cloud Environment
NASA Astrophysics Data System (ADS)
Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.
2014-06-01
Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.
Tools and Techniques for Measuring and Improving Grid Performance
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Frumkin, M.; Smith, W.; VanderWijngaart, R.; Wong, P.; Biegel, Bryan (Technical Monitor)
2001-01-01
This viewgraph presentation provides information on NASA's geographically dispersed computing resources, and the various methods by which the disparate technologies are integrated within a nationwide computational grid. Many large-scale science and engineering projects are accomplished through the interaction of people, heterogeneous computing resources, information systems and instruments at different locations. The overall goal is to facilitate the routine interactions of these resources to reduce the time spent in design cycles, particularly for NASA's mission critical projects. The IPG (Information Power Grid) seeks to implement NASA's diverse computing resources in a fashion similar to the way in which electric power is made available.
Scaling predictive modeling in drug development with cloud computing.
Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola
2015-01-26
Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
Self-guaranteed measurement-based quantum computation
NASA Astrophysics Data System (ADS)
Hayashi, Masahito; Hajdušek, Michal
2018-05-01
In order to guarantee the output of a quantum computation, we usually assume that the component devices are trusted. However, when the total computation process is large, it is not easy to guarantee the whole system when we have scaling effects, unexpected noise, or unaccounted for correlations between several subsystems. If we do not trust the measurement basis or the prepared entangled state, we do need to be worried about such uncertainties. To this end, we propose a self-guaranteed protocol for verification of quantum computation under the scheme of measurement-based quantum computation where no prior-trusted devices (measurement basis or entangled state) are needed. The approach we present enables the implementation of verifiable quantum computation using the measurement-based model in the context of a particular instance of delegated quantum computation where the server prepares the initial computational resource and sends it to the client, who drives the computation by single-qubit measurements. Applying self-testing procedures, we are able to verify the initial resource as well as the operation of the quantum devices and hence the computation itself. The overhead of our protocol scales with the size of the initial resource state to the power of 4 times the natural logarithm of the initial state's size.
Information Power Grid Posters
NASA Technical Reports Server (NTRS)
Vaziri, Arsi
2003-01-01
This document is a summary of the accomplishments of the Information Power Grid (IPG). Grids are an emerging technology that provide seamless and uniform access to the geographically dispersed, computational, data storage, networking, instruments, and software resources needed for solving large-scale scientific and engineering problems. The goal of the NASA IPG is to use NASA's remotely located computing and data system resources to build distributed systems that can address problems that are too large or complex for a single site. The accomplishments outlined in this poster presentation are: access to distributed data, IPG heterogeneous computing, integration of large-scale computing node into distributed environment, remote access to high data rate instruments,and exploratory grid environment.
NASA Astrophysics Data System (ADS)
Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Hurtado Anampa, Kenyi; Tovar, Benjamin; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas
2017-10-01
The University of Notre Dame (ND) CMS group operates a modest-sized Tier-3 site suitable for local, final-stage analysis of CMS data. However, through the ND Center for Research Computing (CRC), Notre Dame researchers have opportunistic access to roughly 25k CPU cores of computing and a 100 Gb/s WAN network link. To understand the limits of what might be possible in this scenario, we undertook to use these resources for a wide range of CMS computing tasks from user analysis through large-scale Monte Carlo production (including both detector simulation and data reconstruction.) We will discuss the challenges inherent in effectively utilizing CRC resources for these tasks and the solutions deployed to overcome them.
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.
2015-01-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L
2015-02-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Length and area equivalents for interpreting wildland resource maps
Elliot L. Amidon; Marilyn S. Whitfield
1969-01-01
Map users must refer to an appropriate scale in interpreting wildland resource maps. Length and area equivalents for nine map scales commonly used have been computed. For each scale a 1-page table consists of map-to-ground equivalents, buffer strip or road widths, and cell dimensions required for a specified acreage. The conversion factors are stored in a Fortran...
Deelman, E.; Callaghan, S.; Field, E.; Francoeur, H.; Graves, R.; Gupta, N.; Gupta, V.; Jordan, T.H.; Kesselman, C.; Maechling, P.; Mehringer, J.; Mehta, G.; Okaya, D.; Vahi, K.; Zhao, L.
2006-01-01
This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an analysis designed to compute probabilistic seismic hazard curves for sites in the Los Angeles area. We explain which software tools were used to build to the system, describe their functionality and interactions. We show the results of running the CyberShake analysis that included over 250,000 jobs using resources available through SCEC and the TeraGrid. ?? 2006 IEEE.
A distributed computing approach to mission operations support. [for spacecraft
NASA Technical Reports Server (NTRS)
Larsen, R. L.
1975-01-01
Computing mission operation support includes orbit determination, attitude processing, maneuver computation, resource scheduling, etc. The large-scale third-generation distributed computer network discussed is capable of fulfilling these dynamic requirements. It is shown that distribution of resources and control leads to increased reliability, and exhibits potential for incremental growth. Through functional specialization, a distributed system may be tuned to very specific operational requirements. Fundamental to the approach is the notion of process-to-process communication, which is effected through a high-bandwidth communications network. Both resource-sharing and load-sharing may be realized in the system.
NASA Technical Reports Server (NTRS)
Hoffer, R. M.
1974-01-01
Forestry, geology, and water resource applications were the focus of this study, which involved the use of computer-implemented pattern-recognition techniques to analyze ERTS-1 data. The results have proven the value of computer-aided analysis techniques, even in areas of mountainous terrain. Several analysis capabilities have been developed during these ERTS-1 investigations. A procedure to rotate, deskew, and geometrically scale the MSS data results in 1:24,000 scale printouts that can be directly overlayed on 7 1/2 minutes U.S.G.S. topographic maps. Several scales of computer-enhanced "false color-infrared" composites of MSS data can be obtained from a digital display unit, and emphasize the tremendous detail present in the ERTS-1 data. A grid can also be superimposed on the displayed data to aid in specifying areas of interest.
A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations
NASA Astrophysics Data System (ADS)
Demir, I.; Agliamzanov, R.
2014-12-01
Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.
Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing
NASA Astrophysics Data System (ADS)
Bazhirov, Timur; Mohammadi, Mohammad; Ding, Kevin; Barabash, Sergey
Recent advances in cloud computing made it possible to access large-scale computational resources completely on-demand in a rapid and efficient manner. When combined with high fidelity simulations, they serve as an alternative pathway to enable computational discovery and design of new materials through large-scale high-throughput screening. Here, we present a case study for a cloud platform implemented at Exabyte Inc. We perform calculations to screen lightweight ternary alloys for thermodynamic stability. Due to the lack of experimental data for most such systems, we rely on theoretical approaches based on first-principle pseudopotential density functional theory. We calculate the formation energies for a set of ternary compounds approximated by special quasirandom structures. During an example run we were able to scale to 10,656 CPUs within 7 minutes from the start, and obtain results for 296 compounds within 38 hours. The results indicate that the ultimate formation enthalpy of ternary systems can be negative for some of lightweight alloys, including Li and Mg compounds. We conclude that compared to traditional capital-intensive approach that requires in on-premises hardware resources, cloud computing is agile and cost-effective, yet scalable and delivers similar performance.
Traffic Simulations on Parallel Computers Using Domain Decomposition Techniques
DOT National Transportation Integrated Search
1995-01-01
Large scale simulations of Intelligent Transportation Systems (ITS) can only be acheived by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic...
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sulakhe, D.; Rodriguez, A.; Wilde, M.
2008-03-01
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
Using Amazon's Elastic Compute Cloud to dynamically scale CMS computational resources
NASA Astrophysics Data System (ADS)
Evans, D.; Fisk, I.; Holzman, B.; Melo, A.; Metson, S.; Pordes, R.; Sheldon, P.; Tiradani, A.
2011-12-01
Large international scientific collaborations such as the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider have traditionally addressed their data reduction and analysis needs by building and maintaining dedicated computational infrastructure. Emerging cloud computing services such as Amazon's Elastic Compute Cloud (EC2) offer short-term CPU and storage resources with costs based on usage. These services allow experiments to purchase computing resources as needed, without significant prior planning and without long term investments in facilities and their management. We have demonstrated that services such as EC2 can successfully be integrated into the production-computing model of CMS, and find that they work very well as worker nodes. The cost-structure and transient nature of EC2 services makes them inappropriate for some CMS production services and functions. We also found that the resources are not truely "on-demand" as limits and caps on usage are imposed. Our trial workflows allow us to make a cost comparison between EC2 resources and dedicated CMS resources at a University, and conclude that it is most cost effective to purchase dedicated resources for the "base-line" needs of experiments such as CMS. However, if the ability to use cloud computing resources is built into an experiment's software framework before demand requires their use, cloud computing resources make sense for bursting during times when spikes in usage are required.
O'Donnell, Michael
2015-01-01
State-and-transition simulation modeling relies on knowledge of vegetation composition and structure (states) that describe community conditions, mechanistic feedbacks such as fire that can affect vegetation establishment, and ecological processes that drive community conditions as well as the transitions between these states. However, as the need for modeling larger and more complex landscapes increase, a more advanced awareness of computing resources becomes essential. The objectives of this study include identifying challenges of executing state-and-transition simulation models, identifying common bottlenecks of computing resources, developing a workflow and software that enable parallel processing of Monte Carlo simulations, and identifying the advantages and disadvantages of different computing resources. To address these objectives, this study used the ApexRMS® SyncroSim software and embarrassingly parallel tasks of Monte Carlo simulations on a single multicore computer and on distributed computing systems. The results demonstrated that state-and-transition simulation models scale best in distributed computing environments, such as high-throughput and high-performance computing, because these environments disseminate the workloads across many compute nodes, thereby supporting analysis of larger landscapes, higher spatial resolution vegetation products, and more complex models. Using a case study and five different computing environments, the top result (high-throughput computing versus serial computations) indicated an approximate 96.6% decrease of computing time. With a single, multicore compute node (bottom result), the computing time indicated an 81.8% decrease relative to using serial computations. These results provide insight into the tradeoffs of using different computing resources when research necessitates advanced integration of ecoinformatics incorporating large and complicated data inputs and models. - See more at: http://aimspress.com/aimses/ch/reader/view_abstract.aspx?file_no=Environ2015030&flag=1#sthash.p1XKDtF8.dpuf
NASA Astrophysics Data System (ADS)
Newman, Gregory A.
2014-01-01
Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
Flexible services for the support of research.
Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John
2013-01-28
Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katz, Daniel S; Jha, Shantenu; Weissman, Jon
2017-01-31
This is the final technical report for the AIMES project. Many important advances in science and engineering are due to large-scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. The AIMES project was conceived against this backdrop, following on the heels of a comprehensive survey of scientific distributed applications. AIMES laid the foundations to address the tripartite challenge of dynamic resource management, integrating information, and portable and interoperablemore » distributed applications. Four abstractions were defined and implemented: skeleton, resource bundle, pilot, and execution strategy. The four abstractions were implemented into software modules and then aggregated into the AIMES middleware. This middleware successfully integrates information across the application layer (skeletons) and resource layer (Bundles), derives a suitable execution strategy for the given skeleton and enacts its execution by means of pilots on one or more resources, depending on the application requirements, and resource availabilities and capabilities.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weissman, Jon; Katz, Dan; Jha, Shantenu
2017-01-31
This is the final technical report for the AIMES project. Many important advances in science and engineering are due to large scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. The AIMES project was conceived against this backdrop, following on the heels of a comprehensive survey of scientific distributed applications. AIMES laid the foundations to address the tripartite challenge of dynamic resource management, integrating information, and portable andmore » interoperable distributed applications. Four abstractions were defined and implemented: skeleton, resource bundle, pilot, and execution strategy. The four abstractions were implemented into software modules and then aggregated into the AIMES middleware. This middleware successfully integrates information across the application layer (skeletons) and resource layer (Bundles), derives a suitable execution strategy for the given skeleton and enacts its execution by means of pilots on one or more resources, depending on the application requirements, and resource availabilities and capabilities.« less
NASA Astrophysics Data System (ADS)
Anderson, Delia Marie Castro
Computer literacy and use have become commonplace in our colleges and universities. In an environment that demands the use of technology, educators should be knowledgeable of the components that make up the overall computer attitude of students and be willing to investigate the processes and techniques of effective teaching and learning that can take place with computer technology. The purpose of this study is two fold. First, it investigates the relationship between computer attitudes and gender, ethnicity, and computer experience. Second, it addresses the question of whether, and to what extent, students' attitudes toward computers change over a 16 week period in an undergraduate microbiology course that supplements the traditional lecture with computer-driven assignments. Multiple regression analyses, using data from the Computer Attitudes Scale (Loyd & Loyd, 1985), showed that, in the experimental group, no significant relationships were found between computer anxiety and gender or ethnicity or between computer confidence and gender or ethnicity. However, students who used computers the longest (p = .001) and who were self-taught (p = .046) had the lowest computer anxiety levels. Likewise students who used computers the longest (p = .001) and who were self-taught (p = .041) had the highest confidence levels. No significant relationships between computer liking, usefulness, or the use of Internet resources and gender, ethnicity, or computer experience were found. Dependent T-tests were performed to determine whether computer attitude scores (pretest and posttest) increased over a 16-week period for students who had been exposed to computer-driven assignments and other Internet resources. Results showed that students in the experimental group were less anxious about working with computers and considered computers to be more useful. In the control group, no significant changes in computer anxiety, confidence, liking, or usefulness were noted. Overall, students in the experimental group, who responded to the use of Internet Resources Survey, were positive (mean of 3.4 on the 4-point scale) toward their use of Internet resources which included the online courseware developed by the researcher. Findings from this study suggest that (1) the digital divide with respect to gender and ethnicity may be narrowing, and (2) students who are exposed to a course that augments computer-driven courseware with traditional teaching methods appear to have less anxiety, have a clearer perception of computer usefulness, and feel that online resources enhance their learning.
Real science at the petascale.
Saksena, Radhika S; Boghosian, Bruce; Fazendeiro, Luis; Kenway, Owain A; Manos, Steven; Mazzeo, Marco D; Sadiq, S Kashif; Suter, James L; Wright, David; Coveney, Peter V
2009-06-28
We describe computational science research that uses petascale resources to achieve scientific results at unprecedented scales and resolution. The applications span a wide range of domains, from investigation of fundamental problems in turbulence through computational materials science research to biomedical applications at the forefront of HIV/AIDS research and cerebrovascular haemodynamics. This work was mainly performed on the US TeraGrid 'petascale' resource, Ranger, at Texas Advanced Computing Center, in the first half of 2008 when it was the largest computing system in the world available for open scientific research. We have sought to use this petascale supercomputer optimally across application domains and scales, exploiting the excellent parallel scaling performance found on up to at least 32 768 cores for certain of our codes in the so-called 'capability computing' category as well as high-throughput intermediate-scale jobs for ensemble simulations in the 32-512 core range. Furthermore, this activity provides evidence that conventional parallel programming with MPI should be successful at the petascale in the short to medium term. We also report on the parallel performance of some of our codes on up to 65 636 cores on the IBM Blue Gene/P system at the Argonne Leadership Computing Facility, which has recently been named the fastest supercomputer in the world for open science.
Short-term Temperature Prediction Using Adaptive Computing on Dynamic Scales
NASA Astrophysics Data System (ADS)
Hu, W.; Cervone, G.; Jha, S.; Balasubramanian, V.; Turilli, M.
2017-12-01
When predicting temperature, there are specific places and times when high accuracy predictions are harder. For example, not all the sub-regions in the domain require the same amount of computing resources to generate an accurate prediction. Plateau areas might require less computing resources than mountainous areas because of the steeper gradient of temperature change in the latter. However, it is difficult to estimate beforehand the optimal allocation of computational resources because several parameters play a role in determining the accuracy of the forecasts, in addition to orography. The allocation of resources to perform simulations can become a bottleneck because it requires human intervention to stop jobs or start new ones. The goal of this project is to design and develop a dynamic approach to generate short-term temperature predictions that can automatically determines the required computing resources and the geographic scales of the predictions based on the spatial and temporal uncertainties. The predictions and the prediction quality metrics are computed using a numeric weather prediction model, Analog Ensemble (AnEn), and the parallelization on high performance computing systems is accomplished using Ensemble Toolkit, one component of the RADICAL-Cybertools family of tools. RADICAL-Cybertools decouple the science needs from the computational capabilities by building an intermediate layer to run general ensemble patterns, regardless of the science. In this research, we show how the ensemble toolkit allows generating high resolution temperature forecasts at different spatial and temporal resolution. The AnEn algorithm is run using NAM analysis and forecasts data for the continental United States for a period of 2 years. AnEn results show that temperature forecasts perform well according to different probabilistic and deterministic statistical tests.
Dinov, Ivo D; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H V; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D Stott; Toga, Arthur W
2008-05-28
The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.
AGIS: The ATLAS Grid Information System
NASA Astrophysics Data System (ADS)
Anisenkov, A.; Di Girolamo, A.; Klimentov, A.; Oleynik, D.; Petrosyan, A.; Atlas Collaboration
2014-06-01
ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produced petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization and computing resources able to meet ATLAS requirements of petabytes scale data operations. In this paper we describe the ATLAS Grid Information System (AGIS), designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.
Cloud computing for genomic data analysis and collaboration.
Langmead, Ben; Nellore, Abhinav
2018-04-01
Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.
Dinov, Ivo D.; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H. V.; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D. Stott; Toga, Arthur W.
2008-01-01
The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu. PMID:18509477
High-Throughput Computing on High-Performance Platforms: A Case Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oleynik, D; Panitkin, S; Matteo, Turilli
The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan -- a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i)more » a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.« less
New design for interfacing computers to the Octopus network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sloan, L.J.
1977-03-14
The Lawrence Livermore Laboratory has several large-scale computers which are connected to the Octopus network. Several difficulties arise in providing adequate resources along with reliable performance. To alleviate some of these problems a new method of bringing large computers into the Octopus environment is proposed.
A Grid Infrastructure for Supporting Space-based Science Operations
NASA Technical Reports Server (NTRS)
Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)
2002-01-01
Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.
Integrating Cloud-Computing-Specific Model into Aircraft Design
NASA Astrophysics Data System (ADS)
Zhimin, Tian; Qi, Lin; Guangwen, Yang
Cloud Computing is becoming increasingly relevant, as it will enable companies involved in spreading this technology to open the door to Web 3.0. In the paper, the new categories of services introduced will slowly replace many types of computational resources currently used. In this perspective, grid computing, the basic element for the large scale supply of cloud services, will play a fundamental role in defining how those services will be provided. The paper tries to integrate cloud computing specific model into aircraft design. This work has acquired good results in sharing licenses of large scale and expensive software, such as CFD (Computational Fluid Dynamics), UG, CATIA, and so on.
NASA Technical Reports Server (NTRS)
Cushman, Paula P.
1993-01-01
Research will be undertaken in this contract in the area of Modeling Resource and Facilities Enhancement to include computer, technical and educational support to NASA investigators to facilitate model implementation, execution and analysis of output; to provide facilities linking USRA and the NASA/EADS Computer System as well as resident work stations in ESAD; and to provide a centralized location for documentation, archival and dissemination of modeling information pertaining to NASA's program. Additional research will be undertaken in the area of Numerical Model Scale Interaction/Convective Parameterization Studies to include implementation of the comparison of cloud and rain systems and convective-scale processes between the model simulations and what was observed; and to incorporate the findings of these and related research findings in at least two refereed journal articles.
Task Assignment Heuristics for Distributed CFD Applications
NASA Technical Reports Server (NTRS)
Lopez-Benitez, N.; Djomehri, M. J.; Biswas, R.; Biegel, Bryan (Technical Monitor)
2001-01-01
CFD applications require high-performance computational platforms: 1. Complex physics and domain configuration demand strongly coupled solutions; 2. Applications are CPU and memory intensive; and 3. Huge resource requirements can only be satisfied by teraflop-scale machines or distributed computing.
Marine Corps Private Cloud Computing Environment Strategy
2012-05-15
leveraging economies of scale through the MCEITS PCCE, the Marine Corps will measure consumed IT resources more effectively, increase or decrease...flexible broad network access, resource pooling, elastic provisioning and measured services. By leveraging economies of scale the Marine Corps will be able...IaaS SaaS / IaaS 1 1 LCE I ACE Dets I I I I ------------------~ GIG / CJ Internet Security Boundary MCEN I DISN r :------------------ MCEN
AGIS: The ATLAS Grid Information System
NASA Astrophysics Data System (ADS)
Anisenkov, Alexey; Belov, Sergey; Di Girolamo, Alessandro; Gayazov, Stavro; Klimentov, Alexei; Oleynik, Danila; Senchenko, Alexander
2012-12-01
ATLAS is a particle physics experiment at the Large Hadron Collider at CERN. The experiment produces petabytes of data annually through simulation production and tens petabytes of data per year from the detector itself. The ATLAS Computing model embraces the Grid paradigm and a high degree of decentralization and computing resources able to meet ATLAS requirements of petabytes scale data operations. In this paper we present ATLAS Grid Information System (AGIS) designed to integrate configuration and status information about resources, services and topology of whole ATLAS Grid needed by ATLAS Distributed Computing applications and services.
Infrastructures for Distributed Computing: the case of BESIII
NASA Astrophysics Data System (ADS)
Pellegrino, J.
2018-05-01
The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.
Large Scale Computing and Storage Requirements for High Energy Physics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerber, Richard A.; Wasserman, Harvey
2010-11-24
The National Energy Research Scientific Computing Center (NERSC) is the leading scientific computing facility for the Department of Energy's Office of Science, providing high-performance computing (HPC) resources to more than 3,000 researchers working on about 400 projects. NERSC provides large-scale computing resources and, crucially, the support and expertise needed for scientists to make effective use of them. In November 2009, NERSC, DOE's Office of Advanced Scientific Computing Research (ASCR), and DOE's Office of High Energy Physics (HEP) held a workshop to characterize the HPC resources needed at NERSC to support HEP research through the next three to five years. Themore » effort is part of NERSC's legacy of anticipating users needs and deploying resources to meet those demands. The workshop revealed several key points, in addition to achieving its goal of collecting and characterizing computing requirements. The chief findings: (1) Science teams need access to a significant increase in computational resources to meet their research goals; (2) Research teams need to be able to read, write, transfer, store online, archive, analyze, and share huge volumes of data; (3) Science teams need guidance and support to implement their codes on future architectures; and (4) Projects need predictable, rapid turnaround of their computational jobs to meet mission-critical time constraints. This report expands upon these key points and includes others. It also presents a number of case studies as representative of the research conducted within HEP. Workshop participants were asked to codify their requirements in this case study format, summarizing their science goals, methods of solution, current and three-to-five year computing requirements, and software and support needs. Participants were also asked to describe their strategy for computing in the highly parallel, multi-core environment that is expected to dominate HPC architectures over the next few years. The report includes a section that describes efforts already underway or planned at NERSC that address requirements collected at the workshop. NERSC has many initiatives in progress that address key workshop findings and are aligned with NERSC's strategic plans.« less
Distributed intrusion detection system based on grid security model
NASA Astrophysics Data System (ADS)
Su, Jie; Liu, Yahui
2008-03-01
Grid computing has developed rapidly with the development of network technology and it can solve the problem of large-scale complex computing by sharing large-scale computing resource. In grid environment, we can realize a distributed and load balance intrusion detection system. This paper first discusses the security mechanism in grid computing and the function of PKI/CA in the grid security system, then gives the application of grid computing character in the distributed intrusion detection system (IDS) based on Artificial Immune System. Finally, it gives a distributed intrusion detection system based on grid security system that can reduce the processing delay and assure the detection rates.
Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores
NASA Astrophysics Data System (ADS)
Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Hurtado Anampa, Kenyi; Yannakopoulos, Anna; Tovar, Benjamin; Donnelly, Patrick; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas
2017-10-01
We previously described Lobster, a workflow management tool for exploiting volatile opportunistic computing resources for computation in HEP. We will discuss the various challenges that have been encountered while scaling up the simultaneous CPU core utilization and the software improvements required to overcome these challenges. Categories: Workflows can now be divided into categories based on their required system resources. This allows the batch queueing system to optimize assignment of tasks to nodes with the appropriate capabilities. Within each category, limits can be specified for the number of running jobs to regulate the utilization of communication bandwidth. System resource specifications for a task category can now be modified while a project is running, avoiding the need to restart the project if resource requirements differ from the initial estimates. Lobster now implements time limits on each task category to voluntarily terminate tasks. This allows partially completed work to be recovered. Workflow dependency specification: One workflow often requires data from other workflows as input. Rather than waiting for earlier workflows to be completed before beginning later ones, Lobster now allows dependent tasks to begin as soon as sufficient input data has accumulated. Resource monitoring: Lobster utilizes a new capability in Work Queue to monitor the system resources each task requires in order to identify bottlenecks and optimally assign tasks. The capability of the Lobster opportunistic workflow management system for HEP computation has been significantly increased. We have demonstrated efficient utilization of 25 000 non-dedicated cores and achieved a data input rate of 30 Gb/s and an output rate of 500GB/h. This has required new capabilities in task categorization, workflow dependency specification, and resource monitoring.
Enabling Large-Scale Biomedical Analysis in the Cloud
Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen
2013-01-01
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
Managing competing elastic Grid and Cloud scientific computing applications using OpenNebula
NASA Astrophysics Data System (ADS)
Bagnasco, S.; Berzano, D.; Lusso, S.; Masera, M.; Vallero, S.
2015-12-01
Elastic cloud computing applications, i.e. applications that automatically scale according to computing needs, work on the ideal assumption of infinite resources. While large public cloud infrastructures may be a reasonable approximation of this condition, scientific computing centres like WLCG Grid sites usually work in a saturated regime, in which applications compete for scarce resources through queues, priorities and scheduling policies, and keeping a fraction of the computing cores idle to allow for headroom is usually not an option. In our particular environment one of the applications (a WLCG Tier-2 Grid site) is much larger than all the others and cannot autoscale easily. Nevertheless, other smaller applications can benefit of automatic elasticity; the implementation of this property in our infrastructure, based on the OpenNebula cloud stack, will be described and the very first operational experiences with a small number of strategies for timely allocation and release of resources will be discussed.
Performance Evaluation of Resource Management in Cloud Computing Environments.
Batista, Bruno Guazzelli; Estrella, Julio Cezar; Ferreira, Carlos Henrique Gomes; Filho, Dionisio Machado Leite; Nakamura, Luis Hideo Vasconcelos; Reiff-Marganiec, Stephan; Santana, Marcos José; Santana, Regina Helena Carlucci
2015-01-01
Cloud computing is a computational model in which resource providers can offer on-demand services to clients in a transparent way. However, to be able to guarantee quality of service without limiting the number of accepted requests, providers must be able to dynamically manage the available resources so that they can be optimized. This dynamic resource management is not a trivial task, since it involves meeting several challenges related to workload modeling, virtualization, performance modeling, deployment and monitoring of applications on virtualized resources. This paper carries out a performance evaluation of a module for resource management in a cloud environment that includes handling available resources during execution time and ensuring the quality of service defined in the service level agreement. An analysis was conducted of different resource configurations to define which dimension of resource scaling has a real influence on client requests. The results were used to model and implement a simulated cloud system, in which the allocated resource can be changed on-the-fly, with a corresponding change in price. In this way, the proposed module seeks to satisfy both the client by ensuring quality of service, and the provider by ensuring the best use of resources at a fair price.
Performance Evaluation of Resource Management in Cloud Computing Environments
Batista, Bruno Guazzelli; Estrella, Julio Cezar; Ferreira, Carlos Henrique Gomes; Filho, Dionisio Machado Leite; Nakamura, Luis Hideo Vasconcelos; Reiff-Marganiec, Stephan; Santana, Marcos José; Santana, Regina Helena Carlucci
2015-01-01
Cloud computing is a computational model in which resource providers can offer on-demand services to clients in a transparent way. However, to be able to guarantee quality of service without limiting the number of accepted requests, providers must be able to dynamically manage the available resources so that they can be optimized. This dynamic resource management is not a trivial task, since it involves meeting several challenges related to workload modeling, virtualization, performance modeling, deployment and monitoring of applications on virtualized resources. This paper carries out a performance evaluation of a module for resource management in a cloud environment that includes handling available resources during execution time and ensuring the quality of service defined in the service level agreement. An analysis was conducted of different resource configurations to define which dimension of resource scaling has a real influence on client requests. The results were used to model and implement a simulated cloud system, in which the allocated resource can be changed on-the-fly, with a corresponding change in price. In this way, the proposed module seeks to satisfy both the client by ensuring quality of service, and the provider by ensuring the best use of resources at a fair price. PMID:26555730
Cloudbus Toolkit for Market-Oriented Cloud Computing
NASA Astrophysics Data System (ADS)
Buyya, Rajkumar; Pandey, Suraj; Vecchiola, Christian
This keynote paper: (1) presents the 21st century vision of computing and identifies various IT paradigms promising to deliver computing as a utility; (2) defines the architecture for creating market-oriented Clouds and computing atmosphere by leveraging technologies such as virtual machines; (3) provides thoughts on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain SLA-oriented resource allocation; (4) presents the work carried out as part of our new Cloud Computing initiative, called Cloudbus: (i) Aneka, a Platform as a Service software system containing SDK (Software Development Kit) for construction of Cloud applications and deployment on private or public Clouds, in addition to supporting market-oriented resource management; (ii) internetworking of Clouds for dynamic creation of federated computing environments for scaling of elastic applications; (iii) creation of 3rd party Cloud brokering services for building content delivery networks and e-Science applications and their deployment on capabilities of IaaS providers such as Amazon along with Grid mashups; (iv) CloudSim supporting modelling and simulation of Clouds for performance studies; (v) Energy Efficient Resource Allocation Mechanisms and Techniques for creation and management of Green Clouds; and (vi) pathways for future research.
Cognitive Model Exploration and Optimization: A New Challenge for Computational Science
2010-01-01
Introduction Research in cognitive science often involves the generation and analysis of computational cognitive models to explain various...HPC) clusters and volunteer computing for large-scale computational resources. The majority of applications on the Department of Defense HPC... clusters focus on solving partial differential equations (Post, 2009). These tend to be lean, fast models with little noise. While we lack specific
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Formation of Virtual Organizations in Grids: A Game-Theoretic Approach
NASA Astrophysics Data System (ADS)
Carroll, Thomas E.; Grosu, Daniel
The execution of large scale grid applications requires the use of several computational resources owned by various Grid Service Providers (GSPs). GSPs must form Virtual Organizations (VOs) to be able to provide the composite resource to these applications. We consider grids as self-organizing systems composed of autonomous, self-interested GSPs that will organize themselves into VOs with every GSP having the objective of maximizing its profit. We formulate the resource composition among GSPs as a coalition formation problem and propose a game-theoretic framework based on cooperation structures to model it. Using this framework, we design a resource management system that supports the VO formation among GSPs in a grid computing system.
A large-scale solar dynamics observatory image dataset for computer vision applications.
Kucuk, Ahmet; Banda, Juan M; Angryk, Rafal A
2017-01-01
The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun's activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA's solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.
Lattice Boltzmann for Airframe Noise Predictions
NASA Technical Reports Server (NTRS)
Barad, Michael; Kocheemoolayil, Joseph; Kiris, Cetin
2017-01-01
Increase predictive use of High-Fidelity Computational Aero- Acoustics (CAA) capabilities for NASA's next generation aviation concepts. CFD has been utilized substantially in analysis and design for steady-state problems (RANS). Computational resources are extremely challenged for high-fidelity unsteady problems (e.g. unsteady loads, buffet boundary, jet and installation noise, fan noise, active flow control, airframe noise, etc) ü Need novel techniques for reducing the computational resources consumed by current high-fidelity CAA Need routine acoustic analysis of aircraft components at full-scale Reynolds number from first principles Need an order of magnitude reduction in wall time to solution!
Using Computing and Data Grids for Large-Scale Science and Engineering
NASA Technical Reports Server (NTRS)
Johnston, William E.
2001-01-01
We use the term "Grid" to refer to a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. These emerging data and computing Grids promise to provide a highly capable and scalable environment for addressing large-scale science problems. We describe the requirements for science Grids, the resulting services and architecture of NASA's Information Power Grid (IPG) and DOE's Science Grid, and some of the scaling issues that have come up in their implementation.
Scaling up ATLAS Event Service to production levels on opportunistic computing platforms
NASA Astrophysics Data System (ADS)
Benjamin, D.; Caballero, J.; Ernst, M.; Guan, W.; Hover, J.; Lesny, D.; Maeno, T.; Nilsson, P.; Tsulaia, V.; van Gemmeren, P.; Vaniachine, A.; Wang, F.; Wenaus, T.; ATLAS Collaboration
2016-10-01
Continued growth in public cloud and HPC resources is on track to exceed the dedicated resources available for ATLAS on the WLCG. Examples of such platforms are Amazon AWS EC2 Spot Instances, Edison Cray XC30 supercomputer, backfill at Tier 2 and Tier 3 sites, opportunistic resources at the Open Science Grid (OSG), and ATLAS High Level Trigger farm between the data taking periods. Because of specific aspects of opportunistic resources such as preemptive job scheduling and data I/O, their efficient usage requires workflow innovations provided by the ATLAS Event Service. Thanks to the finer granularity of the Event Service data processing workflow, the opportunistic resources are used more efficiently. We report on our progress in scaling opportunistic resource usage to double-digit levels in ATLAS production.
A Development of Lightweight Grid Interface
NASA Astrophysics Data System (ADS)
Iwai, G.; Kawai, Y.; Sasaki, T.; Watase, Y.
2011-12-01
In order to help a rapid development of Grid/Cloud aware applications, we have developed API to abstract the distributed computing infrastructures based on SAGA (A Simple API for Grid Applications). SAGA, which is standardized in the OGF (Open Grid Forum), defines API specifications to access distributed computing infrastructures, such as Grid, Cloud and local computing resources. The Universal Grid API (UGAPI), which is a set of command line interfaces (CLI) and APIs, aims to offer simpler API to combine several SAGA interfaces with richer functionalities. These CLIs of the UGAPI offer typical functionalities required by end users for job management and file access to the different distributed computing infrastructures as well as local computing resources. We have also built a web interface for the particle therapy simulation and demonstrated the large scale calculation using the different infrastructures at the same time. In this paper, we would like to present how the web interface based on UGAPI and SAGA achieve more efficient utilization of computing resources over the different infrastructures with technical details and practical experiences.
Computing with scale-invariant neural representations
NASA Astrophysics Data System (ADS)
Howard, Marc; Shankar, Karthik
The Weber-Fechner law is perhaps the oldest quantitative relationship in psychology. Consider the problem of the brain representing a function f (x) . Different neurons have receptive fields that support different parts of the range, such that the ith neuron has a receptive field at xi. Weber-Fechner scaling refers to the finding that the width of the receptive field scales with xi as does the difference between the centers of adjacent receptive fields. Weber-Fechner scaling is exponentially resource-conserving. Neurophysiological evidence suggests that neural representations obey Weber-Fechner scaling in the visual system and perhaps other systems as well. We describe an optimality constraint that is solved by Weber-Fechner scaling, providing an information-theoretic rationale for this principle of neural coding. Weber-Fechner scaling can be generated within a mathematical framework using the Laplace transform. Within this framework, simple computations such as translation, correlation and cross-correlation can be accomplished. This framework can in principle be extended to provide a general computational language for brain-inspired cognitive computation on scale-invariant representations. Supported by NSF PHY 1444389 and the BU Initiative for the Physics and Mathematics of Neural Systems,.
NASA Astrophysics Data System (ADS)
Moore, R. T.; Hansen, M. C.
2011-12-01
Google Earth Engine is a new technology platform that enables monitoring and measurement of changes in the earth's environment, at planetary scale, on a large catalog of earth observation data. The platform offers intrinsically-parallel computational access to thousands of computers in Google's data centers. Initial efforts have focused primarily on global forest monitoring and measurement, in support of REDD+ activities in the developing world. The intent is to put this platform into the hands of scientists and developing world nations, in order to advance the broader operational deployment of existing scientific methods, and strengthen the ability for public institutions and civil society to better understand, manage and report on the state of their natural resources. Earth Engine currently hosts online nearly the complete historical Landsat archive of L5 and L7 data collected over more than twenty-five years. Newly-collected Landsat imagery is downloaded from USGS EROS Center into Earth Engine on a daily basis. Earth Engine also includes a set of historical and current MODIS data products. The platform supports generation, on-demand, of spatial and temporal mosaics, "best-pixel" composites (for example to remove clouds and gaps in satellite imagery), as well as a variety of spectral indices. Supervised learning methods are available over the Landsat data catalog. The platform also includes a new application programming framework, or "API", that allows scientists access to these computational and data resources, to scale their current algorithms or develop new ones. Under the covers of the Google Earth Engine API is an intrinsically-parallel image-processing system. Several forest monitoring applications powered by this API are currently in development and expected to be operational in 2011. Combining science with massive data and technology resources in a cloud-computing framework can offer advantages of computational speed, ease-of-use and collaboration, as well as transparency in data and methods. Methods developed for global processing of MODIS data to map land cover are being adopted for use with Landsat data. Specifically, the MODIS Vegetation Continuous Field product methodology has been applied for mapping forest extent and change at national scales using Landsat time-series data sets. Scaling this method to continental and global scales is enabled by Google Earth Engine computing capabilities. By combining the supervised learning VCF approach with the Landsat archive and cloud computing, unprecedented monitoring of land cover dynamics is enabled.
Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun
2012-01-01
Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
Cloud Computing and Your Library
ERIC Educational Resources Information Center
Mitchell, Erik T.
2010-01-01
One of the first big shifts in how libraries manage resources was the move from print-journal purchasing models to database-subscription and electronic-journal purchasing models. Libraries found that this transition helped them scale their resources and provide better service just by thinking a bit differently about their services. Likewise,…
NASA Technical Reports Server (NTRS)
Morrell, R. A.; Odoherty, R. J.; Ramsey, H. R.; Reynolds, C. C.; Willoughby, J. K.; Working, R. D.
1975-01-01
Data and analyses related to a variety of algorithms for solving typical large-scale scheduling and resource allocation problems are presented. The capabilities and deficiencies of various alternative problem solving strategies are discussed from the viewpoint of computer system design.
AdjScales: Visualizing Differences between Adjectives for Language Learners
NASA Astrophysics Data System (ADS)
Sheinman, Vera; Tokunaga, Takenobu
In this study we introduce AdjScales, a method for scaling similar adjectives by their strength. It combines existing Web-based computational linguistic techniques in order to automatically differentiate between similar adjectives that describe the same property by strength. Though this kind of information is rarely present in most of the lexical resources and dictionaries, it may be useful for language learners that try to distinguish between similar words. Additionally, learners might gain from a simple visualization of these differences using unidimensional scales. The method is evaluated by comparison with annotation on a subset of adjectives from WordNet by four native English speakers. It is also compared against two non-native speakers of English. The collected annotation is an interesting resource in its own right. This work is a first step toward automatic differentiation of meaning between similar words for language learners. AdjScales can be useful for lexical resource enhancement.
Panchabhai, T S; Dangayach, N S; Mehta, V S; Patankar, C V; Rege, N N
2011-01-01
Computer usage capabilities of medical students for introduction of computer-aided learning have not been adequately assessed. Cross-sectional study to evaluate computer literacy among medical students. Tertiary care teaching hospital in Mumbai, India. Participants were administered a 52-question questionnaire, designed to study their background, computer resources, computer usage, activities enhancing computer skills, and attitudes toward computer-aided learning (CAL). The data was classified on the basis of sex, native place, and year of medical school, and the computer resources were compared. The computer usage and attitudes toward computer-based learning were assessed on a five-point Likert scale, to calculate Computer usage score (CUS - maximum 55, minimum 11) and Attitude score (AS - maximum 60, minimum 12). The quartile distribution among the groups with respect to the CUS and AS was compared by chi-squared tests. The correlation between CUS and AS was then tested. Eight hundred and seventy-five students agreed to participate in the study and 832 completed the questionnaire. One hundred and twenty eight questionnaires were excluded and 704 were analyzed. Outstation students had significantly lesser computer resources as compared to local students (P<0.0001). The mean CUS for local students (27.0±9.2, Mean±SD) was significantly higher than outstation students (23.2±9.05). No such difference was observed for the AS. The means of CUS and AS did not differ between males and females. The CUS and AS had positive, but weak correlations for all subgroups. The weak correlation between AS and CUS for all students could be explained by the lack of computer resources or inadequate training to use computers for learning. Providing additional resources would benefit the subset of outstation students with lesser computer resources. This weak correlation between the attitudes and practices of all students needs to be investigated. We believe that this gap can be bridged with a structured computer learning program.
Performance of VPIC on Sequoia
NASA Astrophysics Data System (ADS)
Nystrom, William
2014-10-01
Sequoia is a major DOE computing resource which is characteristic of future resources in that it has many threads per compute node, 64, and the individual processor cores are simpler and less powerful than cores on previous processors like Intel's Sandy Bridge or AMD's Opteron. An effort is in progress to port VPIC to the Blue Gene Q architecture of Sequoia and evaluate its performance. Results of this work will be presented on single node performance of VPIC as well as multi-node scaling.
NASA Technical Reports Server (NTRS)
Morgan, R. P.; Singh, J. P.; Rothenberg, D.; Robinson, B. E.
1975-01-01
The needs to be served, the subsectors in which the system might be used, the technology employed, and the prospects for future utilization of an educational telecommunications delivery system are described and analyzed. Educational subsectors are analyzed with emphasis on the current status and trends within each subsector. Issues which affect future development, and prospects for future use of media, technology, and large-scale electronic delivery within each subsector are included. Information on technology utilization is presented. Educational telecommunications services are identified and grouped into categories: public television and radio, instructional television, computer aided instruction, computer resource sharing, and information resource sharing. Technology based services, their current utilization, and factors which affect future development are stressed. The role of communications satellites in providing these services is discussed. Efforts to analyze and estimate future utilization of large-scale educational telecommunications are summarized. Factors which affect future utilization are identified. Conclusions are presented.
Concepts and Plans towards fast large scale Monte Carlo production for the ATLAS Experiment
NASA Astrophysics Data System (ADS)
Ritsch, E.; Atlas Collaboration
2014-06-01
The huge success of the physics program of the ATLAS experiment at the Large Hadron Collider (LHC) during Run 1 relies upon a great number of simulated Monte Carlo events. This Monte Carlo production takes the biggest part of the computing resources being in use by ATLAS as of now. In this document we describe the plans to overcome the computing resource limitations for large scale Monte Carlo production in the ATLAS Experiment for Run 2, and beyond. A number of fast detector simulation, digitization and reconstruction techniques are being discussed, based upon a new flexible detector simulation framework. To optimally benefit from these developments, a redesigned ATLAS MC production chain is presented at the end of this document.
On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers
NASA Astrophysics Data System (ADS)
Erli, G.; Fischer, F.; Fleig, G.; Giffels, M.; Hauth, T.; Quast, G.; Schnepf, M.; Heese, J.; Leppert, K.; Arnaez de Pedro, J.; Sträter, R.
2017-10-01
This contribution reports on solutions, experiences and recent developments with the dynamic, on-demand provisioning of remote computing resources for analysis and simulation workflows. Local resources of a physics institute are extended by private and commercial cloud sites, ranging from the inclusion of desktop clusters over institute clusters to HPC centers. Rather than relying on dedicated HEP computing centers, it is nowadays more reasonable and flexible to utilize remote computing capacity via virtualization techniques or container concepts. We report on recent experience from incorporating a remote HPC center (NEMO Cluster, Freiburg University) and resources dynamically requested from the commercial provider 1&1 Internet SE into our intitute’s computing infrastructure. The Freiburg HPC resources are requested via the standard batch system, allowing HPC and HEP applications to be executed simultaneously, such that regular batch jobs run side by side to virtual machines managed via OpenStack [1]. For the inclusion of the 1&1 commercial resources, a Python API and SDK as well as the possibility to upload images were available. Large scale tests prove the capability to serve the scientific use case in the European 1&1 datacenters. The described environment at the Institute of Experimental Nuclear Physics (IEKP) at KIT serves the needs of researchers participating in the CMS and Belle II experiments. In total, resources exceeding half a million CPU hours have been provided by remote sites.
Optimizing Resource Utilization in Grid Batch Systems
NASA Astrophysics Data System (ADS)
Gellrich, Andreas
2012-12-01
On Grid sites, the requirements of the computing tasks (jobs) to computing, storage, and network resources differ widely. For instance Monte Carlo production jobs are almost purely CPU-bound, whereas physics analysis jobs demand high data rates. In order to optimize the utilization of the compute node resources, jobs must be distributed intelligently over the nodes. Although the job resource requirements cannot be deduced directly, jobs are mapped to POSIX UID/GID according to the VO, VOMS group and role information contained in the VOMS proxy. The UID/GID then allows to distinguish jobs, if users are using VOMS proxies as planned by the VO management, e.g. ‘role=production’ for Monte Carlo jobs. It is possible to setup and configure batch systems (queuing system and scheduler) at Grid sites based on these considerations although scaling limits were observed with the scheduler MAUI. In tests these limitations could be overcome with a home-made scheduler.
Requirements for fault-tolerant factoring on an atom-optics quantum computer.
Devitt, Simon J; Stephens, Ashley M; Munro, William J; Nemoto, Kae
2013-01-01
Quantum information processing and its associated technologies have reached a pivotal stage in their development, with many experiments having established the basic building blocks. Moving forward, the challenge is to scale up to larger machines capable of performing computational tasks not possible today. This raises questions that need to be urgently addressed, such as what resources these machines will consume and how large will they be. Here we estimate the resources required to execute Shor's factoring algorithm on an atom-optics quantum computer architecture. We determine the runtime and size of the computer as a function of the problem size and physical error rate. Our results suggest that once the physical error rate is low enough to allow quantum error correction, optimization to reduce resources and increase performance will come mostly from integrating algorithms and circuits within the error correction environment, rather than from improving the physical hardware.
Cloudbursting - Solving the 3-body problem
NASA Astrophysics Data System (ADS)
Chang, G.; Heistand, S.; Vakhnin, A.; Huang, T.; Zimdars, P.; Hua, H.; Hood, R.; Koenig, J.; Mehrotra, P.; Little, M. M.; Law, E.
2014-12-01
Many science projects in the future will be accomplished through collaboration among 2 or more NASA centers along with, potentially, external scientists. Science teams will be composed of more geographically dispersed individuals and groups. However, the current computing environment does not make this easy and seamless. By being able to share computing resources among members of a multi-center team working on a science/ engineering project, limited pre-competition funds could be more efficiently applied and technical work could be conducted more effectively with less time spent moving data or waiting for computing resources to free up. Based on the work from an NASA CIO IT Labs task, this presentation will highlight our prototype work in identifying the feasibility and identify the obstacles, both technical and management, to perform "Cloudbursting" among private clouds located at three different centers. We will demonstrate the use of private cloud computing infrastructure at the Jet Propulsion Laboratory, Langley Research Center, and Ames Research Center to provide elastic computation to each other to perform parallel Earth Science data imaging. We leverage elastic load balancing and auto-scaling features at each data center so that each location can independently define how many resources to allocate to a particular job that was "bursted" from another data center and demonstrate that compute capacity scales up and down with the job. We will also discuss future work in the area, which could include the use of cloud infrastructure from different cloud framework providers as well as other cloud service providers.
Connecting Performance Analysis and Visualization to Advance Extreme Scale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bremer, Peer-Timo; Mohr, Bernd; Schulz, Martin
2015-07-29
The characterization, modeling, analysis, and tuning of software performance has been a central topic in High Performance Computing (HPC) since its early beginnings. The overall goal is to make HPC software run faster on particular hardware, either through better scheduling, on-node resource utilization, or more efficient distributed communication.
Harb, Omar S; Roos, David S
2015-01-01
Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.
2011-01-01
Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Massive Cloud-Based Big Data Processing for Ocean Sensor Networks and Remote Sensing
NASA Astrophysics Data System (ADS)
Schwehr, K. D.
2017-12-01
Until recently, the work required to integrate and analyze data for global-scale environmental issues was prohibitive both in cost and availability. Traditional desktop processing systems are not able to effectively store and process all the data, and super computer solutions are financially out of the reach of most people. The availability of large-scale cloud computing has created tools that are usable by small groups and individuals regardless of financial resources or locally available computational resources. These systems give scientists and policymakers the ability to see how critical resources are being used across the globe with little or no barrier to entry. Google Earth Engine has the Moderate Resolution Imaging Spectroradiometer (MODIS) Terra, MODIS Aqua, and Global Land Data Assimilation Systems (GLDAS) data catalogs available live online. Here we demonstrate these data to calculate the correlation between lagged chlorophyll and rainfall to identify areas of eutrophication, matching these events to ocean currents from datasets like HYbrid Coordinate Ocean Model (HYCOM) to check if there are constraints from oceanographic configurations. The system can provide addition ground truth with observations from sensor networks like the International Comprehensive Ocean-Atmosphere Data Set / Voluntary Observing Ship (ICOADS/VOS) and Argo floats. This presentation is intended to introduce users to the datasets, programming idioms, and functionality of Earth Engine for large-scale, data-driven oceanography.
AGIS: Evolution of Distributed Computing information system for ATLAS
NASA Astrophysics Data System (ADS)
Anisenkov, A.; Di Girolamo, A.; Alandes, M.; Karavakis, E.
2015-12-01
ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produces petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization of computing resources in order to meet the ATLAS requirements of petabytes scale data operations. It has been evolved after the first period of LHC data taking (Run-1) in order to cope with new challenges of the upcoming Run- 2. In this paper we describe the evolution and recent developments of the ATLAS Grid Information System (AGIS), developed in order to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.
A characterization of workflow management systems for extreme-scale applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
A characterization of workflow management systems for extreme-scale applications
Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...
2017-02-16
We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.
Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285
Computational biology in the cloud: methods and new insights from computing at scale.
Kasson, Peter M
2013-01-01
The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
Progress in Machine Learning Studies for the CMS Computing Infrastructure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo
Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
The Influence of Large-Scale Computing on Aircraft Structural Design.
1986-04-01
the customer in the most cost- effective manner. Computer facility organizations became computer resource power brokers. A good data processing...capabilities generated on other processors can be easily used. This approach is easily implementable and provides a good strategy for using existing...assistance to member nations for the purpose of increasing their scientific and technical potential; - Recommending effective ways for the member nations to
Progress in Machine Learning Studies for the CMS Computing Infrastructure
Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo; ...
2017-12-06
Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
A Review Study on Cloud Computing Issues
NASA Astrophysics Data System (ADS)
Kanaan Kadhim, Qusay; Yusof, Robiah; Sadeq Mahdi, Hamid; Al-shami, Sayed Samer Ali; Rahayu Selamat, Siti
2018-05-01
Cloud computing is the most promising current implementation of utility computing in the business world, because it provides some key features over classic utility computing, such as elasticity to allow clients dynamically scale-up and scale-down the resources in execution time. Nevertheless, cloud computing is still in its premature stage and experiences lack of standardization. The security issues are the main challenges to cloud computing adoption. Thus, critical industries such as government organizations (ministries) are reluctant to trust cloud computing due to the fear of losing their sensitive data, as it resides on the cloud with no knowledge of data location and lack of transparency of Cloud Service Providers (CSPs) mechanisms used to secure their data and applications which have created a barrier against adopting this agile computing paradigm. This study aims to review and classify the issues that surround the implementation of cloud computing which a hot area that needs to be addressed by future research.
Job Superscheduler Architecture and Performance in Computational Grid Environments
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Oliker, Leonid; Biswas, Rupak
2003-01-01
Computational grids hold great promise in utilizing geographically separated heterogeneous resources to solve large-scale complex scientific problems. However, a number of major technical hurdles, including distributed resource management and effective job scheduling, stand in the way of realizing these gains. In this paper, we propose a novel grid superscheduler architecture and three distributed job migration algorithms. We also model the critical interaction between the superscheduler and autonomous local schedulers. Extensive performance comparisons with ideal, central, and local schemes using real workloads from leading computational centers are conducted in a simulation environment. Additionally, synthetic workloads are used to perform a detailed sensitivity analysis of our superscheduler. Several key metrics demonstrate that substantial performance gains can be achieved via smart superscheduling in distributed computational grids.
NASA Astrophysics Data System (ADS)
Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua
2017-12-01
Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
High Performance Geostatistical Modeling of Biospheric Resources
NASA Astrophysics Data System (ADS)
Pedelty, J. A.; Morisette, J. T.; Smith, J. A.; Schnase, J. L.; Crosier, C. S.; Stohlgren, T. J.
2004-12-01
We are using parallel geostatistical codes to study spatial relationships among biospheric resources in several study areas. For example, spatial statistical models based on large- and small-scale variability have been used to predict species richness of both native and exotic plants (hot spots of diversity) and patterns of exotic plant invasion. However, broader use of geostastics in natural resource modeling, especially at regional and national scales, has been limited due to the large computing requirements of these applications. To address this problem, we implemented parallel versions of the kriging spatial interpolation algorithm. The first uses the Message Passing Interface (MPI) in a master/slave paradigm on an open source Linux Beowulf cluster, while the second is implemented with the new proprietary Xgrid distributed processing system on an Xserve G5 cluster from Apple Computer, Inc. These techniques are proving effective and provide the basis for a national decision support capability for invasive species management that is being jointly developed by NASA and the US Geological Survey.
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU
Xia, Yong; Zhang, Henggui
2015-01-01
Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU.
Xia, Yong; Wang, Kuanquan; Zhang, Henggui
2015-01-01
Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.
Design Tools for Evaluating Multiprocessor Programs
1976-07-01
than large uniprocessing machines, and 2. economies of scale in manufacturing. Perhaps the most compelling reason (possibly a consequence of the...speed, redundancy, (inefficiency, resource utilization, and economies of the components. [Browne 73, Lehman 66] 6. How can the system be scheduled...mejsures are interesting about the computation? Somn may be: speed, redundancy, (inefficiency, resource utilization, and economies of the components
The Next Frontier in Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarrao, John
2016-11-16
Exascale computing refers to computing systems capable of at least one exaflop or a billion calculations per second (1018). That is 50 times faster than the most powerful supercomputers being used today and represents a thousand-fold increase over the first petascale computer that came into operation in 2008. How we use these large-scale simulation resources is the key to solving some of today’s most pressing problems, including clean energy production, nuclear reactor lifetime extension and nuclear stockpile aging.
Use of Emerging Grid Computing Technologies for the Analysis of LIGO Data
NASA Astrophysics Data System (ADS)
Koranda, Scott
2004-03-01
The LIGO Scientific Collaboration (LSC) today faces the challenge of enabling analysis of terabytes of LIGO data by hundreds of scientists from institutions all around the world. To meet this challenge the LSC is developing tools, infrastructure, applications, and expertise leveraging Grid Computing technologies available today, and making available to LSC scientists compute resources at sites across the United States and Europe. We use digital credentials for strong and secure authentication and authorization to compute resources and data. Building on top of products from the Globus project for high-speed data transfer and information discovery we have created the Lightweight Data Replicator (LDR) to securely and robustly replicate data to resource sites. We have deployed at our computing sites the Virtual Data Toolkit (VDT) Server and Client packages, developed in collaboration with our partners in the GriPhyN and iVDGL projects, providing uniform access to distributed resources for users and their applications. Taken together these Grid Computing technologies and infrastructure have formed the LSC DataGrid--a coherent and uniform environment across two continents for the analysis of gravitational-wave detector data. Much work, however, remains in order to scale current analyses and recent lessons learned need to be integrated into the next generation of Grid middleware.
The Application of Large-Scale Hypermedia Information Systems to Training.
ERIC Educational Resources Information Center
Crowder, Richard; And Others
1995-01-01
Discusses the use of hypermedia in electronic information systems that support maintenance operations in large-scale industrial plants. Findings show that after establishing an information system, the same resource base can be used to train personnel how to use the computer system and how to perform operational and maintenance tasks. (Author/JMV)
Computational Science at the Argonne Leadership Computing Facility
NASA Astrophysics Data System (ADS)
Romero, Nichols
2014-03-01
The goal of the Argonne Leadership Computing Facility (ALCF) is to extend the frontiers of science by solving problems that require innovative approaches and the largest-scale computing systems. ALCF's most powerful computer - Mira, an IBM Blue Gene/Q system - has nearly one million cores. How does one program such systems? What software tools are available? Which scientific and engineering applications are able to utilize such levels of parallelism? This talk will address these questions and describe a sampling of projects that are using ALCF systems in their research, including ones in nanoscience, materials science, and chemistry. Finally, the ways to gain access to ALCF resources will be presented. This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.
Translational bioinformatics in the cloud: an affordable alternative
2010-01-01
With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073
The diverse use of clouds by CMS
Andronis, Anastasios; Bauer, Daniela; Chaze, Olivier; ...
2015-12-23
The resources CMS is using are increasingly being offered as clouds. In Run 2 of the LHC the majority of CMS CERN resources, both in Meyrin and at the Wigner Computing Centre, will be presented as cloud resources on which CMS will have to build its own infrastructure. This infrastructure will need to run all of the CMS workflows including: Tier 0, production and user analysis. In addition, the CMS High Level Trigger will provide a compute resource comparable in scale to the total offered by the CMS Tier 1 sites, when it is not running as part of themore » trigger system. During these periods a cloud infrastructure will be overlaid on this resource, making it accessible for general CMS use. Finally, CMS is starting to utilise cloud resources being offered by individual institutes and is gaining experience to facilitate the use of opportunistically available cloud resources. Lastly, we present a snap shot of this infrastructure and its operation at the time of the CHEP2015 conference.« less
Wong, William W L; Feng, Zeny Z; Thein, Hla-Hla
2016-11-01
Agent-based models (ABMs) are computer simulation models that define interactions among agents and simulate emergent behaviors that arise from the ensemble of local decisions. ABMs have been increasingly used to examine trends in infectious disease epidemiology. However, the main limitation of ABMs is the high computational cost for a large-scale simulation. To improve the computational efficiency for large-scale ABM simulations, we built a parallelizable sliding region algorithm (SRA) for ABM and compared it to a nonparallelizable ABM. We developed a complex agent network and performed two simulations to model hepatitis C epidemics based on the real demographic data from Saskatchewan, Canada. The first simulation used the SRA that processed on each postal code subregion subsequently. The second simulation processed the entire population simultaneously. It was concluded that the parallelizable SRA showed computational time saving with comparable results in a province-wide simulation. Using the same method, SRA can be generalized for performing a country-wide simulation. Thus, this parallel algorithm enables the possibility of using ABM for large-scale simulation with limited computational resources.
Challenges in scaling NLO generators to leadership computers
NASA Astrophysics Data System (ADS)
Benjamin, D.; Childers, JT; Hoeche, S.; LeCompte, T.; Uram, T.
2017-10-01
Exascale computing resources are roughly a decade away and will be capable of 100 times more computing than current supercomputers. In the last year, Energy Frontier experiments crossed a milestone of 100 million core-hours used at the Argonne Leadership Computing Facility, Oak Ridge Leadership Computing Facility, and NERSC. The Fortran-based leading-order parton generator called Alpgen was successfully scaled to millions of threads to achieve this level of usage on Mira. Sherpa and MadGraph are next-to-leading order generators used heavily by LHC experiments for simulation. Integration times for high-multiplicity or rare processes can take a week or more on standard Grid machines, even using all 16-cores. We will describe our ongoing work to scale the Sherpa generator to thousands of threads on leadership-class machines and reduce run-times to less than a day. This work allows the experiments to leverage large-scale parallel supercomputers for event generation today, freeing tens of millions of grid hours for other work, and paving the way for future applications (simulation, reconstruction) on these and future supercomputers.
Large-scale virtual screening on public cloud resources with Apache Spark.
Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola
2017-01-01
Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
Users matter : multi-agent systems model of high performance computing cluster users.
DOE Office of Scientific and Technical Information (OSTI.GOV)
North, M. J.; Hood, C. S.; Decision and Information Sciences
2005-01-01
High performance computing clusters have been a critical resource for computational science for over a decade and have more recently become integral to large-scale industrial analysis. Despite their well-specified components, the aggregate behavior of clusters is poorly understood. The difficulties arise from complicated interactions between cluster components during operation. These interactions have been studied by many researchers, some of whom have identified the need for holistic multi-scale modeling that simultaneously includes network level, operating system level, process level, and user level behaviors. Each of these levels presents its own modeling challenges, but the user level is the most complex duemore » to the adaptability of human beings. In this vein, there are several major user modeling goals, namely descriptive modeling, predictive modeling and automated weakness discovery. This study shows how multi-agent techniques were used to simulate a large-scale computing cluster at each of these levels.« less
Why build a virtual brain? Large-scale neural simulations as jump start for cognitive computing
NASA Astrophysics Data System (ADS)
Colombo, Matteo
2017-03-01
Despite the impressive amount of financial resources recently invested in carrying out large-scale brain simulations, it is controversial what the pay-offs are of pursuing this project. One idea is that from designing, building, and running a large-scale neural simulation, scientists acquire knowledge about the computational performance of the simulating system, rather than about the neurobiological system represented in the simulation. It has been claimed that this knowledge may usher in a new era of neuromorphic, cognitive computing systems. This study elucidates this claim and argues that the main challenge this era is facing is not the lack of biological realism. The challenge lies in identifying general neurocomputational principles for the design of artificial systems, which could display the robust flexibility characteristic of biological intelligence.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2012-09-25
The Megatux platform enables the emulation of large scale (multi-million node) distributed systems. In particular, it allows for the emulation of large-scale networks interconnecting a very large number of emulated computer systems. It does this by leveraging virtualization and associated technologies to allow hundreds of virtual computers to be hosted on a single moderately sized server or workstation. Virtualization technology provided by modern processors allows for multiple guest OSs to run at the same time, sharing the hardware resources. The Megatux platform can be deployed on a single PC, a small cluster of a few boxes or a large clustermore » of computers. With a modest cluster, the Megatux platform can emulate complex organizational networks. By using virtualization, we emulate the hardware, but run actual software enabling large scale without sacrificing fidelity.« less
Scientific Services on the Cloud
NASA Astrophysics Data System (ADS)
Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong
Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
LSST Resources for the Community
NASA Astrophysics Data System (ADS)
Jones, R. Lynne
2011-01-01
LSST will generate 100 petabytes of images and 20 petabytes of catalogs, covering 18,000-20,000 square degrees of area sampled every few days, throughout a total of ten years of time -- all publicly available and exquisitely calibrated. The primary access to this data will be through Data Access Centers (DACs). DACs will provide access to catalogs of sources (single detections from individual images) and objects (associations of sources from multiple images). Simple user interfaces or direct SQL queries at the DAC can return user-specified portions of data from catalogs or images. More complex manipulations of the data, such as calculating multi-point correlation functions or creating alternative photo-z measurements on terabyte-scale data, can be completed with the DAC's own resources. Even more data-intensive computations requiring access to large numbers of image pixels on petabyte-scale could also be conducted at the DAC, using compute resources allocated in a similar manner to a TAC. DAC resources will be available to all individuals in member countries or institutes and LSST science collaborations. DACs will also assist investigators with requests for allocations at national facilities such as the Petascale Computing Facility, TeraGrid, and Open Science Grid. Using data on this scale requires new approaches to accessibility and analysis which are being developed through interactions with the LSST Science Collaborations. We are producing simulated images (as might be acquired by LSST) based on models of the universe and generating catalogs from these images (as well as from the base model) using the LSST data management framework in a series of data challenges. The resulting images and catalogs are being made available to the science collaborations to verify the algorithms and develop user interfaces. All LSST software is open source and available online, including preliminary catalog formats. We encourage feedback from the community.
Design for Run-Time Monitor on Cloud Computing
NASA Astrophysics Data System (ADS)
Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola
2015-01-01
New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.
Job Management and Task Bundling
NASA Astrophysics Data System (ADS)
Berkowitz, Evan; Jansen, Gustav R.; McElvain, Kenneth; Walker-Loud, André
2018-03-01
High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi_jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.
The Next Frontier in Computing
Sarrao, John
2018-06-13
Exascale computing refers to computing systems capable of at least one exaflop or a billion calculations per second (1018). That is 50 times faster than the most powerful supercomputers being used today and represents a thousand-fold increase over the first petascale computer that came into operation in 2008. How we use these large-scale simulation resources is the key to solving some of todayâs most pressing problems, including clean energy production, nuclear reactor lifetime extension and nuclear stockpile aging.
Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan
2017-01-01
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325
Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan
2017-08-04
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.
Cotes-Ruiz, Iván Tomás; Prado, Rocío P.; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás
2017-01-01
Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique. PMID:28085932
Cotes-Ruiz, Iván Tomás; Prado, Rocío P; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás
2017-01-01
Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique.
An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients
NASA Astrophysics Data System (ADS)
Mnaouer, Adel B.; Ragoonath, Colin
2010-11-01
Volunteer Computing (VC) is a Distributed Computing model which utilizes idle CPU cycles from computing resources donated by volunteers who are connected through the Internet to form a very large-scale, loosely coupled High Performance Computing environment. Distributed Volunteer Computing environments such as the BOINC framework is concerned mainly with the efficient scheduling of the available resources to the applications which require them. The BOINC framework thus contains a number of scheduling policies/algorithms both on the server-side and on the client which work together to maximize the available resources and to provide a degree of QoS in an environment which is highly volatile. This paper focuses on the BOINC client and introduces an adaptive priority tuning client side middleware application which improves the execution times of Work Units (WUs) while maintaining an acceptable Maximum Response Time (MRT) for the end user. We have conducted extensive experimentation of the proposed system and the results show clear speedup of BOINC applications using our optimized middleware as opposed to running using the original BOINC client.
Findings and Challenges in Fine-Resolution Large-Scale Hydrological Modeling
NASA Astrophysics Data System (ADS)
Her, Y. G.
2017-12-01
Fine-resolution large-scale (FL) modeling can provide the overall picture of the hydrological cycle and transport while taking into account unique local conditions in the simulation. It can also help develop water resources management plans consistent across spatial scales by describing the spatial consequences of decisions and hydrological events extensively. FL modeling is expected to be common in the near future as global-scale remotely sensed data are emerging, and computing resources have been advanced rapidly. There are several spatially distributed models available for hydrological analyses. Some of them rely on numerical methods such as finite difference/element methods (FDM/FEM), which require excessive computing resources (implicit scheme) to manipulate large matrices or small simulation time intervals (explicit scheme) to maintain the stability of the solution, to describe two-dimensional overland processes. Others make unrealistic assumptions such as constant overland flow velocity to reduce the computational loads of the simulation. Thus, simulation efficiency often comes at the expense of precision and reliability in FL modeling. Here, we introduce a new FL continuous hydrological model and its application to four watersheds in different landscapes and sizes from 3.5 km2 to 2,800 km2 at the spatial resolution of 30 m on an hourly basis. The model provided acceptable accuracy statistics in reproducing hydrological observations made in the watersheds. The modeling outputs including the maps of simulated travel time, runoff depth, soil water content, and groundwater recharge, were animated, visualizing the dynamics of hydrological processes occurring in the watersheds during and between storm events. Findings and challenges were discussed in the context of modeling efficiency, accuracy, and reproducibility, which we found can be improved by employing advanced computing techniques and hydrological understandings, by using remotely sensed hydrological observations such as soil moisture and radar rainfall depth and by sharing the model and its codes in public domain, respectively.
NASA Technical Reports Server (NTRS)
1977-01-01
Integrated set of manual procedures, computer programs, and graphic devices processes multispectral scanner data from orbiting Landsat into precisely registered and formatted maps of surface water and other resources at variety of scales, sheet formats, and tick intervals.
NASA Astrophysics Data System (ADS)
Burnett, W.
2016-12-01
The Department of Defense's (DoD) High Performance Computing Modernization Program (HPCMP) provides high performance computing to address the most significant challenges in computational resources, software application support and nationwide research and engineering networks. Today, the HPCMP has a critical role in ensuring the National Earth System Prediction Capability (N-ESPC) achieves initial operational status in 2019. A 2015 study commissioned by the HPCMP found that N-ESPC computational requirements will exceed interconnect bandwidth capacity due to the additional load from data assimilation and passing connecting data between ensemble codes. Memory bandwidth and I/O bandwidth will continue to be significant bottlenecks for the Navy's Hybrid Coordinate Ocean Model (HYCOM) scalability - by far the major driver of computing resource requirements in the N-ESPC. The study also found that few of the N-ESPC model developers have detailed plans to ensure their respective codes scale through 2024. Three HPCMP initiatives are designed to directly address and support these issues: Productivity Enhancement, Technology, Transfer and Training (PETTT), the HPCMP Applications Software Initiative (HASI), and Frontier Projects. PETTT supports code conversion by providing assistance, expertise and training in scalable and high-end computing architectures. HASI addresses the continuing need for modern application software that executes effectively and efficiently on next-generation high-performance computers. Frontier Projects enable research and development that could not be achieved using typical HPCMP resources by providing multi-disciplinary teams access to exceptional amounts of high performance computing resources. Finally, the Navy's DoD Supercomputing Resource Center (DSRC) currently operates a 6 Petabyte system, of which Naval Oceanography receives 15% of operational computational system use, or approximately 1 Petabyte of the processing capability. The DSRC will provide the DoD with future computing assets to initially operate the N-ESPC in 2019. This talk will further describe how DoD's HPCMP will ensure N-ESPC becomes operational, efficiently and effectively, using next-generation high performance computing.
Mars rover local navigation and hazard avoidance
NASA Technical Reports Server (NTRS)
Wilcox, B. H.; Gennery, D. B.; Mishkin, A. H.
1989-01-01
A Mars rover sample return mission has been proposed for the late 1990's. Due to the long speed-of-light delays between earth and Mars, some autonomy on the rover is highly desirable. JPL has been conducting research in two possible modes of rover operation, Computer-Aided Remote Driving and Semiautonomous Navigation. A recently-completed research program used a half-scale testbed vehicle to explore several of the concepts in semiautonomous navigation. A new, full-scale vehicle with all computational and power resources on-board will be used in the coming year to demonstrate relatively fast semiautonomous navigation. The computational and power requirements for Mars rover local navigation and hazard avoidance are discussed.
Mars Rover Local Navigation And Hazard Avoidance
NASA Astrophysics Data System (ADS)
Wilcox, B. H.; Gennery, D. B.; Mishkin, A. H.
1989-03-01
A Mars rover sample return mission has been proposed for the late 1990's. Due to the long speed-of-light delays between Earth and Mars, some autonomy on the rover is highly desirable. JPL has been conducting research in two possible modes of rover operation, Computer-Aided Remote Driving and Semiautonomous Navigation. A recently-completed research program used a half-scale testbed vehicle to explore several of the concepts in semiautonomous navigation. A new, full-scale vehicle with all computational and power resources on-board will be used in the coming year to demonstrate relatively fast semiautonomous navigation. The computational and power requirements for Mars rover local navigation and hazard avoidance are discussed.
Production experience with the ATLAS Event Service
NASA Astrophysics Data System (ADS)
Benjamin, D.; Calafiura, P.; Childers, T.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The ATLAS Event Service (AES) has been designed and implemented for efficient running of ATLAS production workflows on a variety of computing platforms, ranging from conventional Grid sites to opportunistic, often short-lived resources, such as spot market commercial clouds, supercomputers and volunteer computing. The Event Service architecture allows real time delivery of fine grained workloads to running payload applications which process dispatched events or event ranges and immediately stream the outputs to highly scalable Object Stores. Thanks to its agile and flexible architecture the AES is currently being used by grid sites for assigning low priority workloads to otherwise idle computing resources; similarly harvesting HPC resources in an efficient back-fill mode; and massively scaling out to the 50-100k concurrent core level on the Amazon spot market to efficiently utilize those transient resources for peak production needs. Platform ports in development include ATLAS@Home (BOINC) and the Google Compute Engine, and a growing number of HPC platforms. After briefly reviewing the concept and the architecture of the Event Service, we will report the status and experience gained in AES commissioning and production operations on supercomputers, and our plans for extending ES application beyond Geant4 simulation to other workflows, such as reconstruction and data analysis.
Techniques and resources for storm-scale numerical weather prediction
NASA Technical Reports Server (NTRS)
Droegemeier, Kelvin; Grell, Georg; Doyle, James; Soong, Su-Tzai; Skamarock, William; Bacon, David; Staniforth, Andrew; Crook, Andrew; Wilhelmson, Robert
1993-01-01
The topics discussed include the following: multiscale application of the 5th-generation PSU/NCAR mesoscale model, the coupling of nonhydrostatic atmospheric and hydrostatic ocean models for air-sea interaction studies; a numerical simulation of cloud formation over complex topography; adaptive grid simulations of convection; an unstructured grid, nonhydrostatic meso/cloud scale model; efficient mesoscale modeling for multiple scales using variable resolution; initialization of cloud-scale models with Doppler radar data; and making effective use of future computing architectures, networks, and visualization software.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.
Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P
2010-12-22
Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.
SHARPEN-systematic hierarchical algorithms for rotamers and proteins on an extended network.
Loksha, Ilya V; Maiolo, James R; Hong, Cheng W; Ng, Albert; Snow, Christopher D
2009-04-30
Algorithms for discrete optimization of proteins play a central role in recent advances in protein structure prediction and design. We wish to improve the resources available for computational biologists to rapidly prototype such algorithms and to easily scale these algorithms to many processors. To that end, we describe the implementation and use of two new open source resources, citing potential benefits over existing software. We discuss CHOMP, a new object-oriented library for macromolecular optimization, and SHARPEN, a framework for scaling CHOMP scripts to many computers. These tools allow users to develop new algorithms for a variety of applications including protein repacking, protein-protein docking, loop rebuilding, or homology model remediation. Particular care was taken to allow modular energy function design; protein conformations may currently be scored using either the OPLSaa molecular mechanical energy function or an all-atom semiempirical energy function employed by Rosetta. (c) 2009 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Brumby, S. P.; Warren, M. S.; Keisler, R.; Chartrand, R.; Skillman, S.; Franco, E.; Kontgis, C.; Moody, D.; Kelton, T.; Mathis, M.
2016-12-01
Cloud computing, combined with recent advances in machine learning for computer vision, is enabling understanding of the world at a scale and at a level of space and time granularity never before feasible. Multi-decadal Earth remote sensing datasets at the petabyte scale (8×10^15 bits) are now available in commercial cloud, and new satellite constellations will generate daily global coverage at a few meters per pixel. Public and commercial satellite observations now provide a wide range of sensor modalities, from traditional visible/infrared to dual-polarity synthetic aperture radar (SAR). This provides the opportunity to build a continuously updated map of the world supporting the academic community and decision-makers in government, finanace and industry. We report on work demonstrating country-scale agricultural forecasting, and global-scale land cover/land, use mapping using a range of public and commercial satellite imagery. We describe processing over a petabyte of compressed raw data from 2.8 quadrillion pixels (2.8 petapixels) acquired by the US Landsat and MODIS programs over the past 40 years. Using commodity cloud computing resources, we convert the imagery to a calibrated, georeferenced, multiresolution tiled format suited for machine-learning analysis. We believe ours is the first application to process, in less than a day, on generally available resources, over a petabyte of scientific image data. We report on work combining this imagery with time-series SAR collected by ESA Sentinel 1. We report on work using this reprocessed dataset for experiments demonstrating country-scale food production monitoring, an indicator for famine early warning. We apply remote sensing science and machine learning algorithms to detect and classify agricultural crops and then estimate crop yields and detect threats to food security (e.g., flooding, drought). The software platform and analysis methodology also support monitoring water resources, forests and other general indicators of environmental health, and can detect growth and changes in cities that are displacing historical agricultural zones.
Infrastructure Systems for Advanced Computing in E-science applications
NASA Astrophysics Data System (ADS)
Terzo, Olivier
2013-04-01
In the e-science field are growing needs for having computing infrastructure more dynamic and customizable with a model of use "on demand" that follow the exact request in term of resources and storage capacities. The integration of grid and cloud infrastructure solutions allows us to offer services that can adapt the availability in terms of up scaling and downscaling resources. The main challenges for e-sciences domains will on implement infrastructure solutions for scientific computing that allow to adapt dynamically the demands of computing resources with a strong emphasis on optimizing the use of computing resources for reducing costs of investments. Instrumentation, data volumes, algorithms, analysis contribute to increase the complexity for applications who require high processing power and storage for a limited time and often exceeds the computational resources that equip the majority of laboratories, research Unit in an organization. Very often it is necessary to adapt or even tweak rethink tools, algorithms, and consolidate existing applications through a phase of reverse engineering in order to adapt them to a deployment on Cloud infrastructure. For example, in areas such as rainfall monitoring, meteorological analysis, Hydrometeorology, Climatology Bioinformatics Next Generation Sequencing, Computational Electromagnetic, Radio occultation, the complexity of the analysis raises several issues such as the processing time, the scheduling of tasks of processing, storage of results, a multi users environment. For these reasons, it is necessary to rethink the writing model of E-Science applications in order to be already adapted to exploit the potentiality of cloud computing services through the uses of IaaS, PaaS and SaaS layer. An other important focus is on create/use hybrid infrastructure typically a federation between Private and public cloud, in fact in this way when all resources owned by the organization are all used it will be easy with a federate cloud infrastructure to add some additional resources form the Public cloud for following the needs in term of computational and storage resources and release them where process are finished. Following the hybrid model, the scheduling approach is important for managing both cloud models. Thanks to this model infrastructure every time resources are available for additional request in term of IT capacities that can used "on demand" for a limited time without having to proceed to purchase additional servers.
Matthew J. Gregory; Zhiqiang Yang; David M. Bell; Warren B. Cohen; Sean Healey; Janet L. Ohmann; Heather M. Roberts
2015-01-01
Mapping vegetation and landscape change at fine spatial scales is needed to inform natural resource and conservation planning, but such maps are expensive and time-consuming to produce. For Landsat-based methodologies, mapping efforts are hampered by the daunting task of manipulating multivariate data for millions to billions of pixels. The advent of cloud-based...
Cost Optimal Elastic Auto-Scaling in Cloud Infrastructure
NASA Astrophysics Data System (ADS)
Mukhopadhyay, S.; Sidhanta, S.; Ganguly, S.; Nemani, R. R.
2014-12-01
Today, elastic scaling is critical part of leveraging cloud. Elastic scaling refers to adding resources only when it is needed and deleting resources when not in use. Elastic scaling ensures compute/server resources are not over provisioned. Today, Amazon and Windows Azure are the only two platform provider that allow auto-scaling of cloud resources where servers are automatically added and deleted. However, these solution falls short of following key features: A) Requires explicit policy definition such server load and therefore lacks any predictive intelligence to make optimal decision; B) Does not decide on the right size of resource and thereby does not result in cost optimal resource pool. In a typical cloud deployment model, we consider two types of application scenario: A. Batch processing jobs → Hadoop/Big Data case B. Transactional applications → Any application that process continuous transactions (Requests/response) In reference of classical queuing model, we are trying to model a scenario where servers have a price and capacity (size) and system can add delete servers to maintain a certain queue length. Classical queueing models applies to scenario where number of servers are constant. So we cannot apply stationary system analysis in this case. We investigate the following questions 1. Can we define Job queue and use the metric to define such a queue to predict the resource requirement in a quasi-stationary way? Can we map that into an optimal sizing problem? 2. Do we need to get into a level of load (CPU/Data) on server level to characterize the size requirement? How do we learn that based on Job type?
Cockrell, Robert Chase; Christley, Scott; Chang, Eugene; An, Gary
2015-01-01
Perhaps the greatest challenge currently facing the biomedical research community is the ability to integrate highly detailed cellular and molecular mechanisms to represent clinical disease states as a pathway to engineer effective therapeutics. This is particularly evident in the representation of organ-level pathophysiology in terms of abnormal tissue structure, which, through histology, remains a mainstay in disease diagnosis and staging. As such, being able to generate anatomic scale simulations is a highly desirable goal. While computational limitations have previously constrained the size and scope of multi-scale computational models, advances in the capacity and availability of high-performance computing (HPC) resources have greatly expanded the ability of computational models of biological systems to achieve anatomic, clinically relevant scale. Diseases of the intestinal tract are exemplary examples of pathophysiological processes that manifest at multiple scales of spatial resolution, with structural abnormalities present at the microscopic, macroscopic and organ-levels. In this paper, we describe a novel, massively parallel computational model of the gut, the Spatially Explicitly General-purpose Model of Enteric Tissue_HPC (SEGMEnT_HPC), which extends an existing model of the gut epithelium, SEGMEnT, in order to create cell-for-cell anatomic scale simulations. We present an example implementation of SEGMEnT_HPC that simulates the pathogenesis of ileal pouchitis, and important clinical entity that affects patients following remedial surgery for ulcerative colitis.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
ON states as resource units for universal quantum computation with photonic architectures
NASA Astrophysics Data System (ADS)
Sabapathy, Krishna Kumar; Weedbrook, Christian
2018-06-01
Universal quantum computation using photonic systems requires gates the Hamiltonians of which are of order greater than quadratic in the quadrature operators. We first review previous proposals to implement such gates, where specific non-Gaussian states are used as resources in conjunction with entangling gates such as the continuous-variable versions of controlled-phase and controlled-not gates. We then propose ON states which are superpositions of the vacuum and the N th Fock state, for use as non-Gaussian resource states. We show that ON states can be used to implement the cubic and higher-order quadrature phase gates to first order in gate strength. There are several advantages to this method such as reduced number of superpositions in the resource state preparation and greater control over the final gate. We also introduce useful figures of merit to characterize gate performance. Utilizing a supply of on-demand resource states one can potentially scale up implementation to greater accuracy, by repeated application of the basic circuit.
Mira: Argonne's 10-petaflops supercomputer
Papka, Michael; Coghlan, Susan; Isaacs, Eric; Peters, Mark; Messina, Paul
2018-02-13
Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Mira: Argonne's 10-petaflops supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Papka, Michael; Coghlan, Susan; Isaacs, Eric
2013-07-03
Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Gallicchio, Emilio; Deng, Nanjie; He, Peng; Wickstrom, Lauren; Perryman, Alexander L.; Santiago, Daniel N.; Forli, Stefano; Olson, Arthur J.; Levy, Ronald M.
2014-01-01
As part of the SAMPL4 blind challenge, filtered AutoDock Vina ligand docking predictions and large scale binding energy distribution analysis method binding free energy calculations have been applied to the virtual screening of a focused library of candidate binders to the LEDGF site of the HIV integrase protein. The computational protocol leveraged docking and high level atomistic models to improve enrichment. The enrichment factor of our blind predictions ranked best among all of the computational submissions, and second best overall. This work represents to our knowledge the first example of the application of an all-atom physics-based binding free energy model to large scale virtual screening. A total of 285 parallel Hamiltonian replica exchange molecular dynamics absolute protein-ligand binding free energy simulations were conducted starting from docked poses. The setup of the simulations was fully automated, calculations were distributed on multiple computing resources and were completed in a 6-weeks period. The accuracy of the docked poses and the inclusion of intramolecular strain and entropic losses in the binding free energy estimates were the major factors behind the success of the method. Lack of sufficient time and computing resources to investigate additional protonation states of the ligands was a major cause of mispredictions. The experiment demonstrated the applicability of binding free energy modeling to improve hit rates in challenging virtual screening of focused ligand libraries during lead optimization. PMID:24504704
Data Intensive Scientific Workflows on a Federated Cloud: CRADA Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
The Fermilab Scientific Computing Division and the KISTI Global Science Experimental Data Hub Center have built a prototypical large-scale infrastructure to handle scientific workflows of stakeholders to run on multiple cloud resources. The demonstrations have been in the areas of (a) Data-Intensive Scientific Workflows on Federated Clouds, (b) Interoperability and Federation of Cloud Resources, and (c) Virtual Infrastructure Automation to enable On-Demand Services.
Sign: large-scale gene network estimation environment for high performance computing.
Tamada, Yoshinori; Shimamura, Teppei; Yamaguchi, Rui; Imoto, Seiya; Nagasaki, Masao; Miyano, Satoru
2011-01-01
Our research group is currently developing software for estimating large-scale gene networks from gene expression data. The software, called SiGN, is specifically designed for the Japanese flagship supercomputer "K computer" which is planned to achieve 10 petaflops in 2012, and other high performance computing environments including Human Genome Center (HGC) supercomputer system. SiGN is a collection of gene network estimation software with three different sub-programs: SiGN-BN, SiGN-SSM and SiGN-L1. In these three programs, five different models are available: static and dynamic nonparametric Bayesian networks, state space models, graphical Gaussian models, and vector autoregressive models. All these models require a huge amount of computational resources for estimating large-scale gene networks and therefore are designed to be able to exploit the speed of 10 petaflops. The software will be available freely for "K computer" and HGC supercomputer system users. The estimated networks can be viewed and analyzed by Cell Illustrator Online and SBiP (Systems Biology integrative Pipeline). The software project web site is available at http://sign.hgc.jp/ .
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...
2015-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less
A case study for cloud based high throughput analysis of NGS data using the globus genomics system
Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha
2014-01-01
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205
ATLAS and LHC computing on CRAY
NASA Astrophysics Data System (ADS)
Sciacca, F. G.; Haug, S.; ATLAS Collaboration
2017-10-01
Access and exploitation of large scale computing resources, such as those offered by general purpose HPC centres, is one important measure for ATLAS and the other Large Hadron Collider experiments in order to meet the challenge posed by the full exploitation of the future data within the constraints of flat budgets. We report on the effort of moving the Swiss WLCG T2 computing, serving ATLAS, CMS and LHCb, from a dedicated cluster to the large Cray systems at the Swiss National Supercomputing Centre CSCS. These systems do not only offer very efficient hardware, cooling and highly competent operators, but also have large backfill potentials due to size and multidisciplinary usage and potential gains due to economy at scale. Technical solutions, performance, expected return and future plans are discussed.
Developing science gateways for drug discovery in a grid environment.
Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra
2016-01-01
Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.
NASA Technical Reports Server (NTRS)
Harwood, P. (Principal Investigator); Finley, R.; Mcculloch, S.; Malin, P. A.; Schell, J. A.
1977-01-01
The author has identified the following significant results. Image interpretation and computer-assisted techniques were developed to analyze LANDSAT scenes in support of resource inventory and monitoring requirements for the Texas coastal region. Land cover and land use maps, at a scale of 1:125,000 for the image interpretation product and 1:24,000 for the computer-assisted product, were generated covering four Texas coastal test sites. Classification schemes which parallel national systems were developed for each procedure, including 23 classes for image interpretation technique and 13 classes for the computer-assisted technique. Results indicate that LANDSAT-derived land cover and land use maps can be successfully applied to a variety of planning and management activities on the Texas coast. Computer-derived land/water maps can be used with tide gage data to assess shoreline boundaries for management purposes.
Job Scheduling in a Heterogeneous Grid Environment
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Smith, Warren; Oliker, Leonid; Biswas, Rupak
2004-01-01
Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. However, a number of major technical hurdles must be overcome before this potential can be realized. One problem that is critical to effective utilization of computational grids is the efficient scheduling of jobs. This work addresses this problem by describing and evaluating a grid scheduling architecture and three job migration algorithms. The architecture is scalable and does not assume control of local site resources. The job migration policies use the availability and performance of computer systems, the network bandwidth available between systems, and the volume of input and output data associated with each job. An extensive performance comparison is presented using real workloads from leading computational centers. The results, based on several key metrics, demonstrate that the performance of our distributed migration algorithms is significantly greater than that of a local scheduling framework and comparable to a non-scalable global scheduling approach.
Stochastic Simulation Service: Bridging the Gap between the Computational Expert and the Biologist
Banerjee, Debjani; Bellesia, Giovanni; Daigle, Bernie J.; Douglas, Geoffrey; Gu, Mengyuan; Gupta, Anand; Hellander, Stefan; Horuk, Chris; Nath, Dibyendu; Takkar, Aviral; Lötstedt, Per; Petzold, Linda R.
2016-01-01
We present StochSS: Stochastic Simulation as a Service, an integrated development environment for modeling and simulation of both deterministic and discrete stochastic biochemical systems in up to three dimensions. An easy to use graphical user interface enables researchers to quickly develop and simulate a biological model on a desktop or laptop, which can then be expanded to incorporate increasing levels of complexity. StochSS features state-of-the-art simulation engines. As the demand for computational power increases, StochSS can seamlessly scale computing resources in the cloud. In addition, StochSS can be deployed as a multi-user software environment where collaborators share computational resources and exchange models via a public model repository. We demonstrate the capabilities and ease of use of StochSS with an example of model development and simulation at increasing levels of complexity. PMID:27930676
Stochastic Simulation Service: Bridging the Gap between the Computational Expert and the Biologist
Drawert, Brian; Hellander, Andreas; Bales, Ben; ...
2016-12-08
We present StochSS: Stochastic Simulation as a Service, an integrated development environment for modeling and simulation of both deterministic and discrete stochastic biochemical systems in up to three dimensions. An easy to use graphical user interface enables researchers to quickly develop and simulate a biological model on a desktop or laptop, which can then be expanded to incorporate increasing levels of complexity. StochSS features state-of-the-art simulation engines. As the demand for computational power increases, StochSS can seamlessly scale computing resources in the cloud. In addition, StochSS can be deployed as a multi-user software environment where collaborators share computational resources andmore » exchange models via a public model repository. We also demonstrate the capabilities and ease of use of StochSS with an example of model development and simulation at increasing levels of complexity.« less
Globus | Informatics Technology for Cancer Research (ITCR)
Globus software services provide secure cancer research data transfer, synchronization, and sharing in distributed environments at large scale. These services can be integrated into applications and research data gateways, leveraging Globus identity management, single sign-on, search, and authorization capabilities. Globus Genomics integrates Globus with the Galaxy genomics workflow engine and Amazon Web Services to enable cancer genomics analysis that can elastically scale compute resources with demand.
Northwest Trajectory Analysis Capability: A Platform for Enhancing Computational Biophysics Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, Elena S.; Stephan, Eric G.; Corrigan, Abigail L.
2008-07-30
As computational resources continue to increase, the ability of computational simulations to effectively complement, and in some cases replace, experimentation in scientific exploration also increases. Today, large-scale simulations are recognized as an effective tool for scientific exploration in many disciplines including chemistry and biology. A natural side effect of this trend has been the need for an increasingly complex analytical environment. In this paper, we describe Northwest Trajectory Analysis Capability (NTRAC), an analytical software suite developed to enhance the efficiency of computational biophysics analyses. Our strategy is to layer higher-level services and introduce improved tools within the user’s familiar environmentmore » without preventing researchers from using traditional tools and methods. Our desire is to share these experiences to serve as an example for effectively analyzing data intensive large scale simulation data.« less
Examining Extreme Events Using Dynamically Downscaled 12-km WRF Simulations
Continued improvements in the speed and availability of computational resources have allowed dynamical downscaling of global climate model (GCM) projections to be conducted at increasingly finer grid scales and over extended time periods. The implementation of dynamical downscal...
The HEPCloud Facility: elastic computing for High Energy Physics - The NOvA Use Case
NASA Astrophysics Data System (ADS)
Fuess, S.; Garzoglio, G.; Holzman, B.; Kennedy, R.; Norman, A.; Timm, S.; Tiradani, A.
2017-10-01
The need for computing in the HEP community follows cycles of peaks and valleys mainly driven by conference dates, accelerator shutdown, holiday schedules, and other factors. Because of this, the classical method of provisioning these resources at providing facilities has drawbacks such as potential overprovisioning. As the appetite for computing increases, however, so does the need to maximize cost efficiency by developing a model for dynamically provisioning resources only when needed. To address this issue, the HEPCloud project was launched by the Fermilab Scientific Computing Division in June 2015. Its goal is to develop a facility that provides a common interface to a variety of resources, including local clusters, grids, high performance computers, and community and commercial Clouds. Initially targeted experiments include CMS and NOvA, as well as other Fermilab stakeholders. In its first phase, the project has demonstrated the use of the “elastic” provisioning model offered by commercial clouds, such as Amazon Web Services. In this model, resources are rented and provisioned automatically over the Internet upon request. In January 2016, the project demonstrated the ability to increase the total amount of global CMS resources by 58,000 cores from 150,000 cores - a 38 percent increase - in preparation for the Recontres de Moriond. In March 2016, the NOvA experiment has also demonstrated resource burst capabilities with an additional 7,300 cores, achieving a scale almost four times as large as the local allocated resources and utilizing the local AWS s3 storage to optimize data handling operations and costs. NOvA was using the same familiar services used for local computations, such as data handling and job submission, in preparation for the Neutrino 2016 conference. In both cases, the cost was contained by the use of the Amazon Spot Instance Market and the Decision Engine, a HEPCloud component that aims at minimizing cost and job interruption. This paper describes the Fermilab HEPCloud Facility and the challenges overcome for the CMS and NOvA communities.
Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas
2016-04-01
Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
Zagar, Robert John; Kovach, Joseph W; Basile, Benjamin; Hughes, John Russell; Grove, William M; Busch, Kenneth G; Zablocki, Michael; Osnowitz, William; Neuhengen, Jonas; Liu, Yutong; Zagar, Agata Karolina
2013-12-01
147 adults (107 men, 40 women) and 89 adolescents (61 boys, 28 girls), selected randomly from referrals and volunteers, were given the Ammons Quick Test (QT), the Beck Suicide Scale (BSS), the Minnesota Multiphasic Personality Inventory Second (MMPI-2) or Adolescent Versions (MMPI-A), the Raven's Advanced Progressive Matrices, and the Standard Predictor (SP) of Violence Potential Adult or Adolescent Versions. The goals were to: (a) demonstrate computer and paper-and-pencil tests correlated; (b) validate tests to identify at-risk for violence; (c) show that identifying at-risk saves lives and resources; and (d) find which industries benefited from testing at-risk. Paper-and-pencil vs. computer test correlations (.83-.99), sensitivity (.97-.98), and specificity (.50-.97) were computed. Testing at-risk saves lives and resources. Critical industries for testing at-risk individuals may include airlines, energy generating industries, insurance, military, nonprofit-religious, prisoners, trucking or port workers, and veterans.
Semantics-based distributed I/O with the ParaMEDIC framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balaji, P.; Feng, W.; Lin, H.
2008-01-01
Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called 'ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing' which uses application-specific semantic information to convert the generatedmore » data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.« less
Scheduling based on a dynamic resource connection
NASA Astrophysics Data System (ADS)
Nagiyev, A. E.; Botygin, I. A.; Shersntneva, A. I.; Konyaev, P. A.
2017-02-01
The practical using of distributed computing systems associated with many problems, including troubles with the organization of an effective interaction between the agents located at the nodes of the system, with the specific configuration of each node of the system to perform a certain task, with the effective distribution of the available information and computational resources of the system, with the control of multithreading which implements the logic of solving research problems and so on. The article describes the method of computing load balancing in distributed automatic systems, focused on the multi-agency and multi-threaded data processing. The scheme of the control of processing requests from the terminal devices, providing the effective dynamic scaling of computing power under peak load is offered. The results of the model experiments research of the developed load scheduling algorithm are set out. These results show the effectiveness of the algorithm even with a significant expansion in the number of connected nodes and zoom in the architecture distributed computing system.
A novel representation of groundwater dynamics in large-scale land surface modelling
NASA Astrophysics Data System (ADS)
Rahman, Mostaquimur; Rosolem, Rafael; Kollet, Stefan
2017-04-01
Land surface processes are connected to groundwater dynamics via shallow soil moisture. For example, groundwater affects evapotranspiration (by influencing the variability of soil moisture) and runoff generation mechanisms. However, contemporary Land Surface Models (LSM) generally consider isolated soil columns and free drainage lower boundary condition for simulating hydrology. This is mainly due to the fact that incorporating detailed groundwater dynamics in LSMs usually requires considerable computing resources, especially for large-scale applications (e.g., continental to global). Yet, these simplifications undermine the potential effect of groundwater dynamics on land surface mass and energy fluxes. In this study, we present a novel approach of representing high-resolution groundwater dynamics in LSMs that is computationally efficient for large-scale applications. This new parameterization is incorporated in the Joint UK Land Environment Simulator (JULES) and tested at the continental-scale.
Final Report for ALCC Allocation: Predictive Simulation of Complex Flow in Wind Farms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barone, Matthew F.; Ananthan, Shreyas; Churchfield, Matt
This report documents work performed using ALCC computing resources granted under a proposal submitted in February 2016, with the resource allocation period spanning the period July 2016 through June 2017. The award allocation was 10.7 million processor-hours at the National Energy Research Scientific Computing Center. The simulations performed were in support of two projects: the Atmosphere to Electrons (A2e) project, supported by the DOE EERE office; and the Exascale Computing Project (ECP), supported by the DOE Office of Science. The project team for both efforts consists of staff scientists and postdocs from Sandia National Laboratories and the National Renewable Energymore » Laboratory. At the heart of these projects is the open-source computational-fluid-dynamics (CFD) code, Nalu. Nalu solves the low-Mach-number Navier-Stokes equations using an unstructured- grid discretization. Nalu leverages the open-source Trilinos solver library and the Sierra Toolkit (STK) for parallelization and I/O. This report documents baseline computational performance of the Nalu code on problems of direct relevance to the wind plant physics application - namely, Large Eddy Simulation (LES) of an atmospheric boundary layer (ABL) flow and wall-modeled LES of a flow past a static wind turbine rotor blade. Parallel performance of Nalu and its constituent solver routines residing in the Trilinos library has been assessed previously under various campaigns. However, both Nalu and Trilinos have been, and remain, in active development and resources have not been available previously to rigorously track code performance over time. With the initiation of the ECP, it is important to establish and document baseline code performance on the problems of interest. This will allow the project team to identify and target any deficiencies in performance, as well as highlight any performance bottlenecks as we exercise the code on a greater variety of platforms and at larger scales. The current study is rather modest in scale, examining performance on problem sizes of O(100 million) elements and core counts up to 8k cores. This will be expanded as more computational resources become available to the projects.« less
PanDA for ATLAS distributed computing in the next decade
NASA Astrophysics Data System (ADS)
Barreiro Megino, F. H.; De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experiment are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, dozens of scientific applications are supported, while data processing requires more than a few billion hours of computing usage per year. PanDA performed very well over the last decade including the LHC Run 1 data taking period. However, it was decided to upgrade the whole system concurrently with the LHC’s first long shutdown in order to cope with rapidly changing computing infrastructure. After two years of reengineering efforts, PanDA has embedded capabilities for fully dynamic and flexible workload management. The static batch job paradigm was discarded in favor of a more automated and scalable model. Workloads are dynamically tailored for optimal usage of resources, with the brokerage taking network traffic and forecasts into account. Computing resources are partitioned based on dynamic knowledge of their status and characteristics. The pilot has been re-factored around a plugin structure for easier development and deployment. Bookkeeping is handled with both coarse and fine granularities for efficient utilization of pledged or opportunistic resources. An in-house security mechanism authenticates the pilot and data management services in off-grid environments such as volunteer computing and private local clusters. The PanDA monitor has been extensively optimized for performance and extended with analytics to provide aggregated summaries of the system as well as drill-down to operational details. There are as well many other challenges planned or recently implemented, and adoption by non-LHC experiments such as bioinformatics groups successfully running Paleomix (microbial genome and metagenomes) payload on supercomputers. In this paper we will focus on the new and planned features that are most important to the next decade of distributed computing workload management.
NASA Technical Reports Server (NTRS)
Johnston, William E.; Gannon, Dennis; Nitzberg, Bill; Feiereisen, William (Technical Monitor)
2000-01-01
The term "Grid" refers to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. The vision for NASN's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks that will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: The scientist / design engineer whose primary interest is problem solving (e.g., determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user if the tool designer: The computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. This paper describes the current state of IPG (the operational testbed), the set of capabilities being put into place for the operational prototype IPG, as well as some of the longer term R&D tasks.
Cyber-workstation for computational neuroscience.
Digiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C; Fortes, Jose; Sanchez, Justin C
2010-01-01
A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface.
Galaxy CloudMan: delivering cloud compute clusters.
Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James
2010-12-21
Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
Cyber-Workstation for Computational Neuroscience
DiGiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C.; Fortes, Jose; Sanchez, Justin C.
2009-01-01
A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface. PMID:20126436
Towards Cloud-based Asynchronous Elasticity for Iterative HPC Applications
NASA Astrophysics Data System (ADS)
da Rosa Righi, Rodrigo; Facco Rodrigues, Vinicius; André da Costa, Cristiano; Kreutz, Diego; Heiss, Hans-Ulrich
2015-10-01
Elasticity is one of the key features of cloud computing. It allows applications to dynamically scale computing and storage resources, avoiding over- and under-provisioning. In high performance computing (HPC), initiatives are normally modeled to handle bag-of-tasks or key-value applications through a load balancer and a loosely-coupled set of virtual machine (VM) instances. In the joint-field of Message Passing Interface (MPI) and tightly-coupled HPC applications, we observe the need of rewriting source codes, previous knowledge of the application and/or stop-reconfigure-and-go approaches to address cloud elasticity. Besides, there are problems related to how profit this new feature in the HPC scope, since in MPI 2.0 applications the programmers need to handle communicators by themselves, and a sudden consolidation of a VM, together with a process, can compromise the entire execution. To address these issues, we propose a PaaS-based elasticity model, named AutoElastic. It acts as a middleware that allows iterative HPC applications to take advantage of dynamic resource provisioning of cloud infrastructures without any major modification. AutoElastic provides a new concept denoted here as asynchronous elasticity, i.e., it provides a framework to allow applications to either increase or decrease their computing resources without blocking the current execution. The feasibility of AutoElastic is demonstrated through a prototype that runs a CPU-bound numerical integration application on top of the OpenNebula middleware. The results showed the saving of about 3 min at each scaling out operations, emphasizing the contribution of the new concept on contexts where seconds are precious.
AUTOMATED GEOSPATIAL WATERSHED ASSESSMENT: A GIS-BASED HYDROLOGIC MODELING TOOL
Planning and assessment in land and water resource management are evolving toward complex, spatially explicit regional assessments. These problems have to be addressed with distributed models that can compute runoff and erosion at different spatial and temporal scales. The extens...
Energy-Aware Computation Offloading of IoT Sensors in Cloudlet-Based Mobile Edge Computing.
Ma, Xiao; Lin, Chuang; Zhang, Han; Liu, Jianwei
2018-06-15
Mobile edge computing is proposed as a promising computing paradigm to relieve the excessive burden of data centers and mobile networks, which is induced by the rapid growth of Internet of Things (IoT). This work introduces the cloud-assisted multi-cloudlet framework to provision scalable services in cloudlet-based mobile edge computing. Due to the constrained computation resources of cloudlets and limited communication resources of wireless access points (APs), IoT sensors with identical computation offloading decisions interact with each other. To optimize the processing delay and energy consumption of computation tasks, theoretic analysis of the computation offloading decision problem of IoT sensors is presented in this paper. In more detail, the computation offloading decision problem of IoT sensors is formulated as a computation offloading game and the condition of Nash equilibrium is derived by introducing the tool of a potential game. By exploiting the finite improvement property of the game, the Computation Offloading Decision (COD) algorithm is designed to provide decentralized computation offloading strategies for IoT sensors. Simulation results demonstrate that the COD algorithm can significantly reduce the system cost compared with the random-selection algorithm and the cloud-first algorithm. Furthermore, the COD algorithm can scale well with increasing IoT sensors.
NASA Astrophysics Data System (ADS)
Hogenson, K.; Arko, S. A.; Buechler, B.; Hogenson, R.; Herrmann, J.; Geiger, A.
2016-12-01
A problem often faced by Earth science researchers is how to scale algorithms that were developed against few datasets and take them to regional or global scales. One significant hurdle can be the processing and storage resources available for such a task, not to mention the administration of those resources. As a processing environment, the cloud offers nearly unlimited potential for compute and storage, with limited administration required. The goal of the Hybrid Pluggable Processing Pipeline (HyP3) project was to demonstrate the utility of the Amazon cloud to process large amounts of data quickly and cost effectively, while remaining generic enough to incorporate new algorithms with limited administration time or expense. Principally built by three undergraduate students at the ASF DAAC, the HyP3 system relies on core Amazon services such as Lambda, the Simple Notification Service (SNS), Relational Database Service (RDS), Elastic Compute Cloud (EC2), Simple Storage Service (S3), and Elastic Beanstalk. The HyP3 user interface was written using elastic beanstalk, and the system uses SNS and Lamdba to handle creating, instantiating, executing, and terminating EC2 instances automatically. Data are sent to S3 for delivery to customers and removed using standard data lifecycle management rules. In HyP3 all data processing is ephemeral; there are no persistent processes taking compute and storage resources or generating added cost. When complete, HyP3 will leverage the automatic scaling up and down of EC2 compute power to respond to event-driven demand surges correlated with natural disaster or reprocessing efforts. Massive simultaneous processing within EC2 will be able match the demand spike in ways conventional physical computing power never could, and then tail off incurring no costs when not needed. This presentation will focus on the development techniques and technologies that were used in developing the HyP3 system. Data and process flow will be shown, highlighting the benefits of the cloud for each step. Finally, the steps for integrating a new processing algorithm will be demonstrated. This is the true power of HyP3; allowing people to upload their own algorithms and execute them at archive level scales.
Healthcare4VideoStorm: Making Smart Decisions Based on Storm Metrics.
Zhang, Weishan; Duan, Pengcheng; Chen, Xiufeng; Lu, Qinghua
2016-04-23
Storm-based stream processing is widely used for real-time large-scale distributed processing. Knowing the run-time status and ensuring performance is critical to providing expected dependability for some applications, e.g., continuous video processing for security surveillance. The existing scheduling strategies' granularity is too coarse to have good performance, and mainly considers network resources without computing resources while scheduling. In this paper, we propose Healthcare4Storm, a framework that finds Storm insights based on Storm metrics to gain knowledge from the health status of an application, finally ending up with smart scheduling decisions. It takes into account both network and computing resources and conducts scheduling at a fine-grained level using tuples instead of topologies. The comprehensive evaluation shows that the proposed framework has good performance and can improve the dependability of the Storm-based applications.
Design and Implement of Astronomical Cloud Computing Environment In China-VO
NASA Astrophysics Data System (ADS)
Li, Changhua; Cui, Chenzhou; Mi, Linying; He, Boliang; Fan, Dongwei; Li, Shanshan; Yang, Sisi; Xu, Yunfei; Han, Jun; Chen, Junyi; Zhang, Hailong; Yu, Ce; Xiao, Jian; Wang, Chuanjun; Cao, Zihuang; Fan, Yufeng; Liu, Liang; Chen, Xiao; Song, Wenming; Du, Kangyu
2017-06-01
Astronomy cloud computing environment is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on virtualization technology, astronomy cloud computing environment was designed and implemented by China-VO team. It consists of five distributed nodes across the mainland of China. Astronomer can get compuitng and storage resource in this cloud computing environment. Through this environments, astronomer can easily search and analyze astronomical data collected by different telescopes and data centers , and avoid the large scale dataset transportation.
Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem
NASA Astrophysics Data System (ADS)
Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.
2015-12-01
Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.
Cockrell, Robert Chase; Christley, Scott; Chang, Eugene; An, Gary
2015-01-01
Perhaps the greatest challenge currently facing the biomedical research community is the ability to integrate highly detailed cellular and molecular mechanisms to represent clinical disease states as a pathway to engineer effective therapeutics. This is particularly evident in the representation of organ-level pathophysiology in terms of abnormal tissue structure, which, through histology, remains a mainstay in disease diagnosis and staging. As such, being able to generate anatomic scale simulations is a highly desirable goal. While computational limitations have previously constrained the size and scope of multi-scale computational models, advances in the capacity and availability of high-performance computing (HPC) resources have greatly expanded the ability of computational models of biological systems to achieve anatomic, clinically relevant scale. Diseases of the intestinal tract are exemplary examples of pathophysiological processes that manifest at multiple scales of spatial resolution, with structural abnormalities present at the microscopic, macroscopic and organ-levels. In this paper, we describe a novel, massively parallel computational model of the gut, the Spatially Explicitly General-purpose Model of Enteric Tissue_HPC (SEGMEnT_HPC), which extends an existing model of the gut epithelium, SEGMEnT, in order to create cell-for-cell anatomic scale simulations. We present an example implementation of SEGMEnT_HPC that simulates the pathogenesis of ileal pouchitis, and important clinical entity that affects patients following remedial surgery for ulcerative colitis. PMID:25806784
National Climate Change and Wildlife Science Center project accomplishments: highlights
Holl, Sally
2011-01-01
The National Climate Change and Wildlife Science Center (NCCWSC) has invested more than $20M since 2008 to put cutting-edge climate science research in the hands of resource managers across the Nation. With NCCWSC support, more than 25 cooperative research initiatives led by U.S. Geological Survey (USGS) researchers and technical staff are advancing our understanding of habitats and species to provide guidance to managers in the face of a changing climate. Projects focus on quantifying and predicting interactions between climate, habitats, species, and other natural resources such as water. Spatial scales of the projects range from the continent of North America, to a regional scale such as the Pacific Northwest United States, to a landscape scale such as the Florida Everglades. Time scales range from the outset of the 20th century to the end of the 21st century. Projects often lead to workshops, presentations, publications and the creation of new websites, computer models, and data visualization tools. Partnership-building is also a key focus of the NCCWSC-supported projects. New and on-going cooperative partnerships have been forged and strengthened with resource managers and scientists at Federal, tribal, state, local, academic, and non-governmental organizations. USGS scientists work closely with resource managers to produce timely and relevant results that can assist managers and policy makers in current resource management decisions. This fact sheet highlights accomplishments of five NCCWSC projects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hussain, Hameed; Malik, Saif Ur Rehman; Hameed, Abdul
An efficient resource allocation is a fundamental requirement in high performance computing (HPC) systems. Many projects are dedicated to large-scale distributed computing systems that have designed and developed resource allocation mechanisms with a variety of architectures and services. In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported. The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies. Resource allocation mechanisms and strategies play a vital role towards the performance improvement ofmore » all the HPCs classifications. Therefore, a comprehensive discussion of widely used resource allocation strategies deployed in HPC environment is required, which is one of the motivations of this survey. Moreover, we have classified the HPC systems into three broad categories, namely: (a) cluster, (b) grid, and (c) cloud systems and define the characteristics of each class by extracting sets of common attributes. All of the aforementioned systems are cataloged into pure software and hybrid/hardware solutions. The system classification is used to identify approaches followed by the implementation of existing resource allocation strategies that are widely presented in the literature.« less
Enabling Wide-Scale Computer Science Education through Improved Automated Assessment Tools
NASA Astrophysics Data System (ADS)
Boe, Bryce A.
There is a proliferating demand for newly trained computer scientists as the number of computer science related jobs continues to increase. University programs will only be able to train enough new computer scientists to meet this demand when two things happen: when there are more primary and secondary school students interested in computer science, and when university departments have the resources to handle the resulting increase in enrollment. To meet these goals, significant effort is being made to both incorporate computational thinking into existing primary school education, and to support larger university computer science class sizes. We contribute to this effort through the creation and use of improved automated assessment tools. To enable wide-scale computer science education we do two things. First, we create a framework called Hairball to support the static analysis of Scratch programs targeted for fourth, fifth, and sixth grade students. Scratch is a popular building-block language utilized to pique interest in and teach the basics of computer science. We observe that Hairball allows for rapid curriculum alterations and thus contributes to wide-scale deployment of computer science curriculum. Second, we create a real-time feedback and assessment system utilized in university computer science classes to provide better feedback to students while reducing assessment time. Insights from our analysis of student submission data show that modifications to the system configuration support the way students learn and progress through course material, making it possible for instructors to tailor assignments to optimize learning in growing computer science classes.
Computationally efficient statistical differential equation modeling using homogenization
Hooten, Mevin B.; Garlick, Martha J.; Powell, James A.
2013-01-01
Statistical models using partial differential equations (PDEs) to describe dynamically evolving natural systems are appearing in the scientific literature with some regularity in recent years. Often such studies seek to characterize the dynamics of temporal or spatio-temporal phenomena such as invasive species, consumer-resource interactions, community evolution, and resource selection. Specifically, in the spatial setting, data are often available at varying spatial and temporal scales. Additionally, the necessary numerical integration of a PDE may be computationally infeasible over the spatial support of interest. We present an approach to impose computationally advantageous changes of support in statistical implementations of PDE models and demonstrate its utility through simulation using a form of PDE known as “ecological diffusion.” We also apply a statistical ecological diffusion model to a data set involving the spread of mountain pine beetle (Dendroctonus ponderosae) in Idaho, USA.
Template Interfaces for Agile Parallel Data-Intensive Science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramakrishnan, Lavanya; Gunter, Daniel; Pastorello, Gilerto Z.
Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. DOE User Facilities and large science collaborations are increasingly generating large enough data sets that it is no longer practical to download them to a desktop to operate on them. They are instead stored at centralized compute and storage resources such as high performance computing (HPC) centers. Analysis of this data requires an ability to run on these facilities, but with current technologies, scaling an analysis to an HPC center and to a large data set is difficult even for experts. Tigres ismore » addressing the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable "templates" that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.« less
Verifiable fault tolerance in measurement-based quantum computation
NASA Astrophysics Data System (ADS)
Fujii, Keisuke; Hayashi, Masahito
2017-09-01
Quantum systems, in general, cannot be simulated efficiently by a classical computer, and hence are useful for solving certain mathematical problems and simulating quantum many-body systems. This also implies, unfortunately, that verification of the output of the quantum systems is not so trivial, since predicting the output is exponentially hard. As another problem, the quantum system is very delicate for noise and thus needs an error correction. Here, we propose a framework for verification of the output of fault-tolerant quantum computation in a measurement-based model. In contrast to existing analyses on fault tolerance, we do not assume any noise model on the resource state, but an arbitrary resource state is tested by using only single-qubit measurements to verify whether or not the output of measurement-based quantum computation on it is correct. Verifiability is equipped by a constant time repetition of the original measurement-based quantum computation in appropriate measurement bases. Since full characterization of quantum noise is exponentially hard for large-scale quantum computing systems, our framework provides an efficient way to practically verify the experimental quantum error correction.
GIS-BASED HYDROLOGIC MODELING: THE AUTOMATED GEOSPATIAL WATERSHED ASSESSMENT TOOL
Planning and assessment in land and water resource management are evolving from simple, local scale problems toward complex, spatially explicit regional ones. Such problems have to be
addressed with distributed models that can compute runoff and erosion at different spatial a...
Field Scale Monitoring and Modeling of Water and Chemical Transfer in the Vadose Zone
USDA-ARS?s Scientific Manuscript database
Natural resource systems involve highly complex interactions of soil-plant-atmosphere-management components that are extremely difficult to quantitatively describe. Computer simulations for prediction and management of watersheds, water supply areas, and agricultural fields and farms have become inc...
NASA Astrophysics Data System (ADS)
Morikawa, Y.; Murata, K. T.; Watari, S.; Kato, H.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Shimojo, S.
2010-12-01
Main methodologies of Solar-Terrestrial Physics (STP) so far are theoretical, experimental and observational, and computer simulation approaches. Recently "informatics" is expected as a new (fourth) approach to the STP studies. Informatics is a methodology to analyze large-scale data (observation data and computer simulation data) to obtain new findings using a variety of data processing techniques. At NICT (National Institute of Information and Communications Technology, Japan) we are now developing a new research environment named "OneSpaceNet". The OneSpaceNet is a cloud-computing environment specialized for science works, which connects many researchers with high-speed network (JGN: Japan Gigabit Network). The JGN is a wide-area back-born network operated by NICT; it provides 10G network and many access points (AP) over Japan. The OneSpaceNet also provides with rich computer resources for research studies, such as super-computers, large-scale data storage area, licensed applications, visualization devices (like tiled display wall: TDW), database/DBMS, cluster computers (4-8 nodes) for data processing and communication devices. What is amazing in use of the science cloud is that a user simply prepares a terminal (low-cost PC). Once connecting the PC to JGN2plus, the user can make full use of the rich resources of the science cloud. Using communication devices, such as video-conference system, streaming and reflector servers, and media-players, the users on the OneSpaceNet can make research communications as if they belong to a same (one) laboratory: they are members of a virtual laboratory. The specification of the computer resources on the OneSpaceNet is as follows: The size of data storage we have developed so far is almost 1PB. The number of the data files managed on the cloud storage is getting larger and now more than 40,000,000. What is notable is that the disks forming the large-scale storage are distributed to 5 data centers over Japan (but the storage system performs as one disk). There are three supercomputers allocated on the cloud, one from Tokyo, one from Osaka and the other from Nagoya. One's simulation job data on any supercomputers are saved on the cloud data storage (same directory); it is a kind of virtual computing environment. The tiled display wall has 36 panels acting as one display; the pixel (resolution) size of it is as large as 18000x4300. This size is enough to preview or analyze the large-scale computer simulation data. It also allows us to take a look of multiple (e.g., 100 pictures) on one screen together with many researchers. In our talk we also present a brief report of the initial results using the OneSpaceNet for Global MHD simulations as an example of successful use of our science cloud; (i) Ultra-high time resolution visualization of Global MHD simulations on the large-scale storage and parallel processing system on the cloud, (ii) Database of real-time Global MHD simulation and statistic analyses of the data, and (iii) 3D Web service of Global MHD simulations.
The comparative evaluation of ERTS-1 imagery for resource inventory in land use planning. [Oregon
NASA Technical Reports Server (NTRS)
Simonson, G. H. (Principal Investigator); Paine, D. P.; Lawrence, R. D.; Pyott, W. T.; Herzog, J. H.; Murray, R. J.; Norgren, J. A.; Cornwell, J. A.; Rogers, R. A.
1973-01-01
The author has identified the following significant results. Multidiscipline team interpretation and mapping of resources for Crook County is nearly complete on 1:250,000 scale enlargements of ERTS-1 imagery. Maps of geology, landforms, soils and vegetation-land use are being interpreted to show limitations, suitabilities and geologic hazards for land use planning. Mapping of lineaments and structures from ERTS-1 imagery has shown a number of features not previously mapped in Oregon. A timber inventory of Ochoco National Forest has been made. Inventory of forest clear-cutting practices has been successfully demonstrated with ERTS-1 color composites. Soil tonal differences in fallow fields shown on ERTS-1 correspond with major soil boundaries in loess-mantled terrain. A digital classification system used for discriminating natural vegetation and geologic materials classes has been successful in separation of most major classes around Newberry Cauldera, Mt. Washington and Big Summit Prairie. Computer routines are available for correction of scanner data variations; and for matching scales and coordinates between digital and photographic imagery. Methods of Diazo film color printing of computer classifications and elevation-slope perspective plots with computer are being developed.
Galaxy CloudMan: delivering cloud compute clusters
2010-01-01
Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983
Towards Large-area Field-scale Operational Evapotranspiration for Water Use Mapping
NASA Astrophysics Data System (ADS)
Senay, G. B.; Friedrichs, M.; Morton, C.; Huntington, J. L.; Verdin, J.
2017-12-01
Field-scale evapotranspiration (ET) estimates are needed for improving surface and groundwater use and water budget studies. Ideally, field-scale ET estimates would be at regional to national levels and cover long time periods. As a result of large data storage and computational requirements associated with processing field-scale satellite imagery such as Landsat, numerous challenges remain to develop operational ET estimates over large areas for detailed water use and availability studies. However, the combination of new science, data availability, and cloud computing technology is enabling unprecedented capabilities for ET mapping. To demonstrate this capability, we used Google's Earth Engine cloud computing platform to create nationwide annual ET estimates with 30-meter resolution Landsat ( 16,000 images) and gridded weather data using the Operational Simplified Surface Energy Balance (SSEBop) model in support of the National Water Census, a USGS research program designed to build decision support capacity for water management agencies and other natural resource managers. By leveraging Google's Earth Engine Application Programming Interface (API) and developing software in a collaborative, open-platform environment, we rapidly advance from research towards applications for large-area field-scale ET mapping. Cloud computing of the Landsat image archive combined with other satellite, climate, and weather data, is creating never imagined opportunities for assessing ET model behavior and uncertainty, and ultimately providing the ability for more robust operational monitoring and assessment of water use at field-scales.
Advancing Cyberinfrastructure to support high resolution water resources modeling
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Ogden, F. L.; Jones, N.; Horsburgh, J. S.
2012-12-01
Addressing the problem of how the availability and quality of water resources at large scales are sensitive to climate variability, watershed alterations and management activities requires computational resources that combine data from multiple sources and support integrated modeling. Related cyberinfrastructure challenges include: 1) how can we best structure data and computer models to address this scientific problem through the use of high-performance and data-intensive computing, and 2) how can we do this in a way that discipline scientists without extensive computational and algorithmic knowledge and experience can take advantage of advances in cyberinfrastructure? This presentation will describe a new system called CI-WATER that is being developed to address these challenges and advance high resolution water resources modeling in the Western U.S. We are building on existing tools that enable collaboration to develop model and data interfaces that link integrated system models running within an HPC environment to multiple data sources. Our goal is to enhance the use of computational simulation and data-intensive modeling to better understand water resources. Addressing water resource problems in the Western U.S. requires simulation of natural and engineered systems, as well as representation of legal (water rights) and institutional constraints alongside the representation of physical processes. We are establishing data services to represent the engineered infrastructure and legal and institutional systems in a way that they can be used with high resolution multi-physics watershed modeling at high spatial resolution. These services will enable incorporation of location-specific information on water management infrastructure and systems into the assessment of regional water availability in the face of growing demands, uncertain future meteorological forcings, and existing prior-appropriations water rights. This presentation will discuss the informatics challenges involved with data management and easy-to-use access to high performance computing being tackled in this project.
Achieving production-level use of HEP software at the Argonne Leadership Computing Facility
NASA Astrophysics Data System (ADS)
Uram, T. D.; Childers, J. T.; LeCompte, T. J.; Papka, M. E.; Benjamin, D.
2015-12-01
HEP's demand for computing resources has grown beyond the capacity of the Grid, and these demands will accelerate with the higher energy and luminosity planned for Run II. Mira, the ten petaFLOPs supercomputer at the Argonne Leadership Computing Facility, is a potentially significant compute resource for HEP research. Through an award of fifty million hours on Mira, we have delivered millions of events to LHC experiments by establishing the means of marshaling jobs through serial stages on local clusters, and parallel stages on Mira. We are running several HEP applications, including Alpgen, Pythia, Sherpa, and Geant4. Event generators, such as Sherpa, typically have a split workload: a small scale integration phase, and a second, more scalable, event-generation phase. To accommodate this workload on Mira we have developed two Python-based Django applications, Balsam and ARGO. Balsam is a generalized scheduler interface which uses a plugin system for interacting with scheduler software such as HTCondor, Cobalt, and TORQUE. ARGO is a workflow manager that submits jobs to instances of Balsam. Through these mechanisms, the serial and parallel tasks within jobs are executed on the appropriate resources. This approach and its integration with the PanDA production system will be discussed.
Blood Flow: Multi-scale Modeling and Visualization (July 2011)
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2011-01-01
Multi-scale modeling of arterial blood flow can shed light on the interaction between events happening at micro- and meso-scales (i.e., adhesion of red blood cells to the arterial wall, clot formation) and at macro-scales (i.e., change in flow patterns due to the clot). Coupled numerical simulations of such multi-scale flow require state-of-the-art computers and algorithms, along with techniques for multi-scale visualizations. This animation presents early results of two studies used in the development of a multi-scale visualization methodology. The fisrt illustrates a flow of healthy (red) and diseased (blue) blood cells with a Dissipative Particle Dynamics (DPD) method. Each bloodmore » cell is represented by a mesh, small spheres show a sub-set of particles representing the blood plasma, while instantaneous streamlines and slices represent the ensemble average velocity. In the second we investigate the process of thrombus (blood clot) formation, which may be responsible for the rupture of aneurysms, by concentrating on the platelet blood cells, observing as they aggregate on the wall of an aneruysm. Simulation was performed on Kraken at the National Institute for Computational Sciences. Visualization was produced using resources of the Argonne Leadership Computing Facility at Argonne National Laboratory.« less
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets
Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L
2014-01-01
Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852
Energy Conservation Using Dynamic Voltage Frequency Scaling for Computational Cloud
Florence, A. Paulin; Shanthi, V.; Simon, C. B. Sunil
2016-01-01
Cloud computing is a new technology which supports resource sharing on a “Pay as you go” basis around the world. It provides various services such as SaaS, IaaS, and PaaS. Computation is a part of IaaS and the entire computational requests are to be served efficiently with optimal power utilization in the cloud. Recently, various algorithms are developed to reduce power consumption and even Dynamic Voltage and Frequency Scaling (DVFS) scheme is also used in this perspective. In this paper we have devised methodology which analyzes the behavior of the given cloud request and identifies the associated type of algorithm. Once the type of algorithm is identified, using their asymptotic notations, its time complexity is calculated. Using best fit strategy the appropriate host is identified and the incoming job is allocated to the victimized host. Using the measured time complexity the required clock frequency of the host is measured. According to that CPU frequency is scaled up or down using DVFS scheme, enabling energy to be saved up to 55% of total Watts consumption. PMID:27239551
Energy Conservation Using Dynamic Voltage Frequency Scaling for Computational Cloud.
Florence, A Paulin; Shanthi, V; Simon, C B Sunil
2016-01-01
Cloud computing is a new technology which supports resource sharing on a "Pay as you go" basis around the world. It provides various services such as SaaS, IaaS, and PaaS. Computation is a part of IaaS and the entire computational requests are to be served efficiently with optimal power utilization in the cloud. Recently, various algorithms are developed to reduce power consumption and even Dynamic Voltage and Frequency Scaling (DVFS) scheme is also used in this perspective. In this paper we have devised methodology which analyzes the behavior of the given cloud request and identifies the associated type of algorithm. Once the type of algorithm is identified, using their asymptotic notations, its time complexity is calculated. Using best fit strategy the appropriate host is identified and the incoming job is allocated to the victimized host. Using the measured time complexity the required clock frequency of the host is measured. According to that CPU frequency is scaled up or down using DVFS scheme, enabling energy to be saved up to 55% of total Watts consumption.
NASA Technical Reports Server (NTRS)
Johnston, William E.; Gannon, Dennis; Nitzberg, Bill
2000-01-01
We use the term "Grid" to refer to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. This infrastructure includes: (1) Tools for constructing collaborative, application oriented Problem Solving Environments / Frameworks (the primary user interfaces for Grids); (2) Programming environments, tools, and services providing various approaches for building applications that use aggregated computing and storage resources, and federated data sources; (3) Comprehensive and consistent set of location independent tools and services for accessing and managing dynamic collections of widely distributed resources: heterogeneous computing systems, storage systems, real-time data sources and instruments, human collaborators, and communications systems; (4) Operational infrastructure including management tools for distributed systems and distributed resources, user services, accounting and auditing, strong and location independent user authentication and authorization, and overall system security services The vision for NASA's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks. Such Grids will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. Examples of these problems include: (1) Coupled, multidisciplinary simulations too large for single systems (e.g., multi-component NPSS turbomachine simulation); (2) Use of widely distributed, federated data archives (e.g., simultaneous access to metrological, topological, aircraft performance, and flight path scheduling databases supporting a National Air Space Simulation systems}; (3) Coupling large-scale computing and data systems to scientific and engineering instruments (e.g., realtime interaction with experiments through real-time data analysis and interpretation presented to the experimentalist in ways that allow direct interaction with the experiment (instead of just with instrument control); (5) Highly interactive, augmented reality and virtual reality remote collaborations (e.g., Ames / Boeing Remote Help Desk providing field maintenance use of coupled video and NDI to a remote, on-line airframe structures expert who uses this data to index into detailed design databases, and returns 3D internal aircraft geometry to the field); (5) Single computational problems too large for any single system (e.g. the rotocraft reference calculation). Grids also have the potential to provide pools of resources that could be called on in extraordinary / rapid response situations (such as disaster response) because they can provide common interfaces and access mechanisms, standardized management, and uniform user authentication and authorization, for large collections of distributed resources (whether or not they normally function in concert). IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: the scientist / design engineer whose primary interest is problem solving (e.g. determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user is the tool designer: the computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. The results of the analysis of the needs of these two types of users provides a broad set of requirements that gives rise to a general set of required capabilities. The IPG project is intended to address all of these requirements. In some cases the required computing technology exists, and in some cases it must be researched and developed. The project is using available technology to provide a prototype set of capabilities in a persistent distributed computing testbed. Beyond this, there are required capabilities that are not immediately available, and whose development spans the range from near-term engineering development (one to two years) to much longer term R&D (three to six years). Additional information is contained in the original.
NASA Astrophysics Data System (ADS)
Lin, Y.; O'Malley, D.; Vesselinov, V. V.
2015-12-01
Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
bioNerDS: exploring bioinformatics’ database and software use through literature mining
2013-01-01
Background Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. Results We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. Abstract Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/. PMID:23768135
Prototyping an institutional IAIMS/UMLS information environment for an academic medical center.
Miller, P L; Paton, J A; Clyman, J I; Powsner, S M
1992-07-01
The paper describes a prototype information environment designed to link network-based information resources in an integrated fashion and thus enhance the information capabilities of an academic medical center. The prototype was implemented on a single Macintosh computer to permit exploration of the overall "information architecture" and to demonstrate the various desired capabilities prior to full-scale network-based implementation. At the heart of the prototype are two components: a diverse set of information resources available over an institutional computer network and an information sources map designed to assist users in finding and accessing information resources relevant to their needs. The paper describes these and other components of the prototype and presents a scenario illustrating its use. The prototype illustrates the link between the goals of two National Library of Medicine initiatives, the Integrated Academic Information Management System (IAIMS) and the Unified Medical Language System (UMLS).
2015-01-01
Economies are instances of complex socio-technical systems that are shaped by the interactions of large numbers of individuals. The individual behavior and decision-making of consumer agents is determined by complex psychological dynamics that include their own assessment of present and future economic conditions as well as those of others, potentially leading to feedback loops that affect the macroscopic state of the economic system. We propose that the large-scale interactions of a nation's citizens with its online resources can reveal the complex dynamics of their collective psychology, including their assessment of future system states. Here we introduce a behavioral index of Chinese Consumer Confidence (C3I) that computationally relates large-scale online search behavior recorded by Google Trends data to the macroscopic variable of consumer confidence. Our results indicate that such computational indices may reveal the components and complex dynamics of consumer psychology as a collective socio-economic phenomenon, potentially leading to improved and more refined economic forecasting. PMID:25826692
Dong, Xianlei; Bollen, Johan
2015-01-01
Economies are instances of complex socio-technical systems that are shaped by the interactions of large numbers of individuals. The individual behavior and decision-making of consumer agents is determined by complex psychological dynamics that include their own assessment of present and future economic conditions as well as those of others, potentially leading to feedback loops that affect the macroscopic state of the economic system. We propose that the large-scale interactions of a nation's citizens with its online resources can reveal the complex dynamics of their collective psychology, including their assessment of future system states. Here we introduce a behavioral index of Chinese Consumer Confidence (C3I) that computationally relates large-scale online search behavior recorded by Google Trends data to the macroscopic variable of consumer confidence. Our results indicate that such computational indices may reveal the components and complex dynamics of consumer psychology as a collective socio-economic phenomenon, potentially leading to improved and more refined economic forecasting.
How Much Higher Can HTCondor Fly?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fajardo, E. M.; Dost, J. M.; Holzman, B.
The HTCondor high throughput computing system is heavily used in the high energy physics (HEP) community as the batch system for several Worldwide LHC Computing Grid (WLCG) resources. Moreover, it is the backbone of GlidelnWMS, the pilot system used by the computing organization of the Compact Muon Solenoid (CMS) experiment. To prepare for LHC Run 2, we probed the scalability limits of new versions and configurations of HTCondor with a goal of reaching 200,000 simultaneous running jobs in a single internationally distributed dynamic pool.In this paper, we first describe how we created an opportunistic distributed testbed capable of exercising runsmore » with 200,000 simultaneous jobs without impacting production. This testbed methodology is appropriate not only for scale testing HTCondor, but potentially for many other services. In addition to the test conditions and the testbed topology, we include the suggested configuration options used to obtain the scaling results, and describe some of the changes to HTCondor inspired by our testing that enabled sustained operations at scales well beyond previous limits.« less
ASC FY17 Implementation Plan, Rev. 1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamilton, P. G.
The Stockpile Stewardship Program (SSP) is an integrated technical program for maintaining the safety, surety, and reliability of the U.S. nuclear stockpile. The SSP uses nuclear test data, computational modeling and simulation, and experimental facilities to advance understanding of nuclear weapons. It includes stockpile surveillance, experimental research, development and engineering programs, and an appropriately scaled production capability to support stockpile requirements. This integrated national program requires the continued use of experimental facilities and programs, and the computational capabilities to support these programs. The Advanced Simulation and Computing Program (ASC) is a cornerstone of the SSP, providing simulation capabilities and computationalmore » resources that support annual stockpile assessment and certification, study advanced nuclear weapons design and manufacturing processes, analyze accident scenarios and weapons aging, and provide the tools to enable stockpile Life Extension Programs (LEPs) and the resolution of Significant Finding Investigations (SFIs). This requires a balance of resources, including technical staff, hardware, simulation software, and computer science solutions.« less
Application of computational aero-acoustics to real world problems
NASA Technical Reports Server (NTRS)
Hardin, Jay C.
1996-01-01
The application of computational aeroacoustics (CAA) to real problems is discussed in relation to the analysis performed with the aim of assessing the application of the various techniques. It is considered that the applications are limited by the inability of the computational resources to resolve the large range of scales involved in high Reynolds number flows. Possible simplifications are discussed. It is considered that problems remain to be solved in relation to the efficient use of the power of parallel computers and in the development of turbulent modeling schemes. The goal of CAA is stated as being the implementation of acoustic design studies on a computer terminal with reasonable run times.
NASA Technical Reports Server (NTRS)
Nez, G. (Principal Investigator); Mutter, D.
1977-01-01
The author has identified the following significant results. The project mapped land use/cover classifications from LANDSAT computer compatible tape data and combined those results with other multisource data via computer mapping/compositing techniques to analyze various land use planning/natural resource management problems. Data were analyzed on 1:24,000 scale maps at 1.1 acre resolution. LANDSAT analysis software and linkages with other computer mapping software were developed. Significant results were also achieved in training, communication, and identification of needs for developing the LANDSAT/computer mapping technologies into operational tools for use by decision makers.
An efficient two-stage approach for image-based FSI analysis of atherosclerotic arteries
Rayz, Vitaliy L.; Mofrad, Mohammad R. K.; Saloner, David
2010-01-01
Patient-specific biomechanical modeling of atherosclerotic arteries has the potential to aid clinicians in characterizing lesions and determining optimal treatment plans. To attain high levels of accuracy, recent models use medical imaging data to determine plaque component boundaries in three dimensions, and fluid–structure interaction is used to capture mechanical loading of the diseased vessel. As the plaque components and vessel wall are often highly complex in shape, constructing a suitable structured computational mesh is very challenging and can require a great deal of time. Models based on unstructured computational meshes require relatively less time to construct and are capable of accurately representing plaque components in three dimensions. These models unfortunately require additional computational resources and computing time for accurate and meaningful results. A two-stage modeling strategy based on unstructured computational meshes is proposed to achieve a reasonable balance between meshing difficulty and computational resource and time demand. In this method, a coarsegrained simulation of the full arterial domain is used to guide and constrain a fine-scale simulation of a smaller region of interest within the full domain. Results for a patient-specific carotid bifurcation model demonstrate that the two-stage approach can afford a large savings in both time for mesh generation and time and resources needed for computation. The effects of solid and fluid domain truncation were explored, and were shown to minimally affect accuracy of the stress fields predicted with the two-stage approach. PMID:19756798
NASA Astrophysics Data System (ADS)
Moslehi, M.; de Barros, F.; Rajagopal, R.
2014-12-01
Hydrogeological models that represent flow and transport in subsurface domains are usually large-scale with excessive computational complexity and uncertain characteristics. Uncertainty quantification for predicting flow and transport in heterogeneous formations often entails utilizing a numerical Monte Carlo framework, which repeatedly simulates the model according to a random field representing hydrogeological characteristics of the field. The physical resolution (e.g. grid resolution associated with the physical space) for the simulation is customarily chosen based on recommendations in the literature, independent of the number of Monte Carlo realizations. This practice may lead to either excessive computational burden or inaccurate solutions. We propose an optimization-based methodology that considers the trade-off between the following conflicting objectives: time associated with computational costs, statistical convergence of the model predictions and physical errors corresponding to numerical grid resolution. In this research, we optimally allocate computational resources by developing a modeling framework for the overall error based on a joint statistical and numerical analysis and optimizing the error model subject to a given computational constraint. The derived expression for the overall error explicitly takes into account the joint dependence between the discretization error of the physical space and the statistical error associated with Monte Carlo realizations. The accuracy of the proposed framework is verified in this study by applying it to several computationally extensive examples. Having this framework at hand aims hydrogeologists to achieve the optimum physical and statistical resolutions to minimize the error with a given computational budget. Moreover, the influence of the available computational resources and the geometric properties of the contaminant source zone on the optimum resolutions are investigated. We conclude that the computational cost associated with optimal allocation can be substantially reduced compared with prevalent recommendations in the literature.
Simulation of LHC events on a millions threads
NASA Astrophysics Data System (ADS)
Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.
2015-12-01
Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the Grid, however, will not double. The HEP community must consider how to bridge this computing gap by targeting larger compute resources and using the available compute resources as efficiently as possible. Argonne's Mira, the fifth fastest supercomputer in the world, can run roughly five times the number of parallel processes that the ATLAS experiment typically uses on the Grid. We ported Alpgen, a serial x86 code, to run as a parallel application under MPI on the Blue Gene/Q architecture. By analysis of the Alpgen code, we reduced the memory footprint to allow running 64 threads per node, utilizing the four hardware threads available per core on the PowerPC A2 processor. Event generation and unweighting, typically run as independent serial phases, are coupled together in a single job in this scenario, reducing intermediate writes to the filesystem. By these optimizations, we have successfully run LHC proton-proton physics event generation at the scale of a million threads, filling two-thirds of Mira.
Overview of the LINCS architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fletcher, J.G.; Watson, R.W.
1982-01-13
Computing at the Lawrence Livermore National Laboratory (LLNL) has evolved over the past 15 years with a computer network based resource sharing environment. The increasing use of low cost and high performance micro, mini and midi computers and commercially available local networking systems will accelerate this trend. Further, even the large scale computer systems, on which much of the LLNL scientific computing depends, are evolving into multiprocessor systems. It is our belief that the most cost effective use of this environment will depend on the development of application systems structured into cooperating concurrent program modules (processes) distributed appropriately over differentmore » nodes of the environment. A node is defined as one or more processors with a local (shared) high speed memory. Given the latter view, the environment can be characterized as consisting of: multiple nodes communicating over noisy channels with arbitrary delays and throughput, heterogenous base resources and information encodings, no single administration controlling all resources, distributed system state, and no uniform time base. The system design problem is - how to turn the heterogeneous base hardware/firmware/software resources of this environment into a coherent set of resources that facilitate development of cost effective, reliable, and human engineered applications. We believe the answer lies in developing a layered, communication oriented distributed system architecture; layered and modular to support ease of understanding, reconfiguration, extensibility, and hiding of implementation or nonessential local details; communication oriented because that is a central feature of the environment. The Livermore Interactive Network Communication System (LINCS) is a hierarchical architecture designed to meet the above needs. While having characteristics in common with other architectures, it differs in several respects.« less
Multi-discipline resource inventory of soils, vegetation and geology
NASA Technical Reports Server (NTRS)
Simonson, G. H. (Principal Investigator); Paine, D. P.; Lawrence, R. D.; Norgren, J. A.; Pyott, W. Y.; Herzog, J. H.; Murray, R. J.; Rogers, R.
1973-01-01
The author has identified the following significant results. Computer classification of natural vegetation, in the vicinity of Big Summit Prairie, Crook County, Oregon was carried out using MSS digital data. Impure training sets, representing eleven vegetation types plus water, were selected from within the area to be classified. Close correlations were visually observed between vegetation types mapped from the large scale photographs and the computer classification of the ERTS data (Frame 1021-18151, 13 August 1972).
The HEPCloud Facility: elastic computing for High Energy Physics – The NOvA Use Case
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fuess, S.; Garzoglio, G.; Holzman, B.
The need for computing in the HEP community follows cycles of peaks and valleys mainly driven by conference dates, accelerator shutdown, holiday schedules, and other factors. Because of this, the classical method of provisioning these resources at providing facilities has drawbacks such as potential overprovisioning. As the appetite for computing increases, however, so does the need to maximize cost efficiency by developing a model for dynamically provisioning resources only when needed. To address this issue, the HEPCloud project was launched by the Fermilab Scientific Computing Division in June 2015. Its goal is to develop a facility that provides a commonmore » interface to a variety of resources, including local clusters, grids, high performance computers, and community and commercial Clouds. Initially targeted experiments include CMS and NOvA, as well as other Fermilab stakeholders. In its first phase, the project has demonstrated the use of the “elastic” provisioning model offered by commercial clouds, such as Amazon Web Services. In this model, resources are rented and provisioned automatically over the Internet upon request. In January 2016, the project demonstrated the ability to increase the total amount of global CMS resources by 58,000 cores from 150,000 cores - a 25 percent increase - in preparation for the Recontres de Moriond. In March 2016, the NOvA experiment has also demonstrated resource burst capabilities with an additional 7,300 cores, achieving a scale almost four times as large as the local allocated resources and utilizing the local AWS s3 storage to optimize data handling operations and costs. NOvA was using the same familiar services used for local computations, such as data handling and job submission, in preparation for the Neutrino 2016 conference. In both cases, the cost was contained by the use of the Amazon Spot Instance Market and the Decision Engine, a HEPCloud component that aims at minimizing cost and job interruption. This paper describes the Fermilab HEPCloud Facility and the challenges overcome for the CMS and NOvA communities.« less
Program on application of communications satellites to educational development
NASA Technical Reports Server (NTRS)
Morgan, R. P.; Singh, J. P.
1971-01-01
Interdisciplinary research in needs analysis, communications technology studies, and systems synthesis is reported. Existing and planned educational telecommunications services are studied and library utilization of telecommunications is described. Preliminary estimates are presented of ranges of utilization of educational telecommunications services for 1975 and 1985; instructional and public television, computer-aided instruction, computing resources, and information resource sharing for various educational levels and purposes. Communications technology studies include transmission schemes for still-picture television, use of Gunn effect devices, and TV receiver front ends for direct satellite reception at 12 GHz. Two major studies in the systems synthesis project concern (1) organizational and administrative aspects of a large-scale instructional satellite system to be used with schools and (2) an analysis of future development of instructional television, with emphasis on the use of video tape recorders and cable television. A communications satellite system synthesis program developed for NASA is now operational on the university IBM 360-50 computer.
Implementing Parquet equations using HPX
NASA Astrophysics Data System (ADS)
Kellar, Samuel; Wagle, Bibek; Yang, Shuxiang; Tam, Ka-Ming; Kaiser, Hartmut; Moreno, Juana; Jarrell, Mark
A new C++ runtime system (HPX) enables simulations of complex systems to run more efficiently on parallel and heterogeneous systems. This increased efficiency allows for solutions to larger simulations of the parquet approximation for a system with impurities. The relevancy of the parquet equations depends upon the ability to solve systems which require long runs and large amounts of memory. These limitations, in addition to numerical complications arising from stability of the solutions, necessitate running on large distributed systems. As the computational resources trend towards the exascale and the limitations arising from computational resources vanish efficiency of large scale simulations becomes a focus. HPX facilitates efficient simulations through intelligent overlapping of computation and communication. Simulations such as the parquet equations which require the transfer of large amounts of data should benefit from HPX implementations. Supported by the the NSF EPSCoR Cooperative Agreement No. EPS-1003897 with additional support from the Louisiana Board of Regents.
Science Information System in Japan. NIER Occasional Paper 02/83.
ERIC Educational Resources Information Center
Matsumura, Tamiko
This paper describes the development of a proposed Japanese Science Information System (SIS), a nationwide network of research and academic libraries, large-scale computer centers, national research institutes, and other organizations, to be formed for the purpose of sharing information and resources in the natural sciences, technology, the…
A Rich Metadata Filesystem for Scientific Data
ERIC Educational Resources Information Center
Bui, Hoang
2012-01-01
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides…
Grossman, Robert L.; Heath, Allison; Murphy, Mark; Patterson, Maria; Wells, Walt
2017-01-01
Data commons collocate data, storage, and computing infrastructure with core services and commonly used tools and applications for managing, analyzing, and sharing data to create an interoperable resource for the research community. An architecture for data commons is described, as well as some lessons learned from operating several large-scale data commons. PMID:29033693
SCIMITAR: Scalable Stream-Processing for Sensor Information Brokering
2013-11-01
IaaS) cloud frameworks including Amazon Web Services and Eucalyptus . For load testing, we used The Grinder [9], a Java load testing framework that...internal Eucalyptus cluster which we could not scale as large as the Amazon environment due to a lack of computation resources. We recreated our
CoreTSAR: Core Task-Size Adapting Runtime
Scogland, Thomas R. W.; Feng, Wu-chun; Rountree, Barry; ...
2014-10-27
Heterogeneity continues to increase at all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other co-processors into everything from desktops to supercomputers. As a consequence, efficiently managing such disparate resources has become increasingly complex. CoreTSAR seeks to reduce this complexity by adaptively worksharing parallel-loop regions across compute resources without requiring any transformation of the code within the loop. Lastly, our results show performance improvements of up to three-fold over a current state-of-the-art heterogeneous task scheduler as well as linear performance scaling from a single GPU to four GPUs for many codes. In addition, CoreTSAR demonstratesmore » a robust ability to adapt to both a variety of workloads and underlying system configurations.« less
Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters
Torres-Huitzil, Cesar
2013-01-01
Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k 2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456
Midekisa, Alemayehu; Holl, Felix; Savory, David J; Andrade-Pacheco, Ricardo; Gething, Peter W; Bennett, Adam; Sturrock, Hugh J W
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth's land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources.
Holl, Felix; Savory, David J.; Andrade-Pacheco, Ricardo; Gething, Peter W.; Bennett, Adam; Sturrock, Hugh J. W.
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth’s land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources. PMID:28953943
Jade: using on-demand cloud analysis to give scientists back their flow
NASA Astrophysics Data System (ADS)
Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.
2017-12-01
The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.
Testing a computer-based ostomy care training resource for staff nurses.
Bales, Isabel
2010-05-01
Fragmented teaching and ostomy care provided by nonspecialized clinicians unfamiliar with state-of-the-art care and products have been identified as problems in teaching ostomy care to the new ostomate. After conducting a literature review of theories and concepts related to the impact of nurse behaviors and confidence on ostomy care, the author developed a computer-based learning resource and assessed its effect on staff nurse confidence. Of 189 staff nurses with a minimum of 1 year acute-care experience employed in the acute care, emergency, and rehabilitation departments of an acute care facility in the Midwestern US, 103 agreed to participate and returned completed pre- and post-tests, each comprising the same eight statements about providing ostomy care. F and P values were computed for differences between pre- and post test scores. Based on a scale where 1 = totally disagree and 5 = totally agree with the statement, baseline confidence and perceived mean knowledge scores averaged 3.8 and after viewing the resource program post-test mean scores averaged 4.51, a statistically significant improvement (P = 0.000). The largest difference between pre- and post test scores involved feeling confident in having the resources to learn ostomy skills independently. The availability of an electronic ostomy care resource was rated highly in both pre- and post testing. Studies to assess the effects of increased confidence and knowledge on the quality and provision of care are warranted.
COMPUTATIONAL RESOURCES FOR BIOFUEL FEEDSTOCK SPECIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buell, Carol Robin; Childs, Kevin L
2013-05-07
While current production of ethanol as a biofuel relies on starch and sugar inputs, it is anticipated that sustainable production of ethanol for biofuel use will utilize lignocellulosic feedstocks. Candidate plant species to be used for lignocellulosic ethanol production include a large number of species within the Grass, Pine and Birch plant families. For these biofuel feedstock species, there are variable amounts of genome sequence resources available, ranging from complete genome sequences (e.g. sorghum, poplar) to transcriptome data sets (e.g. switchgrass, pine). These data sets are not only dispersed in location but also disparate in content. It will be essentialmore » to leverage and improve these genomic data sets for the improvement of biofuel feedstock production. The objectives of this project were to provide computational tools and resources for data-mining genome sequence/annotation and large-scale functional genomic datasets available for biofuel feedstock species. We have created a Bioenergy Feedstock Genomics Resource that provides a web-based portal or clearing house for genomic data for plant species relevant to biofuel feedstock production. Sequence data from a total of 54 plant species are included in the Bioenergy Feedstock Genomics Resource including model plant species that permit leveraging of knowledge across taxa to biofuel feedstock species.We have generated additional computational analyses of these data, including uniform annotation, to facilitate genomic approaches to improved biofuel feedstock production. These data have been centralized in the publicly available Bioenergy Feedstock Genomics Resource (http://bfgr.plantbiology.msu.edu/).« less
Architecture Framework for Trapped-Ion Quantum Computer based on Performance Simulation Tool
NASA Astrophysics Data System (ADS)
Ahsan, Muhammad
The challenge of building scalable quantum computer lies in striking appropriate balance between designing a reliable system architecture from large number of faulty computational resources and improving the physical quality of system components. The detailed investigation of performance variation with physics of the components and the system architecture requires adequate performance simulation tool. In this thesis we demonstrate a software tool capable of (1) mapping and scheduling the quantum circuit on a realistic quantum hardware architecture with physical resource constraints, (2) evaluating the performance metrics such as the execution time and the success probability of the algorithm execution, and (3) analyzing the constituents of these metrics and visualizing resource utilization to identify system components which crucially define the overall performance. Using this versatile tool, we explore vast design space for modular quantum computer architecture based on trapped ions. We find that while success probability is uniformly determined by the fidelity of physical quantum operation, the execution time is a function of system resources invested at various layers of design hierarchy. At physical level, the number of lasers performing quantum gates, impact the latency of the fault-tolerant circuit blocks execution. When these blocks are used to construct meaningful arithmetic circuit such as quantum adders, the number of ancilla qubits for complicated non-clifford gates and entanglement resources to establish long-distance communication channels, become major performance limiting factors. Next, in order to factorize large integers, these adders are assembled into modular exponentiation circuit comprising bulk of Shor's algorithm. At this stage, the overall scaling of resource-constraint performance with the size of problem, describes the effectiveness of chosen design. By matching the resource investment with the pace of advancement in hardware technology, we find optimal designs for different types of quantum adders. Conclusively, we show that 2,048-bit Shor's algorithm can be reliably executed within the resource budget of 1.5 million qubits.
NASA Astrophysics Data System (ADS)
Huang, Qian
2014-09-01
Scientific computing often requires the availability of a massive number of computers for performing large-scale simulations, and computing in mineral physics is no exception. In order to investigate physical properties of minerals at extreme conditions in computational mineral physics, parallel computing technology is used to speed up the performance by utilizing multiple computer resources to process a computational task simultaneously thereby greatly reducing computation time. Traditionally, parallel computing has been addressed by using High Performance Computing (HPC) solutions and installed facilities such as clusters and super computers. Today, it has been seen that there is a tremendous growth in cloud computing. Infrastructure as a Service (IaaS), the on-demand and pay-as-you-go model, creates a flexible and cost-effective mean to access computing resources. In this paper, a feasibility report of HPC on a cloud infrastructure is presented. It is found that current cloud services in IaaS layer still need to improve performance to be useful to research projects. On the other hand, Software as a Service (SaaS), another type of cloud computing, is introduced into an HPC system for computing in mineral physics, and an application of which is developed. In this paper, an overall description of this SaaS application is presented. This contribution can promote cloud application development in computational mineral physics, and cross-disciplinary studies.
Evaluation of Computer-Based Training for Health Workers in Echocardiography for RHD.
Engelman, Daniel; Okello, Emmy; Beaton, Andrea; Selnow, Gary; Remenyi, Bo; Watson, Caroline; Longenecker, Chris T; Sable, Craig; Steer, Andrew C
2017-03-01
The implementation of screening for rheumatic heart disease at a population-scale would require a considerable increase in human resources. Training nonexpert staff in echocardiography requires appropriate methods and materials. This pre/post study aims to measure the change in the knowledge and confidence of a group of health workers after a computer-assisted training intervention in basic echocardiography for rheumatic heart disease. A syllabus of self-guided, computer-based modules to train nonexpert health workers in basic echocardiography for rheumatic heart disease was developed. Thirty-eight health workers from Uganda participated in the training. Using a pre/post design, identical test instruments were administered before and after the training intervention, assessing the knowledge (using multiple-choice questions) and confidence (using Likert scale questions) in clinical science and echocardiography. The mean total score on knowledge tests rose from 44.8% to 85.4% (mean difference: 40.6%, 95% confidence interval [CI]: 35.4% to 45.8%), with strong evidence for an increase in scores across all knowledge theme areas (p < 0.001). Increased confidence with each key aspect was reported, and there was strong evidence for an increase in the mean score for confidence scales in clinical science (difference: 7.1, 95% CI: 6.2 to 8.0; p < 0.001) and echocardiography (difference: 18.3, 95% CI: 16.6 to 20.0; p < 0.001). The training program was effective at increasing knowledge and confidence for basic echocardiography in nonexpert health workers. Use of computer-assisted learning may reduce the human resource requirements for training staff in echocardiography. Copyright © 2016 World Heart Federation (Geneva). Published by Elsevier B.V. All rights reserved.
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.
Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L
2014-01-01
As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Squid - a simple bioinformatics grid.
Carvalho, Paulo C; Glória, Rafael V; de Miranda, Antonio B; Degrave, Wim M
2005-08-03
BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Livny, Miron; Shank, James; Ernst, Michael
Under this SciDAC-2 grant the project’s goal w a s t o stimulate new discoveries by providing scientists with effective and dependable access to an unprecedented national distributed computational facility: the Open Science Grid (OSG). We proposed to achieve this through the work of the Open Science Grid Consortium: a unique hands-on multi-disciplinary collaboration of scientists, software developers and providers of computing resources. Together the stakeholders in this consortium sustain and use a shared distributed computing environment that transforms simulation and experimental science in the US. The OSG consortium is an open collaboration that actively engages new research communities. Wemore » operate an open facility that brings together a broad spectrum of compute, storage, and networking resources and interfaces to other cyberinfrastructures, including the US XSEDE (previously TeraGrid), the European Grids for ESciencE (EGEE), as well as campus and regional grids. We leverage middleware provided by computer science groups, facility IT support organizations, and computing programs of application communities for the benefit of consortium members and the US national CI.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Ann E; Barker, Ashley D; Bland, Arthur S Buddy
Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) continues to deliver the most powerful resources in the U.S. for open science. At 2.33 petaflops peak performance, the Cray XT Jaguar delivered more than 1.4 billion core hours in calendar year (CY) 2011 to researchers around the world for computational simulations relevant to national and energy security; advancing the frontiers of knowledge in physical sciences and areas of biological, medical, environmental, and computer sciences; and providing world-class research facilities for the nation's science enterprise. Users reported more than 670 publications this year arising from their use of OLCF resources. Of thesemore » we report the 300 in this review that are consistent with guidance provided. Scientific achievements by OLCF users cut across all range scales from atomic to molecular to large-scale structures. At the atomic scale, researchers discovered that the anomalously long half-life of Carbon-14 can be explained by calculating, for the first time, the very complex three-body interactions between all the neutrons and protons in the nucleus. At the molecular scale, researchers combined experimental results from LBL's light source and simulations on Jaguar to discover how DNA replication continues past a damaged site so a mutation can be repaired later. Other researchers combined experimental results from ORNL's Spallation Neutron Source and simulations on Jaguar to reveal the molecular structure of ligno-cellulosic material used in bioethanol production. This year, Jaguar has been used to do billion-cell CFD calculations to develop shock wave compression turbo machinery as a means to meet DOE goals for reducing carbon sequestration costs. General Electric used Jaguar to calculate the unsteady flow through turbo machinery to learn what efficiencies the traditional steady flow assumption is hiding from designers. Even a 1% improvement in turbine design can save the nation billions of gallons of fuel.« less
The Collaborative Seismic Earth Model Project
NASA Astrophysics Data System (ADS)
Fichtner, A.; van Herwaarden, D. P.; Afanasiev, M.
2017-12-01
We present the first generation of the Collaborative Seismic Earth Model (CSEM). This effort is intended to address grand challenges in tomography that currently inhibit imaging the Earth's interior across the seismically accessible scales: [1] For decades to come, computational resources will remain insufficient for the exploitation of the full observable seismic bandwidth. [2] With the man power of individual research groups, only small fractions of available waveform data can be incorporated into seismic tomographies. [3] The limited incorporation of prior knowledge on 3D structure leads to slow progress and inefficient use of resources. The CSEM is a multi-scale model of global 3D Earth structure that evolves continuously through successive regional refinements. Taking the current state of the CSEM as initial model, these refinements are contributed by external collaborators, and used to advance the CSEM to the next state. This mode of operation allows the CSEM to [1] harness the distributed man and computing power of the community, [2] to make consistent use of prior knowledge, and [3] to combine different tomographic techniques, needed to cover the seismic data bandwidth. Furthermore, the CSEM has the potential to serve as a unified and accessible representation of tomographic Earth models. Generation 1 comprises around 15 regional tomographic refinements, computed with full-waveform inversion. These include continental-scale mantle models of North America, Australasia, Europe and the South Atlantic, as well as detailed regional models of the crust beneath the Iberian Peninsula and western Turkey. A global-scale full-waveform inversion ensures that regional refinements are consistent with whole-Earth structure. This first generation will serve as the basis for further automation and methodological improvements concerning validation and uncertainty quantification.
Integration of Titan supercomputer at OLCF with ATLAS Production System
NASA Astrophysics Data System (ADS)
Barreiro Megino, F.; De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wells, J.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this paper we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job submission to Titan’s batch queues and local data management, with lightweight MPI wrappers to run single node workloads in parallel on Titan’s multi-core worker nodes. It provides for running of standard ATLAS production jobs on unused resources (backfill) on Titan. The system already allowed ATLAS to collect on Titan millions of core-hours per month, execute hundreds of thousands jobs, while simultaneously improving Titans utilization efficiency. We will discuss the details of the implementation, current experience with running the system, as well as future plans aimed at improvements in scalability and efficiency. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Prediction of resource volumes at untested locations using simple local prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2006-01-01
This paper shows how local spatial nonparametric prediction models can be applied to estimate volumes of recoverable gas resources at individual undrilled sites, at multiple sites on a regional scale, and to compute confidence bounds for regional volumes based on the distribution of those estimates. An approach that combines cross-validation, the jackknife, and bootstrap procedures is used to accomplish this task. Simulation experiments show that cross-validation can be applied beneficially to select an appropriate prediction model. The cross-validation procedure worked well for a wide range of different states of nature and levels of information. Jackknife procedures are used to compute individual prediction estimation errors at undrilled locations. The jackknife replicates also are used with a bootstrap resampling procedure to compute confidence bounds for the total volume. The method was applied to data (partitioned into a training set and target set) from the Devonian Antrim Shale continuous-type gas play in the Michigan Basin in Otsego County, Michigan. The analysis showed that the model estimate of total recoverable volumes at prediction sites is within 4 percent of the total observed volume. The model predictions also provide frequency distributions of the cell volumes at the production unit scale. Such distributions are the basis for subsequent economic analyses. ?? Springer Science+Business Media, LLC 2007.
Tsotsos, John K.
2017-01-01
Much has been written about how the biological brain might represent and process visual information, and how this might inspire and inform machine vision systems. Indeed, tremendous progress has been made, and especially during the last decade in the latter area. However, a key question seems too often, if not mostly, be ignored. This question is simply: do proposed solutions scale with the reality of the brain's resources? This scaling question applies equally to brain and to machine solutions. A number of papers have examined the inherent computational difficulty of visual information processing using theoretical and empirical methods. The main goal of this activity had three components: to understand the deep nature of the computational problem of visual information processing; to discover how well the computational difficulty of vision matches to the fixed resources of biological seeing systems; and, to abstract from the matching exercise the key principles that lead to the observed characteristics of biological visual performance. This set of components was termed complexity level analysis in Tsotsos (1987) and was proposed as an important complement to Marr's three levels of analysis. This paper revisits that work with the advantage that decades of hindsight can provide. PMID:28848458
Tsotsos, John K
2017-01-01
Much has been written about how the biological brain might represent and process visual information, and how this might inspire and inform machine vision systems. Indeed, tremendous progress has been made, and especially during the last decade in the latter area. However, a key question seems too often, if not mostly, be ignored. This question is simply: do proposed solutions scale with the reality of the brain's resources? This scaling question applies equally to brain and to machine solutions. A number of papers have examined the inherent computational difficulty of visual information processing using theoretical and empirical methods. The main goal of this activity had three components: to understand the deep nature of the computational problem of visual information processing; to discover how well the computational difficulty of vision matches to the fixed resources of biological seeing systems; and, to abstract from the matching exercise the key principles that lead to the observed characteristics of biological visual performance. This set of components was termed complexity level analysis in Tsotsos (1987) and was proposed as an important complement to Marr's three levels of analysis. This paper revisits that work with the advantage that decades of hindsight can provide.
Evaluating open-source cloud computing solutions for geosciences
NASA Astrophysics Data System (ADS)
Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong
2013-09-01
Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
Cloud Computing for radiologists.
Kharat, Amit T; Safvi, Amjad; Thind, Ss; Singh, Amarjit
2012-07-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future.
Cloud Computing for radiologists
Kharat, Amit T; Safvi, Amjad; Thind, SS; Singh, Amarjit
2012-01-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future. PMID:23599560
Integrating computational methods to retrofit enzymes to synthetic pathways.
Brunk, Elizabeth; Neri, Marilisa; Tavernelli, Ivano; Hatzimanikatis, Vassily; Rothlisberger, Ursula
2012-02-01
Microbial production of desired compounds provides an efficient framework for the development of renewable energy resources. To be competitive to traditional chemistry, one requirement is to utilize the full capacity of the microorganism to produce target compounds with high yields and turnover rates. We use integrated computational methods to generate and quantify the performance of novel biosynthetic routes that contain highly optimized catalysts. Engineering a novel reaction pathway entails addressing feasibility on multiple levels, which involves handling the complexity of large-scale biochemical networks while respecting the critical chemical phenomena at the atomistic scale. To pursue this multi-layer challenge, our strategy merges knowledge-based metabolic engineering methods with computational chemistry methods. By bridging multiple disciplines, we provide an integral computational framework that could accelerate the discovery and implementation of novel biosynthetic production routes. Using this approach, we have identified and optimized a novel biosynthetic route for the production of 3HP from pyruvate. Copyright © 2011 Wiley Periodicals, Inc.
A test-bed modeling study for wave resource assessment
NASA Astrophysics Data System (ADS)
Yang, Z.; Neary, V. S.; Wang, T.; Gunawan, B.; Dallman, A.
2016-02-01
Hindcasts from phase-averaged wave models are commonly used to estimate standard statistics used in wave energy resource assessments. However, the research community and wave energy converter industry is lacking a well-documented and consistent modeling approach for conducting these resource assessments at different phases of WEC project development, and at different spatial scales, e.g., from small-scale pilot study to large-scale commercial deployment. Therefore, it is necessary to evaluate current wave model codes, as well as limitations and knowledge gaps for predicting sea states, in order to establish best wave modeling practices, and to identify future research needs to improve wave prediction for resource assessment. This paper presents the first phase of an on-going modeling study to address these concerns. The modeling study is being conducted at a test-bed site off the Central Oregon Coast using two of the most widely-used third-generation wave models - WaveWatchIII and SWAN. A nested-grid modeling approach, with domain dimension ranging from global to regional scales, was used to provide wave spectral boundary condition to a local scale model domain, which has a spatial dimension around 60km by 60km and a grid resolution of 250m - 300m. Model results simulated by WaveWatchIII and SWAN in a structured-grid framework are compared to NOAA wave buoy data for the six wave parameters, including omnidirectional wave power, significant wave height, energy period, spectral width, direction of maximum directionally resolved wave power, and directionality coefficient. Model performance and computational efficiency are evaluated, and the best practices for wave resource assessments are discussed, based on a set of standard error statistics and model run times.
Planning and assessment in land and water resource management are evolving from simple, local-scale problems toward complex, spatially explicit regional ones. Such problems have to be addressed with distributed models that can compute runoff and erosion at different spatial and t...
WLCG scale testing during CMS data challenges
NASA Astrophysics Data System (ADS)
Gutsche, O.; Hajdu, C.
2008-07-01
The CMS computing model to process and analyze LHC collision data follows a data-location driven approach and is using the WLCG infrastructure to provide access to GRID resources. As a preparation for data taking, CMS tests its computing model during dedicated data challenges. An important part of the challenges is the test of the user analysis which poses a special challenge for the infrastructure with its random distributed access patterns. The CMS Remote Analysis Builder (CRAB) handles all interactions with the WLCG infrastructure transparently for the user. During the 2006 challenge, CMS set its goal to test the infrastructure at a scale of 50,000 user jobs per day using CRAB. Both direct submissions by individual users and automated submissions by robots were used to achieve this goal. A report will be given about the outcome of the user analysis part of the challenge using both the EGEE and OSG parts of the WLCG. In particular, the difference in submission between both GRID middlewares (resource broker vs. direct submission) will be discussed. In the end, an outlook for the 2007 data challenge is given.
Classical boson sampling algorithms with superior performance to near-term experiments
NASA Astrophysics Data System (ADS)
Neville, Alex; Sparrow, Chris; Clifford, Raphaël; Johnston, Eric; Birchall, Patrick M.; Montanaro, Ashley; Laing, Anthony
2017-12-01
It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling is a rudimentary quantum algorithm tailored to the platform of linear optics, which has sparked interest as a rapid way to demonstrate such quantum supremacy. Photon statistics are governed by intractable matrix functions, which suggests that sampling from the distribution obtained by injecting photons into a linear optical network could be solved more quickly by a photonic experiment than by a classical computer. The apparently low resource requirements for large boson sampling experiments have raised expectations of a near-term demonstration of quantum supremacy by boson sampling. Here we present classical boson sampling algorithms and theoretical analyses of prospects for scaling boson sampling experiments, showing that near-term quantum supremacy via boson sampling is unlikely. Our classical algorithm, based on Metropolised independence sampling, allowed the boson sampling problem to be solved for 30 photons with standard computing hardware. Compared to current experiments, a demonstration of quantum supremacy over a successful implementation of these classical methods on a supercomputer would require the number of photons and experimental components to increase by orders of magnitude, while tackling exponentially scaling photon loss.
A new paradigm for atomically detailed simulations of kinetics in biophysical systems.
Elber, Ron
2017-01-01
The kinetics of biochemical and biophysical events determined the course of life processes and attracted considerable interest and research. For example, modeling of biological networks and cellular responses relies on the availability of information on rate coefficients. Atomically detailed simulations hold the promise of supplementing experimental data to obtain a more complete kinetic picture. However, simulations at biological time scales are challenging. Typical computer resources are insufficient to provide the ensemble of trajectories at the correct length that is required for straightforward calculations of time scales. In the last years, new technologies emerged that make atomically detailed simulations of rate coefficients possible. Instead of computing complete trajectories from reactants to products, these approaches launch a large number of short trajectories at different positions. Since the trajectories are short, they are computed trivially in parallel on modern computer architecture. The starting and termination positions of the short trajectories are chosen, following statistical mechanics theory, to enhance efficiency. These trajectories are analyzed. The analysis produces accurate estimates of time scales as long as hours. The theory of Milestoning that exploits the use of short trajectories is discussed, and several applications are described.
Reid, Jeffrey G; Carroll, Andrew; Veeraraghavan, Narayanan; Dahdouli, Mahmoud; Sundquist, Andreas; English, Adam; Bainbridge, Matthew; White, Simon; Salerno, William; Buhay, Christian; Yu, Fuli; Muzny, Donna; Daly, Richard; Duyk, Geoff; Gibbs, Richard A; Boerwinkle, Eric
2014-01-29
Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.
Advanced Simulation and Computing Fiscal Year 14 Implementation Plan, Rev. 0.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meisner, Robert; McCoy, Michel; Archer, Bill
2013-09-11
The Stockpile Stewardship Program (SSP) is a single, highly integrated technical program for maintaining the surety and reliability of the U.S. nuclear stockpile. The SSP uses nuclear test data, computational modeling and simulation, and experimental facilities to advance understanding of nuclear weapons. It includes stockpile surveillance, experimental research, development and engineering programs, and an appropriately scaled production capability to support stockpile requirements. This integrated national program requires the continued use of experimental facilities and programs, and the computational enhancements to support these programs. The Advanced Simulation and Computing Program (ASC) is a cornerstone of the SSP, providing simulation capabilities andmore » computational resources that support annual stockpile assessment and certification, study advanced nuclear weapons design and manufacturing processes, analyze accident scenarios and weapons aging, and provide the tools to enable stockpile Life Extension Programs (LEPs) and the resolution of Significant Finding Investigations (SFIs). This requires a balanced resource, including technical staff, hardware, simulation software, and computer science solutions. In its first decade, the ASC strategy focused on demonstrating simulation capabilities of unprecedented scale in three spatial dimensions. In its second decade, ASC is now focused on increasing predictive capabilities in a three-dimensional (3D) simulation environment while maintaining support to the SSP. The program continues to improve its unique tools for solving progressively more difficult stockpile problems (sufficient resolution, dimensionality, and scientific details), quantify critical margins and uncertainties, and resolve increasingly difficult analyses needed for the SSP. Moreover, ASC’s business model is integrated and focused on requirements-driven products that address long-standing technical questions related to enhanced predictive capability in the simulation tools.« less
Parameter Sweep and Optimization of Loosely Coupled Simulations Using the DAKOTA Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elwasif, Wael R; Bernholdt, David E; Pannala, Sreekanth
2012-01-01
The increasing availability of large scale computing capabilities has accelerated the development of high-fidelity coupled simulations. Such simulations typically involve the integration of models that implement various aspects of the complex phenomena under investigation. Coupled simulations are playing an integral role in fields such as climate modeling, earth systems modeling, rocket simulations, computational chemistry, fusion research, and many other computational fields. Model coupling provides scientists with systematic ways to virtually explore the physical, mathematical, and computational aspects of the problem. Such exploration is rarely done using a single execution of a simulation, but rather by aggregating the results from manymore » simulation runs that, together, serve to bring to light novel knowledge about the system under investigation. Furthermore, it is often the case (particularly in engineering disciplines) that the study of the underlying system takes the form of an optimization regime, where the control parameter space is explored to optimize an objective functions that captures system realizability, cost, performance, or a combination thereof. Novel and flexible frameworks that facilitate the integration of the disparate models into a holistic simulation are used to perform this research, while making efficient use of the available computational resources. In this paper, we describe the integration of the DAKOTA optimization and parameter sweep toolkit with the Integrated Plasma Simulator (IPS), a component-based framework for loosely coupled simulations. The integration allows DAKOTA to exploit the internal task and resource management of the IPS to dynamically instantiate simulation instances within a single IPS instance, allowing for greater control over the trade-off between efficiency of resource utilization and time to completion. We present a case study showing the use of the combined DAKOTA-IPS system to aid in the design of a lithium ion battery (LIB) cell, by studying a coupled system involving the electrochemistry and ion transport at the lower length scales and thermal energy transport at the device scales. The DAKOTA-IPS system provides a flexible tool for use in optimization and parameter sweep studies involving loosely coupled simulations that is suitable for use in situations where changes to the constituent components in the coupled simulation are impractical due to intellectual property or code heritage issues.« less
Perspectives on Sharing Models and Related Resources in Computational Biomechanics Research.
Erdemir, Ahmet; Hunter, Peter J; Holzapfel, Gerhard A; Loew, Leslie M; Middleton, John; Jacobs, Christopher R; Nithiarasu, Perumal; Löhner, Rainlad; Wei, Guowei; Winkelstein, Beth A; Barocas, Victor H; Guilak, Farshid; Ku, Joy P; Hicks, Jennifer L; Delp, Scott L; Sacks, Michael; Weiss, Jeffrey A; Ateshian, Gerard A; Maas, Steve A; McCulloch, Andrew D; Peng, Grace C Y
2018-02-01
The role of computational modeling for biomechanics research and related clinical care will be increasingly prominent. The biomechanics community has been developing computational models routinely for exploration of the mechanics and mechanobiology of diverse biological structures. As a result, a large array of models, data, and discipline-specific simulation software has emerged to support endeavors in computational biomechanics. Sharing computational models and related data and simulation software has first become a utilitarian interest, and now, it is a necessity. Exchange of models, in support of knowledge exchange provided by scholarly publishing, has important implications. Specifically, model sharing can facilitate assessment of reproducibility in computational biomechanics and can provide an opportunity for repurposing and reuse, and a venue for medical training. The community's desire to investigate biological and biomechanical phenomena crossing multiple systems, scales, and physical domains, also motivates sharing of modeling resources as blending of models developed by domain experts will be a required step for comprehensive simulation studies as well as the enhancement of their rigor and reproducibility. The goal of this paper is to understand current perspectives in the biomechanics community for the sharing of computational models and related resources. Opinions on opportunities, challenges, and pathways to model sharing, particularly as part of the scholarly publishing workflow, were sought. A group of journal editors and a handful of investigators active in computational biomechanics were approached to collect short opinion pieces as a part of a larger effort of the IEEE EMBS Computational Biology and the Physiome Technical Committee to address model reproducibility through publications. A synthesis of these opinion pieces indicates that the community recognizes the necessity and usefulness of model sharing. There is a strong will to facilitate model sharing, and there are corresponding initiatives by the scientific journals. Outside the publishing enterprise, infrastructure to facilitate model sharing in biomechanics exists, and simulation software developers are interested in accommodating the community's needs for sharing of modeling resources. Encouragement for the use of standardized markups, concerns related to quality assurance, acknowledgement of increased burden, and importance of stewardship of resources are noted. In the short-term, it is advisable that the community builds upon recent strategies and experiments with new pathways for continued demonstration of model sharing, its promotion, and its utility. Nonetheless, the need for a long-term strategy to unify approaches in sharing computational models and related resources is acknowledged. Development of a sustainable platform supported by a culture of open model sharing will likely evolve through continued and inclusive discussions bringing all stakeholders at the table, e.g., by possibly establishing a consortium.
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Cloud4Psi: cloud computing for 3D protein structure similarity searching.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-10-01
Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-01-01
Summary: Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Availability and implementation: Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. Contact: dariusz.mrozek@polsl.pl PMID:24930141
Advanced Aerospace Materials by Design
NASA Technical Reports Server (NTRS)
Srivastava, Deepak; Djomehri, Jahed; Wei, Chen-Yu
2004-01-01
The advances in the emerging field of nanophase thermal and structural composite materials; materials with embedded sensors and actuators for morphing structures; light-weight composite materials for energy and power storage; and large surface area materials for in-situ resource generation and waste recycling, are expected to :revolutionize the capabilities of virtually every system comprising of future robotic and :human moon and mars exploration missions. A high-performance multiscale simulation platform, including the computational capabilities and resources of Columbia - the new supercomputer, is being developed to discover, validate, and prototype next generation (of such advanced materials. This exhibit will describe the porting and scaling of multiscale 'physics based core computer simulation codes for discovering and designing carbon nanotube-polymer composite materials for light-weight load bearing structural and 'thermal protection applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, D.L.; Kaufmann, H.E.
1978-01-01
During July 1977, fifty-one gravity stations were obtained in the Gerlach Extension Known Geothermal Resource Area and vicinity, northwestern Nevada. The gravity observations were made with a Worden gravimeter having a scale factor of about 0.5 milligal per division. No terrain corrections have been applied to these data. The earth tide correction was not used in drift reduction. The Geodetic Reference System 1967 formula (International Association of Geodesy, 1967) was used to compute theoretical gravity. Observed gravity is referenced to a base station in Gerlach, Nevada, having a value based on the Potsdam System of 1930. A density of 2.67more » g per cm/sup 3/ was used in computing the Bouguer anomaly.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, D.L.; Kaufmann, H.E.
1978-01-01
During July 1977, forty-four gravity stations were obtained in the Fly Ranch Extension Known Geothermal Resource Area and vicinity, northwestern Nevada. The gravity observations were made with a Worden gravimeter having a scale factor of about 0.5 milligal per division. No terrain corrections have been applied to these data. The earth tide correction was not used in drift reduction. The Geodetic Reference System 1967 formula (International Association of Geodesy, 1967) was used to compute theoretical gravity. Observed gravity is referenced to a base station in Gerlach, Nevada, having a value based on the Potsdam System of 1930 (fig. 1). Amore » density of 2.67 g per cm/sup 3/ was used in computing the Bouguer anomaly.« less
Simulating Biomass Fast Pyrolysis at the Single Particle Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ciesielski, Peter; Wiggins, Gavin; Daw, C Stuart
2017-07-01
Simulating fast pyrolysis at the scale of single particles allows for the investigation of the impacts of feedstock-specific parameters such as particle size, shape, and species of origin. For this reason particle-scale modeling has emerged as an important tool for understanding how variations in feedstock properties affect the outcomes of pyrolysis processes. The origins of feedstock properties are largely dictated by the composition and hierarchical structure of biomass, from the microstructural porosity to the external morphology of milled particles. These properties may be accounted for in simulations of fast pyrolysis by several different computational approaches depending on the level ofmore » structural and chemical complexity included in the model. The predictive utility of particle-scale simulations of fast pyrolysis can still be enhanced substantially by advancements in several areas. Most notably, considerable progress would be facilitated by the development of pyrolysis kinetic schemes that are decoupled from transport phenomena, predict product evolution from whole-biomass with increased chemical speciation, and are still tractable with present-day computational resources.« less
An Implicit Solver on A Parallel Block-Structured Adaptive Mesh Grid for FLASH
NASA Astrophysics Data System (ADS)
Lee, D.; Gopal, S.; Mohapatra, P.
2012-07-01
We introduce a fully implicit solver for FLASH based on a Jacobian-Free Newton-Krylov (JFNK) approach with an appropriate preconditioner. The main goal of developing this JFNK-type implicit solver is to provide efficient high-order numerical algorithms and methodology for simulating stiff systems of differential equations on large-scale parallel computer architectures. A large number of natural problems in nonlinear physics involve a wide range of spatial and time scales of interest. A system that encompasses such a wide magnitude of scales is described as "stiff." A stiff system can arise in many different fields of physics, including fluid dynamics/aerodynamics, laboratory/space plasma physics, low Mach number flows, reactive flows, radiation hydrodynamics, and geophysical flows. One of the big challenges in solving such a stiff system using current-day computational resources lies in resolving time and length scales varying by several orders of magnitude. We introduce FLASH's preliminary implementation of a time-accurate JFNK-based implicit solver in the framework of FLASH's unsplit hydro solver.
Wake characteristics of wind turbines in utility-scale wind farms
NASA Astrophysics Data System (ADS)
Yang, Xiaolei; Foti, Daniel; Sotiropoulos, Fotis
2017-11-01
The dynamics of turbine wakes is affected by turbine operating conditions, ambient atmospheric turbulent flows, and wakes from upwind turbines. Investigations of the wake from a single turbine have been extensively carried out in the literature. Studies on the wake dynamics in utility-scale wind farms are relatively limited. In this work, we employ large-eddy simulation with an actuator surface or actuator line model for turbine blades to investigate the wake dynamics in utility-scale wind farms. Simulations of three wind farms, i.e., the Horns Rev wind farm in Denmark, Pleasant Valley wind farm in Minnesota, and the Vantage wind farm in Washington are carried out. The computed power shows a good agreement with measurements. Analysis of the wake dynamics in the three wind farms is underway and will be presented in the conference. This work was support by Xcel Energy (RD4-13). The computational resources were provided by National Renewable Energy Laboratory.
Single Image Super-Resolution Based on Multi-Scale Competitive Convolutional Neural Network
Qu, Xiaobo; He, Yifan
2018-01-01
Deep convolutional neural networks (CNNs) are successful in single-image super-resolution. Traditional CNNs are limited to exploit multi-scale contextual information for image reconstruction due to the fixed convolutional kernel in their building modules. To restore various scales of image details, we enhance the multi-scale inference capability of CNNs by introducing competition among multi-scale convolutional filters, and build up a shallow network under limited computational resources. The proposed network has the following two advantages: (1) the multi-scale convolutional kernel provides the multi-context for image super-resolution, and (2) the maximum competitive strategy adaptively chooses the optimal scale of information for image reconstruction. Our experimental results on image super-resolution show that the performance of the proposed network outperforms the state-of-the-art methods. PMID:29509666
Single Image Super-Resolution Based on Multi-Scale Competitive Convolutional Neural Network.
Du, Xiaofeng; Qu, Xiaobo; He, Yifan; Guo, Di
2018-03-06
Deep convolutional neural networks (CNNs) are successful in single-image super-resolution. Traditional CNNs are limited to exploit multi-scale contextual information for image reconstruction due to the fixed convolutional kernel in their building modules. To restore various scales of image details, we enhance the multi-scale inference capability of CNNs by introducing competition among multi-scale convolutional filters, and build up a shallow network under limited computational resources. The proposed network has the following two advantages: (1) the multi-scale convolutional kernel provides the multi-context for image super-resolution, and (2) the maximum competitive strategy adaptively chooses the optimal scale of information for image reconstruction. Our experimental results on image super-resolution show that the performance of the proposed network outperforms the state-of-the-art methods.
Improving Design Efficiency for Large-Scale Heterogeneous Circuits
NASA Astrophysics Data System (ADS)
Gregerson, Anthony
Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
SAD5 Stereo Correlation Line-Striping in an FPGA
NASA Technical Reports Server (NTRS)
Villalpando, Carlos Y.; Morfopoulos, Arin C.
2011-01-01
High precision SAD5 stereo computations can be performed in an FPGA (field-programmable gate array) at much higher speeds than possible in a conventional CPU (central processing unit), but this uses large amounts of FPGA resources that scale with image size. Of the two key resources in an FPGA, Slices and BRAM (block RAM), Slices scale linearly in the new algorithm with image size, and BRAM scales quadratically with image size. An approach was developed to trade latency for BRAM by sub-windowing the image vertically into overlapping strips and stitching the outputs together to create a single continuous disparity output. In stereo, the general rule of thumb is that the disparity search range must be 1/10 the image size. In the new algorithm, BRAM usage scales linearly with disparity search range and scales again linearly with line width. So a doubling of image size, say from 640 to 1,280, would in the previous design be an effective 4 of BRAM usage: 2 for line width, 2 again for disparity search range. The minimum strip size is twice the search range, and will produce an output strip width equal to the disparity search range. So assuming a disparity search range of 1/10 image width, 10 sequential runs of the minimum strip size would produce a full output image. This approach allowed the innovators to fit 1280 960 wide SAD5 stereo disparity in less than 80 BRAM, 52k Slices on a Virtex 5LX330T, 25% and 24% of resources, respectively. Using a 100-MHz clock, this build would perform stereo at 39 Hz. Of particular interest to JPL is that there is a flight qualified version of the Virtex 5: this could produce stereo results even for very large image sizes at 3 orders of magnitude faster than could be computed on the PowerPC 750 flight computer. The work covered in the report allows the stereo algorithm to run on much larger images than before, and using much less BRAM. This opens up choices for a smaller flight FPGA (which saves power and space), or for other algorithms in addition to SAD5 to be run on the same FPGA.
GapMap: Enabling Comprehensive Autism Resource Epidemiology
Albert, Nikhila; Schwartz, Jessey; Du, Michael
2017-01-01
Background For individuals with autism spectrum disorder (ASD), finding resources can be a lengthy and difficult process. The difficulty in obtaining global, fine-grained autism epidemiological data hinders researchers from quickly and efficiently studying large-scale correlations among ASD, environmental factors, and geographical and cultural factors. Objective The objective of this study was to define resource load and resource availability for families affected by autism and subsequently create a platform to enable a more accurate representation of prevalence rates and resource epidemiology. Methods We created a mobile application, GapMap, to collect locational, diagnostic, and resource use information from individuals with autism to compute accurate prevalence rates and better understand autism resource epidemiology. GapMap is hosted on AWS S3, running on a React and Redux front-end framework. The backend framework is comprised of an AWS API Gateway and Lambda Function setup, with secure and scalable end points for retrieving prevalence and resource data, and for submitting participant data. Measures of autism resource scarcity, including resource load, resource availability, and resource gaps were defined and preliminarily computed using simulated or scraped data. Results The average distance from an individual in the United States to the nearest diagnostic center is approximately 182 km (50 miles), with a standard deviation of 235 km (146 miles). The average distance from an individual with ASD to the nearest diagnostic center, however, is only 32 km (20 miles), suggesting that individuals who live closer to diagnostic services are more likely to be diagnosed. Conclusions This study confirmed that individuals closer to diagnostic services are more likely to be diagnosed and proposes GapMap, a means to measure and enable the alleviation of increasingly overburdened diagnostic centers and resource-poor areas where parents are unable to diagnose their children as quickly and easily as needed. GapMap will collect information that will provide more accurate data for computing resource loads and availability, uncovering the impact of resource epidemiology on age and likelihood of diagnosis, and gathering localized autism prevalence rates. PMID:28473303
GapMap: Enabling Comprehensive Autism Resource Epidemiology.
Albert, Nikhila; Daniels, Jena; Schwartz, Jessey; Du, Michael; Wall, Dennis P
2017-05-04
For individuals with autism spectrum disorder (ASD), finding resources can be a lengthy and difficult process. The difficulty in obtaining global, fine-grained autism epidemiological data hinders researchers from quickly and efficiently studying large-scale correlations among ASD, environmental factors, and geographical and cultural factors. The objective of this study was to define resource load and resource availability for families affected by autism and subsequently create a platform to enable a more accurate representation of prevalence rates and resource epidemiology. We created a mobile application, GapMap, to collect locational, diagnostic, and resource use information from individuals with autism to compute accurate prevalence rates and better understand autism resource epidemiology. GapMap is hosted on AWS S3, running on a React and Redux front-end framework. The backend framework is comprised of an AWS API Gateway and Lambda Function setup, with secure and scalable end points for retrieving prevalence and resource data, and for submitting participant data. Measures of autism resource scarcity, including resource load, resource availability, and resource gaps were defined and preliminarily computed using simulated or scraped data. The average distance from an individual in the United States to the nearest diagnostic center is approximately 182 km (50 miles), with a standard deviation of 235 km (146 miles). The average distance from an individual with ASD to the nearest diagnostic center, however, is only 32 km (20 miles), suggesting that individuals who live closer to diagnostic services are more likely to be diagnosed. This study confirmed that individuals closer to diagnostic services are more likely to be diagnosed and proposes GapMap, a means to measure and enable the alleviation of increasingly overburdened diagnostic centers and resource-poor areas where parents are unable to diagnose their children as quickly and easily as needed. GapMap will collect information that will provide more accurate data for computing resource loads and availability, uncovering the impact of resource epidemiology on age and likelihood of diagnosis, and gathering localized autism prevalence rates. ©Nikhila Albert, Jena Daniels, Jessey Schwartz, Michael Du, Dennis P Wall. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 04.05.2017.
Information Power Grid (IPG) Tutorial 2003
NASA Technical Reports Server (NTRS)
Meyers, George
2003-01-01
For NASA and the general community today Grid middleware: a) provides tools to access/use data sources (databases, instruments, ...); b) provides tools to access computing (unique and generic); c) Is an enabler of large scale collaboration. Dynamically responding to needs is a key selling point of a grid. Independent resources can be joined as appropriate to solve a problem. Provide tools to enable the building of a frameworks for application. Provide value added service to the NASA user base for utilizing resources on the grid in new and more efficient ways. Provides tools for development of Frameworks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, D.L.; Kaufmann, H.E.
1978-01-01
During July 1977, forty-nine gravity stations were obtained in the Double Hot Springs Known Geothermal Resource Area and vicinity, northwestern Nevada. The gravity observations were made with a Worden gravimeter having a scale factor of about 0.5 milligal per division. No terrain corrections have been applied to these data. The earth tide correction was not used in drift reduction. The Geodetic Reference System 1967 formula (International Association of Geodesy, 1967) was used to compute theoretical gravity.
Application-Level Interoperability Across Grids and Clouds
NASA Astrophysics Data System (ADS)
Jha, Shantenu; Luckow, Andre; Merzky, Andre; Erdely, Miklos; Sehgal, Saurabh
Application-level interoperability is defined as the ability of an application to utilize multiple distributed heterogeneous resources. Such interoperability is becoming increasingly important with increasing volumes of data, multiple sources of data as well as resource types. The primary aim of this chapter is to understand different ways in which application-level interoperability can be provided across distributed infrastructure. We achieve this by (i) using the canonical wordcount application, based on an enhanced version of MapReduce that scales-out across clusters, clouds, and HPC resources, (ii) establishing how SAGA enables the execution of wordcount application using MapReduce and other programming models such as Sphere concurrently, and (iii) demonstrating the scale-out of ensemble-based biomolecular simulations across multiple resources. We show user-level control of the relative placement of compute and data and also provide simple performance measures and analysis of SAGA-MapReduce when using multiple, different, heterogeneous infrastructures concurrently for the same problem instance. Finally, we discuss Azure and some of the system-level abstractions that it provides and show how it is used to support ensemble-based biomolecular simulations.
Eppig, Janan T
2017-07-01
The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. © The Author 2017. Published by Oxford University Press.
Eppig, Janan T.
2017-01-01
Abstract The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. PMID:28838066
Multipartite entanglement verification resistant against dishonest parties.
Pappa, Anna; Chailloux, André; Wehner, Stephanie; Diamanti, Eleni; Kerenidis, Iordanis
2012-06-29
Future quantum information networks will consist of quantum and classical agents, who have the ability to communicate in a variety of ways with trusted and untrusted parties and securely delegate computational tasks to untrusted large-scale quantum computing servers. Multipartite quantum entanglement is a fundamental resource for such a network and, hence, it is imperative to study the possibility of verifying a multipartite entanglement source in a way that is efficient and provides strong guarantees even in the presence of multiple dishonest parties. In this Letter, we show how an agent of a quantum network can perform a distributed verification of a source creating multipartite Greenberger-Horne-Zeilinger (GHZ) states with minimal resources, which is, nevertheless, resistant against any number of dishonest parties. Moreover, we provide a tight tradeoff between the level of security and the distance between the state produced by the source and the ideal GHZ state. Last, by adding the resource of a trusted common random source, we can further provide security guarantees for all honest parties in the quantum network simultaneously.
Are Cloud Environments Ready for Scientific Applications?
NASA Astrophysics Data System (ADS)
Mehrotra, P.; Shackleford, K.
2011-12-01
Cloud computing environments are becoming widely available both in the commercial and government sectors. They provide flexibility to rapidly provision resources in order to meet dynamic and changing computational needs without the customers incurring capital expenses and/or requiring technical expertise. Clouds also provide reliable access to resources even though the end-user may not have in-house expertise for acquiring or operating such resources. Consolidation and pooling in a cloud environment allow organizations to achieve economies of scale in provisioning or procuring computing resources and services. Because of these and other benefits, many businesses and organizations are migrating their business applications (e.g., websites, social media, and business processes) to cloud environments-evidenced by the commercial success of offerings such as the Amazon EC2. In this paper, we focus on the feasibility of utilizing cloud environments for scientific workloads and workflows particularly of interest to NASA scientists and engineers. There is a wide spectrum of such technical computations. These applications range from small workstation-level computations to mid-range computing requiring small clusters to high-performance simulations requiring supercomputing systems with high bandwidth/low latency interconnects. Data-centric applications manage and manipulate large data sets such as satellite observational data and/or data previously produced by high-fidelity modeling and simulation computations. Most of the applications are run in batch mode with static resource requirements. However, there do exist situations that have dynamic demands, particularly ones with public-facing interfaces providing information to the general public, collaborators and partners, as well as to internal NASA users. In the last few months we have been studying the suitability of cloud environments for NASA's technical and scientific workloads. We have ported several applications to multiple cloud environments including NASA's Nebula environment, Amazon's EC2, Magellan at NERSC, and SGI's Cyclone system. We critically examined the performance of the applications on these systems. We also collected information on the usability of these cloud environments. In this talk we will present the results of our study focusing on the efficacy of using clouds for NASA's scientific applications.
Advanced computations in plasma physics
NASA Astrophysics Data System (ADS)
Tang, W. M.
2002-05-01
Scientific simulation in tandem with theory and experiment is an essential tool for understanding complex plasma behavior. In this paper we review recent progress and future directions for advanced simulations in magnetically confined plasmas with illustrative examples chosen from magnetic confinement research areas such as microturbulence, magnetohydrodynamics, magnetic reconnection, and others. Significant recent progress has been made in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics, giving increasingly good agreement between experimental observations and computational modeling. This was made possible by innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales together with access to powerful new computational resources. In particular, the fusion energy science community has made excellent progress in developing advanced codes for which computer run-time and problem size scale well with the number of processors on massively parallel machines (MPP's). A good example is the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPP's to produce three-dimensional, general geometry, nonlinear particle simulations which have accelerated progress in understanding the nature of turbulence self-regulation by zonal flows. It should be emphasized that these calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In general, results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. The associated scientific excitement should serve to stimulate improved cross-cutting collaborations with other fields and also to help attract bright young talent to plasma science.
Prospects for improving the representation of coastal and shelf seas in global ocean models
NASA Astrophysics Data System (ADS)
Holt, Jason; Hyder, Patrick; Ashworth, Mike; Harle, James; Hewitt, Helene T.; Liu, Hedong; New, Adrian L.; Pickles, Stephen; Porter, Andrew; Popova, Ekaterina; Icarus Allen, J.; Siddorn, John; Wood, Richard
2017-02-01
Accurately representing coastal and shelf seas in global ocean models represents one of the grand challenges of Earth system science. They are regions of immense societal importance through the goods and services they provide, hazards they pose and their role in global-scale processes and cycles, e.g. carbon fluxes and dense water formation. However, they are poorly represented in the current generation of global ocean models. In this contribution, we aim to briefly characterise the problem, and then to identify the important physical processes, and their scales, needed to address this issue in the context of the options available to resolve these scales globally and the evolving computational landscape.We find barotropic and topographic scales are well resolved by the current state-of-the-art model resolutions, e.g. nominal 1/12°, and still reasonably well resolved at 1/4°; here, the focus is on process representation. We identify tides, vertical coordinates, river inflows and mixing schemes as four areas where modelling approaches can readily be transferred from regional to global modelling with substantial benefit. In terms of finer-scale processes, we find that a 1/12° global model resolves the first baroclinic Rossby radius for only ˜ 8 % of regions < 500 m deep, but this increases to ˜ 70 % for a 1/72° model, so resolving scales globally requires substantially finer resolution than the current state of the art.We quantify the benefit of improved resolution and process representation using 1/12° global- and basin-scale northern North Atlantic nucleus for a European model of the ocean (NEMO) simulations; the latter includes tides and a k-ɛ vertical mixing scheme. These are compared with global stratification observations and 19 models from CMIP5. In terms of correlation and basin-wide rms error, the high-resolution models outperform all these CMIP5 models. The model with tides shows improved seasonal cycles compared to the high-resolution model without tides. The benefits of resolution are particularly apparent in eastern boundary upwelling zones.To explore the balance between the size of a globally refined model and that of multiscale modelling options (e.g. finite element, finite volume or a two-way nesting approach), we consider a simple scale analysis and a conceptual grid refining approach. We put this analysis in the context of evolving computer systems, discussing model turnaround time, scalability and resource costs. Using a simple cost model compared to a reference configuration (taken to be a 1/4° global model in 2011) and the increasing performance of the UK Research Councils' computer facility, we estimate an unstructured mesh multiscale approach, resolving process scales down to 1.5 km, would use a comparable share of the computer resource by 2021, the two-way nested multiscale approach by 2022, and a 1/72° global model by 2026. However, we also note that a 1/12° global model would not have a comparable computational cost to a 1° global model in 2017 until 2027. Hence, we conclude that for computationally expensive models (e.g. for oceanographic research or operational oceanography), resolving scales to ˜ 1.5 km would be routinely practical in about a decade given substantial effort on numerical and computational development. For complex Earth system models, this extends to about 2 decades, suggesting the focus here needs to be on improved process parameterisation to meet these challenges.
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2016-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Chase Qishi; Zhu, Michelle Mengxia
The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less
Recommendations for open data science.
Gymrek, Melissa; Farjoun, Yossi
2016-01-01
Life science research increasingly relies on large-scale computational analyses. However, the code and data used for these analyses are often lacking in publications. To maximize scientific impact, reproducibility, and reuse, it is crucial that these resources are made publicly available and are fully transparent. We provide recommendations for improving the openness of data-driven studies in life sciences.
Karanovic, Marinko; Muffels, Christopher T.; Tonkin, Matthew J.; Hunt, Randall J.
2012-01-01
Models of environmental systems have become increasingly complex, incorporating increasingly large numbers of parameters in an effort to represent physical processes on a scale approaching that at which they occur in nature. Consequently, the inverse problem of parameter estimation (specifically, model calibration) and subsequent uncertainty analysis have become increasingly computation-intensive endeavors. Fortunately, advances in computing have made computational power equivalent to that of dozens to hundreds of desktop computers accessible through a variety of alternate means: modelers have various possibilities, ranging from traditional Local Area Networks (LANs) to cloud computing. Commonly used parameter estimation software is well suited to take advantage of the availability of such increased computing power. Unfortunately, logistical issues become increasingly important as an increasing number and variety of computers are brought to bear on the inverse problem. To facilitate efficient access to disparate computer resources, the PESTCommander program documented herein has been developed to provide a Graphical User Interface (GUI) that facilitates the management of model files ("file management") and remote launching and termination of "slave" computers across a distributed network of computers ("run management"). In version 1.0 described here, PESTCommander can access and ascertain resources across traditional Windows LANs: however, the architecture of PESTCommander has been developed with the intent that future releases will be able to access computing resources (1) via trusted domains established in Wide Area Networks (WANs) in multiple remote locations and (2) via heterogeneous networks of Windows- and Unix-based operating systems. The design of PESTCommander also makes it suitable for extension to other computational resources, such as those that are available via cloud computing. Version 1.0 of PESTCommander was developed primarily to work with the parameter estimation software PEST; the discussion presented in this report focuses on the use of the PESTCommander together with Parallel PEST. However, PESTCommander can be used with a wide variety of programs and models that require management, distribution, and cleanup of files before or after model execution. In addition to its use with the Parallel PEST program suite, discussion is also included in this report regarding the use of PESTCommander with the Global Run Manager GENIE, which was developed simultaneously with PESTCommander.
Large Data at Small Universities: Astronomical processing using a computer classroom
NASA Astrophysics Data System (ADS)
Fuller, Nathaniel James; Clarkson, William I.; Fluharty, Bill; Belanger, Zach; Dage, Kristen
2016-06-01
The use of large computing clusters for astronomy research is becoming more commonplace as datasets expand, but access to these required resources is sometimes difficult for research groups working at smaller Universities. As an alternative to purchasing processing time on an off-site computing cluster, or purchasing dedicated hardware, we show how one can easily build a crude on-site cluster by utilizing idle cycles on instructional computers in computer-lab classrooms. Since these computers are maintained as part of the educational mission of the University, the resource impact on the investigator is generally low.By using open source Python routines, it is possible to have a large number of desktop computers working together via a local network to sort through large data sets. By running traditional analysis routines in an “embarrassingly parallel” manner, gains in speed are accomplished without requiring the investigator to learn how to write routines using highly specialized methodology. We demonstrate this concept here applied to 1. photometry of large-format images and 2. Statistical significance-tests for X-ray lightcurve analysis. In these scenarios, we see a speed-up factor which scales almost linearly with the number of cores in the cluster. Additionally, we show that the usage of the cluster does not severely limit performance for a local user, and indeed the processing can be performed while the computers are in use for classroom purposes.
The Virtual Watershed Observatory: Cyberinfrastructure for Model-Data Integration and Access
NASA Astrophysics Data System (ADS)
Duffy, C.; Leonard, L. N.; Giles, L.; Bhatt, G.; Yu, X.
2011-12-01
The Virtual Watershed Observatory (VWO) is a concept where scientists, water managers, educators and the general public can create a virtual observatory from integrated hydrologic model results, national databases and historical or real-time observations via web services. In this paper, we propose a prototype for automated and virtualized web services software using national data products for climate reanalysis, soils, geology, terrain and land cover. The VWO has the broad purpose of making accessible water resource simulations, real-time data assimilation, calibration and archival at the scale of HUC 12 watersheds (Hydrologic Unit Code) anywhere in the continental US. Our prototype for model-data integration focuses on creating tools for fast data storage from selected national databases, as well as the computational resources necessary for a dynamic, distributed watershed simulation. The paper will describe cyberinfrastructure tools and workflow that attempts to resolve the problem of model-data accessibility and scalability such that individuals, research teams, managers and educators can create a WVO in a desired context. Examples are given for the NSF-funded Shale Hills Critical Zone Observatory and the European Critical Zone Observatories within the SoilTrEC project. In the future implementation of WVO services will benefit from the development of a cloud cyber infrastructure as the prototype evolves to data and model intensive computation for continental scale water resource predictions.
Reactive transport modeling in the subsurface environment with OGS-IPhreeqc
NASA Astrophysics Data System (ADS)
He, Wenkui; Beyer, Christof; Fleckenstein, Jan; Jang, Eunseon; Kalbacher, Thomas; Naumov, Dimitri; Shao, Haibing; Wang, Wenqing; Kolditz, Olaf
2015-04-01
Worldwide, sustainable water resource management becomes an increasingly challenging task due to the growth of population and extensive applications of fertilizer in agriculture. Moreover, climate change causes further stresses to both water quantity and quality. Reactive transport modeling in the coupled soil-aquifer system is a viable approach to assess the impacts of different land use and groundwater exploitation scenarios on the water resources. However, the application of this approach is usually limited in spatial scale and to simplified geochemical systems due to the huge computational expense involved. Such computational expense is not only caused by solving the high non-linearity of the initial boundary value problems of water flow in the unsaturated zone numerically with rather fine spatial and temporal discretization for the correct mass balance and numerical stability, but also by the intensive computational task of quantifying geochemical reactions. In the present study, a flexible and efficient tool for large scale reactive transport modeling in variably saturated porous media and its applications are presented. The open source scientific software OpenGeoSys (OGS) is coupled with the IPhreeqc module of the geochemical solver PHREEQC. The new coupling approach makes full use of advantages from both codes: OGS provides a flexible choice of different numerical approaches for simulation of water flow in the vadose zone such as the pressure-based or mixed forms of Richards equation; whereas the IPhreeqc module leads to a simplification of data storage and its communication with OGS, which greatly facilitates the coupling and code updating. Moreover, a parallelization scheme with MPI (Message Passing Interface) is applied, in which the computational task of water flow and mass transport is partitioned through domain decomposition, whereas the efficient parallelization of geochemical reactions is achieved by smart allocation of computational workload over multiple compute nodes. The plausibility of the new coupling is verified by several benchmark tests. In addition, the efficiency of the new coupling approach is demonstrated by its application in a large scale scenario, in which the environmental fate of pesticides in a complex soil-aquifer system is studied.
Reactive transport modeling in variably saturated porous media with OGS-IPhreeqc
NASA Astrophysics Data System (ADS)
He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kalbacher, T.; Shao, H.; Wang, W.; Kolditz, O.
2014-12-01
Worldwide, sustainable water resource management becomes an increasingly challenging task due to the growth of population and extensive applications of fertilizer in agriculture. Moreover, climate change causes further stresses to both water quantity and quality. Reactive transport modeling in the coupled soil-aquifer system is a viable approach to assess the impacts of different land use and groundwater exploitation scenarios on the water resources. However, the application of this approach is usually limited in spatial scale and to simplified geochemical systems due to the huge computational expense involved. Such computational expense is not only caused by solving the high non-linearity of the initial boundary value problems of water flow in the unsaturated zone numerically with rather fine spatial and temporal discretization for the correct mass balance and numerical stability, but also by the intensive computational task of quantifying geochemical reactions. In the present study, a flexible and efficient tool for large scale reactive transport modeling in variably saturated porous media and its applications are presented. The open source scientific software OpenGeoSys (OGS) is coupled with the IPhreeqc module of the geochemical solver PHREEQC. The new coupling approach makes full use of advantages from both codes: OGS provides a flexible choice of different numerical approaches for simulation of water flow in the vadose zone such as the pressure-based or mixed forms of Richards equation; whereas the IPhreeqc module leads to a simplification of data storage and its communication with OGS, which greatly facilitates the coupling and code updating. Moreover, a parallelization scheme with MPI (Message Passing Interface) is applied, in which the computational task of water flow and mass transport is partitioned through domain decomposition, whereas the efficient parallelization of geochemical reactions is achieved by smart allocation of computational workload over multiple compute nodes. The plausibility of the new coupling is verified by several benchmark tests. In addition, the efficiency of the new coupling approach is demonstrated by its application in a large scale scenario, in which the environmental fate of pesticides in a complex soil-aquifer system is studied.
The future of PanDA in ATLAS distributed computing
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.
2015-12-01
Experiments at the Large Hadron Collider (LHC) face unprecedented computing challenges. Heterogeneous resources are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, while data processing requires more than a few billion hours of computing usage per year. The PanDA (Production and Distributed Analysis) system was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. In the process, the old batch job paradigm of locally managed computing in HEP was discarded in favour of a far more automated, flexible and scalable model. The success of PanDA in ATLAS is leading to widespread adoption and testing by other experiments. PanDA is the first exascale workload management system in HEP, already operating at more than a million computing jobs per day, and processing over an exabyte of data in 2013. There are many new challenges that PanDA will face in the near future, in addition to new challenges of scale, heterogeneity and increasing user base. PanDA will need to handle rapidly changing computing infrastructure, will require factorization of code for easier deployment, will need to incorporate additional information sources including network metrics in decision making, be able to control network circuits, handle dynamically sized workload processing, provide improved visualization, and face many other challenges. In this talk we will focus on the new features, planned or recently implemented, that are relevant to the next decade of distributed computing workload management using PanDA.
Stability and Scalability of the CMS Global Pool: Pushing HTCondor and GlideinWMS to New Limits
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balcas, J.; Bockelman, B.; Hufnagel, D.
The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. The total resources at Tier-1 and Tier-2 grid sites pledged to CMS exceed 100,000 CPU cores, while another 50,000 to 100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. These resources are becoming more diverse in their accessibility and configuration over time. Furthermore, the challenge of stably running at higher and higher scales while introducing new modes of operation such asmore » multi-core pilots, as well as the chaotic nature of physics analysis workflows, places huge strains on the submission infrastructure. This paper details some of the most important challenges to scalability and stability that the CMS Global Pool has faced since the beginning of the LHC Run II and how they were overcome.« less
Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits
NASA Astrophysics Data System (ADS)
Balcas, J.; Bockelman, B.; Hufnagel, D.; Hurtado Anampa, K.; Aftab Khan, F.; Larson, K.; Letts, J.; Marra da Silva, J.; Mascheroni, M.; Mason, D.; Perez-Calero Yzquierdo, A.; Tiradani, A.
2017-10-01
The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. The total resources at Tier-1 and Tier-2 grid sites pledged to CMS exceed 100,000 CPU cores, while another 50,000 to 100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. These resources are becoming more diverse in their accessibility and configuration over time. Furthermore, the challenge of stably running at higher and higher scales while introducing new modes of operation such as multi-core pilots, as well as the chaotic nature of physics analysis workflows, places huge strains on the submission infrastructure. This paper details some of the most important challenges to scalability and stability that the CMS Global Pool has faced since the beginning of the LHC Run II and how they were overcome.
The economics of time shared computing: Congestion, user costs and capacity
NASA Technical Reports Server (NTRS)
Agnew, C. E.
1982-01-01
Time shared systems permit the fixed costs of computing resources to be spread over large numbers of users. However, bottleneck results in the theory of closed queueing networks can be used to show that this economy of scale will be offset by the increased congestion that results as more users are added to the system. If one considers the total costs, including the congestion cost, there is an optimal number of users for a system which equals the saturation value usually used to define system capacity.
The JASMIN Cloud: specialised and hybrid to meet the needs of the Environmental Sciences Community
NASA Astrophysics Data System (ADS)
Kershaw, Philip; Lawrence, Bryan; Churchill, Jonathan; Pritchard, Matt
2014-05-01
Cloud computing provides enormous opportunities for the research community. The large public cloud providers provide near-limitless scaling capability. However, adapting Cloud to scientific workloads is not without its problems. The commodity nature of the public cloud infrastructure can be at odds with the specialist requirements of the research community. Issues such as trust, ownership of data, WAN bandwidth and costing models make additional barriers to more widespread adoption. Alongside the application of public cloud for scientific applications, a number of private cloud initiatives are underway in the research community of which the JASMIN Cloud is one example. Here, cloud service models are being effectively super-imposed over more established services such as data centres, compute cluster facilities and Grids. These have the potential to deliver the specialist infrastructure needed for the science community coupled with the benefits of a Cloud service model. The JASMIN facility based at the Rutherford Appleton Laboratory was established in 2012 to support the data analysis requirements of the climate and Earth Observation community. In its first year of operation, the 5PB of available storage capacity was filled and the hosted compute capability used extensively. JASMIN has modelled the concept of a centralised large-volume data analysis facility. Key characteristics have enabled success: peta-scale fast disk connected via low latency networks to compute resources and the use of virtualisation for effective management of the resources for a range of users. A second phase is now underway funded through NERC's (Natural Environment Research Council) Big Data initiative. This will see significant expansion to the resources available with a doubling of disk-based storage to 12PB and an increase of compute capacity by a factor of ten to over 3000 processing cores. This expansion is accompanied by a broadening in the scope for JASMIN, as a service available to the entire UK environmental science community. Experience with the first phase demonstrated the range of user needs. A trade-off is needed between access privileges to resources, flexibility of use and security. This has influenced the form and types of service under development for the new phase. JASMIN will deploy a specialised private cloud organised into "Managed" and "Unmanaged" components. In the Managed Cloud, users have direct access to the storage and compute resources for optimal performance but for reasons of security, via a more restrictive PaaS (Platform-as-a-Service) interface. The Unmanaged Cloud is deployed in an isolated part of the network but co-located with the rest of the infrastructure. This enables greater liberty to tenants - full IaaS (Infrastructure-as-a-Service) capability to provision customised infrastructure - whilst at the same time protecting more sensitive parts of the system from direct access using these elevated privileges. The private cloud will be augmented with cloud-bursting capability so that it can exploit the resources available from public clouds, making it effectively a hybrid solution. A single interface will overlay the functionality of both the private cloud and external interfaces to public cloud providers giving users the flexibility to migrate resources between infrastructures as requirements dictate.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davidson, George S.; Brown, William Michael
2007-09-01
Techniques for high throughput determinations of interactomes, together with high resolution protein collocalizations maps within organelles and through membranes will soon create a vast resource. With these data, biological descriptions, akin to the high dimensional phase spaces familiar to physicists, will become possible. These descriptions will capture sufficient information to make possible realistic, system-level models of cells. The descriptions and the computational models they enable will require powerful computing techniques. This report is offered as a call to the computational biology community to begin thinking at this scale and as a challenge to develop the required algorithms and codes tomore » make use of the new data.3« less
Multicore job scheduling in the Worldwide LHC Computing Grid
NASA Astrophysics Data System (ADS)
Forti, A.; Pérez-Calero Yzquierdo, A.; Hartmann, T.; Alef, M.; Lahiff, A.; Templon, J.; Dal Pra, S.; Gila, M.; Skipsey, S.; Acosta-Silva, C.; Filipcic, A.; Walker, R.; Walker, C. J.; Traynor, D.; Gadrat, S.
2015-12-01
After the successful first run of the LHC, data taking is scheduled to restart in Summer 2015 with experimental conditions leading to increased data volumes and event complexity. In order to process the data generated in such scenario and exploit the multicore architectures of current CPUs, the LHC experiments have developed parallelized software for data reconstruction and simulation. However, a good fraction of their computing effort is still expected to be executed as single-core tasks. Therefore, jobs with diverse resources requirements will be distributed across the Worldwide LHC Computing Grid (WLCG), making workload scheduling a complex problem in itself. In response to this challenge, the WLCG Multicore Deployment Task Force has been created in order to coordinate the joint effort from experiments and WLCG sites. The main objective is to ensure the convergence of approaches from the different LHC Virtual Organizations (VOs) to make the best use of the shared resources in order to satisfy their new computing needs, minimizing any inefficiency originated from the scheduling mechanisms, and without imposing unnecessary complexities in the way sites manage their resources. This paper describes the activities and progress of the Task Force related to the aforementioned topics, including experiences from key sites on how to best use different batch system technologies, the evolution of workload submission tools by the experiments and the knowledge gained from scale tests of the different proposed job submission strategies.
NASA Astrophysics Data System (ADS)
Berzano, D.; Blomer, J.; Buncic, P.; Charalampidis, I.; Ganis, G.; Meusel, R.
2015-12-01
Cloud resources nowadays contribute an essential share of resources for computing in high-energy physics. Such resources can be either provided by private or public IaaS clouds (e.g. OpenStack, Amazon EC2, Google Compute Engine) or by volunteers computers (e.g. LHC@Home 2.0). In any case, experiments need to prepare a virtual machine image that provides the execution environment for the physics application at hand. The CernVM virtual machine since version 3 is a minimal and versatile virtual machine image capable of booting different operating systems. The virtual machine image is less than 20 megabyte in size. The actual operating system is delivered on demand by the CernVM File System. CernVM 3 has matured from a prototype to a production environment. It is used, for instance, to run LHC applications in the cloud, to tune event generators using a network of volunteer computers, and as a container for the historic Scientific Linux 5 and Scientific Linux 4 based software environments in the course of long-term data preservation efforts of the ALICE, CMS, and ALEPH experiments. We present experience and lessons learned from the use of CernVM at scale. We also provide an outlook on the upcoming developments. These developments include adding support for Scientific Linux 7, the use of container virtualization, such as provided by Docker, and the streamlining of virtual machine contextualization towards the cloud-init industry standard.
The Scalable Checkpoint/Restart Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A.
The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
,
1992-01-01
US GeoData tapes are computer tapes which contain cartographic data in digital form. The 1:2,000,000-scale data are available in two forms. The graphic form can be used to generate computer-plotted maps. The content and scale of the maps can be varied to meet your needs. The topologically-structured form of US GeoData is suitable for input to geographic information systems for use in spatial analysis and geographic studies. Both forms must be used in conjunction with appropriate software. US GeoData tapes offer convenience, accuracy, flexibility, and cost effectiveness to many map users. Business, industry, and government users who are involved in network planning and analysis, transportation, demography, land use, or any activity where data can be related to, or plotted on a map will find US GeoData a valuable resource.
TOPICAL REVIEW: Advances and challenges in computational plasma science
NASA Astrophysics Data System (ADS)
Tang, W. M.; Chan, V. S.
2005-02-01
Scientific simulation, which provides a natural bridge between theory and experiment, is an essential tool for understanding complex plasma behaviour. Recent advances in simulations of magnetically confined plasmas are reviewed in this paper, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics. Progress has been stimulated, in particular, by the exponential growth of computer speed along with significant improvements in computer technology. The advances in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics have produced increasingly good agreement between experimental observations and computational modelling. This was enabled by two key factors: (a) innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales and (b) access to powerful new computational resources. Excellent progress has been made in developing codes for which computer run-time and problem-size scale well with the number of processors on massively parallel processors (MPPs). Examples include the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPPs to produce three-dimensional, general geometry, nonlinear particle simulations that have accelerated advances in understanding the nature of turbulence self-regulation by zonal flows. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In looking towards the future, the current results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. This should produce the scientific excitement which will help to (a) stimulate enhanced cross-cutting collaborations with other fields and (b) attract the bright young talent needed for the future health of the field of plasma science.
Advances and challenges in computational plasma science
NASA Astrophysics Data System (ADS)
Tang, W. M.
2005-02-01
Scientific simulation, which provides a natural bridge between theory and experiment, is an essential tool for understanding complex plasma behaviour. Recent advances in simulations of magnetically confined plasmas are reviewed in this paper, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics. Progress has been stimulated, in particular, by the exponential growth of computer speed along with significant improvements in computer technology. The advances in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics have produced increasingly good agreement between experimental observations and computational modelling. This was enabled by two key factors: (a) innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales and (b) access to powerful new computational resources. Excellent progress has been made in developing codes for which computer run-time and problem-size scale well with the number of processors on massively parallel processors (MPPs). Examples include the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPPs to produce three-dimensional, general geometry, nonlinear particle simulations that have accelerated advances in understanding the nature of turbulence self-regulation by zonal flows. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In looking towards the future, the current results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. This should produce the scientific excitement which will help to (a) stimulate enhanced cross-cutting collaborations with other fields and (b) attract the bright young talent needed for the future health of the field of plasma science.
NASA Astrophysics Data System (ADS)
Candela, S. G.; Howat, I.; Noh, M. J.; Porter, C. C.; Morin, P. J.
2016-12-01
In the last decade, high resolution satellite imagery has become an increasingly accessible tool for geoscientists to quantify changes in the Arctic land surface due to geophysical, ecological and anthropomorphic processes. However, the trade off between spatial coverage and spatial-temporal resolution has limited detailed, process-level change detection over large (i.e. continental) scales. The ArcticDEM project utilized over 300,000 Worldview image pairs to produce a nearly 100% coverage elevation model (above 60°N) offering the first polar, high spatial - high resolution (2-8m by region) dataset, often with multiple repeats in areas of particular interest to geo-scientists. A dataset of this size (nearly 250 TB) offers endless new avenues of scientific inquiry, but quickly becomes unmanageable computationally and logistically for the computing resources available to the average scientist. Here we present TopoDiff, a framework for a generalized. automated workflow that requires minimal input from the end user about a study site, and utilizes cloud computing resources to provide a temporally sorted and differenced dataset, ready for geostatistical analysis. This hands-off approach allows the end user to focus on the science, without having to manage thousands of files, or petabytes of data. At the same time, TopoDiff provides a consistent and accurate workflow for image sorting, selection, and co-registration enabling cross-comparisons between research projects.
Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A
2010-01-01
Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
Machine-learned and codified synthesis parameters of oxide materials
NASA Astrophysics Data System (ADS)
Kim, Edward; Huang, Kevin; Tomala, Alex; Matthews, Sara; Strubell, Emma; Saunders, Adam; McCallum, Andrew; Olivetti, Elsa
2017-09-01
Predictive materials design has rapidly accelerated in recent years with the advent of large-scale resources, such as materials structure and property databases generated by ab initio computations. In the absence of analogous ab initio frameworks for materials synthesis, high-throughput and machine learning techniques have recently been harnessed to generate synthesis strategies for select materials of interest. Still, a community-accessible, autonomously-compiled synthesis planning resource which spans across materials systems has not yet been developed. In this work, we present a collection of aggregated synthesis parameters computed using the text contained within over 640,000 journal articles using state-of-the-art natural language processing and machine learning techniques. We provide a dataset of synthesis parameters, compiled autonomously across 30 different oxide systems, in a format optimized for planning novel syntheses of materials.
Price schedules coordination for electricity pool markets
NASA Astrophysics Data System (ADS)
Legbedji, Alexis Motto
2002-04-01
We consider the optimal coordination of a class of mathematical programs with equilibrium constraints, which is formally interpreted as a resource-allocation problem. Many decomposition techniques were proposed to circumvent the difficulty of solving large systems with limited computer resources. The considerable improvement in computer architecture has allowed the solution of large-scale problems with increasing speed. Consequently, interest in decomposition techniques has waned. Nonetheless, there is an important class of applications for which decomposition techniques will still be relevant, among others, distributed systems---the Internet, perhaps, being the most conspicuous example---and competitive economic systems. Conceptually, a competitive economic system is a collection of agents that have similar or different objectives while sharing the same system resources. In theory, constructing a large-scale mathematical program and solving it centrally, using currently available computing power can optimize such systems of agents. In practice, however, because agents are self-interested and not willing to reveal some sensitive corporate data, one cannot solve these kinds of coordination problems by simply maximizing the sum of agent's objective functions with respect to their constraints. An iterative price decomposition or Lagrangian dual method is considered best suited because it can operate with limited information. A price-directed strategy, however, can only work successfully when coordinating or equilibrium prices exist, which is not generally the case when a weak duality is unavoidable. Showing when such prices exist and how to compute them is the main subject of this thesis. Among our results, we show that, if the Lagrangian function of a primal program is additively separable, price schedules coordination may be attained. The prices are Lagrange multipliers, and are also the decision variables of a dual program. In addition, we propose a new form of augmented or nonlinear pricing, which is an example of the use of penalty functions in mathematical programming. Applications are drawn from mathematical programming problems of the form arising in electric power system scheduling under competition.
CE-ACCE: The Cloud Enabled Advanced sCience Compute Environment
NASA Astrophysics Data System (ADS)
Cinquini, L.; Freeborn, D. J.; Hardman, S. H.; Wong, C.
2017-12-01
Traditionally, Earth Science data from NASA remote sensing instruments has been processed by building custom data processing pipelines (often based on a common workflow engine or framework) which are typically deployed and run on an internal cluster of computing resources. This approach has some intrinsic limitations: it requires each mission to develop and deploy a custom software package on top of the adopted framework; it makes use of dedicated hardware, network and storage resources, which must be specifically purchased, maintained and re-purposed at mission completion; and computing services cannot be scaled on demand beyond the capability of the available servers.More recently, the rise of Cloud computing, coupled with other advances in containerization technology (most prominently, Docker) and micro-services architecture, has enabled a new paradigm, whereby space mission data can be processed through standard system architectures, which can be seamlessly deployed and scaled on demand on either on-premise clusters, or commercial Cloud providers. In this talk, we will present one such architecture named CE-ACCE ("Cloud Enabled Advanced sCience Compute Environment"), which we have been developing at the NASA Jet Propulsion Laboratory over the past year. CE-ACCE is based on the Apache OODT ("Object Oriented Data Technology") suite of services for full data lifecycle management, which are turned into a composable array of Docker images, and complemented by a plug-in model for mission-specific customization. We have applied this infrastructure to both flying and upcoming NASA missions, such as ECOSTRESS and SMAP, and demonstrated deployment on the Amazon Cloud, either using simple EC2 instances, or advanced AWS services such as Amazon Lambda and ECS (EC2 Container Services).
Coupled basin-scale water resource models for arid and semiarid regions
NASA Astrophysics Data System (ADS)
Winter, C.; Springer, E.; Costigan, K.; Fasel, P.; Mniewski, S.; Zyvoloski, G.
2003-04-01
Managers of semi-arid and arid water resources must allocate increasingly variable surface sources and limited groundwater resources to growing demands. This challenge is leading to a new generation of detailed computational models that link multiple interacting sources and demands. We will discuss a new computational model of arid region hydrology that we are parameterizing for the upper Rio Grande Basin of the United States. The model consists of linked components for the atmosphere (the Regional Atmospheric Modeling System, RAMS), surface hydrology (the Los Alamos Distributed Hydrologic System, LADHS), and groundwater (the Finite Element Heat and Mass code, FEHM), and the couplings between them. The model runs under the Parallel Application WorkSpace software developed at Los Alamos for applications running on large distributed memory computers. RAMS simulates regional meteorology coupled to global climate data on the one hand and land surface hydrology on the other. LADHS generates runoff by infiltration or saturation excess mechanisms, as well as interception, evapotranspiration, and snow accumulation and melt. FEHM simulates variably saturated flow and heat transport in three dimensions. A key issue is to increase the components’ spatial and temporal resolution to account for changes in topography and other rapidly changing variables that affect results such as soil moisture distribution or groundwater recharge. Thus, RAMS’ smallest grid is 5 km on a side, LADHS uses 100 m spacing, while FEHM concentrates processing on key volumes by means of an unstructured grid. Couplings within our model are based on new scaling methods that link groundwater-groundwater systems and streams to aquifers and we are developing evapotranspiration methods based on detailed calculations of latent heat and vegetative cover. Simulations of precipitation and soil moisture for the 1992-93 El Nino year will be used to demonstrate the approach and suggest further needs.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Cellular trade-offs and optimal resource allocation during cyanobacterial diurnal growth
Knoop, Henning; Bockmayr, Alexander; Steuer, Ralf
2017-01-01
Cyanobacteria are an integral part of Earth’s biogeochemical cycles and a promising resource for the synthesis of renewable bioproducts from atmospheric CO2. Growth and metabolism of cyanobacteria are inherently tied to the diurnal rhythm of light availability. As yet, however, insight into the stoichiometric and energetic constraints of cyanobacterial diurnal growth is limited. Here, we develop a computational framework to investigate the optimal allocation of cellular resources during diurnal phototrophic growth using a genome-scale metabolic reconstruction of the cyanobacterium Synechococcus elongatus PCC 7942. We formulate phototrophic growth as an autocatalytic process and solve the resulting time-dependent resource allocation problem using constraint-based analysis. Based on a narrow and well-defined set of parameters, our approach results in an ab initio prediction of growth properties over a full diurnal cycle. The computational model allows us to study the optimality of metabolite partitioning during diurnal growth. The cyclic pattern of glycogen accumulation, an emergent property of the model, has timing characteristics that are in qualitative agreement with experimental findings. The approach presented here provides insight into the time-dependent resource allocation problem of phototrophic diurnal growth and may serve as a general framework to assess the optimality of metabolic strategies that evolved in phototrophic organisms under diurnal conditions. PMID:28720699
Advanced Computation in Plasma Physics
NASA Astrophysics Data System (ADS)
Tang, William
2001-10-01
Scientific simulation in tandem with theory and experiment is an essential tool for understanding complex plasma behavior. This talk will review recent progress and future directions for advanced simulations in magnetically-confined plasmas with illustrative examples chosen from areas such as microturbulence, magnetohydrodynamics, magnetic reconnection, and others. Significant recent progress has been made in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics, giving increasingly good agreement between experimental observations and computational modeling. This was made possible by innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales together with access to powerful new computational resources. In particular, the fusion energy science community has made excellent progress in developing advanced codes for which computer run-time and problem size scale well with the number of processors on massively parallel machines (MPP's). A good example is the effective usage of the full power of multi-teraflop MPP's to produce 3-dimensional, general geometry, nonlinear particle simulations which have accelerated progress in understanding the nature of turbulence self-regulation by zonal flows. It should be emphasized that these calculations, which typically utilized billions of particles for tens of thousands time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In general, results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. The associated scientific excitement should serve to stimulate improved cross-cutting collaborations with other fields and also to help attract bright young talent to plasma science.
Controlling user access to electronic resources without password
Smith, Fred Hewitt
2015-06-16
Described herein are devices and techniques for remotely controlling user access to a restricted computer resource. The process includes pre-determining an association of the restricted computer resource and computer-resource-proximal environmental information. Indicia of user-proximal environmental information are received from a user requesting access to the restricted computer resource. Received indicia of user-proximal environmental information are compared to associated computer-resource-proximal environmental information. User access to the restricted computer resource is selectively granted responsive to a favorable comparison in which the user-proximal environmental information is sufficiently similar to the computer-resource proximal environmental information. In at least some embodiments, the process further includes comparing user-supplied biometric measure and comparing it with a predetermined association of at least one biometric measure of an authorized user. Access to the restricted computer resource is granted in response to a favorable comparison.
Laboratory Computing Resource Center
Systems Computing and Data Resources Purchasing Resources Future Plans For Users Getting Started Using LCRC Software Best Practices and Policies Getting Help Support Laboratory Computing Resource Center Laboratory Computing Resource Center Latest Announcements See All April 27, 2018, Announcements, John Low
GIS in the Classroom: A New Zealand Experience
ERIC Educational Resources Information Center
Brodie, Sally
2006-01-01
This article begins by describing the use of GIS at a local scale within a single school, and builds outwards to review the use of GIS in the contexts of national classrooms. The single school is Diocesan School for Girls in Auckland. It is an independent girls' school for Years 1-13, well resourced with IT staff, computer hardware and software.…
Wan, Shixiang; Zou, Quan
2017-01-01
Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
NCI HPC Scaling and Optimisation in Climate, Weather, Earth system science and the Geosciences
NASA Astrophysics Data System (ADS)
Evans, B. J. K.; Bermous, I.; Freeman, J.; Roberts, D. S.; Ward, M. L.; Yang, R.
2016-12-01
The Australian National Computational Infrastructure (NCI) has a national focus in the Earth system sciences including climate, weather, ocean, water management, environment and geophysics. NCI leads a Program across its partners from the Australian science agencies and research communities to identify priority computational models to scale-up. Typically, these cases place a large overall demand on the available computer time, need to scale to higher resolutions, use excessive scarce resources such as large memory or bandwidth that limits, or in some cases, need to meet requirements for transition to a separate operational forecasting system, with set time-windows. The model codes include the UK Met Office Unified Model atmospheric model (UM), GFDL's Modular Ocean Model (MOM), both the UK Met Office's GC3 and Australian ACCESS coupled-climate systems (including sea ice), 4D-Var data assimilation and satellite processing, the Regional Ocean Model (ROMS), and WaveWatch3 as well as geophysics codes including hazards, magentuellerics, seismic inversions, and geodesy. Many of these codes use significant compute resources both for research applications as well as within the operational systems. Some of these models are particularly complex, and their behaviour had not been critically analysed for effective use of the NCI supercomputer or how they could be improved. As part of the Program, we have established a common profiling methodology that uses a suite of open source tools for performing scaling analyses. The most challenging cases are profiling multi-model coupled systems where the component models have their own complex algorithms and performance issues. We have also found issues within the current suite of profiling tools, and no single tool fully exposes the nature of the code performance. As a result of this work, international collaborations are now in place to ensure that improvements are incorporated within the community models, and our effort can be targeted in a coordinated way. The coordinations have involved user stakeholders, the model developer community, and dependent software libraries. For example, we have spent significant time characterising I/O scalability, and improving the use of libraries such as NetCDF and HDF5.
Access control and privacy in large distributed systems
NASA Technical Reports Server (NTRS)
Leiner, B. M.; Bishop, M.
1986-01-01
Large scale distributed systems consists of workstations, mainframe computers, supercomputers and other types of servers, all connected by a computer network. These systems are being used in a variety of applications including the support of collaborative scientific research. In such an environment, issues of access control and privacy arise. Access control is required for several reasons, including the protection of sensitive resources and cost control. Privacy is also required for similar reasons, including the protection of a researcher's proprietary results. A possible architecture for integrating available computer and communications security technologies into a system that meet these requirements is described. This architecture is meant as a starting point for discussion, rather that the final answer.
NASA Astrophysics Data System (ADS)
Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.
2017-01-01
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
NASA Astrophysics Data System (ADS)
Dutta, Dushmanta; Vaze, Jai; Kim, Shaun; Hughes, Justin; Yang, Ang; Teng, Jin; Lerat, Julien
2017-04-01
Existing global and continental scale river models, mainly designed for integrating with global climate models, are of very coarse spatial resolutions and lack many important hydrological processes, such as overbank flow, irrigation diversion, groundwater seepage/recharge, which operate at a much finer resolution. Thus, these models are not suitable for producing water accounts, which have become increasingly important for water resources planning and management at regional and national scales. A continental scale river system model called Australian Water Resource Assessment River System model (AWRA-R) has been developed and implemented for national water accounting in Australia using a node-link architecture. The model includes major hydrological processes, anthropogenic water utilisation and storage routing that influence the streamflow in both regulated and unregulated river systems. Two key components of the model are an irrigation model to compute water diversion for irrigation use and associated fluxes and stores and a storage-based floodplain inundation model to compute overbank flow from river to floodplain and associated floodplain fluxes and stores. The results in the Murray-Darling Basin shows highly satisfactory performance of the model with median daily Nash-Sutcliffe Efficiency (NSE) of 0.64 and median annual bias of less than 1% for the period of calibration (1970-1991) and median daily NSE of 0.69 and median annual bias of 12% for validation period (1992-2014). The results have demonstrated that the performance of the model is less satisfactory when the key processes such as overbank flow, groundwater seepage and irrigation diversion are switched off. The AWRA-R model, which has been operationalised by the Australian Bureau of Meteorology for continental scale water accounting, has contributed to improvements in the national water account by substantially reducing accounted different volume (gain/loss).
Research on elastic resource management for multi-queue under cloud computing environment
NASA Astrophysics Data System (ADS)
CHENG, Zhenjing; LI, Haibo; HUANG, Qiulan; Cheng, Yaodong; CHEN, Gang
2017-10-01
As a new approach to manage computing resource, virtualization technology is more and more widely applied in the high-energy physics field. A virtual computing cluster based on Openstack was built at IHEP, using HTCondor as the job queue management system. In a traditional static cluster, a fixed number of virtual machines are pre-allocated to the job queue of different experiments. However this method cannot be well adapted to the volatility of computing resource requirements. To solve this problem, an elastic computing resource management system under cloud computing environment has been designed. This system performs unified management of virtual computing nodes on the basis of job queue in HTCondor based on dual resource thresholds as well as the quota service. A two-stage pool is designed to improve the efficiency of resource pool expansion. This paper will present several use cases of the elastic resource management system in IHEPCloud. The practical run shows virtual computing resource dynamically expanded or shrunk while computing requirements change. Additionally, the CPU utilization ratio of computing resource was significantly increased when compared with traditional resource management. The system also has good performance when there are multiple condor schedulers and multiple job queues.
TRANSITION FROM KINETIC TO MHD BEHAVIOR IN A COLLISIONLESS PLASMA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parashar, Tulasi N.; Matthaeus, William H.; Shay, Michael A.
The study of kinetic effects in heliospheric plasmas requires representation of dynamics at sub-proton scales, but in most cases the system is driven by magnetohydrodynamic (MHD) activity at larger scales. The latter requirement challenges available computational resources, which raises the question of how large such a system must be to exhibit MHD traits at large scales while kinetic behavior is accurately represented at small scales. Here we study this implied transition from kinetic to MHD-like behavior using particle-in-cell (PIC) simulations, initialized using an Orszag–Tang Vortex. The PIC code treats protons, as well as electrons, kinetically, and we address the questionmore » of interest by examining several different indicators of MHD-like behavior.« less
The Numerical Propulsion System Simulation: An Overview
NASA Technical Reports Server (NTRS)
Lytle, John K.
2000-01-01
Advances in computational technology and in physics-based modeling are making large-scale, detailed simulations of complex systems possible within the design environment. For example, the integration of computing, communications, and aerodynamics has reduced the time required to analyze major propulsion system components from days and weeks to minutes and hours. This breakthrough has enabled the detailed simulation of major propulsion system components to become a routine part of designing systems, providing the designer with critical information about the components early in the design process. This paper describes the development of the numerical propulsion system simulation (NPSS), a modular and extensible framework for the integration of multicomponent and multidisciplinary analysis tools using geographically distributed resources such as computing platforms, data bases, and people. The analysis is currently focused on large-scale modeling of complete aircraft engines. This will provide the product developer with a "virtual wind tunnel" that will reduce the number of hardware builds and tests required during the development of advanced aerospace propulsion systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Yuqi; Wang, Jinan; Shao, Qiang, E-mail: qshao@mail.shcnc.ac.cn, E-mail: Jiye.Shi@ucb.com, E-mail: wlzhu@mail.shcnc.ac.cn
2015-03-28
The application of temperature replica exchange molecular dynamics (REMD) simulation on protein motion is limited by its huge requirement of computational resource, particularly when explicit solvent model is implemented. In the previous study, we developed a velocity-scaling optimized hybrid explicit/implicit solvent REMD method with the hope to reduce the temperature (replica) number on the premise of maintaining high sampling efficiency. In this study, we utilized this method to characterize and energetically identify the conformational transition pathway of a protein model, the N-terminal domain of calmodulin. In comparison to the standard explicit solvent REMD simulation, the hybrid REMD is much lessmore » computationally expensive but, meanwhile, gives accurate evaluation of the structural and thermodynamic properties of the conformational transition which are in well agreement with the standard REMD simulation. Therefore, the hybrid REMD could highly increase the computational efficiency and thus expand the application of REMD simulation to larger-size protein systems.« less
2014-01-01
Background Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. Results To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. Conclusions By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples. PMID:24475911
Adaptive subdomain modeling: A multi-analysis technique for ocean circulation models
NASA Astrophysics Data System (ADS)
Altuntas, Alper; Baugh, John
2017-07-01
Many coastal and ocean processes of interest operate over large temporal and geographical scales and require a substantial amount of computational resources, particularly when engineering design and failure scenarios are also considered. This study presents an adaptive multi-analysis technique that improves the efficiency of these computations when multiple alternatives are being simulated. The technique, called adaptive subdomain modeling, concurrently analyzes any number of child domains, with each instance corresponding to a unique design or failure scenario, in addition to a full-scale parent domain providing the boundary conditions for its children. To contain the altered hydrodynamics originating from the modifications, the spatial extent of each child domain is adaptively adjusted during runtime depending on the response of the model. The technique is incorporated in ADCIRC++, a re-implementation of the popular ADCIRC ocean circulation model with an updated software architecture designed to facilitate this adaptive behavior and to utilize concurrent executions of multiple domains. The results of our case studies confirm that the method substantially reduces computational effort while maintaining accuracy.
NASA Astrophysics Data System (ADS)
Ercan, Mehmet Bulent
Watershed-scale hydrologic models are used for a variety of applications from flood prediction, to drought analysis, to water quality assessments. A particular challenge in applying these models is calibration of the model parameters, many of which are difficult to measure at the watershed-scale. A primary goal of this dissertation is to contribute new computational methods and tools for calibration of watershed-scale hydrologic models and the Soil and Water Assessment Tool (SWAT) model, in particular. SWAT is a physically-based, watershed-scale hydrologic model developed to predict the impact of land management practices on water quality and quantity. The dissertation follows a manuscript format meaning it is comprised of three separate but interrelated research studies. The first two research studies focus on SWAT model calibration, and the third research study presents an application of the new calibration methods and tools to study climate change impacts on water resources in the Upper Neuse Watershed of North Carolina using SWAT. The objective of the first two studies is to overcome computational challenges associated with calibration of SWAT models. The first study evaluates a parallel SWAT calibration tool built using the Windows Azure cloud environment and a parallel version of the Dynamically Dimensioned Search (DDS) calibration method modified to run in Azure. The calibration tool was tested for six model scenarios constructed using three watersheds of increasing size (the Eno, Upper Neuse, and Neuse) for both a 2 year and 10 year simulation duration. Leveraging the cloud as an on demand computing resource allowed for a significantly reduced calibration time such that calibration of the Neuse watershed went from taking 207 hours on a personal computer to only 3.4 hours using 256 cores in the Azure cloud. The second study aims at increasing SWAT model calibration efficiency by creating an open source, multi-objective calibration tool using the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). This tool was demonstrated through an application for the Upper Neuse Watershed in North Carolina, USA. The objective functions used for the calibration were Nash-Sutcliffe (E) and Percent Bias (PB), and the objective sites were the Flat, Little, and Eno watershed outlets. The results show that the use of multi-objective calibration algorithms for SWAT calibration improved model performance especially in terms of minimizing PB compared to the single objective model calibration. The third study builds upon the first two studies by leveraging the new calibration methods and tools to study future climate impacts on the Upper Neuse watershed. Statistically downscaled outputs from eight Global Circulation Models (GCMs) were used for both low and high emission scenarios to drive a well calibrated SWAT model of the Upper Neuse watershed. The objective of the study was to understand the potential hydrologic response of the watershed, which serves as a public water supply for the growing Research Triangle Park region of North Carolina, under projected climate change scenarios. The future climate change scenarios, in general, indicate an increase in precipitation and temperature for the watershed in coming decades. The SWAT simulations using the future climate scenarios, in general, suggest an increase in soil water and water yield, and a decrease in evapotranspiration within the Upper Neuse watershed. In summary, this dissertation advances the field of watershed-scale hydrologic modeling by (i) providing some of the first work to apply cloud computing for the computationally-demanding task of model calibration; (ii) providing a new, open source library that can be used by SWAT modelers to perform multi-objective calibration of their models; and (iii) advancing understanding of climate change impacts on water resources for an important watershed in the Research Triangle Park region of North Carolina. The third study leveraged the methodological advances presented in the first two studies. Therefore, the dissertation contains three independent by interrelated studies that collectively advance the field of watershed-scale hydrologic modeling and analysis.
Tools for Analyzing Computing Resource Management Strategies and Algorithms for SDR Clouds
NASA Astrophysics Data System (ADS)
Marojevic, Vuk; Gomez-Miguelez, Ismael; Gelonch, Antoni
2012-09-01
Software defined radio (SDR) clouds centralize the computing resources of base stations. The computing resource pool is shared between radio operators and dynamically loads and unloads digital signal processing chains for providing wireless communications services on demand. Each new user session request particularly requires the allocation of computing resources for executing the corresponding SDR transceivers. The huge amount of computing resources of SDR cloud data centers and the numerous session requests at certain hours of a day require an efficient computing resource management. We propose a hierarchical approach, where the data center is divided in clusters that are managed in a distributed way. This paper presents a set of computing resource management tools for analyzing computing resource management strategies and algorithms for SDR clouds. We use the tools for evaluating a different strategies and algorithms. The results show that more sophisticated algorithms can achieve higher resource occupations and that a tradeoff exists between cluster size and algorithm complexity.
Costs of cloud computing for a biometry department. A case study.
Knaus, J; Hieke, S; Binder, H; Schwarzer, G
2013-01-01
"Cloud" computing providers, such as the Amazon Web Services (AWS), offer stable and scalable computational resources based on hardware virtualization, with short, usually hourly, billing periods. The idea of pay-as-you-use seems appealing for biometry research units which have only limited access to university or corporate data center resources or grids. This case study compares the costs of an existing heterogeneous on-site hardware pool in a Medical Biometry and Statistics department to a comparable AWS offer. The "total cost of ownership", including all direct costs, is determined for the on-site hardware, and hourly prices are derived, based on actual system utilization during the year 2011. Indirect costs, which are difficult to quantify are not included in this comparison, but nevertheless some rough guidance from our experience is given. To indicate the scale of costs for a methodological research project, a simulation study of a permutation-based statistical approach is performed using AWS and on-site hardware. In the presented case, with a system utilization of 25-30 percent and 3-5-year amortization, on-site hardware can result in smaller costs, compared to hourly rental in the cloud dependent on the instance chosen. Renting cloud instances with sufficient main memory is a deciding factor in this comparison. Costs for on-site hardware may vary, depending on the specific infrastructure at a research unit, but have only moderate impact on the overall comparison and subsequent decision for obtaining affordable scientific computing resources. Overall utilization has a much stronger impact as it determines the actual computing hours needed per year. Taking this into ac count, cloud computing might still be a viable option for projects with limited maturity, or as a supplement for short peaks in demand.
NGScloud: RNA-seq analysis of non-model species using cloud computing.
Mora-Márquez, Fernando; Vázquez-Poletti, José Luis; López de Heredia, Unai
2018-05-03
RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon's hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis. NGScloud is freely available at https://github.com/GGFHF/NGScloud/. A manual detailing installation and how-to-use instructions is available with the distribution. unai.lopezdeheredia@upm.es.
Satellite Imagery Analysis for Automated Global Food Security Forecasting
NASA Astrophysics Data System (ADS)
Moody, D.; Brumby, S. P.; Chartrand, R.; Keisler, R.; Mathis, M.; Beneke, C. M.; Nicholaeff, D.; Skillman, S.; Warren, M. S.; Poehnelt, J.
2017-12-01
The recent computing performance revolution has driven improvements in sensor, communication, and storage technology. Multi-decadal remote sensing datasets at the petabyte scale are now available in commercial clouds, with new satellite constellations generating petabytes/year of daily high-resolution global coverage imagery. Cloud computing and storage, combined with recent advances in machine learning, are enabling understanding of the world at a scale and at a level of detail never before feasible. We present results from an ongoing effort to develop satellite imagery analysis tools that aggregate temporal, spatial, and spectral information and that can scale with the high-rate and dimensionality of imagery being collected. We focus on the problem of monitoring food crop productivity across the Middle East and North Africa, and show how an analysis-ready, multi-sensor data platform enables quick prototyping of satellite imagery analysis algorithms, from land use/land cover classification and natural resource mapping, to yearly and monthly vegetative health change trends at the structural field level.
McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression
Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh; ...
2013-01-01
High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, applications store their states in checkpoints on a parallel file system (PFS). As applications scale up, checkpoint-restart incurs high overheads due to contention for PFS resources. The high overheads force large-scale applications to reduce checkpoint frequency, which means more compute time is lost in the event of failure. We alleviate this problem through a scalable checkpoint-restart system, mcrEngine. McrEngine aggregates checkpoints from multiple application processes with knowledge of the data semantics available through widely-used I/O libraries, e.g., HDF5 and netCDF, and compresses them. Our novel scheme improves compressibility ofmore » checkpoints up to 115% over simple concatenation and compression. Our evaluation with large-scale application checkpoints show that mcrEngine reduces checkpointing overhead by up to 87% and restart overhead by up to 62% over a baseline with no aggregation or compression.« less
Nebert, D.D.
1989-01-01
In the process of developing a continuous hydrographic data layer for water resources applications in the Pacific Northwest, map-edge discontinuities in the U.S. Geological Survey 1:100 ,000-scale digital data that required application of computer-assisted edgematching procedures were identified. The spatial data sets required by the project must have line features that match closely enough across map boundaries to ensure full line topology when adjacent files are joined by the computer. Automated edgematching techniques are evaluated as to their effects on positional accuracy. Interactive methods such as selective node-matching and on-screen editing are also reviewed. Interactive procedures complement automated methods by allowing supervision of edgematching in a cartographic and hydrologic context. Common edge conditions encountered in the preparation of the Northwest Rivers data base are described, as are recommended processing solutions. Suggested edgematching procedures for 1:100,000-scale hydrography data are included in an appendix to encourage consistent processing of this theme on a national scale. (USGS)
Computer 3D site model generation based on aerial images
NASA Astrophysics Data System (ADS)
Zheltov, Sergey Y.; Blokhinov, Yuri B.; Stepanov, Alexander A.; Skryabin, Sergei V.; Sibiriakov, Alexandre V.
1997-07-01
The technology for 3D model design of real world scenes and its photorealistic rendering are current topics of investigation. Development of such technology is very attractive to implement in vast varieties of applications: military mission planning, crew training, civil engineering, architecture, virtual reality entertainments--just a few were mentioned. 3D photorealistic models of urban areas are often discussed now as upgrade from existing 2D geographic information systems. Possibility of site model generation with small details depends on two main factors: available source dataset and computer power resources. In this paper PC based technology is presented, so the scenes of middle resolution (scale of 1:1000) be constructed. Types of datasets are the gray level aerial stereo pairs of photographs (scale of 1:14000) and true color on ground photographs of buildings (scale ca.1:1000). True color terrestrial photographs are also necessary for photorealistic rendering, that in high extent improves human perception of the scene.
Dynamic computer model for the metallogenesis and tectonics of the Circum-North Pacific
Scotese, Christopher R.; Nokleberg, Warren J.; Monger, James W.H.; Norton, Ian O.; Parfenov, Leonid M.; Khanchuk, Alexander I.; Bundtzen, Thomas K.; Dawson, Kenneth M.; Eremin, Roman A.; Frolov, Yuri F.; Fujita, Kazuya; Goryachev, Nikolai A.; Pozdeev, Anany I.; Ratkin, Vladimir V.; Rodinov, Sergey M.; Rozenblum, Ilya S.; Scholl, David W.; Shpikerman, Vladimir I.; Sidorov, Anatoly A.; Stone, David B.
2001-01-01
The digital files on this report consist of a dynamic computer model of the metallogenesis and tectonics of the Circum-North Pacific, and background articles, figures, and maps. The tectonic part of the dynamic computer model is derived from a major analysis of the tectonic evolution of the Circum-North Pacific which is also contained in directory tectevol. The dynamic computer model and associated materials on this CD-ROM are part of a project on the major mineral deposits, metallogenesis, and tectonics of the Russian Far East, Alaska, and the Canadian Cordillera. The project provides critical information on bedrock geology and geophysics, tectonics, major metalliferous mineral resources, metallogenic patterns, and crustal origin and evolution of mineralizing systems for this region. The major scientific goals and benefits of the project are to: (1) provide a comprehensive international data base on the mineral resources of the region that is the first, extensive knowledge available in English; (2) provide major new interpretations of the origin and crustal evolution of mineralizing systems and their host rocks, thereby enabling enhanced, broad-scale tectonic reconstructions and interpretations; and (3) promote trade and scientific and technical exchanges between North America and Eastern Asia.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
Tatlow, PJ; Piccolo, Stephen R.
2016-01-01
Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (https://osf.io/gqrz9). PMID:27982081
A simple Lagrangian forecast system with aviation forecast potential
NASA Technical Reports Server (NTRS)
Petersen, R. A.; Homan, J. H.
1983-01-01
A trajectory forecast procedure is developed which uses geopotential tendency fields obtained from a simple, multiple layer, potential vorticity conservative isentropic model. This model can objectively account for short-term advective changes in the mass field when combined with fine-scale initial analyses. This procedure for producing short-term, upper-tropospheric trajectory forecasts employs a combination of a detailed objective analysis technique, an efficient mass advection model, and a diagnostically proven trajectory algorithm, none of which require extensive computer resources. Results of initial tests are presented, which indicate an exceptionally good agreement for trajectory paths entering the jet stream and passing through an intensifying trough. It is concluded that this technique not only has potential for aiding in route determination, fuel use estimation, and clear air turbulence detection, but also provides an example of the types of short range forecasting procedures which can be applied at local forecast centers using simple algorithms and a minimum of computer resources.
Variable Generation Power Forecasting as a Big Data Problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haupt, Sue Ellen; Kosovic, Branko
To blend growing amounts of power from renewable resources into utility operations requires accurate forecasts. For both day ahead planning and real-time operations, the power from the wind and solar resources must be predicted based on real-time observations and a series of models that span the temporal and spatial scales of the problem, using the physical and dynamical knowledge as well as computational intelligence. Accurate prediction is a Big Data problem that requires disparate data, multiple models that are each applicable for a specific time frame, and application of computational intelligence techniques to successfully blend all of the model andmore » observational information in real-time and deliver it to the decision makers at utilities and grid operators. This paper describes an example system that has been used for utility applications and how it has been configured to meet utility needs while addressing the Big Data issues.« less
FPGA-Based Multimodal Embedded Sensor System Integrating Low- and Mid-Level Vision
Botella, Guillermo; Martín H., José Antonio; Santos, Matilde; Meyer-Baese, Uwe
2011-01-01
Motion estimation is a low-level vision task that is especially relevant due to its wide range of applications in the real world. Many of the best motion estimation algorithms include some of the features that are found in mammalians, which would demand huge computational resources and therefore are not usually available in real-time. In this paper we present a novel bioinspired sensor based on the synergy between optical flow and orthogonal variant moments. The bioinspired sensor has been designed for Very Large Scale Integration (VLSI) using properties of the mammalian cortical motion pathway. This sensor combines low-level primitives (optical flow and image moments) in order to produce a mid-level vision abstraction layer. The results are described trough experiments showing the validity of the proposed system and an analysis of the computational resources and performance of the applied algorithms. PMID:22164069
Variable Generation Power Forecasting as a Big Data Problem
Haupt, Sue Ellen; Kosovic, Branko
2016-10-10
To blend growing amounts of power from renewable resources into utility operations requires accurate forecasts. For both day ahead planning and real-time operations, the power from the wind and solar resources must be predicted based on real-time observations and a series of models that span the temporal and spatial scales of the problem, using the physical and dynamical knowledge as well as computational intelligence. Accurate prediction is a Big Data problem that requires disparate data, multiple models that are each applicable for a specific time frame, and application of computational intelligence techniques to successfully blend all of the model andmore » observational information in real-time and deliver it to the decision makers at utilities and grid operators. This paper describes an example system that has been used for utility applications and how it has been configured to meet utility needs while addressing the Big Data issues.« less
FPGA-based multimodal embedded sensor system integrating low- and mid-level vision.
Botella, Guillermo; Martín H, José Antonio; Santos, Matilde; Meyer-Baese, Uwe
2011-01-01
Motion estimation is a low-level vision task that is especially relevant due to its wide range of applications in the real world. Many of the best motion estimation algorithms include some of the features that are found in mammalians, which would demand huge computational resources and therefore are not usually available in real-time. In this paper we present a novel bioinspired sensor based on the synergy between optical flow and orthogonal variant moments. The bioinspired sensor has been designed for Very Large Scale Integration (VLSI) using properties of the mammalian cortical motion pathway. This sensor combines low-level primitives (optical flow and image moments) in order to produce a mid-level vision abstraction layer. The results are described trough experiments showing the validity of the proposed system and an analysis of the computational resources and performance of the applied algorithms.
Impact on family and parental stress of prenatal vs postnatal repair of myelomeningocele.
Antiel, Ryan M; Adzick, N Scott; Thom, Elizabeth A; Burrows, Pamela K; Farmer, Diana L; Brock, John W; Howell, Lori J; Farrell, Jody A; Houtrow, Amy J
2016-10-01
The Management of Myelomeningocele Study was a multicenter, randomized controlled trial that compared prenatal repair with standard postnatal repair for fetal myelomeningocele. We sought to describe the long-term impact on the families of the women who participated and to evaluate how the timing of repair influenced the impact on families and parental stress. Randomized women completed the 24-item Impact on Family Scale and the 36-item Parenting Stress Index Short Form at 12 and 30 months after delivery. A revised 15-item Impact on Family Scale describing overall impact was also computed. Higher scores reflected more negative impacts or greater stress. In addition, we examined Family Support Scale and Family Resource Scale scores along with various neonatal outcomes. Repeated measures analysis was conducted for each scale and subscale. Of 183 women randomized, 171 women completed the Impact on Family Scale and 172 completed the Parenting Stress Index at both 12 and 30 months. The prenatal surgery group had significantly lower revised 15-item Impact on Family Scale scores as well as familial-social impact subscale scores compared to the postnatal surgery group (P = .02 and .004, respectively). There was no difference in total parental stress between the 2 groups (P = .89) or in any of the Parenting Stress Index Short Form subscales. In addition, walking independently at 30 months and family resources at 12 months were associated with both family impact and parental stress. The overall negative family impact of caring for a child with spina bifida, up to 30 months of age, was significantly lower in the prenatal surgery group compared to the postnatal surgery group. Ambulation status and family resources were predictive of impact on family and parental stress. Copyright © 2016 Elsevier Inc. All rights reserved.
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2017-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments. PMID:28190948
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline
Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.
2016-09-28
A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.
A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
Information technology challenges of biodiversity and ecosystems informatics
Schnase, J.L.; Cushing, J.; Frame, M.; Frondorf, A.; Landis, E.; Maier, D.; Silberschatz, A.
2003-01-01
Computer scientists, biologists, and natural resource managers recently met to examine the prospects for advancing computer science and information technology research by focusing on the complex and often-unique challenges found in the biodiversity and ecosystem domain. The workshop and its final report reveal that the biodiversity and ecosystem sciences are fundamentally information sciences and often address problems having distinctive attributes of scale and socio-technical complexity. The paper provides an overview of the emerging field of biodiversity and ecosystem informatics and demonstrates how the demands of biodiversity and ecosystem research can advance our understanding and use of information technologies.
A study of computer graphics technology in application of communication resource management
NASA Astrophysics Data System (ADS)
Li, Jing; Zhou, Liang; Yang, Fei
2017-08-01
With the development of computer technology, computer graphics technology has been widely used. Especially, the success of object-oriented technology and multimedia technology promotes the development of graphics technology in the computer software system. Therefore, the computer graphics theory and application technology have become an important topic in the field of computer, while the computer graphics technology becomes more and more extensive in various fields of application. In recent years, with the development of social economy, especially the rapid development of information technology, the traditional way of communication resource management cannot effectively meet the needs of resource management. In this case, the current communication resource management is still using the original management tools and management methods, resource management equipment management and maintenance, which brought a lot of problems. It is very difficult for non-professionals to understand the equipment and the situation in communication resource management. Resource utilization is relatively low, and managers cannot quickly and accurately understand the resource conditions. Aimed at the above problems, this paper proposes to introduce computer graphics technology into the communication resource management. The introduction of computer graphics not only makes communication resource management more vivid, but also reduces the cost of resource management and improves work efficiency.
ERIC Educational Resources Information Center
Vivian, Rebecca; Falkner, Katrina; Falkner, Nickolas
2014-01-01
England and Australia have introduced new learning areas, teaching computer science to children from the first year of school. This is a significant milestone that also raises a number of big challenges: the preparation of teachers and the development of resources" at a national scale." Curriculum change is not easy for teachers, in any…
ERIC Educational Resources Information Center
Marques, Bertil P.; Carvalho, Piedade; Escudeiro, Paula; Barata, Ana; Silva, Ana; Queiros, Sandra
2017-01-01
Promoted by the significant increase of large scale internet access, many audiences have turned to the web and to its resources for learning and inspiration, with diverse sets of skills and intents. In this context, Multimedia Online Open Courses (MOOC) consist in learning models supported on user-friendly web tools that allow anyone with minimum…
Ng, Chip-Jin; Hsu, Kuang-Hung; Kuan, Jen-Tze; Chiu, Te-Fa; Chen, Wei-Kong; Lin, Hung-Jung; Bullard, Michael J; Chen, Jih-Chang
2010-11-01
Since the implementation of National Health Insurance in Taiwan, Emergency Department (ED) volume has progressively increased, and the current triage system is insufficient and needs modification. This study compared the prioritization and resource utilization differences between the four-level Taiwan Triage System (TTS) and the standardized five-level Canadian Triage and Acuity Scale (CTAS) among ED patients. This was a prospective observational study. All adult ED patients who presented to three different medical centers during the study period were included. Patients were independently triaged by the duty triage nurse using TTS, and a single trained research nurse using CTAS with a computer support software system. Hospitalization, length of stay (LOS), and medical resource consumption were analyzed by comparing TTS and CTAS by acuity levels. There was significant disparity in patient prioritization between TTS and CTAS among the 1851 enrolled patients. With TTS, 7.8%, 46.1%, 45.9% and 0.2% were assigned to levels 1, 2, 3, and 4, respectively. With CTAS, 3.5%, 24.4%, 44.3%, 22.4% and 5.5% were assigned to levels 1, 2, 3, 4, and 5, respectively. The hospitalization rate, LOS, and medical resource consumption differed significantly between the two triage systems and correlated better with CTAS. CTAS provided better discrimination for ED patient triage, and also showed greater validity when predicting hospitalization, LOS, and medical resource consumption. An accurate five-level triage scale appeared superior in predicting patient acuity and resource utilization. Copyright © 2010 Formosan Medical Association & Elsevier. Published by Elsevier B.V. All rights reserved.
A resource management architecture based on complex network theory in cloud computing federation
NASA Astrophysics Data System (ADS)
Zhang, Zehua; Zhang, Xuejie
2011-10-01
Cloud Computing Federation is a main trend of Cloud Computing. Resource Management has significant effect on the design, realization, and efficiency of Cloud Computing Federation. Cloud Computing Federation has the typical characteristic of the Complex System, therefore, we propose a resource management architecture based on complex network theory for Cloud Computing Federation (abbreviated as RMABC) in this paper, with the detailed design of the resource discovery and resource announcement mechanisms. Compare with the existing resource management mechanisms in distributed computing systems, a Task Manager in RMABC can use the historical information and current state data get from other Task Managers for the evolution of the complex network which is composed of Task Managers, thus has the advantages in resource discovery speed, fault tolerance and adaptive ability. The result of the model experiment confirmed the advantage of RMABC in resource discovery performance.
NASA Technical Reports Server (NTRS)
Joyce, A. T.
1974-01-01
Significant progress has been made in the classification of surface conditions (land uses) with computer-implemented techniques based on the use of ERTS digital data and pattern recognition software. The supervised technique presently used at the NASA Earth Resources Laboratory is based on maximum likelihood ratioing with a digital table look-up approach to classification. After classification, colors are assigned to the various surface conditions (land uses) classified, and the color-coded classification is film recorded on either positive or negative 9 1/2 in. film at the scale desired. Prints of the film strips are then mosaicked and photographed to produce a land use map in the format desired. Computer extraction of statistical information is performed to show the extent of each surface condition (land use) within any given land unit that can be identified in the image. Evaluations of the product indicate that classification accuracy is well within the limits for use by land resource managers and administrators. Classifications performed with digital data acquired during different seasons indicate that the combination of two or more classifications offer even better accuracy.
Schruben, Paul G.
1996-01-01
This CD-ROM contains digital versions of the geology and resource assessment maps of Costa Rica originally published by the U.S. Geological Survey (USGS), the Direccion General de Geologia, Minas e Hidrocarburos, and the Universidad de Costa Rica in 1987 at a scale of 1:500,000 in USGS Folio I-1865. The following layers of the map are available on the CD-ROM: geology, favorable domains for selected deposit types, Bouguer gravity, isostatic gravity, mineral deposits, and rock geochemistry sample points. Some of the layers are provided in the following formats: ArcView 1 for Windows and UNIX, ARC/INFO 6.1.2 Export, Digital Line Graph (DLG) Optional, and Drawing Exchange File (DXF). This CD-ROM was produced in accordance with the ISO 9660 and Apple Computer's HFS standards.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
The Fermilab Grid and Cloud Computing Department and the KISTI Global Science experimental Data hub Center are working on a multi-year Collaborative Research and Development Agreement.With the knowledge developed in the first year on how to provision and manage a federation of virtual machines through Cloud management systems. In this second year, we expanded the work on provisioning and federation, increasing both scale and diversity of solutions, and we started to build on-demand services on the established fabric, introducing the paradigm of Platform as a Service to assist with the execution of scientific workflows. We have enabled scientific workflows ofmore » stakeholders to run on multiple cloud resources at the scale of 1,000 concurrent machines. The demonstrations have been in the areas of (a) Virtual Infrastructure Automation and Provisioning, (b) Interoperability and Federation of Cloud Resources, and (c) On-demand Services for ScientificWorkflows.« less
Pfeil, Thomas; Potjans, Tobias C; Schrader, Sven; Potjans, Wiebke; Schemmel, Johannes; Diesmann, Markus; Meier, Karlheinz
2012-01-01
Large-scale neuromorphic hardware systems typically bear the trade-off between detail level and required chip resources. Especially when implementing spike-timing dependent plasticity, reduction in resources leads to limitations as compared to floating point precision. By design, a natural modification that saves resources would be reducing synaptic weight resolution. In this study, we give an estimate for the impact of synaptic weight discretization on different levels, ranging from random walks of individual weights to computer simulations of spiking neural networks. The FACETS wafer-scale hardware system offers a 4-bit resolution of synaptic weights, which is shown to be sufficient within the scope of our network benchmark. Our findings indicate that increasing the resolution may not even be useful in light of further restrictions of customized mixed-signal synapses. In addition, variations due to production imperfections are investigated and shown to be uncritical in the context of the presented study. Our results represent a general framework for setting up and configuring hardware-constrained synapses. We suggest how weight discretization could be considered for other backends dedicated to large-scale simulations. Thus, our proposition of a good hardware verification practice may rise synergy effects between hardware developers and neuroscientists.
Pfeil, Thomas; Potjans, Tobias C.; Schrader, Sven; Potjans, Wiebke; Schemmel, Johannes; Diesmann, Markus; Meier, Karlheinz
2012-01-01
Large-scale neuromorphic hardware systems typically bear the trade-off between detail level and required chip resources. Especially when implementing spike-timing dependent plasticity, reduction in resources leads to limitations as compared to floating point precision. By design, a natural modification that saves resources would be reducing synaptic weight resolution. In this study, we give an estimate for the impact of synaptic weight discretization on different levels, ranging from random walks of individual weights to computer simulations of spiking neural networks. The FACETS wafer-scale hardware system offers a 4-bit resolution of synaptic weights, which is shown to be sufficient within the scope of our network benchmark. Our findings indicate that increasing the resolution may not even be useful in light of further restrictions of customized mixed-signal synapses. In addition, variations due to production imperfections are investigated and shown to be uncritical in the context of the presented study. Our results represent a general framework for setting up and configuring hardware-constrained synapses. We suggest how weight discretization could be considered for other backends dedicated to large-scale simulations. Thus, our proposition of a good hardware verification practice may rise synergy effects between hardware developers and neuroscientists. PMID:22822388
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away
NASA Astrophysics Data System (ADS)
Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.
2012-09-01
By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
A resource-sharing model based on a repeated game in fog computing.
Sun, Yan; Zhang, Nan
2017-03-01
With the rapid development of cloud computing techniques, the number of users is undergoing exponential growth. It is difficult for traditional data centers to perform many tasks in real time because of the limited bandwidth of resources. The concept of fog computing is proposed to support traditional cloud computing and to provide cloud services. In fog computing, the resource pool is composed of sporadic distributed resources that are more flexible and movable than a traditional data center. In this paper, we propose a fog computing structure and present a crowd-funding algorithm to integrate spare resources in the network. Furthermore, to encourage more resource owners to share their resources with the resource pool and to supervise the resource supporters as they actively perform their tasks, we propose an incentive mechanism in our algorithm. Simulation results show that our proposed incentive mechanism can effectively reduce the SLA violation rate and accelerate the completion of tasks.
Memristive Mixed-Signal Neuromorphic Systems: Energy-Efficient Learning at the Circuit-Level
Chakma, Gangotree; Adnan, Md Musabbir; Wyer, Austin R.; ...
2017-11-23
Neuromorphic computing is non-von Neumann computer architecture for the post Moore’s law era of computing. Since a main focus of the post Moore’s law era is energy-efficient computing with fewer resources and less area, neuromorphic computing contributes effectively in this research. Here in this paper, we present a memristive neuromorphic system for improved power and area efficiency. Our particular mixed-signal approach implements neural networks with spiking events in a synchronous way. Moreover, the use of nano-scale memristive devices saves both area and power in the system. We also provide device-level considerations that make the system more energy-efficient. The proposed systemmore » additionally includes synchronous digital long term plasticity, an online learning methodology that helps the system train the neural networks during the operation phase and improves the efficiency in learning considering the power consumption and area overhead.« less
Memristive Mixed-Signal Neuromorphic Systems: Energy-Efficient Learning at the Circuit-Level
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chakma, Gangotree; Adnan, Md Musabbir; Wyer, Austin R.
Neuromorphic computing is non-von Neumann computer architecture for the post Moore’s law era of computing. Since a main focus of the post Moore’s law era is energy-efficient computing with fewer resources and less area, neuromorphic computing contributes effectively in this research. Here in this paper, we present a memristive neuromorphic system for improved power and area efficiency. Our particular mixed-signal approach implements neural networks with spiking events in a synchronous way. Moreover, the use of nano-scale memristive devices saves both area and power in the system. We also provide device-level considerations that make the system more energy-efficient. The proposed systemmore » additionally includes synchronous digital long term plasticity, an online learning methodology that helps the system train the neural networks during the operation phase and improves the efficiency in learning considering the power consumption and area overhead.« less
Computer graphics for management: An abstract of capabilities and applications of the EIS system
NASA Technical Reports Server (NTRS)
Solem, B. J.
1975-01-01
The Executive Information Services (EIS) system, developed as a computer-based, time-sharing tool for making and implementing management decisions, and including computer graphics capabilities, was described. The following resources are available through the EIS languages: centralized corporate/gov't data base, customized and working data bases, report writing, general computational capability, specialized routines, modeling/programming capability, and graphics. Nearly all EIS graphs can be created by a single, on-line instruction. A large number of options are available, such as selection of graphic form, line control, shading, placement on the page, multiple images on a page, control of scaling and labeling, plotting of cum data sets, optical grid lines, and stack charts. The following are examples of areas in which the EIS system may be used: research, estimating services, planning, budgeting, and performance measurement, national computer hook-up negotiations.
NASA Astrophysics Data System (ADS)
Falkner, Katrina; Vivian, Rebecca
2015-10-01
To support teachers to implement Computer Science curricula into classrooms from the very first year of school, teachers, schools and organisations seek quality curriculum resources to support implementation and teacher professional development. Until now, many Computer Science resources and outreach initiatives have targeted K-12 school-age children, with the intention to engage children and increase interest, rather than to formally teach concepts and skills. What is the educational quality of existing Computer Science resources and to what extent are they suitable for classroom learning and teaching? In this paper, an assessment framework is presented to evaluate the quality of online Computer Science resources. Further, a semi-systematic review of available online Computer Science resources was conducted to evaluate resources available for classroom learning and teaching and to identify gaps in resource availability, using the Australian curriculum as a case study analysis. The findings reveal a predominance of quality resources, however, a number of critical gaps were identified. This paper provides recommendations and guidance for the development of new and supplementary resources and future research.
Map visualization of groundwater withdrawals at the sub-basin scale
NASA Astrophysics Data System (ADS)
Goode, Daniel J.
2016-06-01
A simple method is proposed to visualize the magnitude of groundwater withdrawals from wells relative to user-defined water-resource metrics. The map is solely an illustration of the withdrawal magnitudes, spatially centered on wells—it is not capture zones or source areas contributing recharge to wells. Common practice is to scale the size (area) of withdrawal well symbols proportional to pumping rate. Symbols are drawn large enough to be visible, but not so large that they overlap excessively. In contrast to such graphics-based symbol sizes, the proposed method uses a depth-rate index (length per time) to visualize the well withdrawal rates by volumetrically consistent areas, called "footprints". The area of each individual well's footprint is the withdrawal rate divided by the depth-rate index. For example, the groundwater recharge rate could be used as a depth-rate index to show how large withdrawals are relative to that recharge. To account for the interference of nearby wells, composite footprints are computed by iterative nearest-neighbor distribution of excess withdrawals on a computational and display grid having uniform square cells. The map shows circular footprints at individual isolated wells and merged footprint areas where wells' individual footprints overlap. Examples are presented for depth-rate indexes corresponding to recharge, to spatially variable stream baseflow (normalized by basin area), and to the average rate of water-table decline (scaled by specific yield). These depth-rate indexes are water-resource metrics, and the footprints visualize the magnitude of withdrawals relative to these metrics.
Map visualization of groundwater withdrawals at the sub-basin scale
Goode, Daniel J.
2016-01-01
A simple method is proposed to visualize the magnitude of groundwater withdrawals from wells relative to user-defined water-resource metrics. The map is solely an illustration of the withdrawal magnitudes, spatially centered on wells—it is not capture zones or source areas contributing recharge to wells. Common practice is to scale the size (area) of withdrawal well symbols proportional to pumping rate. Symbols are drawn large enough to be visible, but not so large that they overlap excessively. In contrast to such graphics-based symbol sizes, the proposed method uses a depth-rate index (length per time) to visualize the well withdrawal rates by volumetrically consistent areas, called “footprints”. The area of each individual well’s footprint is the withdrawal rate divided by the depth-rate index. For example, the groundwater recharge rate could be used as a depth-rate index to show how large withdrawals are relative to that recharge. To account for the interference of nearby wells, composite footprints are computed by iterative nearest-neighbor distribution of excess withdrawals on a computational and display grid having uniform square cells. The map shows circular footprints at individual isolated wells and merged footprint areas where wells’ individual footprints overlap. Examples are presented for depth-rate indexes corresponding to recharge, to spatially variable stream baseflow (normalized by basin area), and to the average rate of water-table decline (scaled by specific yield). These depth-rate indexes are water-resource metrics, and the footprints visualize the magnitude of withdrawals relative to these metrics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Childers, J. T.; Uram, T. D.; LeCompte, T. J.
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the World- wide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Childers, J. T.; Uram, T. D.; LeCompte, T. J.; ...
2016-09-29
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Childers, J. T.; Uram, T. D.; LeCompte, T. J.
As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Computing Properties of Hadrons, Nuclei and Nuclear Matter from Quantum Chromodynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Savage, Martin J.
This project was part of a coordinated software development effort which the nuclear physics lattice QCD community pursues in order to ensure that lattice calculations can make optimal use of present, and forthcoming leadership-class and dedicated hardware, including those of the national laboratories, and prepares for the exploitation of future computational resources in the exascale era. The UW team improved and extended software libraries used in lattice QCD calculations related to multi-nucleon systems, enhanced production running codes related to load balancing multi-nucleon production on large-scale computing platforms, and developed SQLite (addressable database) interfaces to efficiently archive and analyze multi-nucleon datamore » and developed a Mathematica interface for the SQLite databases.« less
A hybrid computational strategy to address WGS variant analysis in >5000 samples.
Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli
2016-09-10
The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
Jung, Yousung; Shao, Yihan; Head-Gordon, Martin
2007-09-01
The scaled opposite spin Møller-Plesset method (SOS-MP2) is an economical way of obtaining correlation energies that are computationally cheaper, and yet, in a statistical sense, of higher quality than standard MP2 theory, by introducing one empirical parameter. But SOS-MP2 still has a fourth-order scaling step that makes the method inapplicable to very large molecular systems. We reduce the scaling of SOS-MP2 by exploiting the sparsity of expansion coefficients and local integral matrices, by performing local auxiliary basis expansions for the occupied-virtual product distributions. To exploit sparsity of 3-index local quantities, we use a blocking scheme in which entire zero-rows and columns, for a given third global index, are deleted by comparison against a numerical threshold. This approach minimizes sparse matrix book-keeping overhead, and also provides sufficiently large submatrices after blocking, to allow efficient matrix-matrix multiplies. The resulting algorithm is formally cubic scaling, and requires only moderate computational resources (quadratic memory and disk space) and, in favorable cases, is shown to yield effective quadratic scaling behavior in the size regime we can apply it to. Errors associated with local fitting using the attenuated Coulomb metric and numerical thresholds in the blocking procedure are found to be insignificant in terms of the predicted relative energies. A diverse set of test calculations shows that the size of system where significant computational savings can be achieved depends strongly on the dimensionality of the system, and the extent of localizability of the molecular orbitals. Copyright 2007 Wiley Periodicals, Inc.
Solving global shallow water equations on heterogeneous supercomputers
Fu, Haohuan; Gan, Lin; Yang, Chao; Xue, Wei; Wang, Lanning; Wang, Xinliang; Huang, Xiaomeng; Yang, Guangwen
2017-01-01
The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved. PMID:28282428
Dawn Usage, Scheduling, and Governance Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Louis, S
2009-11-02
This document describes Dawn use, scheduling, and governance concerns. Users started running full-machine science runs in early April 2009 during the initial open shakedown period. Scheduling Dawn while in the Open Computing Facility (OCF) was controlled and coordinated via phone calls, emails, and a small number of controlled banks. With Dawn moving to the Secure Computing Facility (SCF) in fall of 2009, a more detailed scheduling and governance model is required. The three major objectives are: (1) Ensure Dawn resources are allocated on a program priority-driven basis; (2) Utilize Dawn resources on the job mixes for which they were intended;more » and (3) Minimize idle cycles through use of partitions, banks and proper job mix. The SCF workload for Dawn will be inherently different than Purple or BG/L, and therefore needs a different approach. Dawn's primary function is to permit adequate access for tri-lab code development in preparation for Sequoia, and in particular for weapons multi-physics codes in support of UQ. A second purpose is to provide time allocations for large-scale science runs and for UQ suite calculations to advance SSP program priorities. This proposed governance model will be the basis for initial time allocation of Dawn computing resources for the science and UQ workloads that merit priority on this class of resource, either because they cannot be reasonably attempted on any other resources due to size of problem, or because of the unavailability of sizable allocations on other ASC capability or capacity platforms. This proposed model intends to make the most effective use of Dawn as possible, but without being overly constrained by more formal proposal processes such as those now used for Purple CCCs.« less
What if we took a global look?
NASA Astrophysics Data System (ADS)
Ouellet Dallaire, C.; Lehner, B.
2014-12-01
Freshwater resources are facing unprecedented pressures. In hope to cope with this, Environmental Hydrology, Freshwater Biology, and Fluvial Geomorphology have defined conceptual approaches such as "environmental flow requirements", "instream flow requirements" or "normative flow regime" to define appropriate flow regime to maintain a given ecological status. These advances in the fields of freshwater resources management are asking scientists to create bridges across disciplines. Holistic and multi-scales approaches are becoming more and more common in water sciences research. The intrinsic nature of river systems demands these approaches to account for the upstream-downstream link of watersheds. Before recent technological developments, large scale analyses were cumbersome and, often, the necessary data was unavailable. However, new technologies, both for information collection and computing capacity, enable a high resolution look at the global scale. For rivers around the world, this new outlook is facilitated by the hydrologically relevant geo-spatial database HydroSHEDS. This database now offers more than 24 millions of kilometers of rivers, some never mapped before, at the click of a fingertip. Large and, even, global scale assessments can now be used to compare rivers around the world. A river classification framework was developed using HydroSHEDS called GloRiC (Global River Classification). This framework advocates for holistic approach to river systems by using sub-classifications drawn from six disciplines related to river sciences: Hydrology, Physiography and climate, Geomorphology, Chemistry, Biology and Human impact. Each of these disciplines brings complementary information on the rivers that is relevant at different scales. A first version of a global river reach classification was produced at the 500m resolution. Variables used in the classification have influence on processes involved at different scales (ex. topography index vs. pH). However, all variables are computed at the same high spatial resolution. This way, we can have a global look at local phenomenon.
NASA Astrophysics Data System (ADS)
Martin, R.; Orgogozo, L.; Noiriel, C. N.; Guibert, R.; Golfier, F.; Debenest, G.; Quintard, M.
2013-05-01
In the context of biofilm growth in porous media, we developed high performance computing tools to study the impact of biofilms on the fluid transport through pores of a solid matrix. Indeed, biofilms are consortia of micro-organisms that are developing in polymeric extracellular substances that are generally located at a fluid-solid interfaces like pore interfaces in a water-saturated porous medium. Several applications of biofilms in porous media are encountered for instance in bio-remediation methods by allowing the dissolution of organic pollutants. Many theoretical studies have been done on the resulting effective properties of these modified media ([1],[2], [3]) but the bio-colonized porous media under consideration are mainly described following simplified theoretical media (stratified media, cubic networks of spheres ...). Therefore, recent experimental advances have provided tomography images of bio-colonized porous media which allow us to observe realistic biofilm micro-structures inside the porous media [4]. To solve closure system of equations related to upscaling procedures in realistic porous media, we solve the velocity field of fluids through pores on complex geometries that are described with a huge number of cells (up to billions). Calculations are made on a realistic 3D sample geometry obtained by X micro-tomography. Cell volumes are coming from a percolation experiment performed to estimate the impact of precipitation processes on the properties of a fluid transport phenomena in porous media [5]. Average permeabilities of the sample are obtained from velocities by using MPI-based high performance computing on up to 1000 processors. Steady state Stokes equations are solved using finite volume approach. Relaxation pre-conditioning is introduced to accelerate the code further. Good weak or strong scaling are reached with results obtained in hours instead of weeks. Factors of accelerations of 20 up to 40 can be reached. Tens of geometries can now be computed by sending batteries of codes in a mass production procedure. Some constraints can now be provided for poro-elastic imaging at the scale of reservoirs, for CO2 storage monitoring or geophysical exploration. 1. Golfier F. et al., Biofilms in porous media: Development of macroscopic transport equations va volume averaging with closure for local mass equilibrium conditions, Advances in Water Resources, 32, 463-485 (2009). 2. Orgogozo L. et al., Upscaling of transport processes in porous media with biofilms in non-equilibrium conditions, Advances in Water Resources, 33(5), 585-600 (2010). 3. Davit Y. et al., Modeling non-equilibrium mass transport in biologically reactive porous media, Advances in Water Resources, 33, 1075-1093, (2010). 4. Davit Y. et al., Imaging biofilm in porous media using X-ray computed micro-tomography, Journal of Microscopy, 242(1), 15-25 (2010). 5. Noiriel C. et al., Upscaling calcium carbonate precipitation rates from pore to continuum scale, Chemical Geology, 318-319, 60-74 (2012).
Leveraging the national cyberinfrastructure for biomedical research.
LeDuc, Richard; Vaughn, Matthew; Fonner, John M; Sullivan, Michael; Williams, James G; Blood, Philip D; Taylor, James; Barnett, William
2014-01-01
In the USA, the national cyberinfrastructure refers to a system of research supercomputer and other IT facilities and the high speed networks that connect them. These resources have been heavily leveraged by scientists in disciplines such as high energy physics, astronomy, and climatology, but until recently they have been little used by biomedical researchers. We suggest that many of the 'Big Data' challenges facing the medical informatics community can be efficiently handled using national-scale cyberinfrastructure. Resources such as the Extreme Science and Discovery Environment, the Open Science Grid, and Internet2 provide economical and proven infrastructures for Big Data challenges, but these resources can be difficult to approach. Specialized web portals, support centers, and virtual organizations can be constructed on these resources to meet defined computational challenges, specifically for genomics. We provide examples of how this has been done in basic biology as an illustration for the biomedical informatics community.
Leveraging the national cyberinfrastructure for biomedical research
LeDuc, Richard; Vaughn, Matthew; Fonner, John M; Sullivan, Michael; Williams, James G; Blood, Philip D; Taylor, James; Barnett, William
2014-01-01
In the USA, the national cyberinfrastructure refers to a system of research supercomputer and other IT facilities and the high speed networks that connect them. These resources have been heavily leveraged by scientists in disciplines such as high energy physics, astronomy, and climatology, but until recently they have been little used by biomedical researchers. We suggest that many of the ‘Big Data’ challenges facing the medical informatics community can be efficiently handled using national-scale cyberinfrastructure. Resources such as the Extreme Science and Discovery Environment, the Open Science Grid, and Internet2 provide economical and proven infrastructures for Big Data challenges, but these resources can be difficult to approach. Specialized web portals, support centers, and virtual organizations can be constructed on these resources to meet defined computational challenges, specifically for genomics. We provide examples of how this has been done in basic biology as an illustration for the biomedical informatics community. PMID:23964072
NASA Astrophysics Data System (ADS)
Darema, F.
2016-12-01
InfoSymbiotics/DDDAS embodies the power of Dynamic Data Driven Applications Systems (DDDAS), a concept whereby an executing application model is dynamically integrated, in a feed-back loop, with the real-time data-acquisition and control components, as well as other data sources of the application system. Advanced capabilities can be created through such new computational approaches in modeling and simulations, and in instrumentation methods, and include: enhancing the accuracy of the application model; speeding-up the computation to allow faster and more comprehensive models of a system, and create decision support systems with the accuracy of full-scale simulations; in addition, the notion of controlling instrumentation processes by the executing application results in more efficient management of application-data and addresses challenges of how to architect and dynamically manage large sets of heterogeneous sensors and controllers, an advance over the static and ad-hoc ways of today - with DDDAS these sets of resources can be managed adaptively and in optimized ways. Large-Scale-Dynamic-Data encompasses the next wave of Big Data, and namely dynamic data arising from ubiquitous sensing and control in engineered, natural, and societal systems, through multitudes of heterogeneous sensors and controllers instrumenting these systems, and where opportunities and challenges at these "large-scales" relate not only to data size but the heterogeneity in data, data collection modalities, fidelities, and timescales, ranging from real-time data to archival data. In tandem with this important dimension of dynamic data, there is an extended view of Big Computing, which includes the collective computing by networked assemblies of multitudes of sensors and controllers, this range from the high-end to the real-time seamlessly integrated and unified, and comprising the Large-Scale-Big-Computing. InfoSymbiotics/DDDAS engenders transformative impact in many application domains, ranging from the nano-scale to the terra-scale and to the extra-terra-scale. The talk will address opportunities for new capabilities together with corresponding research challenges, with illustrative examples from several application areas including environmental sciences, geosciences, and space sciences.
A national-scale authentication infrastructure.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Butler, R.; Engert, D.; Foster, I.
2000-12-01
Today, individuals and institutions in science and industry are increasingly forming virtual organizations to pool resources and tackle a common goal. Participants in virtual organizations commonly need to share resources such as data archives, computer cycles, and networks - resources usually available only with restrictions based on the requested resource's nature and the user's identity. Thus, any sharing mechanism must have the ability to authenticate the user's identity and determine if the user is authorized to request the resource. Virtual organizations tend to be fluid, however, so authentication mechanisms must be flexible and lightweight, allowing administrators to quickly establish andmore » change resource-sharing arrangements. However, because virtual organizations complement rather than replace existing institutions, sharing mechanisms cannot change local policies and must allow individual institutions to maintain control over their own resources. Our group has created and deployed an authentication and authorization infrastructure that meets these requirements: the Grid Security Infrastructure. GSI offers secure single sign-ons and preserves site control over access policies and local security. It provides its own versions of common applications, such as FTP and remote login, and a programming interface for creating secure applications.« less
DynaMed Plus®: An Evidence-Based Clinical Reference Resource.
Charbonneau, Deborah H; James, LaTeesa N
2018-01-01
DynaMed Plus ® from EBSCO Health is an evidence-based tool that health professionals can use to inform clinical care. DynaMed Plus content undergoes a review process, and the evidence is synthesized in detailed topic overviews. A unique three-level rating scale is used to assess the quality of available evidence. Topic overviews summarize current evidence and provide recommendations to support health providers at the point-of-care. Additionally, DynaMed Plus content can be accessed via a desktop computer or mobile platforms. Given this, DynaMed Plus can be a time-saving resource for health providers. Overall, DynaMed Plus provides evidence summaries using an easy-to-read bullet format, and the resource incorporates images, clinical calculators, patient handouts, and practice guidelines in one place.
Resource Balancing Control Allocation
NASA Technical Reports Server (NTRS)
Frost, Susan A.; Bodson, Marc
2010-01-01
Next generation aircraft with a large number of actuators will require advanced control allocation methods to compute the actuator commands needed to follow desired trajectories while respecting system constraints. Previously, algorithms were proposed to minimize the l1 or l2 norms of the tracking error and of the control effort. The paper discusses the alternative choice of using the l1 norm for minimization of the tracking error and a normalized l(infinity) norm, or sup norm, for minimization of the control effort. The algorithm computes the norm of the actuator deflections scaled by the actuator limits. Minimization of the control effort then translates into the minimization of the maximum actuator deflection as a percentage of its range of motion. The paper shows how the problem can be solved effectively by converting it into a linear program and solving it using a simplex algorithm. Properties of the algorithm are investigated through examples. In particular, the min-max criterion results in a type of resource balancing, where the resources are the control surfaces and the algorithm balances these resources to achieve the desired command. A study of the sensitivity of the algorithms to the data is presented, which shows that the normalized l(infinity) algorithm has the lowest sensitivity, although high sensitivities are observed whenever the limits of performance are reached.
LEMON - LHC Era Monitoring for Large-Scale Infrastructures
NASA Astrophysics Data System (ADS)
Marian, Babik; Ivan, Fedorko; Nicholas, Hook; Hector, Lansdale Thomas; Daniel, Lenkes; Miroslav, Siket; Denis, Waldron
2011-12-01
At the present time computer centres are facing a massive rise in virtualization and cloud computing as these solutions bring advantages to service providers and consolidate the computer centre resources. However, as a result the monitoring complexity is increasing. Computer centre management requires not only to monitor servers, network equipment and associated software but also to collect additional environment and facilities data (e.g. temperature, power consumption, cooling efficiency, etc.) to have also a good overview of the infrastructure performance. The LHC Era Monitoring (Lemon) system is addressing these requirements for a very large scale infrastructure. The Lemon agent that collects data on every client and forwards the samples to the central measurement repository provides a flexible interface that allows rapid development of new sensors. The system allows also to report on behalf of remote devices such as switches and power supplies. Online and historical data can be visualized via a web-based interface or retrieved via command-line tools. The Lemon Alarm System component can be used for notifying the operator about error situations. In this article, an overview of the Lemon monitoring is provided together with a description of the CERN LEMON production instance. No direct comparison is made with other monitoring tool.
Recent Performance Results of VPIC on Trinity
NASA Astrophysics Data System (ADS)
Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Le, A.; Li, H.; Nam, H.; Pang, X.; Stark, D. J.; Rust, W. N., III; Yin, L.; Albright, B. J.
2017-10-01
Trinity is a new DOE compute resource now in production at Los Alamos National Laboratory. Trinity has several new and unique features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes, use of on package high bandwidth memory (HBM) for KNL nodes, ability to configure KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to optimize VPIC on Trinity by taking advantage of these new architectural features. Results of work will be presented on performance of VPIC on Haswell and KNL partitions for single node runs and runs at scale. Results include use of burst buffers at scale to optimize I/O, comparison of strategies for using MPI and threads, performance benefits using HBM and effectiveness of using intrinsics for vectorization. Work performed under auspices of U.S. Dept. of Energy by Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by LANL LDRD program.
From micro-scale 3D simulations to macro-scale model of periodic porous media
NASA Astrophysics Data System (ADS)
Crevacore, Eleonora; Tosco, Tiziana; Marchisio, Daniele; Sethi, Rajandrea; Messina, Francesca
2015-04-01
In environmental engineering, the transport of colloidal suspensions in porous media is studied to understand the fate of potentially harmful nano-particles and to design new remediation technologies. In this perspective, averaging techniques applied to micro-scale numerical simulations are a powerful tool to extrapolate accurate macro-scale models. Choosing two simplified packing configurations of soil grains and starting from a single elementary cell (module), it is possible to take advantage of the periodicity of the structures to reduce the computation costs of full 3D simulations. Steady-state flow simulations for incompressible fluid in laminar regime are implemented. Transport simulations are based on the pore-scale advection-diffusion equation, that can be enriched introducing also the Stokes velocity (to consider the gravity effect) and the interception mechanism. Simulations are carried on a domain composed of several elementary modules, that serve as control volumes in a finite volume method for the macro-scale method. The periodicity of the medium involves the periodicity of the flow field and this will be of great importance during the up-scaling procedure, allowing relevant simplifications. Micro-scale numerical data are treated in order to compute the mean concentration (volume and area averages) and fluxes on each module. The simulation results are used to compare the micro-scale averaged equation to the integral form of the macroscopic one, making a distinction between those terms that could be computed exactly and those for which a closure in needed. Of particular interest it is the investigation of the origin of macro-scale terms such as the dispersion and tortuosity, trying to describe them with micro-scale known quantities. Traditionally, to study the colloidal transport many simplifications are introduced, such those concerning ultra-simplified geometry that usually account for a single collector. Gradual removal of such hypothesis leads to a detailed description of colloidal transport mechanisms. Starting from nearly realistic 3D geometries, the ultimate purpose of this work is that of develop an improved understanding of the fate of colloidal particles through, for example, an accurate description of the deposition efficiency, in order design efficient remediation techniques. G. Boccardo, D.L. Marchisio, R.Sethi, Journal of colloid and interface science, Vol 417C, pp 227-237, 2014 M. Icardi, G. Boccardo, D.L. Marchisio, T. Tosco, R.Sethi, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 2014 S. Torkzaban, S.S. Tazehkand, S.L. Walker, S.A. Bradford, Water resources research, Vol 44, 2008 S.M. Hassanizadeh, Adv in Water Resources, Vol. 2, pp 131-144, 1979 S. Whitaker, AIChE Journal, Vol. 13 No. 3, pp 420-428, May 1967
Provider-Independent Use of the Cloud
NASA Astrophysics Data System (ADS)
Harmer, Terence; Wright, Peter; Cunningham, Christina; Perrott, Ron
Utility computing offers researchers and businesses the potential of significant cost-savings, making it possible for them to match the cost of their computing and storage to their demand for such resources. A utility compute provider enables the purchase of compute infrastructures on-demand; when a user requires computing resources a provider will provision a resource for them and charge them only for their period of use of that resource. There has been a significant growth in the number of cloud computing resource providers and each has a different resource usage model, application process and application programming interface (API)-developing generic multi-resource provider applications is thus difficult and time consuming. We have developed an abstraction layer that provides a single resource usage model, user authentication model and API for compute providers that enables cloud-provider neutral applications to be developed. In this paper we outline the issues in using external resource providers, give examples of using a number of the most popular cloud providers and provide examples of developing provider neutral applications. In addition, we discuss the development of the API to create a generic provisioning model based on a common architecture for cloud computing providers.
R&D100: Lightweight Distributed Metric Service
Gentile, Ann; Brandt, Jim; Tucker, Tom; Showerman, Mike
2018-06-12
On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.
R&D100: Lightweight Distributed Metric Service
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gentile, Ann; Brandt, Jim; Tucker, Tom
2015-11-19
On today's High Performance Computing platforms, the complexity of applications and configurations makes efficient use of resources difficult. The Lightweight Distributed Metric Service (LDMS) is monitoring software developed by Sandia National Laboratories to provide detailed metrics of system performance. LDMS provides collection, transport, and storage of data from extreme-scale systems at fidelities and timescales to provide understanding of application and system performance with no statistically significant impact on application performance.
CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.
2013-12-01
As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over 1100 hazard curves. We will report on the performance of this CyberShake study, four times larger than previous studies. Additionally, we will examine the challenges we face applying these workflow techniques to additional open-science HPC systems and discuss whether our workflow solutions continue to provide value to our large-scale PSHA calculations.
An interactive web-based system using cloud for large-scale visual analytics
NASA Astrophysics Data System (ADS)
Kaseb, Ahmed S.; Berry, Everett; Rozolis, Erik; McNulty, Kyle; Bontrager, Seth; Koh, Youngsol; Lu, Yung-Hsiang; Delp, Edward J.
2015-03-01
Network cameras have been growing rapidly in recent years. Thousands of public network cameras provide tremendous amount of visual information about the environment. There is a need to analyze this valuable information for a better understanding of the world around us. This paper presents an interactive web-based system that enables users to execute image analysis and computer vision techniques on a large scale to analyze the data from more than 65,000 worldwide cameras. This paper focuses on how to use both the system's website and Application Programming Interface (API). Given a computer program that analyzes a single frame, the user needs to make only slight changes to the existing program and choose the cameras to analyze. The system handles the heterogeneity of the geographically distributed cameras, e.g. different brands, resolutions. The system allocates and manages Amazon EC2 and Windows Azure cloud resources to meet the analysis requirements.
TethysCluster: A comprehensive approach for harnessing cloud resources for hydrologic modeling
NASA Astrophysics Data System (ADS)
Nelson, J.; Jones, N.; Ames, D. P.
2015-12-01
Advances in water resources modeling are improving the information that can be supplied to support decisions affecting the safety and sustainability of society. However, as water resources models become more sophisticated and data-intensive they require more computational power to run. Purchasing and maintaining the computing facilities needed to support certain modeling tasks has been cost-prohibitive for many organizations. With the advent of the cloud, the computing resources needed to address this challenge are now available and cost-effective, yet there still remains a significant technical barrier to leverage these resources. This barrier inhibits many decision makers and even trained engineers from taking advantage of the best science and tools available. Here we present the Python tools TethysCluster and CondorPy, that have been developed to lower the barrier to model computation in the cloud by providing (1) programmatic access to dynamically scalable computing resources, (2) a batch scheduling system to queue and dispatch the jobs to the computing resources, (3) data management for job inputs and outputs, and (4) the ability to dynamically create, submit, and monitor computing jobs. These Python tools leverage the open source, computing-resource management, and job management software, HTCondor, to offer a flexible and scalable distributed-computing environment. While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision-support systems deployed as web apps.
NASA Astrophysics Data System (ADS)
Duffy, D.; Maxwell, T. P.; Doutriaux, C.; Williams, D. N.; Chaudhary, A.; Ames, S.
2015-12-01
As the size of remote sensing observations and model output data grows, the volume of the data has become overwhelming, even to many scientific experts. As societies are forced to better understand, mitigate, and adapt to climate changes, the combination of Earth observation data and global climate model projects is crucial to not only scientists but to policy makers, downstream applications, and even the public. Scientific progress on understanding climate is critically dependent on the availability of a reliable infrastructure that promotes data access, management, and provenance. The Earth System Grid Federation (ESGF) has created such an environment for the Intergovernmental Panel on Climate Change (IPCC). ESGF provides a federated global cyber infrastructure for data access and management of model outputs generated for the IPCC Assessment Reports (AR). The current generation of the ESGF federated grid allows consumers of the data to find and download data with limited capabilities for server-side processing. Since the amount of data for future AR is expected to grow dramatically, ESGF is working on integrating server-side analytics throughout the federation. The ESGF Compute Working Team (CWT) has created a Web Processing Service (WPS) Application Programming Interface (API) to enable access scalable computational resources. The API is the exposure point to high performance computing resources across the federation. Specifically, the API allows users to execute simple operations, such as maximum, minimum, average, and anomalies, on ESGF data without having to download the data. These operations are executed at the ESGF data node site with access to large amounts of parallel computing capabilities. This presentation will highlight the WPS API, its capabilities, provide implementation details, and discuss future developments.
Programming mRNA decay to modulate synthetic circuit resource allocation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venturelli, Ophelia S.; Tei, Mika; Bauer, Stefan
Synthetic circuits embedded in host cells compete with cellular processes for limited intracellular resources. Here we show how funnelling of cellular resources, after global transcriptome degradation by the sequence-dependent endoribonuclease MazF, to a synthetic circuit can increase production. Target genes are protected from MazF activity by recoding the gene sequence to eliminate recognition sites, while preserving the amino acid sequence. The expression of a protected fluorescent reporter and flux of a high-value metabolite are significantly enhanced using this genome-scale control strategy. Proteomics measurements discover a host factor in need of protection to improve resource redistribution activity. A computational model demonstratesmore » that the MazF mRNA-decay feedback loop enables proportional control of MazF in an optimal operating regime. Transcriptional profiling of MazF-induced cells elucidates the dynamic shifts in transcript abundance and discovers regulatory design elements. Altogether, our results suggest that manipulation of cellular resource allocation is a key control parameter for synthetic circuit design.« less
Programming mRNA decay to modulate synthetic circuit resource allocation
Venturelli, Ophelia S.; Tei, Mika; Bauer, Stefan; ...
2017-04-26
Synthetic circuits embedded in host cells compete with cellular processes for limited intracellular resources. Here we show how funnelling of cellular resources, after global transcriptome degradation by the sequence-dependent endoribonuclease MazF, to a synthetic circuit can increase production. Target genes are protected from MazF activity by recoding the gene sequence to eliminate recognition sites, while preserving the amino acid sequence. The expression of a protected fluorescent reporter and flux of a high-value metabolite are significantly enhanced using this genome-scale control strategy. Proteomics measurements discover a host factor in need of protection to improve resource redistribution activity. A computational model demonstratesmore » that the MazF mRNA-decay feedback loop enables proportional control of MazF in an optimal operating regime. Transcriptional profiling of MazF-induced cells elucidates the dynamic shifts in transcript abundance and discovers regulatory design elements. Altogether, our results suggest that manipulation of cellular resource allocation is a key control parameter for synthetic circuit design.« less
Grid accounting service: state and future development
NASA Astrophysics Data System (ADS)
Levshina, T.; Sehgal, C.; Bockelman, B.; Weitzel, D.; Guru, A.
2014-06-01
During the last decade, large-scale federated distributed infrastructures have been continually developed and expanded. One of the crucial components of a cyber-infrastructure is an accounting service that collects data related to resource utilization and identity of users using resources. The accounting service is important for verifying pledged resource allocation per particular groups and users, providing reports for funding agencies and resource providers, and understanding hardware provisioning requirements. It can also be used for end-to-end troubleshooting as well as billing purposes. In this work we describe Gratia, a federated accounting service jointly developed at Fermilab and Holland Computing Center at University of Nebraska-Lincoln. The Open Science Grid, Fermilab, HCC, and several other institutions have used Gratia in production for several years. The current development activities include expanding Virtual Machines provisioning information, XSEDE allocation usage accounting, and Campus Grids resource utilization. We also identify the direction of future work: improvement and expansion of Cloud accounting, persistent and elastic storage space allocation, and the incorporation of WAN and LAN network metrics.
The performance of low-cost commercial cloud computing as an alternative in computational chemistry.
Thackston, Russell; Fortenberry, Ryan C
2015-05-05
The growth of commercial cloud computing (CCC) as a viable means of computational infrastructure is largely unexplored for the purposes of quantum chemistry. In this work, the PSI4 suite of computational chemistry programs is installed on five different types of Amazon World Services CCC platforms. The performance for a set of electronically excited state single-point energies is compared between these CCC platforms and typical, "in-house" physical machines. Further considerations are made for the number of cores or virtual CPUs (vCPUs, for the CCC platforms), but no considerations are made for full parallelization of the program (even though parallelization of the BLAS library is implemented), complete high-performance computing cluster utilization, or steal time. Even with this most pessimistic view of the computations, CCC resources are shown to be more cost effective for significant numbers of typical quantum chemistry computations. Large numbers of large computations are still best utilized by more traditional means, but smaller-scale research may be more effectively undertaken through CCC services. © 2015 Wiley Periodicals, Inc.
Oak Ridge Institutional Cluster Autotune Test Drive Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jibonananda, Sanyal; New, Joshua Ryan
2014-02-01
The Oak Ridge Institutional Cluster (OIC) provides general purpose computational resources for the ORNL staff to run computation heavy jobs that are larger than desktop applications but do not quite require the scale and power of the Oak Ridge Leadership Computing Facility (OLCF). This report details the efforts made and conclusions derived in performing a short test drive of the cluster resources on Phase 5 of the OIC. EnergyPlus was used in the analysis as a candidate user program and the overall software environment was evaluated against anticipated challenges experienced with resources such as the shared memory-Nautilus (JICS) and Titanmore » (OLCF). The OIC performed within reason and was found to be acceptable in the context of running EnergyPlus simulations. The number of cores per node and the availability of scratch space per node allow non-traditional desktop focused applications to leverage parallel ensemble execution. Although only individual runs of EnergyPlus were executed, the software environment on the OIC appeared suitable to run ensemble simulations with some modifications to the Autotune workflow. From a standpoint of general usability, the system supports common Linux libraries, compilers, standard job scheduling software (Torque/Moab), and the OpenMPI library (the only MPI library) for MPI communications. The file system is a Panasas file system which literature indicates to be an efficient file system.« less
Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.
2014-12-01
The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.
NASA Astrophysics Data System (ADS)
Jang, W.; Engda, T. A.; Neff, J. C.; Herrick, J.
2017-12-01
Many crop models are increasingly used to evaluate crop yields at regional and global scales. However, implementation of these models across large areas using fine-scale grids is limited by computational time requirements. In order to facilitate global gridded crop modeling with various scenarios (i.e., different crop, management schedule, fertilizer, and irrigation) using the Environmental Policy Integrated Climate (EPIC) model, we developed a distributed parallel computing framework in Python. Our local desktop with 14 cores (28 threads) was used to test the distributed parallel computing framework in Iringa, Tanzania which has 406,839 grid cells. High-resolution soil data, SoilGrids (250 x 250 m), and climate data, AgMERRA (0.25 x 0.25 deg) were also used as input data for the gridded EPIC model. The framework includes a master file for parallel computing, input database, input data formatters, EPIC model execution, and output analyzers. Through the master file for parallel computing, the user-defined number of threads of CPU divides the EPIC simulation into jobs. Then, Using EPIC input data formatters, the raw database is formatted for EPIC input data and the formatted data moves into EPIC simulation jobs. Then, 28 EPIC jobs run simultaneously and only interesting results files are parsed and moved into output analyzers. We applied various scenarios with seven different slopes and twenty-four fertilizer ranges. Parallelized input generators create different scenarios as a list for distributed parallel computing. After all simulations are completed, parallelized output analyzers are used to analyze all outputs according to the different scenarios. This saves significant computing time and resources, making it possible to conduct gridded modeling at regional to global scales with high-resolution data. For example, serial processing for the Iringa test case would require 113 hours, while using the framework developed in this study requires only approximately 6 hours, a nearly 95% reduction in computing time.
The Use of Scale-Dependent Precision to Increase Forecast Accuracy in Earth System Modelling
NASA Astrophysics Data System (ADS)
Thornes, Tobias; Duben, Peter; Palmer, Tim
2016-04-01
At the current pace of development, it may be decades before the 'exa-scale' computers needed to resolve individual convective clouds in weather and climate models become available to forecasters, and such machines will incur very high power demands. But the resolution could be improved today by switching to more efficient, 'inexact' hardware with which variables can be represented in 'reduced precision'. Currently, all numbers in our models are represented as double-precision floating points - each requiring 64 bits of memory - to minimise rounding errors, regardless of spatial scale. Yet observational and modelling constraints mean that values of atmospheric variables are inevitably known less precisely on smaller scales, suggesting that this may be a waste of computer resources. More accurate forecasts might therefore be obtained by taking a scale-selective approach whereby the precision of variables is gradually decreased at smaller spatial scales to optimise the overall efficiency of the model. To study the effect of reducing precision to different levels on multiple spatial scales, we here introduce a new model atmosphere developed by extending the Lorenz '96 idealised system to encompass three tiers of variables - which represent large-, medium- and small-scale features - for the first time. In this chaotic but computationally tractable system, the 'true' state can be defined by explicitly resolving all three tiers. The abilities of low resolution (single-tier) double-precision models and similar-cost high resolution (two-tier) models in mixed-precision to produce accurate forecasts of this 'truth' are compared. The high resolution models outperform the low resolution ones even when small-scale variables are resolved in half-precision (16 bits). This suggests that using scale-dependent levels of precision in more complicated real-world Earth System models could allow forecasts to be made at higher resolution and with improved accuracy. If adopted, this new paradigm would represent a revolution in numerical modelling that could be of great benefit to the world.
Instructional computing in space physics moves ahead
NASA Astrophysics Data System (ADS)
Russell, C. T.; Omidi, N.
As the number of spacecraft stationed in the Earth's magnetosphere exponentiates and society becomes more technologically sophisticated and dependent on these spacebased resources, both the importance of space physics and the need to train people in this field will increase.Space physics is a very difficult subject for students to master. Both mechanical and electromagnetic forces are important. The treatment of problems can be very mathematical, and the scale sizes of phenomena are usually such that laboratory studies become impossible, and experimentation, when possible at all, must be carried out in deep space. Fortunately, computers have evolved to the point that they are able to greatly facilitate instruction in space physics.
CloudMan as a platform for tool, data, and analysis distribution.
Afgan, Enis; Chapman, Brad; Taylor, James
2012-11-27
Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.
Lightweight Data Systems in the Cloud: Costs, Benefits and Best Practices
NASA Astrophysics Data System (ADS)
Fatland, R.; Arendt, A. A.; Howe, B.; Hess, N. J.; Futrelle, J.
2015-12-01
We present here a simple analysis of both the cost and the benefit of using the cloud in environmental science circa 2016. We present this set of ideas to enable the potential 'cloud adopter' research scientist to explore and understand the tradeoffs in moving some aspect of their compute work to the cloud. We present examples, design patterns and best practices as an evolving body of knowledge that help optimize benefit to the research team. Thematically this generally means not starting from a blank page but rather learning how to find 90% of the solution to a problem pre-built. We will touch on four topics of interest. (1) Existing cloud data resources (NASA, WHOI BCO DMO, etc) and how they can be discovered, used and improved. (2) How to explore, compare and evaluate cost and compute power from many cloud options, particularly in relation to data scale (size/complexity). (3) What are simple / fast 'Lightweight Data System' procedures that take from 20 minutes to one day to implement and that have a clear immediate payoff in environmental data-driven research. Examples include publishing a SQL Share URL at (EarthCube's) CINERGI as a registered data resource and creating executable papers on a cloud-hosted Jupyter instance, particularly iPython notebooks. (4) Translating the computational terminology landscape ('cloud', 'HPC cluster', 'hadoop', 'spark', 'machine learning') into examples from the community of practice to help the geoscientist build or expand their mental map. In the course of this discussion -- which is about resource discovery, adoption and mastery -- we provide direction to online resources in support of these themes.
Temporal variation and scale in movement-based resource selection functions
Hooten, M.B.; Hanks, E.M.; Johnson, D.S.; Alldredge, M.W.
2013-01-01
A common population characteristic of interest in animal ecology studies pertains to the selection of resources. That is, given the resources available to animals, what do they ultimately choose to use? A variety of statistical approaches have been employed to examine this question and each has advantages and disadvantages with respect to the form of available data and the properties of estimators given model assumptions. A wealth of high resolution telemetry data are now being collected to study animal population movement and space use and these data present both challenges and opportunities for statistical inference. We summarize traditional methods for resource selection and then describe several extensions to deal with measurement uncertainty and an explicit movement process that exists in studies involving high-resolution telemetry data. Our approach uses a correlated random walk movement model to obtain temporally varying use and availability distributions that are employed in a weighted distribution context to estimate selection coefficients. The temporally varying coefficients are then weighted by their contribution to selection and combined to provide inference at the population level. The result is an intuitive and accessible statistical procedure that uses readily available software and is computationally feasible for large datasets. These methods are demonstrated using data collected as part of a large-scale mountain lion monitoring study in Colorado, USA.
Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.
2009-01-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Integration of a neuroimaging processing pipeline into a pan-canadian computing grid
NASA Astrophysics Data System (ADS)
Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.
2012-02-01
The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.
Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N
2009-06-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
Accelerating large-scale protein structure alignments with graphics processing units
2012-01-01
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Economic decision making and the application of nonparametric prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2008-01-01
Sustained increases in energy prices have focused attention on gas resources in low-permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are often large. Planning and development decisions for extraction of such resources must be areawide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm, the decision to enter such plays depends on reconnaissance-level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional-scale cost functions. The context of the worked example is the Devonian Antrim-shale gas play in the Michigan basin. One finding relates to selection of the resource prediction model to be used with economic models. Models chosen because they can best predict aggregate volume over larger areas (many hundreds of sites) smooth out granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined arbitrarily by extraneous factors. The analysis shows a 15-20% gain in gas volume when these simple models are applied to order drilling prospects strategically rather than to choose drilling locations randomly. Copyright ?? 2008 Society of Petroleum Engineers.
NASA Astrophysics Data System (ADS)
Gregory, A. E.; Benedict, K. K.; Zhang, S.; Savickas, J.
2017-12-01
Large scale, high severity wildfires in forests have become increasingly prevalent in the western United States due to fire exclusion. Although past work has focused on the immediate consequences of wildfire (ie. runoff magnitude and debris flow), little has been done to understand the post wildfire hydrologic consequences of vegetation regrowth. Furthermore, vegetation is often characterized by static parameterizations within hydrological models. In order to understand the temporal relationship between hydrologic processes and revegetation, we modularized and partially automated the hydrologic modeling process to increase connectivity between remotely sensed data, the Virtual Watershed Platform (a data management resource, called the VWP), input meteorological data, and the Precipitation-Runoff Modeling System (PRMS). This process was used to run simulations in the Valles Caldera of NM, an area impacted by the 2011 Las Conchas Fire, in PRMS before and after the Las Conchas to evaluate hydrologic process changes. The modeling environment addressed some of the existing challenges faced by hydrological modelers. At present, modelers are somewhat limited in their ability to push the boundaries of hydrologic understanding. Specific issues faced by modelers include limited computational resources to model processes at large spatial and temporal scales, data storage capacity and accessibility from the modeling platform, computational and time contraints for experimental modeling, and the skills to integrate modeling software in ways that have not been explored. By taking an interdisciplinary approach, we were able to address some of these challenges by leveraging the skills of hydrologic, data, and computer scientists; and the technical capabilities provided by a combination of on-demand/high-performance computing, distributed data, and cloud services. The hydrologic modeling process was modularized to include options for distributing meteorological data, parameter space experimentation, data format transformation, looping, validation of models and containerization for enabling new analytic scenarios. The user interacts with the modules through Jupyter Notebooks which can be connected to an on-demand computing and HPC environment, and data services built as part of the VWP.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Santanu; Dang, Liem X.
In this paper, we present the first computer simulation of methanol exchange dynamics between the first and second solvation shells around different cations and anions. After water, methanol is the most frequently used solvent for ions. Methanol has different structural and dynamical properties than water, so its ion solvation process is different. To this end, we performed molecular dynamics simulations using polarizable potential models to describe methanol-methanol and ion-methanol interactions. In particular, we computed methanol exchange rates by employing the transition state theory, the Impey-Madden-McDonald method, the reactive flux approach, and the Grote-Hynes theory. We observed that methanol exchange occursmore » at a nanosecond time scale for Na+ and at a picosecond time scale for other ions. We also observed a trend in which, for like charges, the exchange rate is slower for smaller ions because they are more strongly bound to methanol. This work was supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences, and Biosciences. The calculations were carried out using computer resources provided by the Office of Basic Energy Sciences.« less
Daubechies wavelets for linear scaling density functional theory.
Mohr, Stephan; Ratcliff, Laura E; Boulanger, Paul; Genovese, Luigi; Caliste, Damien; Deutsch, Thierry; Goedecker, Stefan
2014-05-28
We demonstrate that Daubechies wavelets can be used to construct a minimal set of optimized localized adaptively contracted basis functions in which the Kohn-Sham orbitals can be represented with an arbitrarily high, controllable precision. Ground state energies and the forces acting on the ions can be calculated in this basis with the same accuracy as if they were calculated directly in a Daubechies wavelets basis, provided that the amplitude of these adaptively contracted basis functions is sufficiently small on the surface of the localization region, which is guaranteed by the optimization procedure described in this work. This approach reduces the computational costs of density functional theory calculations, and can be combined with sparse matrix algebra to obtain linear scaling with respect to the number of electrons in the system. Calculations on systems of 10,000 atoms or more thus become feasible in a systematic basis set with moderate computational resources. Further computational savings can be achieved by exploiting the similarity of the adaptively contracted basis functions for closely related environments, e.g., in geometry optimizations or combined calculations of neutral and charged systems.
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.
Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi
2018-01-01
Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Large-Scale NASA Science Applications on the Columbia Supercluster
NASA Technical Reports Server (NTRS)
Brooks, Walter
2005-01-01
Columbia, NASA's newest 61 teraflops supercomputer that became operational late last year, is a highly integrated Altix cluster of 10,240 processors, and was named to honor the crew of the Space Shuttle lost in early 2003. Constructed in just four months, Columbia increased NASA's computing capability ten-fold, and revitalized the Agency's high-end computing efforts. Significant cutting-edge science and engineering simulations in the areas of space and Earth sciences, as well as aeronautics and space operations, are already occurring on this largest operational Linux supercomputer, demonstrating its capacity and capability to accelerate NASA's space exploration vision. The presentation will describe how an integrated environment consisting not only of next-generation systems, but also modeling and simulation, high-speed networking, parallel performance optimization, and advanced data analysis and visualization, is being used to reduce design cycle time, accelerate scientific discovery, conduct parametric analysis of multiple scenarios, and enhance safety during the life cycle of NASA missions. The talk will conclude by discussing how NAS partnered with various NASA centers, other government agencies, computer industry, and academia, to create a national resource in large-scale modeling and simulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayo, Jackson R.; Chen, Frank Xiaoxiao; Pebay, Philippe Pierre
2010-06-01
Effective failure prediction and mitigation strategies in high-performance computing systems could provide huge gains in resilience of tightly coupled large-scale scientific codes. These gains would come from prediction-directed process migration and resource servicing, intelligent resource allocation, and checkpointing driven by failure predictors rather than at regular intervals based on nominal mean time to failure. Given probabilistic associations of outlier behavior in hardware-related metrics with eventual failure in hardware, system software, and/or applications, this paper explores approaches for quantifying the effects of prediction and mitigation strategies and demonstrates these using actual production system data. We describe context-relevant methodologies for determining themore » accuracy and cost-benefit of predictors. While many research studies have quantified the expected impact of growing system size, and the associated shortened mean time to failure (MTTF), on application performance in large-scale high-performance computing (HPC) platforms, there has been little if any work to quantify the possible gains from predicting system resource failures with significant but imperfect accuracy. This possibly stems from HPC system complexity and the fact that, to date, no one has established any good predictors of failure in these systems. Our work in the OVIS project aims to discover these predictors via a variety of data collection techniques and statistical analysis methods that yield probabilistic predictions. The question then is, 'How good or useful are these predictions?' We investigate methods for answering this question in a general setting, and illustrate them using a specific failure predictor discovered on a production system at Sandia.« less
A Toolkit for ARB to Integrate Custom Databases and Externally Built Phylogenies
Essinger, Steven D.; Reichenberger, Erin; Morrison, Calvin; ...
2015-01-21
Researchers are perpetually amassing biological sequence data. The computational approaches employed by ecologists for organizing this data (e.g. alignment, phylogeny, etc.) typically scale nonlinearly in execution time with the size of the dataset. This often serves as a bottleneck for processing experimental data since many molecular studies are characterized by massive datasets. To keep up with experimental data demands, ecologists are forced to choose between continually upgrading expensive in-house computer hardware or outsourcing the most demanding computations to the cloud. Outsourcing is attractive since it is the least expensive option, but does not necessarily allow direct user interaction with themore » data for exploratory analysis. Desktop analytical tools such as ARB are indispensable for this purpose, but they do not necessarily offer a convenient solution for the coordination and integration of datasets between local and outsourced destinations. Therefore, researchers are currently left with an undesirable tradeoff between computational throughput and analytical capability. To mitigate this tradeoff we introduce a software package to leverage the utility of the interactive exploratory tools offered by ARB with the computational throughput of cloud-based resources. Our pipeline serves as middleware between the desktop and the cloud allowing researchers to form local custom databases containing sequences and metadata from multiple resources and a method for linking data outsourced for computation back to the local database. Furthermore, a tutorial implementation of the toolkit is provided in the supporting information, S1 Tutorial.« less
A Toolkit for ARB to Integrate Custom Databases and Externally Built Phylogenies
Essinger, Steven D.; Reichenberger, Erin; Morrison, Calvin; Blackwood, Christopher B.; Rosen, Gail L.
2015-01-01
Researchers are perpetually amassing biological sequence data. The computational approaches employed by ecologists for organizing this data (e.g. alignment, phylogeny, etc.) typically scale nonlinearly in execution time with the size of the dataset. This often serves as a bottleneck for processing experimental data since many molecular studies are characterized by massive datasets. To keep up with experimental data demands, ecologists are forced to choose between continually upgrading expensive in-house computer hardware or outsourcing the most demanding computations to the cloud. Outsourcing is attractive since it is the least expensive option, but does not necessarily allow direct user interaction with the data for exploratory analysis. Desktop analytical tools such as ARB are indispensable for this purpose, but they do not necessarily offer a convenient solution for the coordination and integration of datasets between local and outsourced destinations. Therefore, researchers are currently left with an undesirable tradeoff between computational throughput and analytical capability. To mitigate this tradeoff we introduce a software package to leverage the utility of the interactive exploratory tools offered by ARB with the computational throughput of cloud-based resources. Our pipeline serves as middleware between the desktop and the cloud allowing researchers to form local custom databases containing sequences and metadata from multiple resources and a method for linking data outsourced for computation back to the local database. A tutorial implementation of the toolkit is provided in the supporting information, S1 Tutorial. Availability: http://www.ece.drexel.edu/gailr/EESI/tutorial.php. PMID:25607539
Statistics Online Computational Resource for Education
ERIC Educational Resources Information Center
Dinov, Ivo D.; Christou, Nicolas
2009-01-01
The Statistics Online Computational Resource (http://www.SOCR.ucla.edu) provides one of the largest collections of free Internet-based resources for probability and statistics education. SOCR develops, validates and disseminates two core types of materials--instructional resources and computational libraries. (Contains 2 figures.)
An Architecture for Cross-Cloud System Management
NASA Astrophysics Data System (ADS)
Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad
The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
NASA Astrophysics Data System (ADS)
Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.
2014-12-01
The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.
Optimal estimation and scheduling in aquifer management using the rapid feedback control method
NASA Astrophysics Data System (ADS)
Ghorbanidehno, Hojat; Kokkinaki, Amalia; Kitanidis, Peter K.; Darve, Eric
2017-12-01
Management of water resources systems often involves a large number of parameters, as in the case of large, spatially heterogeneous aquifers, and a large number of "noisy" observations, as in the case of pressure observation in wells. Optimizing the operation of such systems requires both searching among many possible solutions and utilizing new information as it becomes available. However, the computational cost of this task increases rapidly with the size of the problem to the extent that textbook optimization methods are practically impossible to apply. In this paper, we present a new computationally efficient technique as a practical alternative for optimally operating large-scale dynamical systems. The proposed method, which we term Rapid Feedback Controller (RFC), provides a practical approach for combined monitoring, parameter estimation, uncertainty quantification, and optimal control for linear and nonlinear systems with a quadratic cost function. For illustration, we consider the case of a weakly nonlinear uncertain dynamical system with a quadratic objective function, specifically a two-dimensional heterogeneous aquifer management problem. To validate our method, we compare our results with the linear quadratic Gaussian (LQG) method, which is the basic approach for feedback control. We show that the computational cost of the RFC scales only linearly with the number of unknowns, a great improvement compared to the basic LQG control with a computational cost that scales quadratically. We demonstrate that the RFC method can obtain the optimal control values at a greatly reduced computational cost compared to the conventional LQG algorithm with small and controllable losses in the accuracy of the state and parameter estimation.
Bao, Shunxing; Weitendorf, Frederick D; Plassard, Andrew J; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A
2017-02-11
The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging.
NASA Astrophysics Data System (ADS)
Bao, Shunxing; Weitendorf, Frederick D.; Plassard, Andrew J.; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A.
2017-03-01
The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and nonrelevant for medical imaging.
Quantum Computing: Selected Internet Resources for Librarians, Researchers, and the Casually Curious
ERIC Educational Resources Information Center
Cirasella, Jill
2009-01-01
This article presents an annotated selection of the most important and informative Internet resources for learning about quantum computing, finding quantum computing literature, and tracking quantum computing news. All of the quantum computing resources described in this article are freely available, English-language web sites that fall into one…
NASA Astrophysics Data System (ADS)
O'Malley, D.; Vesselinov, V. V.
2017-12-01
Classical microprocessors have had a dramatic impact on hydrology for decades, due largely to the exponential growth in computing power predicted by Moore's law. However, this growth is not expected to continue indefinitely and has already begun to slow. Quantum computing is an emerging alternative to classical microprocessors. Here, we demonstrated cutting edge inverse model analyses utilizing some of the best available resources in both worlds: high-performance classical computing and a D-Wave quantum annealer. The classical high-performance computing resources are utilized to build an advanced numerical model that assimilates data from O(10^5) observations, including water levels, drawdowns, and contaminant concentrations. The developed model accurately reproduces the hydrologic conditions at a Los Alamos National Laboratory contamination site, and can be leveraged to inform decision-making about site remediation. We demonstrate the use of a D-Wave 2X quantum annealer to solve hydrologic inverse problems. This work can be seen as an early step in quantum-computational hydrology. We compare and contrast our results with an early inverse approach in classical-computational hydrology that is comparable to the approach we use with quantum annealing. Our results show that quantum annealing can be useful for identifying regions of high and low permeability within an aquifer. While the problems we consider are small-scale compared to the problems that can be solved with modern classical computers, they are large compared to the problems that could be solved with early classical CPUs. Further, the binary nature of the high/low permeability problem makes it well-suited to quantum annealing, but challenging for classical computers.
NASA Astrophysics Data System (ADS)
Chen, Xiuhong; Huang, Xianglei; Jiao, Chaoyi; Flanner, Mark G.; Raeker, Todd; Palen, Brock
2017-01-01
The suites of numerical models used for simulating climate of our planet are usually run on dedicated high-performance computing (HPC) resources. This study investigates an alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9° latitude by 2.5° longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the scaling is nearly linear. Beyond 64 cores, the communication latency starts to outweigh the benefit of distributed computing and the parallel speedup becomes nearly unchanged.
NASA Astrophysics Data System (ADS)
Mitchell, T. M.; Backeberg, N. R.; Iacoviello, F.; Rittner, M.; Jones, A. P.; Wheeler, J.; Day, R.; Vermeesch, P.; Shearing, P. R.; Striolo, A.
2017-12-01
The permeability of shales is important, because it controls where oil and gas resources can migrate to and where in the Earth hydrocarbons are ultimately stored. Shales have a well-known anisotropic directional permeability that is inherited from the depositional layering of sedimentary laminations, where the highest permeability is measured parallel to laminations and the lowest permeability is perpendicular to laminations. We combine state of the art laboratory permeability experiments with high-resolution X-ray computed tomography and for the first time can quantify the three-dimensional interconnected pathways through a rock that define the anisotropic behaviour of shales. Experiments record a physical anisotropy in permeability of one to two orders of magnitude. Two- and three-dimensional analyses of micro- and nano-scale X-ray computed tomography illuminate that the directional anisotropy is fundamentally controlled by the bulk rock mineral geometry, which determines the finite length (or tortuosity) of the interconnected pathways through the porous/permeable phases in shales. Understanding the mineral-scale control on permeability will allow for better estimations of the extent of recoverable reserves in shale gas plays globally.
Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min
2015-06-01
The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Ground-water models as a management tool in Florida
Hutchinson, C.B.
1984-01-01
Highly sophisticated computer models provide powerful tools for analyzing historic data and for simulating future water levels, water movement, and water chemistry under stressed conditions throughout the ground-water system in Florida. Models that simulate the movement of heat and subsidence of land in response to aquifer pumping also have potential for application to hydrologic problems in the State. Florida, with 20 ground-water modeling studies reported since 1972, has applied computer modeling techniques to a variety of water-resources problems. Models in Florida generally have been used to provide insight to problems of water supply, contamination, and impact on the environment. The model applications range from site-specific studies, such as estimating contamination by wastewater injection at St. Petersburg, to a regional model of the entire State that may be used to assess broad-scale environmental impact of water-resources development. Recently, groundwater models have been used as management tools by the State regulatory authority to permit or deny development of water resources. As modeling precision, knowledge, and confidence increase, the use of ground-water models will shift more and more toward regulation of development and enforcement of environmental laws. (USGS)
Combining multi-layered bitmap files using network specific hardware
DuBois, David H [Los Alamos, NM; DuBois, Andrew J [Santa Fe, NM; Davenport, Carolyn Connor [Los Alamos, NM
2012-02-28
Images and video can be produced by compositing or alpha blending a group of image layers or video layers. Increasing resolution or the number of layers results in increased computational demands. As such, the available computational resources limit the images and videos that can be produced. A computational architecture in which the image layers are packetized and streamed through processors can be easily scaled so to handle many image layers and high resolutions. The image layers are packetized to produce packet streams. The packets in the streams are received, placed in queues, and processed. For alpha blending, ingress queues receive the packetized image layers which are then z sorted and sent to egress queues. The egress queue packets are alpha blended to produce an output image or video.
Mendel-GPU: haplotyping and genotype imputation on graphics processing units
Chen, Gary K.; Wang, Kai; Stram, Alex H.; Sobel, Eric M.; Lange, Kenneth
2012-01-01
Motivation: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Results: Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Availability: Mendel-GPU, our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/. Contact: gary.k.chen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22954633
2015-05-01
decisions on the fly in an online retail environment. Tech. rep., Working Paper, Massachusetts Institute of Technology, Boston, MA. Arneson, Broderick , Ryan...Hayward, Philip Henderson. 2009. MoHex wins Hex tournament. International Computer Games Association Journal 32 114–116. Arneson, Broderick , Ryan B...Combina- torial Search. Enzenberger, Markus, Martin Muller, Broderick Arneson, Richard Segal. 2010. Fuego—an open-source framework for board games and
1982-08-01
though the two groups were different in terms of SC!I scientific interests and academic orientation scores (the aviation supply sample scored higher on...51 Chemists/Physicists 50 MARINE OFFICERS- COMUNICATION 49 MARINE OFFICERS-DATA SYSTEMS 48 Engineers 47 Biologists 46 Systems Analysts/Computer...Base ( Scientific and Technical Information Office) Commander, Air Force Human Resources Laboratory, Lowry Air Force Base (Technical Training Branch
Changing paradigms in the management of 2184 patients with traumatic brain injury.
Joseph, Bellal; Haider, Ansab A; Pandit, Viraj; Tang, Andrew; Kulvatunyou, Narong; OʼKeeffe, Terence; Rhee, Peter
2015-09-01
The aim of this study was to assess the change in trends in the management of traumatic brain injury (TBI) at a level I trauma center and the utilization of resources as a result of this change in management. The management of TBI has been evolving with trends toward management of minimally injured patients with intracranial hemorrhage exclusively by trauma surgeons. A 5-year (2009-2014) prospective database on all patients with TBI (skull fracture/intracranial hemorrhage on head computed tomography) presenting to a level I trauma center was analyzed for patient demographics, injuries, admission physiology, computed tomographic scan results, and hospital outcomes. These records were matched to the institutional registry and hospital financial database. A total of 2184 patients were included with median (interquartile range) Glasgow Coma Scale score of 15 (12-15), and median (interquartile range) head-abbreviated injury scale score of 3 (2-4). The distribution of types and size of intracranial bleeds remained unchanged throughout the study period. The proportion of TBI managed exclusively by trauma surgeons increased significantly over the years from 6.8% to 40.1% (P < 0.001). Proportion of patients who received neurosurgical consultations (P < 0.001) and repeat head computed tomographic scans (P < 0.001), hospital length of stay (P = 0.028), and costs (P < 0.001) decreased significantly over time. The overall mortality rate (18.5%) and rate of intervention (14.1%) remained unchanged. TBI patients can be selectively managed without initially involving neurosurgeons safely in a cost-effective manner, resulting in more effective use of precious resources.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Li; He, Ya-Ling; Kang, Qinjun
2013-12-15
A coupled (hybrid) simulation strategy spatially combining the finite volume method (FVM) and the lattice Boltzmann method (LBM), called CFVLBM, is developed to simulate coupled multi-scale multi-physicochemical processes. In the CFVLBM, computational domain of multi-scale problems is divided into two sub-domains, i.e., an open, free fluid region and a region filled with porous materials. The FVM and LBM are used for these two regions, respectively, with information exchanged at the interface between the two sub-domains. A general reconstruction operator (RO) is proposed to derive the distribution functions in the LBM from the corresponding macro scalar, the governing equation of whichmore » obeys the convection–diffusion equation. The CFVLBM and the RO are validated in several typical physicochemical problems and then are applied to simulate complex multi-scale coupled fluid flow, heat transfer, mass transport, and chemical reaction in a wall-coated micro reactor. The maximum ratio of the grid size between the FVM and LBM regions is explored and discussed. -- Highlights: •A coupled simulation strategy for simulating multi-scale phenomena is developed. •Finite volume method and lattice Boltzmann method are coupled. •A reconstruction operator is derived to transfer information at the sub-domains interface. •Coupled multi-scale multiple physicochemical processes in micro reactor are simulated. •Techniques to save computational resources and improve the efficiency are discussed.« less
Advanced Simulation & Computing FY15 Implementation Plan Volume 2, Rev. 0.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCoy, Michel; Archer, Bill; Matzen, M. Keith
2014-09-16
The Stockpile Stewardship Program (SSP) is a single, highly integrated technical program for maintaining the surety and reliability of the U.S. nuclear stockpile. The SSP uses nuclear test data, computational modeling and simulation, and experimental facilities to advance understanding of nuclear weapons. It includes stockpile surveillance, experimental research, development and engineering programs, and an appropriately scaled production capability to support stockpile requirements. This integrated national program requires the continued use of experimental facilities and programs, and the computational enhancements to support these programs. The Advanced Simulation and Computing Program (ASC) is a cornerstone of the SSP, providing simulation capabilities andmore » computational resources that support annual stockpile assessment and certification, study advanced nuclear weapons design and manufacturing processes, analyze accident scenarios and weapons aging, and provide the tools to enable stockpile Life Extension Programs (LEPs) and the resolution of Significant Finding Investigations (SFIs). This requires a balance of resource, including technical staff, hardware, simulation software, and computer science solutions. As the program approaches the end of its second decade, ASC is intently focused on increasing predictive capabilities in a three-dimensional (3D) simulation environment while maintaining support to the SSP. The program continues to improve its unique tools for solving progressively more difficult stockpile problems (sufficient resolution, dimensionality, and scientific details), quantify critical margins and uncertainties, and resolve increasingly difficult analyses needed for the SSP. Where possible, the program also enables the use of high-performance simulation and computing tools to address broader national security needs, such as foreign nuclear weapon assessments and counternuclear terrorism.« less
Large-scale ground motion simulation using GPGPU
NASA Astrophysics Data System (ADS)
Aoi, S.; Maeda, T.; Nishizawa, N.; Aoki, T.
2012-12-01
Huge computation resources are required to perform large-scale ground motion simulations using 3-D finite difference method (FDM) for realistic and complex models with high accuracy. Furthermore, thousands of various simulations are necessary to evaluate the variability of the assessment caused by uncertainty of the assumptions of the source models for future earthquakes. To conquer the problem of restricted computational resources, we introduced the use of GPGPU (General purpose computing on graphics processing units) which is the technique of using a GPU as an accelerator of the computation which has been traditionally conducted by the CPU. We employed the CPU version of GMS (Ground motion Simulator; Aoi et al., 2004) as the original code and implemented the function for GPU calculation using CUDA (Compute Unified Device Architecture). GMS is a total system for seismic wave propagation simulation based on 3-D FDM scheme using discontinuous grids (Aoi&Fujiwara, 1999), which includes the solver as well as the preprocessor tools (parameter generation tool) and postprocessor tools (filter tool, visualization tool, and so on). The computational model is decomposed in two horizontal directions and each decomposed model is allocated to a different GPU. We evaluated the performance of our newly developed GPU version of GMS on the TSUBAME2.0 which is one of the Japanese fastest supercomputer operated by the Tokyo Institute of Technology. First we have performed a strong scaling test using the model with about 22 million grids and achieved 3.2 and 7.3 times of the speed-up by using 4 and 16 GPUs. Next, we have examined a weak scaling test where the model sizes (number of grids) are increased in proportion to the degree of parallelism (number of GPUs). The result showed almost perfect linearity up to the simulation with 22 billion grids using 1024 GPUs where the calculation speed reached to 79.7 TFlops and about 34 times faster than the CPU calculation using the same number of cores. Finally, we applied GPU calculation to the simulation of the 2011 Tohoku-oki earthquake. The model was constructed using a slip model from inversion of strong motion data (Suzuki et al., 2012), and a geological- and geophysical-based velocity structure model comprising all the Tohoku and Kanto regions as well as the large source area, which consists of about 1.9 billion grids. The overall characteristics of observed velocity seismograms for a longer period than range of 8 s were successfully reproduced (Maeda et al., 2012 AGU meeting). The turn around time for 50 thousand-step calculation (which correspond to 416 s in seismograph) using 100 GPUs was 52 minutes which is fairly short, especially considering this is the performance for the realistic and complex model.
Bethel, EW; Bauer, A; Abbasi, H; ...
2016-06-10
The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i.e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed /visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPU’s and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitionersmore » using in situ methods in extreme-scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.« less
An immersed boundary method for modeling a dirty geometry data
NASA Astrophysics Data System (ADS)
Onishi, Keiji; Tsubokura, Makoto
2017-11-01
We present a robust, fast, and low preparation cost immersed boundary method (IBM) for simulating an incompressible high Re flow around highly complex geometries. The method is achieved by the dispersion of the momentum by the axial linear projection and the approximate domain assumption satisfying the mass conservation around the wall including cells. This methodology has been verified against an analytical theory and wind tunnel experiment data. Next, we simulate the problem of flow around a rotating object and demonstrate the ability of this methodology to the moving geometry problem. This methodology provides the possibility as a method for obtaining a quick solution at a next large scale supercomputer. This research was supported by MEXT as ``Priority Issue on Post-K computer'' (Development of innovative design and production processes) and used computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science.
Harispe, Sébastien; Ranwez, Sylvie; Janaqi, Stefan; Montmain, Jacky
2014-03-01
The semantic measures library and toolkit are robust open-source and easy to use software solutions dedicated to semantic measures. They can be used for large-scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. Open Biomedical Ontology, Resource Description Framework). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool that can be used on personal computers or computer clusters. Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org.
Enabling Extreme Scale Earth Science Applications at the Oak Ridge Leadership Computing Facility
NASA Astrophysics Data System (ADS)
Anantharaj, V. G.; Mozdzynski, G.; Hamrud, M.; Deconinck, W.; Smith, L.; Hack, J.
2014-12-01
The Oak Ridge Leadership Facility (OLCF), established at the Oak Ridge National Laboratory (ORNL) under the auspices of the U.S. Department of Energy (DOE), welcomes investigators from universities, government agencies, national laboratories and industry who are prepared to perform breakthrough research across a broad domain of scientific disciplines, including earth and space sciences. Titan, the OLCF flagship system, is currently listed as #2 in the Top500 list of supercomputers in the world, and the largest available for open science. The computational resources are allocated primarily via the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, sponsored by the U.S. DOE Office of Science. In 2014, over 2.25 billion core hours on Titan were awarded via INCITE projects., including 14% of the allocation toward earth sciences. The INCITE competition is also open to research scientists based outside the USA. In fact, international research projects account for 12% of the INCITE awards in 2014. The INCITE scientific review panel also includes 20% participation from international experts. Recent accomplishments in earth sciences at OLCF include the world's first continuous simulation of 21,000 years of earth's climate history (2009); and an unprecedented simulation of a magnitude 8 earthquake over 125 sq. miles. One of the ongoing international projects involves scaling the ECMWF Integrated Forecasting System (IFS) model to over 200K cores of Titan. ECMWF is a partner in the EU funded Collaborative Research into Exascale Systemware, Tools and Applications (CRESTA) project. The significance of the research carried out within this project is the demonstration of techniques required to scale current generation Petascale capable simulation codes towards the performance levels required for running on future Exascale systems. One of the techniques pursued by ECMWF is to use Fortran2008 coarrays to overlap computations and communications and to reduce the total volume of data communicated. Use of Titan has enabled ECMWF to plan future scalability developments and resource requirements. We will also discuss the best practices developed over the years in navigating logistical, legal and regulatory hurdles involved in supporting the facility's diverse user community.
NASA Astrophysics Data System (ADS)
Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.
2015-12-01
The CyberShake computational platform, developed by the Southern California Earthquake Center (SCEC), is an integrated collection of scientific software and middleware that performs 3D physics-based probabilistic seismic hazard analysis (PSHA) for Southern California. CyberShake integrates large-scale and high-throughput research codes to produce probabilistic seismic hazard curves for individual locations of interest and hazard maps for an entire region. A recent CyberShake calculation produced about 500,000 two-component seismograms for each of 336 locations, resulting in over 300 million synthetic seismograms in a Los Angeles-area probabilistic seismic hazard model. CyberShake calculations require a series of scientific software programs. Early computational stages produce data used as inputs by later stages, so we describe CyberShake calculations using a workflow definition language. Scientific workflow tools automate and manage the input and output data and enable remote job execution on large-scale HPC systems. To satisfy the requests of broad impact users of CyberShake data, such as seismologists, utility companies, and building code engineers, we successfully completed CyberShake Study 15.4 in April and May 2015, calculating a 1 Hz urban seismic hazard map for Los Angeles. We distributed the calculation between the NSF Track 1 system NCSA Blue Waters, the DOE Leadership-class system OLCF Titan, and USC's Center for High Performance Computing. This study ran for over 5 weeks, burning about 1.1 million node-hours and producing over half a petabyte of data. The CyberShake Study 15.4 results doubled the maximum simulated seismic frequency from 0.5 Hz to 1.0 Hz as compared to previous studies, representing a factor of 16 increase in computational complexity. We will describe how our workflow tools supported splitting the calculation across multiple systems. We will explain how we modified CyberShake software components, including GPU implementations and migrating from file-based communication to MPI messaging, to greatly reduce the I/O demands and node-hour requirements of CyberShake. We will also present performance metrics from CyberShake Study 15.4, and discuss challenges that producers of Big Data on open-science HPC resources face moving forward.
Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows.
Sztromwasser, Pawel; Puntervoll, Pål; Petersen, Kjell
2011-07-26
Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.
An Approach to Experimental Design for the Computer Analysis of Complex Phenomenon
NASA Technical Reports Server (NTRS)
Rutherford, Brian
2000-01-01
The ability to make credible system assessments, predictions and design decisions related to engineered systems and other complex phenomenon is key to a successful program for many large-scale investigations in government and industry. Recently, many of these large-scale analyses have turned to computational simulation to provide much of the required information. Addressing specific goals in the computer analysis of these complex phenomenon is often accomplished through the use of performance measures that are based on system response models. The response models are constructed using computer-generated responses together with physical test results where possible. They are often based on probabilistically defined inputs and generally require estimation of a set of response modeling parameters. As a consequence, the performance measures are themselves distributed quantities reflecting these variabilities and uncertainties. Uncertainty in the values of the performance measures leads to uncertainties in predicted performance and can cloud the decisions required of the analysis. A specific goal of this research has been to develop methodology that will reduce this uncertainty in an analysis environment where limited resources and system complexity together restrict the number of simulations that can be performed. An approach has been developed that is based on evaluation of the potential information provided for each "intelligently selected" candidate set of computer runs. Each candidate is evaluated by partitioning the performance measure uncertainty into two components - one component that could be explained through the additional computational simulation runs and a second that would remain uncertain. The portion explained is estimated using a probabilistic evaluation of likely results for the additional computational analyses based on what is currently known about the system. The set of runs indicating the largest potential reduction in uncertainty is then selected and the computational simulations are performed. Examples are provided to demonstrate this approach on small scale problems. These examples give encouraging results. Directions for further research are indicated.
The national coal-resources data system of the U.S. geological survey
Carter, M.D.
1976-01-01
The National Coal Resources Data System (NCRDS) was designed by the U.S. Geological Survey (USGS) to meet the increasing demands for rapid retrieval of information on coal location, quantity, quality, and accessibility. An interactive conversational query system devised by the USGS retrieves information from the data bank through a standard computer terminal. The system is being developed in two phases. Phase I, which currently is available on a limited basis, contains published areal resource and chemical data. The primary objective of this phase is to retrieve, calculate, and tabulate coal-resource data by area on a local, regional, or national scale. Factors available for retrieval include: state, county, quadrangle, township, coal field, coal bed, formation, geologic age, source and reliability of data, and coal-bed rank, thickness, overburden, and tonnage, or any combinations of variables. In addition, the chemical data items include individual values for proximate and ultimate analyses, BTU value, and several other physical and chemical tests. Information will be validated and deleted or updated as needed. Phase II is being developed to store, retrieve, and manipulate basic point source coal data (e.g., field observations, drill-hole logs), including geodetic location; bed thickness; depth of burial; moisture; ash; sulfur; major-, minor-, and trace-element content; heat value; and characteristics of overburden, roof rocks, and floor rocks. The computer system may be used to generate interactively structure-contour or isoline maps of the physical and chemical characteristics of a coal bed or to calculate coal resources. ?? 1976.
NASA Astrophysics Data System (ADS)
Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.
2017-12-01
Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of exploiting large computational and storage resources of Cloud Computing platforms for large scale DInSAR analysis. The presented Cloud Computing P-SBAS processing chain can be a precious tool in the perspective of developing operational services disposable for the EO scientific community related to hazard monitoring and risk prevention and mitigation.
Adaptive Resource Utilization Prediction System for Infrastructure as a Service Cloud.
Zia Ullah, Qazi; Hassan, Shahzad; Khan, Gul Muhammad
2017-01-01
Infrastructure as a Service (IaaS) cloud provides resources as a service from a pool of compute, network, and storage resources. Cloud providers can manage their resource usage by knowing future usage demand from the current and past usage patterns of resources. Resource usage prediction is of great importance for dynamic scaling of cloud resources to achieve efficiency in terms of cost and energy consumption while keeping quality of service. The purpose of this paper is to present a real-time resource usage prediction system. The system takes real-time utilization of resources and feeds utilization values into several buffers based on the type of resources and time span size. Buffers are read by R language based statistical system. These buffers' data are checked to determine whether their data follows Gaussian distribution or not. In case of following Gaussian distribution, Autoregressive Integrated Moving Average (ARIMA) is applied; otherwise Autoregressive Neural Network (AR-NN) is applied. In ARIMA process, a model is selected based on minimum Akaike Information Criterion (AIC) values. Similarly, in AR-NN process, a network with the lowest Network Information Criterion (NIC) value is selected. We have evaluated our system with real traces of CPU utilization of an IaaS cloud of one hundred and twenty servers.
Adaptive Resource Utilization Prediction System for Infrastructure as a Service Cloud
Hassan, Shahzad; Khan, Gul Muhammad
2017-01-01
Infrastructure as a Service (IaaS) cloud provides resources as a service from a pool of compute, network, and storage resources. Cloud providers can manage their resource usage by knowing future usage demand from the current and past usage patterns of resources. Resource usage prediction is of great importance for dynamic scaling of cloud resources to achieve efficiency in terms of cost and energy consumption while keeping quality of service. The purpose of this paper is to present a real-time resource usage prediction system. The system takes real-time utilization of resources and feeds utilization values into several buffers based on the type of resources and time span size. Buffers are read by R language based statistical system. These buffers' data are checked to determine whether their data follows Gaussian distribution or not. In case of following Gaussian distribution, Autoregressive Integrated Moving Average (ARIMA) is applied; otherwise Autoregressive Neural Network (AR-NN) is applied. In ARIMA process, a model is selected based on minimum Akaike Information Criterion (AIC) values. Similarly, in AR-NN process, a network with the lowest Network Information Criterion (NIC) value is selected. We have evaluated our system with real traces of CPU utilization of an IaaS cloud of one hundred and twenty servers. PMID:28811819
Duarte, Afonso M. S.; Psomopoulos, Fotis E.; Blanchet, Christophe; Bonvin, Alexandre M. J. J.; Corpas, Manuel; Franc, Alain; Jimenez, Rafael C.; de Lucas, Jesus M.; Nyrönen, Tommi; Sipos, Gergely; Suhr, Stephanie B.
2015-01-01
With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community. PMID:26157454
Duarte, Afonso M S; Psomopoulos, Fotis E; Blanchet, Christophe; Bonvin, Alexandre M J J; Corpas, Manuel; Franc, Alain; Jimenez, Rafael C; de Lucas, Jesus M; Nyrönen, Tommi; Sipos, Gergely; Suhr, Stephanie B
2015-01-01
With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community.
NASA Astrophysics Data System (ADS)
Rossi, V.; Dubois, M.; Ser-Giacomi, E.; Monroy, P.; Lopez, C.; Hernandez-Garcia, E.
2016-02-01
Assessing the spatial structure and dynamics of marine populations is still a major challenge for ecologists. The necessity to manage marine resources from a large-scale perspective and considering the whole ecosystem is now recognized but the absence of appropriate tools to address these objectives limits the implementation of globally pertinent conservation planning. Inspired from Network Theory, we present a new methodological framework called Lagrangian Flow Network which allows a systematic characterization of multi-scale dispersal and connectivity of early life history stages of marine organisms. The network is constructed by subdividing the basin into an ensemble of equal-area subregions which are interconnected through the transport of propagules by ocean currents. The present version allows the identification of hydrodynamical provinces and the computation of various connectivity proxies measuring retention and exchange of larvae. Due to our spatial discretization and subsequent network representation, as well as our Lagrangian approach, further methodological improvements are handily accessible. These future developments include a parametrization of habitat patchiness, the implementation of realistic larval traits and the consideration of abiotic variables (e.g. temperature, salinity, planktonic resources...) and their effects on larval production and survival. While the model is potentially tunable to any species whose biological traits and ecological preferences are precisely known, it can also be used in a more generic configuration by efficient computing and analysis of a large number of experiments with relevant ecological parameters. It permits a better characterization of population connectivity at multiple scales and it informs its ecological and managerial interpretations.
High-resolution LES of the rotating stall in a reduced scale model pump-turbine
NASA Astrophysics Data System (ADS)
Pacot, Olivier; Kato, Chisachi; Avellan, François
2014-03-01
Extending the operating range of modern pump-turbines becomes increasingly important in the course of the integration of renewable energy sources in the existing power grid. However, at partial load condition in pumping mode, the occurrence of rotating stall is critical to the operational safety of the machine and on the grid stability. The understanding of the mechanisms behind this flow phenomenon yet remains vague and incomplete. Past numerical simulations using a RANS approach often led to inconclusive results concerning the physical background. For the first time, the rotating stall is investigated by performing a large scale LES calculation on the HYDRODYNA pump-turbine scale model featuring approximately 100 million elements. The computations were performed on the PRIMEHPC FX10 of the University of Tokyo using the overset Finite Element open source code FrontFlow/blue with the dynamic Smagorinsky turbulence model and the no-slip wall condition. The internal flow computed is the one when operating the pump-turbine at 76% of the best efficiency point in pumping mode, as previous experimental research showed the presence of four rotating cells. The rotating stall phenomenon is accurately reproduced for a reduced Reynolds number using the LES approach with acceptable computing resources. The results show an excellent agreement with available experimental data from the reduced scale model testing at the EPFL Laboratory for Hydraulic Machines. The number of stall cells as well as the propagation speed corroborates the experiment.
Understanding the Performance and Potential of Cloud Computing for Scientific Applications
Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin; ...
2015-02-19
In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
Understanding the Performance and Potential of Cloud Computing for Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin
In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
Optimizing R with SparkR on a commodity cluster for biomedical research.
Sedlmayr, Martin; Würfl, Tobias; Maier, Christian; Häberle, Lothar; Fasching, Peter; Prokosch, Hans-Ulrich; Christoph, Jan
2016-12-01
Medical researchers are challenged today by the enormous amount of data collected in healthcare. Analysis methods such as genome-wide association studies (GWAS) are often computationally intensive and thus require enormous resources to be performed in a reasonable amount of time. While dedicated clusters and public clouds may deliver the desired performance, their use requires upfront financial efforts or anonymous data, which is often not possible for preliminary or occasional tasks. We explored the possibilities to build a private, flexible cluster for processing scripts in R based on commodity, non-dedicated hardware of our department. For this, a GWAS-calculation in R on a single desktop computer, a Message Passing Interface (MPI)-cluster, and a SparkR-cluster were compared with regards to the performance, scalability, quality, and simplicity. The original script had a projected runtime of three years on a single desktop computer. Optimizing the script in R already yielded a significant reduction in computing time (2 weeks). By using R-MPI and SparkR, we were able to parallelize the computation and reduce the time to less than three hours (2.6 h) on already available, standard office computers. While MPI is a proven approach in high-performance clusters, it requires rather static, dedicated nodes. SparkR and its Hadoop siblings allow for a dynamic, elastic environment with automated failure handling. SparkR also scales better with the number of nodes in the cluster than MPI due to optimized data communication. R is a popular environment for clinical data analysis. The new SparkR solution offers elastic resources and allows supporting big data analysis using R even on non-dedicated resources with minimal change to the original code. To unleash the full potential, additional efforts should be invested to customize and improve the algorithms, especially with regards to data distribution. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Overcoming complexities for consistent, continental-scale flood mapping
NASA Astrophysics Data System (ADS)
Smith, Helen; Zaidman, Maxine; Davison, Charlotte
2013-04-01
The EU Floods Directive requires all member states to produce flood hazard maps by 2013. Although flood mapping practices are well developed in Europe, there are huge variations in the scale and resolution of the maps between individual countries. Since extreme flood events are rarely confined to a single country, this is problematic, particularly for the re/insurance industry whose exposures often extend beyond country boundaries. Here, we discuss the challenges of large-scale hydrological and hydraulic modelling, using our experience of developing a 12-country model and set of maps, to illustrate how consistent, high-resolution river flood maps across Europe can be produced. The main challenges addressed include: data acquisition; manipulating the vast quantities of high-resolution data; and computational resources. Our starting point was to develop robust flood-frequency models that are suitable for estimating peak flows for a range of design flood return periods. We used the index flood approach, based on a statistical analysis of historic river flow data pooled on the basis of catchment characteristics. Historical flow data were therefore sourced for each country and collated into a large pan-European database. After a lengthy validation these data were collated into 21 separate analysis zones or regions, grouping smaller river basins according to their physical and climatic characteristics. The very large continental scale basins were each modelled separately on account of their size (e.g. Danube, Elbe, Drava and Rhine). Our methodology allows the design flood hydrograph to be predicted at any point on the river network for a range of return periods. Using JFlow+, JBA's proprietary 2D hydraulic hydrodynamic model, the calculated out-of-bank flows for all watercourses with an upstream drainage area exceeding 50km2 were routed across two different Digital Terrain Models in order to map the extent and depth of floodplain inundation. This generated modelling for a total river length of approximately 250,000km. Such a large-scale, high-resolution modelling exercise is extremely demanding on computational resources and would have been unfeasible without the use of Graphics Processing Units on a network of standard specification gaming computers. Our GPU grid is the world's largest flood-dedicated computer grid. The European river basins were split out into approximately 100 separate hydraulic models and managed individually, although care was taken to ensure flow continuity was maintained between models. The flood hazard maps from the modelling were pieced together using GIS techniques, to provide flood depth and extent information across Europe to a consistent scale and standard. After discussing the methodological challenges, we shall present our flood hazard maps and, from extensive validation work, compare these against historical flow records and observed flood extents.
Large-Scale Simulations of Plastic Neural Networks on Neuromorphic Hardware
Knight, James C.; Tully, Philip J.; Kaplan, Bernhard A.; Lansner, Anders; Furber, Steve B.
2016-01-01
SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Rather than using bespoke analog or digital hardware, the basic computational unit of a SpiNNaker system is a general-purpose ARM processor, allowing it to be programmed to simulate a wide variety of neuron and synapse models. This flexibility is particularly valuable in the study of biological plasticity phenomena. A recently proposed learning rule based on the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm offers a generic framework for modeling the interaction of different plasticity mechanisms using spiking neurons. However, it can be computationally expensive to simulate large networks with BCPNN learning since it requires multiple state variables for each synapse, each of which needs to be updated every simulation time-step. We discuss the trade-offs in efficiency and accuracy involved in developing an event-based BCPNN implementation for SpiNNaker based on an analytical solution to the BCPNN equations, and detail the steps taken to fit this within the limited computational and memory resources of the SpiNNaker architecture. We demonstrate this learning rule by learning temporal sequences of neural activity within a recurrent attractor network which we simulate at scales of up to 2.0 × 104 neurons and 5.1 × 107 plastic synapses: the largest plastic neural network ever to be simulated on neuromorphic hardware. We also run a comparable simulation on a Cray XC-30 supercomputer system and find that, if it is to match the run-time of our SpiNNaker simulation, the super computer system uses approximately 45× more power. This suggests that cheaper, more power efficient neuromorphic systems are becoming useful discovery tools in the study of plasticity in large-scale brain models. PMID:27092061
Womack, James C; Anton, Lucian; Dziedzic, Jacek; Hasnip, Phil J; Probert, Matt I J; Skylaris, Chris-Kriton
2018-03-13
The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼10 9 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.
Quantum chemistry simulation on quantum computers: theories and experiments.
Lu, Dawei; Xu, Boruo; Xu, Nanyang; Li, Zhaokai; Chen, Hongwei; Peng, Xinhua; Xu, Ruixue; Du, Jiangfeng
2012-07-14
It has been claimed that quantum computers can mimic quantum systems efficiently in the polynomial scale. Traditionally, those simulations are carried out numerically on classical computers, which are inevitably confronted with the exponential growth of required resources, with the increasing size of quantum systems. Quantum computers avoid this problem, and thus provide a possible solution for large quantum systems. In this paper, we first discuss the ideas of quantum simulation, the background of quantum simulators, their categories, and the development in both theories and experiments. We then present a brief introduction to quantum chemistry evaluated via classical computers followed by typical procedures of quantum simulation towards quantum chemistry. Reviewed are not only theoretical proposals but also proof-of-principle experimental implementations, via a small quantum computer, which include the evaluation of the static molecular eigenenergy and the simulation of chemical reaction dynamics. Although the experimental development is still behind the theory, we give prospects and suggestions for future experiments. We anticipate that in the near future quantum simulation will become a powerful tool for quantum chemistry over classical computations.
Rich client data exploration and research prototyping for NOAA
NASA Astrophysics Data System (ADS)
Grossberg, Michael; Gladkova, Irina; Guch, Ingrid; Alabi, Paul; Shahriar, Fazlul; Bonev, George; Aizenman, Hannah
2009-08-01
Data from satellites and model simulations is increasing exponentially as observations and model computing power improve rapidly. Not only is technology producing more data, but it often comes from sources all over the world. Researchers and scientists who must collaborate are also located globally. This work presents a software design and technologies which will make it possible for groups of researchers to explore large data sets visually together without the need to download these data sets locally. The design will also make it possible to exploit high performance computing remotely and transparently to analyze and explore large data sets. Computer power, high quality sensing, and data storage capacity have improved at a rate that outstrips our ability to develop software applications that exploit these resources. It is impractical for NOAA scientists to download all of the satellite and model data that may be relevant to a given problem and the computing environments available to a given researcher range from supercomputers to only a web browser. The size and volume of satellite and model data are increasing exponentially. There are at least 50 multisensor satellite platforms collecting Earth science data. On the ground and in the sea there are sensor networks, as well as networks of ground based radar stations, producing a rich real-time stream of data. This new wealth of data would have limited use were it not for the arrival of large-scale high-performance computation provided by parallel computers, clusters, grids, and clouds. With these computational resources and vast archives available, it is now possible to analyze subtle relationships which are global, multi-modal and cut across many data sources. Researchers, educators, and even the general public, need tools to access, discover, and use vast data center archives and high performance computing through a simple yet flexible interface.
Multi-time Scale Coordination of Distributed Energy Resources in Isolated Power Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayhorn, Ebony; Xie, Le; Butler-Purry, Karen
2016-03-31
In isolated power systems, including microgrids, distributed assets, such as renewable energy resources (e.g. wind, solar) and energy storage, can be actively coordinated to reduce dependency on fossil fuel generation. The key challenge of such coordination arises from significant uncertainty and variability occurring at small time scales associated with increased penetration of renewables. Specifically, the problem is with ensuring economic and efficient utilization of DERs, while also meeting operational objectives such as adequate frequency performance. One possible solution is to reduce the time step at which tertiary controls are implemented and to ensure feedback and look-ahead capability are incorporated tomore » handle variability and uncertainty. However, reducing the time step of tertiary controls necessitates investigating time-scale coupling with primary controls so as not to exacerbate system stability issues. In this paper, an optimal coordination (OC) strategy, which considers multiple time-scales, is proposed for isolated microgrid systems with a mix of DERs. This coordination strategy is based on an online moving horizon optimization approach. The effectiveness of the strategy was evaluated in terms of economics, technical performance, and computation time by varying key parameters that significantly impact performance. The illustrative example with realistic scenarios on a simulated isolated microgrid test system suggests that the proposed approach is generalizable towards designing multi-time scale optimal coordination strategies for isolated power systems.« less
Moving contact lines on vibrating surfaces
NASA Astrophysics Data System (ADS)
Solomenko, Zlatko; Spelt, Peter; Scott, Julian
2017-11-01
Large-scale simulations of flows with moving contact lines for realistic conditions generally requires a subgrid scale model (analyses based on matched asymptotics) to account for the unresolved part of the flow, given the large range of length scales involved near contact lines. Existing models for the interface shape in the contact-line region are primarily for steady flows on homogeneous substrates, with encouraging results in 3D simulations. Introduction of complexities would require further investigation of the contact-line region, however. Here we study flows with moving contact lines on planar substrates subject to vibrations, with applications in controlling wetting/dewetting. The challenge here is to determine the change in interface shape near contact lines due to vibrations. To develop further insight, 2D direct numerical simulations (wherein the flow is resolved down to an imposed slip length) have been performed to enable comparison with asymptotic theory, which is also developed further. Perspectives will also be presented on the final objective of the work, which is to develop a subgrid scale model that can be utilized in large-scale simulations. The authors gratefully acknowledge the ANR for financial support (ANR-15-CE08-0031) and the meso-centre FLMSN for use of computational resources. This work was Granted access to the HPC resources of CINES under the allocation A0012B06893 made by GENCI.
NASA Astrophysics Data System (ADS)
Cowles, G. W.; Hakim, A.; Churchill, J. H.
2016-02-01
Tidal in-stream energy conversion (TISEC) facilities provide a highly predictable and dependable source of energy. Given the economic and social incentives to migrate towards renewable energy sources there has been tremendous interest in the technology. Key challenges to the design process stem from the wide range of problem scales extending from device to array. In the present approach we apply a multi-model approach to bridge the scales of interest and select optimal device geometries to estimate the technical resource for several realistic sites in the coastal waters of Massachusetts, USA. The approach links two computational models. To establish flow conditions at site scales ( 10m), a barotropic setup of the unstructured grid ocean model FVCOM is employed. The model is validated using shipboard and fixed ADCP as well as pressure data. For device scale, the structured multiblock flow solver SUmb is selected. A large ensemble of simulations of 2D cross-flow tidal turbines is used to construct a surrogate design model. The surrogate model is then queried using velocity profiles extracted from the tidal model to determine the optimal geometry for the conditions at each site. After device selection, the annual technical yield of the array is evaluated with FVCOM using a linear momentum actuator disk approach to model the turbines. Results for several key Massachusetts sites including comparison with theoretical approaches will be presented.
Schmidt, Susanne I; Cuthbert, Mark O; Schwientek, Marc
2017-08-15
Micro scale processes are expected to have a fundamental role in shaping groundwater ecosystems and yet they remain poorly understood and under-researched. In part, this is due to the fact that sampling is rarely carried out at the scale at which microorganisms, and their grazers and predators, function and thus we lack essential information. While set within a larger scale framework in terms of geochemical features, supply with energy and nutrients, and exchange intensity and dynamics, the micro scale adds variability, by providing heterogeneous zones at the micro scale which enable a wider range of redox reactions. Here we outline how understanding micro scale processes better may lead to improved appreciation of the range of ecosystems functions taking place at all scales. Such processes are relied upon in bioremediation and we demonstrate that ecosystem modelling as well as engineering measures have to take into account, and use, understanding at the micro scale. We discuss the importance of integrating faunal processes and computational appraisals in research, in order to continue to secure sustainable water resources from groundwater. Copyright © 2017 Elsevier B.V. All rights reserved.
Resource Aware Intelligent Network Services (RAINS) Final Technical Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lehman, Tom; Yang, Xi
The Resource Aware Intelligent Network Services (RAINS) project conducted research and developed technologies in the area of cyber infrastructure resource modeling and computation. The goal of this work was to provide a foundation to enable intelligent, software defined services which spanned the network AND the resources which connect to the network. A Multi-Resource Service Plane (MRSP) was defined, which allows resource owners/managers to locate and place themselves from a topology and service availability perspective within the dynamic networked cyberinfrastructure ecosystem. The MRSP enables the presentation of integrated topology views and computation results which can include resources across the spectrum ofmore » compute, storage, and networks. The RAINS project developed MSRP includes the following key components: i) Multi-Resource Service (MRS) Ontology/Multi-Resource Markup Language (MRML), ii) Resource Computation Engine (RCE), iii) Modular Driver Framework (to allow integration of a variety of external resources). The MRS/MRML is a general and extensible modeling framework that allows for resource owners to model, or describe, a wide variety of resource types. All resources are described using three categories of elements: Resources, Services, and Relationships between the elements. This modeling framework defines a common method for the transformation of cyber infrastructure resources into data in the form of MRML models. In order to realize this infrastructure datification, the RAINS project developed a model based computation system, i.e. “RAINS Computation Engine (RCE)”. The RCE has the ability to ingest, process, integrate, and compute based on automatically generated MRML models. The RCE interacts with the resources thru system drivers which are specific to the type of external network or resource controller. The RAINS project developed a modular and pluggable driver system which facilities a variety of resource controllers to automatically generate, maintain, and distribute MRML based resource descriptions. Once all of the resource topologies are absorbed by the RCE, a connected graph of the full distributed system topology is constructed, which forms the basis for computation and workflow processing. The RCE includes a Modular Computation Element (MCE) framework which allows for tailoring of the computation process to the specific set of resources under control, and the services desired. The input and output of an MCE are both model data based on MRS/MRML ontology and schema. Some of the RAINS project accomplishments include: Development of general and extensible multi-resource modeling framework; Design of a Resource Computation Engine (RCE) system which includes the following key capabilities; Absorb a variety of multi-resource model types and build integrated models; Novel architecture which uses model based communications across the full stack for all Flexible provision of abstract or intent based user facing interfaces; Workflow processing based on model descriptions; Release of the RCE as an open source software; Deployment of RCE in the University of Maryland/Mid-Atlantic Crossroad ScienceDMZ in prototype mode with a plan under way to transition to production; Deployment at the Argonne National Laboratory DTN Facility in prototype mode; Selection of RCE by the DOE SENSE (SDN for End-to-end Networked Science at the Exascale) project as the basis for their orchestration service.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources
Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.
2016-01-01
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar
New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Karthikeyan, M; Krishnan, S; Pandey, Anil Kumar; Bender, Andreas; Tropsha, Alexander
2008-04-01
We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its "plug-in architecture". The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
NASA Astrophysics Data System (ADS)
Anantharaj, Valentine; Norman, Matthew; Evans, Katherine; Taylor, Mark; Worley, Patrick; Hack, James; Mayer, Benjamin
2014-05-01
During 2013, high-resolution climate model simulations accounted for over 100 million "core hours" using Titan at the Oak Ridge Leadership Computing Facility (OLCF). The suite of climate modeling experiments, primarily using the Community Earth System Model (CESM) at nearly 0.25 degree horizontal resolution, generated over a petabyte of data and nearly 100,000 files, ranging in sizes from 20 MB to over 100 GB. Effective utilization of leadership class resources requires careful planning and preparation. The application software, such as CESM, need to be ported, optimized and benchmarked for the target platform in order to meet the computational readiness requirements. The model configuration needs to be "tuned and balanced" for the experiments. This can be a complicated and resource intensive process, especially for high-resolution configurations using complex physics. The volume of I/O also increases with resolution; and new strategies may be required to manage I/O especially for large checkpoint and restart files that may require more frequent output for resiliency. It is also essential to monitor the application performance during the course of the simulation exercises. Finally, the large volume of data needs to be analyzed to derive the scientific results; and appropriate data and information delivered to the stakeholders. Titan is currently the largest supercomputer available for open science. The computational resources, in terms of "titan core hours" are allocated primarily via the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) and ASCR Leadership Computing Challenge (ALCC) programs, both sponsored by the U.S. Department of Energy (DOE) Office of Science. Titan is a Cray XK7 system, capable of a theoretical peak performance of over 27 PFlop/s, consists of 18,688 compute nodes, with a NVIDIA Kepler K20 GPU and a 16-core AMD Opteron CPU in every node, for a total of 299,008 Opteron cores and 18,688 GPUs offering a cumulative 560,640 equivalent cores. Scientific applications, such as CESM, are also required to demonstrate a "computational readiness capability" to efficiently scale across and utilize 20% of the entire system. The 0,25 deg configuration of the spectral element dynamical core of the Community Atmosphere Model (CAM-SE), the atmospheric component of CESM, has been demonstrated to scale efficiently across more than 5,000 nodes (80,000 CPU cores) on Titan. The tracer transport routines of CAM-SE have also been ported to take advantage of the hybrid many-core architecture of Titan using GPUs [see EGU2014-4233], yielding over 2X speedup when transporting over 100 tracers. The high throughput I/O in CESM, based on the Parallel IO Library (PIO), is being further augmented to support even higher resolutions and enhance resiliency. The application performance of the individual runs are archived in a database and routinely analyzed to identify and rectify performance degradation during the course of the experiments. The various resources available at the OLCF now support a scientific workflow to facilitate high-resolution climate modelling. A high-speed center-wide parallel file system, called ATLAS, capable of 1 TB/s, is available on Titan as well as on the clusters used for analysis (Rhea) and visualization (Lens/EVEREST). Long-term archive is facilitated by the HPSS storage system. The Earth System Grid (ESG), featuring search & discovery, is also used to deliver data. The end-to-end workflow allows OLCF users to efficiently share data and publish results in a timely manner.
Enabling opportunistic resources for CMS Computing Operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hufnagel, Dirk
With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize opportunistic resources resources not owned by, or a priori configured for CMS to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are used to enablemore » access and bring the CMS environment into these non CMS resources. Finally, we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Enabling opportunistic resources for CMS Computing Operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hufnagel, Dick
With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize “opportunistic” resources — resources not owned by, or a priori configured for CMS — to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are usedmore » to enable access and bring the CMS environment into these non CMS resources. Here we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Enabling opportunistic resources for CMS Computing Operations
Hufnagel, Dirk
2015-12-23
With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize opportunistic resources resources not owned by, or a priori configured for CMS to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are used to enablemore » access and bring the CMS environment into these non CMS resources. Finally, we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Performance Analysis, Modeling and Scaling of HPC Applications and Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatele, Abhinav
2016-01-13
E cient use of supercomputers at DOE centers is vital for maximizing system throughput, mini- mizing energy costs and enabling science breakthroughs faster. This requires complementary e orts along several directions to optimize the performance of scienti c simulation codes and the under- lying runtimes and software stacks. This in turn requires providing scalable performance analysis tools and modeling techniques that can provide feedback to physicists and computer scientists developing the simulation codes and runtimes respectively. The PAMS project is using time allocations on supercomputers at ALCF, NERSC and OLCF to further the goals described above by performing research alongmore » the following fronts: 1. Scaling Study of HPC applications; 2. Evaluation of Programming Models; 3. Hardening of Performance Tools; 4. Performance Modeling of Irregular Codes; and 5. Statistical Analysis of Historical Performance Data. We are a team of computer and computational scientists funded by both DOE/NNSA and DOE/ ASCR programs such as ECRP, XStack (Traleika Glacier, PIPER), ExaOSR (ARGO), SDMAV II (MONA) and PSAAP II (XPACC). This allocation will enable us to study big data issues when analyzing performance on leadership computing class systems and to assist the HPC community in making the most e ective use of these resources.« less
Using Mosix for Wide-Area Compuational Resources
Maddox, Brian G.
2004-01-01
One of the problems with using traditional Beowulf-type distributed processing clusters is that they require an investment in dedicated computer resources. These resources are usually needed in addition to pre-existing ones such as desktop computers and file servers. Mosix is a series of modifications to the Linux kernel that creates a virtual computer, featuring automatic load balancing by migrating processes from heavily loaded nodes to less used ones. An extension of the Beowulf concept is to run a Mosixenabled Linux kernel on a large number of computer resources in an organization. This configuration would provide a very large amount of computational resources based on pre-existing equipment. The advantage of this method is that it provides much more processing power than a traditional Beowulf cluster without the added costs of dedicating resources.
Economic decision making and the application of nonparametric prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2007-01-01
Sustained increases in energy prices have focused attention on gas resources in low permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are large. Planning and development decisions for extraction of such resources must be area-wide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm the decision to enter such plays depends on reconnaissance level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional scale cost functions. The context of the worked example is the Devonian Antrim shale gas play, Michigan Basin. One finding relates to selection of the resource prediction model to be used with economic models. Models which can best predict aggregate volume over larger areas (many hundreds of sites) may lose granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined by extraneous factors. The paper also shows that when these simple prediction models are used to strategically order drilling prospects, the gain in gas volume over volumes associated with simple random site selection amounts to 15 to 20 percent. It also discusses why the observed benefit of updating predictions from results of new drilling, as opposed to following static predictions, is somewhat smaller. Copyright 2007, Society of Petroleum Engineers.
NASA Astrophysics Data System (ADS)
Strayer, Michael
2007-09-01
Good morning. Welcome to Boston, the home of the Red Sox, Celtics and Bruins, baked beans, tea parties, Robert Parker, and SciDAC 2007. A year ago I stood before you to share the legacy of the first SciDAC program and identify the challenges that we must address on the road to petascale computing—a road E E Cummins described as `. . . never traveled, gladly beyond any experience.' Today, I want to explore the preparations for the rapidly approaching extreme scale (X-scale) generation. These preparations are the first step propelling us along the road of burgeoning scientific discovery enabled by the application of X- scale computing. We look to petascale computing and beyond to open up a world of discovery that cuts across scientific fields and leads us to a greater understanding of not only our world, but our universe. As part of the President's America Competitiveness Initiative, the ASCR Office has been preparing a ten year vision for computing. As part of this planning the LBNL together with ORNL and ANL hosted three town hall meetings on Simulation and Modeling at the Exascale for Energy, Ecological Sustainability and Global Security (E3). The proposed E3 initiative is organized around four programmatic themes: Engaging our top scientists, engineers, computer scientists and applied mathematicians; investing in pioneering large-scale science; developing scalable analysis algorithms, and storage architectures to accelerate discovery; and accelerating the build-out and future development of the DOE open computing facilities. It is clear that we have only just started down the path to extreme scale computing. Plan to attend Thursday's session on the out-briefing and discussion of these meetings. The road to the petascale has been at best rocky. In FY07, the continuing resolution provided 12% less money for Advanced Scientific Computing than either the President, the Senate, or the House. As a consequence, many of you had to absorb a no cost extension for your SciDAC work. I am pleased that the President's FY08 budget restores the funding for SciDAC. Quoting from Advanced Scientific Computing Research description in the House Energy and Water Development Appropriations Bill for FY08, "Perhaps no other area of research at the Department is so critical to sustaining U.S. leadership in science and technology, revolutionizing the way science is done and improving research productivity." As a society we need to revolutionize our approaches to energy, environmental and global security challenges. As we go forward along the road to the X-scale generation, the use of computation will continue to be a critical tool along with theory and experiment in understanding the behavior of the fundamental components of nature as well as for fundamental discovery and exploration of the behavior of complex systems. The foundation to overcome these societal challenges will build from the experiences and knowledge gained as you, members of our SciDAC research teams, work together to attack problems at the tera- and peta- scale. If SciDAC is viewed as an experiment for revolutionizing scientific methodology, then a strategic goal of ASCR program must be to broaden the intellectual base prepared to address the challenges of the new X-scale generation of computing. We must focus our computational science experiences gained over the past five years on the opportunities introduced with extreme scale computing. Our facilities are on a path to provide the resources needed to undertake the first part of our journey. Using the newly upgraded 119 teraflop Cray XT system at the Leadership Computing Facility, SciDAC research teams have in three days performed a 100-year study of the time evolution of the atmospheric CO2 concentration originating from the land surface. The simulation of the El Nino/Southern Oscillation which was part of this study has been characterized as `the most impressive new result in ten years' gained new insight into the behavior of superheated ionic gas in the ITER reactor as a result of an AORSA run on 22,500 processors that achieved over 87 trillion calculations per second (87 teraflops) which is 74% of the system's theoretical peak. Tomorrow, Argonne and IBM will announce that the first IBM Blue Gene/P, a 100 teraflop system, will be shipped to the Argonne Leadership Computing Facility later this fiscal year. By the end of FY2007 ASCR high performance and leadership computing resources will include the 114 teraflop IBM Blue Gene/P; a 102 teraflop Cray XT4 at NERSC and a 119 teraflop Cray XT system at Oak Ridge. Before ringing in the New Year, Oak Ridge will upgrade to 250 teraflops with the replacement of the dual core processors with quad core processors and Argonne will upgrade to between 250-500 teraflops, and next year, a petascale Cray Baker system is scheduled for delivery at Oak Ridge. The multidisciplinary teams in our SciDAC Centers for Enabling Technologies and our SciDAC Institutes must continue to work with our Scientific Application teams to overcome the barriers that prevent effective use of these new systems. These challenges include: the need for new algorithms as well as operating system and runtime software and tools which scale to parallel systems composed of hundreds of thousands processors; program development environments and tools which scale effectively and provide ease of use for developers and scientific end users; and visualization and data management systems that support moving, storing, analyzing, manipulating and visualizing multi-petabytes of scientific data and objects. The SciDAC Centers, located primarily at our DOE national laboratories will take the lead in ensuring that critical computer science and applied mathematics issues are addressed in a timely and comprehensive fashion and to address issues associated with research software lifecycle. In contrast, the SciDAC Institutes, which are university-led centers of excellence, will have more flexibility to pursue new research topics through a range of research collaborations. The Institutes will also work to broaden the intellectual and researcher base—conducting short courses and summer schools to take advantage of new high performance computing capabilities. The SciDAC Outreach Center at Lawrence Berkeley National Laboratory complements the outreach efforts of the SciDAC Institutes. The Outreach Center is our clearinghouse for SciDAC activities and resources and will communicate with the high performance computing community in part to understand their needs for workshops, summer schools and institutes. SciDAC is not ASCR's only effort to broaden the computational science community needed to meet the challenges of the new X-scale generation. I hope that you were able to attend the Computational Science Graduate Fellowship poster session last night. ASCR developed the fellowship in 1991 to meet the nation's growing need for scientists and technology professionals with advanced computer skills. CSGF, now jointly funded between ASCR and NNSA, is more than a traditional academic fellowship. It has provided more than 200 of the best and brightest graduate students with guidance, support and community in preparing them as computational scientists. Today CSGF alumni are bringing their diverse top-level skills and knowledge to research teams at DOE laboratories and in industries such as Proctor and Gamble, Lockheed Martin and Intel. At universities they are working to train the next generation of computational scientists. To build on this success, we intend to develop a wholly new Early Career Principal Investigator's (ECPI) program. Our objective is to stimulate academic research in scientific areas within ASCR's purview especially among faculty in early stages of their academic careers. Last February, we lost Ken Kennedy, one of the leading lights of our community. As we move forward into the extreme computing generation, his vision and insight will be greatly missed. In memorial to Ken Kennedy, we shall designate the ECPI grants to beginning faculty in Computer Science as the Ken Kennedy Fellowship. Watch the ASCR website for more information about ECPI and other early career programs in the computational sciences. We look to you, our scientists, researchers, and visionaries to take X-scale computing and use it to explode scientific discovery in your fields. We at SciDAC will work to ensure that this tool is the sharpest and most precise and efficient instrument to carve away the unknown and reveal the most exciting secrets and stimulating scientific discoveries of our time. The partnership between research and computing is the marriage that will spur greater discovery, and as Spencer said to Susan in Robert Parker's novel, `Sudden Mischief', `We stick together long enough, and we may get as smart as hell'. Michael Strayer
NASA Technical Reports Server (NTRS)
Brooks, Rodney Allen; Stein, Lynn Andrea
1994-01-01
We describe a project to capitalize on newly available levels of computational resources in order to understand human cognition. We will build an integrated physical system including vision, sound input and output, and dextrous manipulation, all controlled by a continuously operating large scale parallel MIMD computer. The resulting system will learn to 'think' by building on its bodily experiences to accomplish progressively more abstract tasks. Past experience suggests that in attempting to build such an integrated system we will have to fundamentally change the way artificial intelligence, cognitive science, linguistics, and philosophy think about the organization of intelligence. We expect to be able to better reconcile the theories that will be developed with current work in neuroscience.
CloudMan as a platform for tool, data, and analysis distribution
2012-01-01
Background Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. Results CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. Conclusions With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions. PMID:23181507
Optimization of a Monte Carlo Model of the Transient Reactor Test Facility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Kristin; DeHart, Mark; Goluoglu, Sedat
2017-03-01
The ultimate goal of modeling and simulation is to obtain reasonable answers to problems that don’t have representations which can be easily evaluated while minimizing the amount of computational resources. With the advances during the last twenty years of large scale computing centers, researchers have had the ability to create a multitude of tools to minimize the number of approximations necessary when modeling a system. The tremendous power of these centers requires the user to possess an immense amount of knowledge to optimize the models for accuracy and efficiency.This paper seeks to evaluate the KENO model of TREAT to optimizemore » calculational efforts.« less
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Contextuality as a Resource for Models of Quantum Computation with Qubits
NASA Astrophysics Data System (ADS)
Bermejo-Vega, Juan; Delfosse, Nicolas; Browne, Dan E.; Okay, Cihan; Raussendorf, Robert
2017-09-01
A central question in quantum computation is to identify the resources that are responsible for quantum speed-up. Quantum contextuality has been recently shown to be a resource for quantum computation with magic states for odd-prime dimensional qudits and two-dimensional systems with real wave functions. The phenomenon of state-independent contextuality poses a priori an obstruction to characterizing the case of regular qubits, the fundamental building block of quantum computation. Here, we establish contextuality of magic states as a necessary resource for a large class of quantum computation schemes on qubits. We illustrate our result with a concrete scheme related to measurement-based quantum computation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frew, Bethany A; Cole, Wesley J; Sun, Yinong
Capacity expansion models (CEMs) are widely used to evaluate the least-cost portfolio of electricity generators, transmission, and storage needed to reliably serve demand over the evolution of many years or decades. Various CEM formulations are used to evaluate systems ranging in scale from states or utility service territories to national or multi-national systems. CEMs can be computationally complex, and to achieve acceptable solve times, key parameters are often estimated using simplified methods. In this paper, we focus on two of these key parameters associated with the integration of variable generation (VG) resources: capacity value and curtailment. We first discuss commonmore » modeling simplifications used in CEMs to estimate capacity value and curtailment, many of which are based on a representative subset of hours that can miss important tail events or which require assumptions about the load and resource distributions that may not match actual distributions. We then present an alternate approach that captures key elements of chronological operation over all hours of the year without the computationally intensive economic dispatch optimization typically employed within more detailed operational models. The updated methodology characterizes the (1) contribution of VG to system capacity during high load and net load hours, (2) the curtailment level of VG, and (3) the potential reductions in curtailments enabled through deployment of storage and more flexible operation of select thermal generators. We apply this alternate methodology to an existing CEM, the Regional Energy Deployment System (ReEDS). Results demonstrate that this alternate approach provides more accurate estimates of capacity value and curtailments by explicitly capturing system interactions across all hours of the year. This approach could be applied more broadly to CEMs at many different scales where hourly resource and load data is available, greatly improving the representation of challenges associate with integration of variable generation resources.« less
Scientific Discovery through Advanced Computing in Plasma Science
NASA Astrophysics Data System (ADS)
Tang, William
2005-03-01
Advanced computing is generally recognized to be an increasingly vital tool for accelerating progress in scientific research during the 21st Century. For example, the Department of Energy's ``Scientific Discovery through Advanced Computing'' (SciDAC) Program was motivated in large measure by the fact that formidable scientific challenges in its research portfolio could best be addressed by utilizing the combination of the rapid advances in super-computing technology together with the emergence of effective new algorithms and computational methodologies. The imperative is to translate such progress into corresponding increases in the performance of the scientific codes used to model complex physical systems such as those encountered in high temperature plasma research. If properly validated against experimental measurements and analytic benchmarks, these codes can provide reliable predictive capability for the behavior of a broad range of complex natural and engineered systems. This talk reviews recent progress and future directions for advanced simulations with some illustrative examples taken from the plasma science applications area. Significant recent progress has been made in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics, giving increasingly good agreement between experimental observations and computational modeling. This was made possible by the combination of access to powerful new computational resources together with innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning a huge range in time and space scales. In particular, the plasma science community has made excellent progress in developing advanced codes for which computer run-time and problem size scale well with the number of processors on massively parallel machines (MPP's). A good example is the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPP's to produce three-dimensional, general geometry, nonlinear particle simulations which have accelerated progress in understanding the nature of plasma turbulence in magnetically-confined high temperature plasmas. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In general, results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. The associated scientific excitement should serve to stimulate improved cross-cutting collaborations with other fields and also to help attract bright young talent to the computational science area.
Computing arrival times of firefighting resources for initial attack
Romain M. Mees
1978-01-01
Dispatching of firefighting resources requires instantaneous or precalculated decisions. A FORTRAN computer program has been developed that can provide a list of resources in order of computed arrival time for initial attack on a fire. The program requires an accurate description of the existing road system and a list of all resources available on a planning unit....
Statistical mechanics of neocortical interactions: Path-integral evolution of short-term memory
NASA Astrophysics Data System (ADS)
Ingber, Lester
1994-05-01
Previous papers in this series of statistical mechanics of neocortical interactions (SMNI) have detailed a development from the relatively microscopic scales of neurons up to the macroscopic scales as recorded by electroencephalography (EEG), requiring an intermediate mesocolumnar scale to be developed at the scale of minicolumns (~=102 neurons) and macrocolumns (~=105 neurons). Opportunity was taken to view SMNI as sets of statistical constraints, not necessarily describing specific synaptic or neuronal mechanisms, on neuronal interactions, on some aspects of short-term memory (STM), e.g., its capacity, stability, and duration. A recently developed c-language code, pathint, provides a non-Monte Carlo technique for calculating the dynamic evolution of arbitrary-dimension (subject to computer resources) nonlinear Lagrangians, such as derived for the two-variable SMNI problem. Here, pathint is used to explicitly detail the evolution of the SMNI constraints on STM.
Portability and Cross-Platform Performance of an MPI-Based Parallel Polygon Renderer
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1999-01-01
Visualizing the results of computations performed on large-scale parallel computers is a challenging problem, due to the size of the datasets involved. One approach is to perform the visualization and graphics operations in place, exploiting the available parallelism to obtain the necessary rendering performance. Over the past several years, we have been developing algorithms and software to support visualization applications on NASA's parallel supercomputers. Our results have been incorporated into a parallel polygon rendering system called PGL. PGL was initially developed on tightly-coupled distributed-memory message-passing systems, including Intel's iPSC/860 and Paragon, and IBM's SP2. Over the past year, we have ported it to a variety of additional platforms, including the HP Exemplar, SGI Origin2OOO, Cray T3E, and clusters of Sun workstations. In implementing PGL, we have had two primary goals: cross-platform portability and high performance. Portability is important because (1) our manpower resources are limited, making it difficult to develop and maintain multiple versions of the code, and (2) NASA's complement of parallel computing platforms is diverse and subject to frequent change. Performance is important in delivering adequate rendering rates for complex scenes and ensuring that parallel computing resources are used effectively. Unfortunately, these two goals are often at odds. In this paper we report on our experiences with portability and performance of the PGL polygon renderer across a range of parallel computing platforms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Salman; Roser, Robert; Gerber, Richard
The U.S. Department of Energy (DOE) Office of Science (SC) Offices of High Energy Physics (HEP) and Advanced Scientific Computing Research (ASCR) convened a programmatic Exascale Requirements Review on June 10–12, 2015, in Bethesda, Maryland. This report summarizes the findings, results, and recommendations derived from that meeting. The high-level findings and observations are as follows. Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude — and in some cases greatermore » — than that available currently. The growth rate of data produced by simulations is overwhelming the current ability of both facilities and researchers to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. Data rates and volumes from experimental facilities are also straining the current HEP infrastructure in its ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. A close integration of high-performance computing (HPC) simulation and data analysis will greatly aid in interpreting the results of HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. Long-range planning between HEP and ASCR will be required to meet HEP’s research needs. To best use ASCR HPC resources, the experimental HEP program needs (1) an established, long-term plan for access to ASCR computational and data resources, (2) the ability to map workflows to HPC resources, (3) the ability for ASCR facilities to accommodate workflows run by collaborations potentially comprising thousands of individual members, (4) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, (5) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.« less
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier
1992-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
ERIC Educational Resources Information Center
Falkner, Katrina; Vivian, Rebecca
2015-01-01
To support teachers to implement Computer Science curricula into classrooms from the very first year of school, teachers, schools and organisations seek quality curriculum resources to support implementation and teacher professional development. Until now, many Computer Science resources and outreach initiatives have targeted K-12 school-age…
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Stocker, John C.; Golomb, Andrew M.
2011-01-01
Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
A Computational framework for telemedicine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.
1998-07-01
Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less
Water resources planning and management : A stochastic dual dynamic programming approach
NASA Astrophysics Data System (ADS)
Goor, Q.; Pinte, D.; Tilmant, A.
2008-12-01
Allocating water between different users and uses, including the environment, is one of the most challenging task facing water resources managers and has always been at the heart of Integrated Water Resources Management (IWRM). As water scarcity is expected to increase over time, allocation decisions among the different uses will have to be found taking into account the complex interactions between water and the economy. Hydro-economic optimization models can capture those interactions while prescribing efficient allocation policies. Many hydro-economic models found in the literature are formulated as large-scale non linear optimization problems (NLP), seeking to maximize net benefits from the system operation while meeting operational and/or institutional constraints, and describing the main hydrological processes. However, those models rarely incorporate the uncertainty inherent to the availability of water, essentially because of the computational difficulties associated stochastic formulations. The purpose of this presentation is to present a stochastic programming model that can identify economically efficient allocation policies in large-scale multipurpose multireservoir systems. The model is based on stochastic dual dynamic programming (SDDP), an extension of traditional SDP that is not affected by the curse of dimensionality. SDDP identify efficient allocation policies while considering the hydrologic uncertainty. The objective function includes the net benefits from the hydropower and irrigation sectors, as well as penalties for not meeting operational and/or institutional constraints. To be able to implement the efficient decomposition scheme that remove the computational burden, the one-stage SDDP problem has to be a linear program. Recent developments improve the representation of the non-linear and mildly non- convex hydropower function through a convex hull approximation of the true hydropower function. This model is illustrated on a cascade of 14 reservoirs on the Nile river basin.
Integration of PanDA workload management system with Titan supercomputer at OLCF
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.
2015-12-01
The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently distributes jobs to more than 100,000 cores at well over 100 Grid sites, the future LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). The current approach utilizes a modified PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multicore worker nodes. It also gives PanDA new capability to collect, in real time, information about unused worker nodes on Titan, which allows precise definition of the size and duration of jobs submitted to Titan according to available free resources. This capability significantly reduces PanDA job wait time while improving Titan's utilization efficiency. This implementation was tested with a variety of Monte-Carlo workloads on Titan and is being tested on several other supercomputing platforms. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
Experience on HTCondor batch system for HEP and other research fields at KISTI-GSDC
NASA Astrophysics Data System (ADS)
Ahn, S. U.; Jaikar, A.; Kong, B.; Yeo, I.; Bae, S.; Kim, J.
2017-10-01
Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) located at Daejeon in South Korea is the unique datacenter in the country which helps with its computing resources fundamental research fields dealing with the large-scale of data. For historical reason, it has run Torque batch system while recently it starts running HTCondor for new systems. Having different kinds of batch systems implies inefficiency in terms of resource management and utilization. We conducted a research on resource management with HTCondor for several user scenarios corresponding to the user environments that currently GSDC supports. A recent research on the resource usage patterns at GSDC is considered in this research to build the possible user scenarios. Checkpointing and Super-Collector model of HTCondor give us more efficient and flexible way to manage resources and Grid Gate provided by HTCondor helps to interface with the Grid environment. In this paper, the overview on the essential features of HTCondor exploited in this work is described and the practical examples for HTCondor cluster configuration in our cases are presented.
Grid accounting service: state and future development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levshina, T.; Sehgal, C.; Bockelman, B.
2014-01-01
During the last decade, large-scale federated distributed infrastructures have been continually developed and expanded. One of the crucial components of a cyber-infrastructure is an accounting service that collects data related to resource utilization and identity of users using resources. The accounting service is important for verifying pledged resource allocation per particular groups and users, providing reports for funding agencies and resource providers, and understanding hardware provisioning requirements. It can also be used for end-to-end troubleshooting as well as billing purposes. In this work we describe Gratia, a federated accounting service jointly developed at Fermilab and Holland Computing Center at Universitymore » of Nebraska-Lincoln. The Open Science Grid, Fermilab, HCC, and several other institutions have used Gratia in production for several years. The current development activities include expanding Virtual Machines provisioning information, XSEDE allocation usage accounting, and Campus Grids resource utilization. We also identify the direction of future work: improvement and expansion of Cloud accounting, persistent and elastic storage space allocation, and the incorporation of WAN and LAN network metrics.« less
NASA Astrophysics Data System (ADS)
Okamoto, Taro; Takenaka, Hiroshi; Nakamura, Takeshi; Aoki, Takayuki
2010-12-01
We adopted the GPU (graphics processing unit) to accelerate the large-scale finite-difference simulation of seismic wave propagation. The simulation can benefit from the high-memory bandwidth of GPU because it is a "memory intensive" problem. In a single-GPU case we achieved a performance of about 56 GFlops, which was about 45-fold faster than that achieved by a single core of the host central processing unit (CPU). We confirmed that the optimized use of fast shared memory and registers were essential for performance. In the multi-GPU case with three-dimensional domain decomposition, the non-contiguous memory alignment in the ghost zones was found to impose quite long time in data transfer between GPU and the host node. This problem was solved by using contiguous memory buffers for ghost zones. We achieved a performance of about 2.2 TFlops by using 120 GPUs and 330 GB of total memory: nearly (or more than) 2200 cores of host CPUs would be required to achieve the same performance. The weak scaling was nearly proportional to the number of GPUs. We therefore conclude that GPU computing for large-scale simulation of seismic wave propagation is a promising approach as a faster simulation is possible with reduced computational resources compared to CPUs.
Operating Dedicated Data Centers - Is It Cost-Effective?
NASA Astrophysics Data System (ADS)
Ernst, M.; Hogue, R.; Hollowell, C.; Strecker-Kellog, W.; Wong, A.; Zaytsev, A.
2014-06-01
The advent of cloud computing centres such as Amazon's EC2 and Google's Computing Engine has elicited comparisons with dedicated computing clusters. Discussions on appropriate usage of cloud resources (both academic and commercial) and costs have ensued. This presentation discusses a detailed analysis of the costs of operating and maintaining the RACF (RHIC and ATLAS Computing Facility) compute cluster at Brookhaven National Lab and compares them with the cost of cloud computing resources under various usage scenarios. An extrapolation of likely future cost effectiveness of dedicated computing resources is also presented.
Ehsan, Shoaib; Clark, Adrian F.; ur Rehman, Naveed; McDonald-Maier, Klaus D.
2015-01-01
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems. PMID:26184211
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
Ehsan, Shoaib; Clark, Adrian F; Naveed ur Rehman; McDonald-Maier, Klaus D
2015-07-10
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.
Computing the Envelope for Stepwise-Constant Resource Allocations
NASA Technical Reports Server (NTRS)
Muscettola, Nicola; Clancy, Daniel (Technical Monitor)
2002-01-01
Computing tight resource-level bounds is a fundamental problem in the construction of flexible plans with resource utilization. In this paper we describe an efficient algorithm that builds a resource envelope, the tightest possible such bound. The algorithm is based on transforming the temporal network of resource consuming and producing events into a flow network with nodes equal to the events and edges equal to the necessary predecessor links between events. A staged maximum flow problem on the network is then used to compute the time of occurrence and the height of each step of the resource envelope profile. Each stage has the same computational complexity of solving a maximum flow problem on the entire flow network. This makes this method computationally feasible and promising for use in the inner loop of flexible-time scheduling algorithms.
The CMS Tier0 goes cloud and grid for LHC Run 2
Hufnagel, Dirk
2015-12-23
In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threadedmore » framework to deal with the increased event complexity and to ensure efficient use of the resources. Furthermore, this contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.« less
The CMS TierO goes Cloud and Grid for LHC Run 2
NASA Astrophysics Data System (ADS)
Hufnagel, Dirk
2015-12-01
In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threaded framework to deal with the increased event complexity and to ensure efficient use of the resources. This contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Newman, G.A.; Commer, M.
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/Lmore » supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.« less
Large-Scale Compute-Intensive Analysis via a Combined In-situ and Co-scheduling Workflow Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Messer, Bronson; Sewell, Christopher; Heitmann, Katrin
2015-01-01
Large-scale simulations can produce tens of terabytes of data per analysis cycle, complicating and limiting the efficiency of workflows. Traditionally, outputs are stored on the file system and analyzed in post-processing. With the rapidly increasing size and complexity of simulations, this approach faces an uncertain future. Trending techniques consist of performing the analysis in situ, utilizing the same resources as the simulation, and/or off-loading subsets of the data to a compute-intensive analysis system. We introduce an analysis framework developed for HACC, a cosmological N-body code, that uses both in situ and co-scheduling approaches for handling Petabyte-size outputs. An initial inmore » situ step is used to reduce the amount of data to be analyzed, and to separate out the data-intensive tasks handled off-line. The analysis routines are implemented using the PISTON/VTK-m framework, allowing a single implementation of an algorithm that simultaneously targets a variety of GPU, multi-core, and many-core architectures.« less
TomoMiner and TomoMinerCloud: A software platform for large-scale subtomogram structural analysis
Frazier, Zachary; Xu, Min; Alber, Frank
2017-01-01
SUMMARY Cryo-electron tomography (cryoET) captures the 3D electron density distribution of macromolecular complexes in close to native state. With the rapid advance of cryoET acquisition technologies, it is possible to generate large numbers (>100,000) of subtomograms, each containing a macromolecular complex. Often, these subtomograms represent a heterogeneous sample due to variations in structure and composition of a complex in situ form or because particles are a mixture of different complexes. In this case subtomograms must be classified. However, classification of large numbers of subtomograms is a time-intensive task and often a limiting bottleneck. This paper introduces an open source software platform, TomoMiner, for large-scale subtomogram classification, template matching, subtomogram averaging, and alignment. Its scalable and robust parallel processing allows efficient classification of tens to hundreds of thousands of subtomograms. Additionally, TomoMiner provides a pre-configured TomoMinerCloud computing service permitting users without sufficient computing resources instant access to TomoMiners high-performance features. PMID:28552576
Attention Modulates Spatial Precision in Multiple-Object Tracking.
Srivastava, Nisheeth; Vul, Ed
2016-01-01
We present a computational model of multiple-object tracking that makes trial-level predictions about the allocation of visual attention and the effect of this allocation on observers' ability to track multiple objects simultaneously. This model follows the intuition that increased attention to a location increases the spatial resolution of its internal representation. Using a combination of empirical and computational experiments, we demonstrate the existence of a tight coupling between cognitive and perceptual resources in this task: Low-level tracking of objects generates bottom-up predictions of error likelihood, and high-level attention allocation selectively reduces error probabilities in attended locations while increasing it at non-attended locations. Whereas earlier models of multiple-object tracking have predicted the big picture relationship between stimulus complexity and response accuracy, our approach makes accurate predictions of both the macro-scale effect of target number and velocity on tracking difficulty and micro-scale variations in difficulty across individual trials and targets arising from the idiosyncratic within-trial interactions of targets and distractors. Copyright © 2016 Cognitive Science Society, Inc.
Large Scale Analysis of Geospatial Data with Dask and XArray
NASA Astrophysics Data System (ADS)
Zender, C. S.; Hamman, J.; Abernathey, R.; Evans, K. J.; Rocklin, M.; Zender, C. S.; Rocklin, M.
2017-12-01
The analysis of geospatial data with high level languages has acceleratedinnovation and the impact of existing data resources. However, as datasetsgrow beyond single-machine memory, data structures within these high levellanguages can become a bottleneck. New libraries like Dask and XArray resolve some of these scalability issues,providing interactive workflows that are both familiar tohigh-level-language researchers while also scaling out to much largerdatasets. This broadens the access of researchers to larger datasets on highperformance computers and, through interactive development, reducestime-to-insight when compared to traditional parallel programming techniques(MPI). This talk describes Dask, a distributed dynamic task scheduler, Dask.array, amulti-dimensional array that copies the popular NumPy interface, and XArray,a library that wraps NumPy/Dask.array with labeled and indexes axes,implementing the CF conventions. We discuss both the basic design of theselibraries and how they change interactive analysis of geospatial data, and alsorecent benefits and challenges of distributed computing on clusters ofmachines.
Ohue, Masahito; Shimoda, Takehiro; Suzuki, Shuji; Matsuzaki, Yuri; Ishida, Takashi; Akiyama, Yutaka
2014-11-15
The application of protein-protein docking in large-scale interactome analysis is a major challenge in structural bioinformatics and requires huge computing resources. In this work, we present MEGADOCK 4.0, an FFT-based docking software that makes extensive use of recent heterogeneous supercomputers and shows powerful, scalable performance of >97% strong scaling. MEGADOCK 4.0 is written in C++ with OpenMPI and NVIDIA CUDA 5.0 (or later) and is freely available to all academic and non-profit users at: http://www.bi.cs.titech.ac.jp/megadock. akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.