High-performance scientific computing in the cloud
NASA Astrophysics Data System (ADS)
Jorissen, Kevin; Vila, Fernando; Rehr, John
2011-03-01
Cloud computing has the potential to open up high-performance computational science to a much broader class of researchers, owing to its ability to provide on-demand, virtualized computational resources. However, before such approaches can become commonplace, user-friendly tools must be developed that hide the unfamiliar cloud environment and streamline the management of cloud resources for many scientific applications. We have recently shown that high-performance cloud computing is feasible for parallelized x-ray spectroscopy calculations. We now present benchmark results for a wider selection of scientific applications focusing on electronic structure and spectroscopic simulation software in condensed matter physics. These applications are driven by an improved portable interface that can manage virtual clusters and run various applications in the cloud. We also describe a next generation of cluster tools, aimed at improved performance and a more robust cluster deployment. Supported by NSF grant OCI-1048052.
Evaluating the Efficacy of the Cloud for Cluster Computation
NASA Technical Reports Server (NTRS)
Knight, David; Shams, Khawaja; Chang, George; Soderstrom, Tom
2012-01-01
Computing requirements vary by industry, and it follows that NASA and other research organizations have computing demands that fall outside the mainstream. While cloud computing made rapid inroads for tasks such as powering web applications, performance issues on highly distributed tasks hindered early adoption for scientific computation. One venture to address this problem is Nebula, NASA's homegrown cloud project tasked with delivering science-quality cloud computing resources. However, another industry development is Amazon's high-performance computing (HPC) instances on Elastic Cloud Compute (EC2) that promises improved performance for cluster computation. This paper presents results from a series of benchmarks run on Amazon EC2 and discusses the efficacy of current commercial cloud technology for running scientific applications across a cluster. In particular, a 240-core cluster of cloud instances achieved 2 TFLOPS on High-Performance Linpack (HPL) at 70% of theoretical computational performance. The cluster's local network also demonstrated sub-100 ?s inter-process latency with sustained inter-node throughput in excess of 8 Gbps. Beyond HPL, a real-world Hadoop image processing task from NASA's Lunar Mapping and Modeling Project (LMMP) was run on a 29 instance cluster to process lunar and Martian surface images with sizes on the order of tens of gigapixels. These results demonstrate that while not a rival of dedicated supercomputing clusters, commercial cloud technology is now a feasible option for moderately demanding scientific workloads.
A high performance scientific cloud computing environment for materials simulations
NASA Astrophysics Data System (ADS)
Jorissen, K.; Vila, F. D.; Rehr, J. J.
2012-09-01
We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.
Scaling predictive modeling in drug development with cloud computing.
Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola
2015-01-26
Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
HPC on Competitive Cloud Resources
NASA Astrophysics Data System (ADS)
Bientinesi, Paolo; Iakymchuk, Roman; Napper, Jeff
Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures--traditionally ignored in the high-performance computing context--now should complement or even substitute the standard definitions of efficiency.
RAPPORT: running scientific high-performance computing applications on the cloud.
Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt
2013-01-28
Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.
Automating NEURON Simulation Deployment in Cloud Resources.
Stockton, David B; Santamaria, Fidel
2017-01-01
Simulations in neuroscience are performed on local servers or High Performance Computing (HPC) facilities. Recently, cloud computing has emerged as a potential computational platform for neuroscience simulation. In this paper we compare and contrast HPC and cloud resources for scientific computation, then report how we deployed NEURON, a widely used simulator of neuronal activity, in three clouds: Chameleon Cloud, a hybrid private academic cloud for cloud technology research based on the OpenStack software; Rackspace, a public commercial cloud, also based on OpenStack; and Amazon Elastic Cloud Computing, based on Amazon's proprietary software. We describe the manual procedures and how to automate cloud operations. We describe extending our simulation automation software called NeuroManager (Stockton and Santamaria, Frontiers in Neuroinformatics, 2015), so that the user is capable of recruiting private cloud, public cloud, HPC, and local servers simultaneously with a simple common interface. We conclude by performing several studies in which we examine speedup, efficiency, total session time, and cost for sets of simulations of a published NEURON model.
Automating NEURON Simulation Deployment in Cloud Resources
Santamaria, Fidel
2016-01-01
Simulations in neuroscience are performed on local servers or High Performance Computing (HPC) facilities. Recently, cloud computing has emerged as a potential computational platform for neuroscience simulation. In this paper we compare and contrast HPC and cloud resources for scientific computation, then report how we deployed NEURON, a widely used simulator of neuronal activity, in three clouds: Chameleon Cloud, a hybrid private academic cloud for cloud technology research based on the Open-Stack software; Rackspace, a public commercial cloud, also based on OpenStack; and Amazon Elastic Cloud Computing, based on Amazon’s proprietary software. We describe the manual procedures and how to automate cloud operations. We describe extending our simulation automation software called NeuroManager (Stockton and Santamaria, Frontiers in Neuroinformatics, 2015), so that the user is capable of recruiting private cloud, public cloud, HPC, and local servers simultaneously with a simple common interface. We conclude by performing several studies in which we examine speedup, efficiency, total session time, and cost for sets of simulations of a published NEURON model. PMID:27655341
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian; ...
2017-09-29
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian
Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
Translational bioinformatics in the cloud: an affordable alternative
2010-01-01
With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-07
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
Integration of High-Performance Computing into Cloud Computing Services
NASA Astrophysics Data System (ADS)
Vouk, Mladen A.; Sills, Eric; Dreher, Patrick
High-Performance Computing (HPC) projects span a spectrum of computer hardware implementations ranging from peta-flop supercomputers, high-end tera-flop facilities running a variety of operating systems and applications, to mid-range and smaller computational clusters used for HPC application development, pilot runs and prototype staging clusters. What they all have in common is that they operate as a stand-alone system rather than a scalable and shared user re-configurable resource. The advent of cloud computing has changed the traditional HPC implementation. In this article, we will discuss a very successful production-level architecture and policy framework for supporting HPC services within a more general cloud computing infrastructure. This integrated environment, called Virtual Computing Lab (VCL), has been operating at NC State since fall 2004. Nearly 8,500,000 HPC CPU-Hrs were delivered by this environment to NC State faculty and students during 2009. In addition, we present and discuss operational data that show that integration of HPC and non-HPC (or general VCL) services in a cloud can substantially reduce the cost of delivering cloud services (down to cents per CPU hour).
Making Cloud Computing Available For Researchers and Innovators (Invited)
NASA Astrophysics Data System (ADS)
Winsor, R.
2010-12-01
High Performance Computing (HPC) facilities exist in most academic institutions but are almost invariably over-subscribed. Access is allocated based on academic merit, the only practical method of assigning valuable finite compute resources. Cloud computing on the other hand, and particularly commercial clouds, draw flexibly on an almost limitless resource as long as the user has sufficient funds to pay the bill. How can the commercial cloud model be applied to scientific computing? Is there a case to be made for a publicly available research cloud and how would it be structured? This talk will explore these themes and describe how Cybera, a not-for-profit non-governmental organization in Alberta Canada, aims to leverage its high speed research and education network to provide cloud computing facilities for a much wider user base.
Understanding the Performance and Potential of Cloud Computing for Scientific Applications
Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin; ...
2015-02-19
In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
Understanding the Performance and Potential of Cloud Computing for Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin
In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
Security and Cloud Outsourcing Framework for Economic Dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Security and Cloud Outsourcing Framework for Economic Dispatch
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi; ...
2017-04-24
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
The application of cloud computing to scientific workflows: a study of cost and performance.
Berriman, G Bruce; Deelman, Ewa; Juve, Gideon; Rynge, Mats; Vöckler, Jens-S
2013-01-28
The current model of transferring data from data centres to desktops for analysis will soon be rendered impractical by the accelerating growth in the volume of science datasets. Processing will instead often take place on high-performance servers co-located with data. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. We report here the results of investigations of the applicability of commercial cloud computing to scientific computing, with an emphasis on astronomy, including investigations of what types of applications can be run cheaply and efficiently on the cloud, and an example of an application well suited to the cloud: processing a large dataset to create a new science product.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure
NASA Astrophysics Data System (ADS)
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-01
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
A Weibull distribution accrual failure detector for cloud computing.
Liu, Jiaxi; Wu, Zhibo; Wu, Jin; Dong, Jian; Zhao, Yao; Wen, Dongxin
2017-01-01
Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve this problem, a new accrual failure detector based on Weibull Distribution, called the Weibull Distribution Failure Detector, has been proposed specifically for cloud computing. It can adapt to the dynamic and unexpected network conditions in cloud computing. The performance of the Weibull Distribution Failure Detector is evaluated and compared based on public classical experiment data and cloud computing experiment data. The results show that the Weibull Distribution Failure Detector has better performance in terms of speed and accuracy in unstable scenarios, especially in cloud computing.
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
Challenges and opportunities of cloud computing for atmospheric sciences
NASA Astrophysics Data System (ADS)
Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.
2016-04-01
Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.
An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform
NASA Technical Reports Server (NTRS)
Saini, Subhash; Heistand, Steve; Jin, Haoqiang; Chang, Johnny; Hood, Robert T.; Mehrotra, Piyush; Biswas, Rupak
2012-01-01
The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.
Arctic Boreal Vulnerability Experiment (ABoVE) Science Cloud
NASA Astrophysics Data System (ADS)
Duffy, D.; Schnase, J. L.; McInerney, M.; Webster, W. P.; Sinno, S.; Thompson, J. H.; Griffith, P. C.; Hoy, E.; Carroll, M.
2014-12-01
The effects of climate change are being revealed at alarming rates in the Arctic and Boreal regions of the planet. NASA's Terrestrial Ecology Program has launched a major field campaign to study these effects over the next 5 to 8 years. The Arctic Boreal Vulnerability Experiment (ABoVE) will challenge scientists to take measurements in the field, study remote observations, and even run models to better understand the impacts of a rapidly changing climate for areas of Alaska and western Canada. The NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center (GSFC) has partnered with the Terrestrial Ecology Program to create a science cloud designed for this field campaign - the ABoVE Science Cloud. The cloud combines traditional high performance computing with emerging technologies to create an environment specifically designed for large-scale climate analytics. The ABoVE Science Cloud utilizes (1) virtualized high-speed InfiniBand networks, (2) a combination of high-performance file systems and object storage, and (3) virtual system environments tailored for data intensive, science applications. At the center of the architecture is a large object storage environment, much like a traditional high-performance file system, that supports data proximal processing using technologies like MapReduce on a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high performance compute resources with many processing cores and large memory coupled to the storage through an InfiniBand network. Virtual systems can be tailored to a specific scientist and provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems. In this talk, we will present the architectural components of the science cloud and examples of how it is being used to meet the needs of the ABoVE campaign. In our experience, the science cloud approach significantly lowers the barriers and risks to organizations that require high performance computing solutions and provides the NCCS with the agility required to meet our customers' rapidly increasing and evolving requirements.
Exploring Cloud Computing for Large-scale Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Guang; Han, Binh; Yin, Jian
This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-01-01
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. PMID:28208684
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-02-12
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.
Templet Web: the use of volunteer computing approach in PaaS-style cloud
NASA Astrophysics Data System (ADS)
Vostokin, Sergei; Artamonov, Yuriy; Tsarev, Daniil
2018-03-01
This article presents the Templet Web cloud service. The service is designed for high-performance scientific computing automation. The use of high-performance technology is specifically required by new fields of computational science such as data mining, artificial intelligence, machine learning, and others. Cloud technologies provide a significant cost reduction for high-performance scientific applications. The main objectives to achieve this cost reduction in the Templet Web service design are: (a) the implementation of "on-demand" access; (b) source code deployment management; (c) high-performance computing programs development automation. The distinctive feature of the service is the approach mainly used in the field of volunteer computing, when a person who has access to a computer system delegates his access rights to the requesting user. We developed an access procedure, algorithms, and software for utilization of free computational resources of the academic cluster system in line with the methods of volunteer computing. The Templet Web service has been in operation for five years. It has been successfully used for conducting laboratory workshops and solving research problems, some of which are considered in this article. The article also provides an overview of research directions related to service development.
A Weibull distribution accrual failure detector for cloud computing
Wu, Zhibo; Wu, Jin; Zhao, Yao; Wen, Dongxin
2017-01-01
Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve this problem, a new accrual failure detector based on Weibull Distribution, called the Weibull Distribution Failure Detector, has been proposed specifically for cloud computing. It can adapt to the dynamic and unexpected network conditions in cloud computing. The performance of the Weibull Distribution Failure Detector is evaluated and compared based on public classical experiment data and cloud computing experiment data. The results show that the Weibull Distribution Failure Detector has better performance in terms of speed and accuracy in unstable scenarios, especially in cloud computing. PMID:28278229
NASA Astrophysics Data System (ADS)
Huang, Qian
2014-09-01
Scientific computing often requires the availability of a massive number of computers for performing large-scale simulations, and computing in mineral physics is no exception. In order to investigate physical properties of minerals at extreme conditions in computational mineral physics, parallel computing technology is used to speed up the performance by utilizing multiple computer resources to process a computational task simultaneously thereby greatly reducing computation time. Traditionally, parallel computing has been addressed by using High Performance Computing (HPC) solutions and installed facilities such as clusters and super computers. Today, it has been seen that there is a tremendous growth in cloud computing. Infrastructure as a Service (IaaS), the on-demand and pay-as-you-go model, creates a flexible and cost-effective mean to access computing resources. In this paper, a feasibility report of HPC on a cloud infrastructure is presented. It is found that current cloud services in IaaS layer still need to improve performance to be useful to research projects. On the other hand, Software as a Service (SaaS), another type of cloud computing, is introduced into an HPC system for computing in mineral physics, and an application of which is developed. In this paper, an overall description of this SaaS application is presented. This contribution can promote cloud application development in computational mineral physics, and cross-disciplinary studies.
Yokohama, Noriya
2013-07-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
Exploiting GPUs in Virtual Machine for BioCloud
Jo, Heeseung; Jeong, Jinkyu; Lee, Myoungho; Choi, Dong Hoon
2013-01-01
Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E) channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment. PMID:23710465
Exploiting GPUs in virtual machine for BioCloud.
Jo, Heeseung; Jeong, Jinkyu; Lee, Myoungho; Choi, Dong Hoon
2013-01-01
Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E) channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment.
Model-as-a-service (MaaS) using the cloud service innovation platform (CSIP)
USDA-ARS?s Scientific Manuscript database
Cloud infrastructures for modelling activities such as data processing, performing environmental simulations, or conducting model calibrations/optimizations provide a cost effective alternative to traditional high performance computing approaches. Cloud-based modelling examples emerged into the more...
Cloud computing applications for biomedical science: A perspective.
Navale, Vivek; Bourne, Philip E
2018-06-01
Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Cloud computing applications for biomedical science: A perspective
2018-01-01
Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud.
Cianfrocco, Michael A; Leschziner, Andres E
2015-05-08
The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available 'off-the-shelf' computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16-480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.
Cloud Computing with iPlant Atmosphere.
McKay, Sheldon J; Skidmore, Edwin J; LaRose, Christopher J; Mercer, Andre W; Noutsos, Christos
2013-10-15
Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. Copyright © 2013 John Wiley & Sons, Inc.
Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing.
Shatil, Anwar S; Younas, Sohail; Pourreza, Hossein; Figley, Chase R
2015-01-01
With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications.
Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing
Shatil, Anwar S.; Younas, Sohail; Pourreza, Hossein; Figley, Chase R.
2015-01-01
With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications. PMID:27279746
Bent, John M.; Faibish, Sorin; Grider, Gary
2016-04-19
Cloud object storage is enabled for checkpoints of high performance computing applications using a middleware process. A plurality of files, such as checkpoint files, generated by a plurality of processes in a parallel computing system are stored by obtaining said plurality of files from said parallel computing system; converting said plurality of files to objects using a log structured file system middleware process; and providing said objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.
Midekisa, Alemayehu; Holl, Felix; Savory, David J; Andrade-Pacheco, Ricardo; Gething, Peter W; Bennett, Adam; Sturrock, Hugh J W
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth's land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources.
Holl, Felix; Savory, David J.; Andrade-Pacheco, Ricardo; Gething, Peter W.; Bennett, Adam; Sturrock, Hugh J. W.
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth’s land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources. PMID:28953943
Cloud Computing and Its Applications in GIS
NASA Astrophysics Data System (ADS)
Kang, Cao
2011-12-01
Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
AceCloud: Molecular Dynamics Simulations in the Cloud.
Harvey, M J; De Fabritiis, G
2015-05-26
We present AceCloud, an on-demand service for molecular dynamics simulations. AceCloud is designed to facilitate the secure execution of large ensembles of simulations on an external cloud computing service (currently Amazon Web Services). The AceCloud client, integrated into the ACEMD molecular dynamics package, provides an easy-to-use interface that abstracts all aspects of interaction with the cloud services. This gives the user the experience that all simulations are running on their local machine, minimizing the learning curve typically associated with the transition to using high performance computing services.
Enabling Large-Scale Biomedical Analysis in the Cloud
Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen
2013-01-01
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
Bent, John M.; Faibish, Sorin; Grider, Gary
2015-06-30
Cloud object storage is enabled for archived data, such as checkpoints and results, of high performance computing applications using a middleware process. A plurality of archived files, such as checkpoint files and results, generated by a plurality of processes in a parallel computing system are stored by obtaining the plurality of archived files from the parallel computing system; converting the plurality of archived files to objects using a log structured file system middleware process; and providing the objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.
Spontaneous Ad Hoc Mobile Cloud Computing Network
Lacuesta, Raquel; Sendra, Sandra; Peñalver, Lourdes
2014-01-01
Cloud computing helps users and companies to share computing resources instead of having local servers or personal devices to handle the applications. Smart devices are becoming one of the main information processing devices. Their computing features are reaching levels that let them create a mobile cloud computing network. But sometimes they are not able to create it and collaborate actively in the cloud because it is difficult for them to build easily a spontaneous network and configure its parameters. For this reason, in this paper, we are going to present the design and deployment of a spontaneous ad hoc mobile cloud computing network. In order to perform it, we have developed a trusted algorithm that is able to manage the activity of the nodes when they join and leave the network. The paper shows the network procedures and classes that have been designed. Our simulation results using Castalia show that our proposal presents a good efficiency and network performance even by using high number of nodes. PMID:25202715
Spontaneous ad hoc mobile cloud computing network.
Lacuesta, Raquel; Lloret, Jaime; Sendra, Sandra; Peñalver, Lourdes
2014-01-01
Cloud computing helps users and companies to share computing resources instead of having local servers or personal devices to handle the applications. Smart devices are becoming one of the main information processing devices. Their computing features are reaching levels that let them create a mobile cloud computing network. But sometimes they are not able to create it and collaborate actively in the cloud because it is difficult for them to build easily a spontaneous network and configure its parameters. For this reason, in this paper, we are going to present the design and deployment of a spontaneous ad hoc mobile cloud computing network. In order to perform it, we have developed a trusted algorithm that is able to manage the activity of the nodes when they join and leave the network. The paper shows the network procedures and classes that have been designed. Our simulation results using Castalia show that our proposal presents a good efficiency and network performance even by using high number of nodes.
Cloud Computing for Protein-Ligand Binding Site Comparison
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery. PMID:23762824
Cloud computing for protein-ligand binding site comparison.
Hung, Che-Lun; Hua, Guan-Jie
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery.
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud
Cianfrocco, Michael A; Leschziner, Andres E
2015-01-01
The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available ‘off-the-shelf’ computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16–480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM. DOI: http://dx.doi.org/10.7554/eLife.06664.001 PMID:25955969
Galaxy CloudMan: delivering cloud compute clusters.
Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James
2010-12-21
Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing
NASA Astrophysics Data System (ADS)
Bazhirov, Timur; Mohammadi, Mohammad; Ding, Kevin; Barabash, Sergey
Recent advances in cloud computing made it possible to access large-scale computational resources completely on-demand in a rapid and efficient manner. When combined with high fidelity simulations, they serve as an alternative pathway to enable computational discovery and design of new materials through large-scale high-throughput screening. Here, we present a case study for a cloud platform implemented at Exabyte Inc. We perform calculations to screen lightweight ternary alloys for thermodynamic stability. Due to the lack of experimental data for most such systems, we rely on theoretical approaches based on first-principle pseudopotential density functional theory. We calculate the formation energies for a set of ternary compounds approximated by special quasirandom structures. During an example run we were able to scale to 10,656 CPUs within 7 minutes from the start, and obtain results for 296 compounds within 38 hours. The results indicate that the ultimate formation enthalpy of ternary systems can be negative for some of lightweight alloys, including Li and Mg compounds. We conclude that compared to traditional capital-intensive approach that requires in on-premises hardware resources, cloud computing is agile and cost-effective, yet scalable and delivers similar performance.
Signal and image processing algorithm performance in a virtual and elastic computing environment
NASA Astrophysics Data System (ADS)
Bennett, Kelly W.; Robertson, James
2013-05-01
The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing.
Trudgian, David C; Mirzaei, Hamid
2012-12-07
We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include modular local and remote scheduling for data processing jobs. The pipeline can now be run on a single PC or server, a local cluster, a remote HPC cluster, and/or the Amazon Web Services (AWS) cloud. We provide public images that allow easy deployment of CPFP in its entirety in the AWS cloud. This significantly reduces the effort necessary to use the software, and allows proteomics laboratories to pay for compute time ad hoc, rather than obtaining and maintaining expensive local server clusters. Alternatively the Amazon cloud can be used to increase the throughput of a local installation of CPFP as necessary. We demonstrate that cloud CPFP allows users to process data at higher speed than local installations but with similar cost and lower staff requirements. In addition to the computational improvements, the web interface to CPFP is simplified, and other functionalities are enhanced. The software is under active development at two leading institutions and continues to be released under an open-source license at http://cpfp.sourceforge.net.
Galaxy CloudMan: delivering cloud compute clusters
2010-01-01
Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm
Chen, Jui-Le; Yang, Chu-Sing
2013-01-01
The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.
Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P
2010-01-15
A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.
Cloud4Psi: cloud computing for 3D protein structure similarity searching.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-10-01
Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-01-01
Summary: Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Availability and implementation: Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. Contact: dariusz.mrozek@polsl.pl PMID:24930141
Cloud Computing Boosts Business Intelligence of Telecommunication Industry
NASA Astrophysics Data System (ADS)
Xu, Meng; Gao, Dan; Deng, Chao; Luo, Zhiguo; Sun, Shaoling
Business Intelligence becomes an attracting topic in today's data intensive applications, especially in telecommunication industry. Meanwhile, Cloud Computing providing IT supporting Infrastructure with excellent scalability, large scale storage, and high performance becomes an effective way to implement parallel data processing and data mining algorithms. BC-PDM (Big Cloud based Parallel Data Miner) is a new MapReduce based parallel data mining platform developed by CMRI (China Mobile Research Institute) to fit the urgent requirements of business intelligence in telecommunication industry. In this paper, the architecture, functionality and performance of BC-PDM are presented, together with the experimental evaluation and case studies of its applications. The evaluation result demonstrates both the usability and the cost-effectiveness of Cloud Computing based Business Intelligence system in applications of telecommunication industry.
2011-08-01
5 Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis...classification of streaming data. Example input images (top left). All digit prototypes (cluster centers) found, with size proportional to frequency (top...Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis 1 http
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Stocker, John C.; Golomb, Andrew M.
2011-01-01
Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
2011-01-01
Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Are Cloud Environments Ready for Scientific Applications?
NASA Astrophysics Data System (ADS)
Mehrotra, P.; Shackleford, K.
2011-12-01
Cloud computing environments are becoming widely available both in the commercial and government sectors. They provide flexibility to rapidly provision resources in order to meet dynamic and changing computational needs without the customers incurring capital expenses and/or requiring technical expertise. Clouds also provide reliable access to resources even though the end-user may not have in-house expertise for acquiring or operating such resources. Consolidation and pooling in a cloud environment allow organizations to achieve economies of scale in provisioning or procuring computing resources and services. Because of these and other benefits, many businesses and organizations are migrating their business applications (e.g., websites, social media, and business processes) to cloud environments-evidenced by the commercial success of offerings such as the Amazon EC2. In this paper, we focus on the feasibility of utilizing cloud environments for scientific workloads and workflows particularly of interest to NASA scientists and engineers. There is a wide spectrum of such technical computations. These applications range from small workstation-level computations to mid-range computing requiring small clusters to high-performance simulations requiring supercomputing systems with high bandwidth/low latency interconnects. Data-centric applications manage and manipulate large data sets such as satellite observational data and/or data previously produced by high-fidelity modeling and simulation computations. Most of the applications are run in batch mode with static resource requirements. However, there do exist situations that have dynamic demands, particularly ones with public-facing interfaces providing information to the general public, collaborators and partners, as well as to internal NASA users. In the last few months we have been studying the suitability of cloud environments for NASA's technical and scientific workloads. We have ported several applications to multiple cloud environments including NASA's Nebula environment, Amazon's EC2, Magellan at NERSC, and SGI's Cyclone system. We critically examined the performance of the applications on these systems. We also collected information on the usability of these cloud environments. In this talk we will present the results of our study focusing on the efficacy of using clouds for NASA's scientific applications.
ERIC Educational Resources Information Center
Fredette, Michelle
2012-01-01
"Rent or buy?" is a question people ask about everything from housing to textbooks. It is also a question universities must consider when it comes to high-performance computing (HPC). With the advent of Amazon's Elastic Compute Cloud (EC2), Microsoft Windows HPC Server, Rackspace's OpenStack, and other cloud-based services, researchers now have…
CloudMC: a cloud computing application for Monte Carlo simulation.
Miras, H; Jiménez, R; Miras, C; Gomà, C
2013-04-21
This work presents CloudMC, a cloud computing application-developed in Windows Azure®, the platform of the Microsoft® cloud-for the parallelization of Monte Carlo simulations in a dynamic virtual cluster. CloudMC is a web application designed to be independent of the Monte Carlo code in which the simulations are based-the simulations just need to be of the form: input files → executable → output files. To study the performance of CloudMC in Windows Azure®, Monte Carlo simulations with penelope were performed on different instance (virtual machine) sizes, and for different number of instances. The instance size was found to have no effect on the simulation runtime. It was also found that the decrease in time with the number of instances followed Amdahl's law, with a slight deviation due to the increase in the fraction of non-parallelizable time with increasing number of instances. A simulation that would have required 30 h of CPU on a single instance was completed in 48.6 min when executed on 64 instances in parallel (speedup of 37 ×). Furthermore, the use of cloud computing for parallel computing offers some advantages over conventional clusters: high accessibility, scalability and pay per usage. Therefore, it is strongly believed that cloud computing will play an important role in making Monte Carlo dose calculation a reality in future clinical practice.
Capabilities and Advantages of Cloud Computing in the Implementation of Electronic Health Record.
Ahmadi, Maryam; Aslani, Nasim
2018-01-01
With regard to the high cost of the Electronic Health Record (EHR), in recent years the use of new technologies, in particular cloud computing, has increased. The purpose of this study was to review systematically the studies conducted in the field of cloud computing. The present study was a systematic review conducted in 2017. Search was performed in the Scopus, Web of Sciences, IEEE, Pub Med and Google Scholar databases by combination keywords. From the 431 article that selected at the first, after applying the inclusion and exclusion criteria, 27 articles were selected for surveyed. Data gathering was done by a self-made check list and was analyzed by content analysis method. The finding of this study showed that cloud computing is a very widespread technology. It includes domains such as cost, security and privacy, scalability, mutual performance and interoperability, implementation platform and independence of Cloud Computing, ability to search and exploration, reducing errors and improving the quality, structure, flexibility and sharing ability. It will be effective for electronic health record. According to the findings of the present study, higher capabilities of cloud computing are useful in implementing EHR in a variety of contexts. It also provides wide opportunities for managers, analysts and providers of health information systems. Considering the advantages and domains of cloud computing in the establishment of HER, it is recommended to use this technology.
Capabilities and Advantages of Cloud Computing in the Implementation of Electronic Health Record
Ahmadi, Maryam; Aslani, Nasim
2018-01-01
Background: With regard to the high cost of the Electronic Health Record (EHR), in recent years the use of new technologies, in particular cloud computing, has increased. The purpose of this study was to review systematically the studies conducted in the field of cloud computing. Methods: The present study was a systematic review conducted in 2017. Search was performed in the Scopus, Web of Sciences, IEEE, Pub Med and Google Scholar databases by combination keywords. From the 431 article that selected at the first, after applying the inclusion and exclusion criteria, 27 articles were selected for surveyed. Data gathering was done by a self-made check list and was analyzed by content analysis method. Results: The finding of this study showed that cloud computing is a very widespread technology. It includes domains such as cost, security and privacy, scalability, mutual performance and interoperability, implementation platform and independence of Cloud Computing, ability to search and exploration, reducing errors and improving the quality, structure, flexibility and sharing ability. It will be effective for electronic health record. Conclusion: According to the findings of the present study, higher capabilities of cloud computing are useful in implementing EHR in a variety of contexts. It also provides wide opportunities for managers, analysts and providers of health information systems. Considering the advantages and domains of cloud computing in the establishment of HER, it is recommended to use this technology. PMID:29719309
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Computational biology in the cloud: methods and new insights from computing at scale.
Kasson, Peter M
2013-01-01
The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
Cloud Computing for Complex Performance Codes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Appel, Gordon John; Hadgu, Teklu; Klein, Brandon Thorin
This report describes the use of cloud computing services for running complex public domain performance assessment problems. The work consisted of two phases: Phase 1 was to demonstrate complex codes, on several differently configured servers, could run and compute trivial small scale problems in a commercial cloud infrastructure. Phase 2 focused on proving non-trivial large scale problems could be computed in the commercial cloud environment. The cloud computing effort was successfully applied using codes of interest to the geohydrology and nuclear waste disposal modeling community.
Sector and Sphere: the design and implementation of a high-performance data cloud
Gu, Yunhong; Grossman, Robert L.
2009-01-01
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source. PMID:19451100
Design and deployment of an elastic network test-bed in IHEP data center based on SDN
NASA Astrophysics Data System (ADS)
Zeng, Shan; Qi, Fazhi; Chen, Gang
2017-10-01
High energy physics experiments produce huge amounts of raw data, while because of the sharing characteristics of the network resources, there is no guarantee of the available bandwidth for each experiment which may cause link congestion problems. On the other side, with the development of cloud computing technologies, IHEP have established a cloud platform based on OpenStack which can ensure the flexibility of the computing and storage resources, and more and more computing applications have been deployed on virtual machines established by OpenStack. However, under the traditional network architecture, network capability can’t be required elastically, which becomes the bottleneck of restricting the flexible application of cloud computing. In order to solve the above problems, we propose an elastic cloud data center network architecture based on SDN, and we also design a high performance controller cluster based on OpenDaylight. In the end, we present our current test results.
Study on the application of mobile internet cloud computing platform
NASA Astrophysics Data System (ADS)
Gong, Songchun; Fu, Songyin; Chen, Zheng
2012-04-01
The innovative development of computer technology promotes the application of the cloud computing platform, which actually is the substitution and exchange of a sort of resource service models and meets the needs of users on the utilization of different resources after changes and adjustments of multiple aspects. "Cloud computing" owns advantages in many aspects which not merely reduce the difficulties to apply the operating system and also make it easy for users to search, acquire and process the resources. In accordance with this point, the author takes the management of digital libraries as the research focus in this paper, and analyzes the key technologies of the mobile internet cloud computing platform in the operation process. The popularization and promotion of computer technology drive people to create the digital library models, and its core idea is to strengthen the optimal management of the library resource information through computers and construct an inquiry and search platform with high performance, allowing the users to access to the necessary information resources at any time. However, the cloud computing is able to promote the computations within the computers to distribute in a large number of distributed computers, and hence implement the connection service of multiple computers. The digital libraries, as a typical representative of the applications of the cloud computing, can be used to carry out an analysis on the key technologies of the cloud computing.
The role of dedicated data computing centers in the age of cloud computing
NASA Astrophysics Data System (ADS)
Caramarcu, Costin; Hollowell, Christopher; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr
2017-10-01
Brookhaven National Laboratory (BNL) anticipates significant growth in scientific programs with large computing and data storage needs in the near future and has recently reorganized support for scientific computing to meet these needs. A key component is the enhanced role of the RHIC-ATLAS Computing Facility (RACF) in support of high-throughput and high-performance computing (HTC and HPC) at BNL. This presentation discusses the evolving role of the RACF at BNL, in light of its growing portfolio of responsibilities and its increasing integration with cloud (academic and for-profit) computing activities. We also discuss BNL’s plan to build a new computing center to support the new responsibilities of the RACF and present a summary of the cost benefit analysis done, including the types of computing activities that benefit most from a local data center vs. cloud computing. This analysis is partly based on an updated cost comparison of Amazon EC2 computing services and the RACF, which was originally conducted in 2012.
Scientific Services on the Cloud
NASA Astrophysics Data System (ADS)
Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong
Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
Evaluating open-source cloud computing solutions for geosciences
NASA Astrophysics Data System (ADS)
Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong
2013-09-01
Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
Bao, Shunxing; Damon, Stephen M; Landman, Bennett A; Gokhale, Aniruddha
2016-02-27
Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical-Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for-use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline.
NASA Astrophysics Data System (ADS)
Bao, Shunxing; Damon, Stephen M.; Landman, Bennett A.; Gokhale, Aniruddha
2016-03-01
Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical- Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for- use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline.
Bao, Shunxing; Damon, Stephen M.; Landman, Bennett A.; Gokhale, Aniruddha
2016-01-01
Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical-Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for-use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline. PMID:27127335
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele
2014-11-11
has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away
NASA Astrophysics Data System (ADS)
Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.
2012-09-01
By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karthik, Rajasekar
2014-01-01
In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
Using a cloud to replenish parched groundwater modeling efforts.
Hunt, Randall J; Luchette, Joseph; Schreuder, Willem A; Rumbaugh, James O; Doherty, John; Tonkin, Matthew J; Rumbaugh, Douglas B
2010-01-01
Groundwater models can be improved by introduction of additional parameter flexibility and simultaneous use of soft-knowledge. However, these sophisticated approaches have high computational requirements. Cloud computing provides unprecedented access to computing power via the Internet to facilitate the use of these techniques. A modeler can create, launch, and terminate "virtual" computers as needed, paying by the hour, and save machine images for future use. Such cost-effective and flexible computing power empowers groundwater modelers to routinely perform model calibration and uncertainty analysis in ways not previously possible.
Using a cloud to replenish parched groundwater modeling efforts
Hunt, Randall J.; Luchette, Joseph; Schreuder, Willem A.; Rumbaugh, James O.; Doherty, John; Tonkin, Matthew J.; Rumbaugh, Douglas B.
2010-01-01
Groundwater models can be improved by introduction of additional parameter flexibility and simultaneous use of soft-knowledge. However, these sophisticated approaches have high computational requirements. Cloud computing provides unprecedented access to computing power via the Internet to facilitate the use of these techniques. A modeler can create, launch, and terminate “virtual” computers as needed, paying by the hour, and save machine images for future use. Such cost-effective and flexible computing power empowers groundwater modelers to routinely perform model calibration and uncertainty analysis in ways not previously possible.
GATE Monte Carlo simulation in a cloud computing environment
NASA Astrophysics Data System (ADS)
Rowedder, Blake Austin
The GEANT4-based GATE is a unique and powerful Monte Carlo (MC) platform, which provides a single code library allowing the simulation of specific medical physics applications, e.g. PET, SPECT, CT, radiotherapy, and hadron therapy. However, this rigorous yet flexible platform is used only sparingly in the clinic due to its lengthy calculation time. By accessing the powerful computational resources of a cloud computing environment, GATE's runtime can be significantly reduced to clinically feasible levels without the sizable investment of a local high performance cluster. This study investigated a reliable and efficient execution of GATE MC simulations using a commercial cloud computing services. Amazon's Elastic Compute Cloud was used to launch several nodes equipped with GATE. Job data was initially broken up on the local computer, then uploaded to the worker nodes on the cloud. The results were automatically downloaded and aggregated on the local computer for display and analysis. Five simulations were repeated for every cluster size between 1 and 20 nodes. Ultimately, increasing cluster size resulted in a decrease in calculation time that could be expressed with an inverse power model. Comparing the benchmark results to the published values and error margins indicated that the simulation results were not affected by the cluster size and thus that integrity of a calculation is preserved in a cloud computing environment. The runtime of a 53 minute long simulation was decreased to 3.11 minutes when run on a 20-node cluster. The ability to improve the speed of simulation suggests that fast MC simulations are viable for imaging and radiotherapy applications. With high power computing continuing to lower in price and accessibility, implementing Monte Carlo techniques with cloud computing for clinical applications will continue to become more attractive.
NASA Astrophysics Data System (ADS)
Nakatsuji, Noriaki; Matsushima, Kyoji
2017-03-01
Full-parallax high-definition CGHs composed of more than billion pixels were so far created only by the polygon-based method because of its high performance. However, GPUs recently allow us to generate CGHs much faster by the point cloud. In this paper, we measure computation time of object fields for full-parallax high-definition CGHs, which are composed of 4 billion pixels and reconstruct the same scene, by using the point cloud with GPU and the polygon-based method with CPU. In addition, we compare the optical and simulated reconstructions between CGHs created by these techniques to verify the image quality.
NASA Astrophysics Data System (ADS)
Khan, Kashif A.; Wang, Qi; Luo, Chunbo; Wang, Xinheng; Grecos, Christos
2014-05-01
Mobile cloud computing is receiving world-wide momentum for ubiquitous on-demand cloud services for mobile users provided by Amazon, Google etc. with low capital cost. However, Internet-centric clouds introduce wide area network (WAN) delays that are often intolerable for real-time applications such as video streaming. One promising approach to addressing this challenge is to deploy decentralized mini-cloud facility known as cloudlets to enable localized cloud services. When supported by local wireless connectivity, a wireless cloudlet is expected to offer low cost and high performance cloud services for the users. In this work, we implement a realistic framework that comprises both a popular Internet cloud (Amazon Cloud) and a real-world cloudlet (based on Ubuntu Enterprise Cloud (UEC)) for mobile cloud users in a wireless mesh network. We focus on real-time video streaming over the HTTP standard and implement a typical application. We further perform a comprehensive comparative analysis and empirical evaluation of the application's performance when it is delivered over the Internet cloud and the cloudlet respectively. The study quantifies the influence of the two different cloud networking architectures on supporting real-time video streaming. We also enable movement of the users in the wireless mesh network and investigate the effect of user's mobility on mobile cloud computing over the cloudlet and Amazon cloud respectively. Our experimental results demonstrate the advantages of the cloudlet paradigm over its Internet cloud counterpart in supporting the quality of service of real-time applications.
On the Large-Scaling Issues of Cloud-based Applications for Earth Science Dat
NASA Astrophysics Data System (ADS)
Hua, H.
2016-12-01
Next generation science data systems are needed to address the incoming flood of data from new missions such as NASA's SWOT and NISAR where its SAR data volumes and data throughput rates are order of magnitude larger than present day missions. Existing missions, such as OCO-2, may also require high turn-around time for processing different science scenarios where on-premise and even traditional HPC computing environments may not meet the high processing needs. Additionally, traditional means of procuring hardware on-premise are already limited due to facilities capacity constraints for these new missions. Experiences have shown that to embrace efficient cloud computing approaches for large-scale science data systems requires more than just moving existing code to cloud environments. At large cloud scales, we need to deal with scaling and cost issues. We present our experiences on deploying multiple instances of our hybrid-cloud computing science data system (HySDS) to support large-scale processing of Earth Science data products. We will explore optimization approaches to getting best performance out of hybrid-cloud computing as well as common issues that will arise when dealing with large-scale computing. Novel approaches were utilized to do processing on Amazon's spot market, which can potentially offer 75%-90% costs savings but with an unpredictable computing environment based on market forces.
Modeling the Cloud to Enhance Capabilities for Crises and Catastrophe Management
2016-11-16
order for cloud computing infrastructures to be successfully deployed in real world scenarios as tools for crisis and catastrophe management, where...Statement of the Problem Studied As cloud computing becomes the dominant computational infrastructure[1] and cloud technologies make a transition to hosting...1. Formulate rigorous mathematical models representing technological capabilities and resources in cloud computing for performance modeling and
Distributed MRI reconstruction using Gadgetron-based cloud computing.
Xue, Hui; Inati, Souheil; Sørensen, Thomas Sangild; Kellman, Peter; Hansen, Michael S
2015-03-01
To expand the open source Gadgetron reconstruction framework to support distributed computing and to demonstrate that a multinode version of the Gadgetron can be used to provide nonlinear reconstruction with clinically acceptable latency. The Gadgetron framework was extended with new software components that enable an arbitrary number of Gadgetron instances to collaborate on a reconstruction task. This cloud-enabled version of the Gadgetron was deployed on three different distributed computing platforms ranging from a heterogeneous collection of commodity computers to the commercial Amazon Elastic Compute Cloud. The Gadgetron cloud was used to provide nonlinear, compressed sensing reconstruction on a clinical scanner with low reconstruction latency (eg, cardiac and neuroimaging applications). The proposed setup was able to handle acquisition and 11 -SPIRiT reconstruction of nine high temporal resolution real-time, cardiac short axis cine acquisitions, covering the ventricles for functional evaluation, in under 1 min. A three-dimensional high-resolution brain acquisition with 1 mm(3) isotropic pixel size was acquired and reconstructed with nonlinear reconstruction in less than 5 min. A distributed computing enabled Gadgetron provides a scalable way to improve reconstruction performance using commodity cluster computing. Nonlinear, compressed sensing reconstruction can be deployed clinically with low image reconstruction latency. © 2014 Wiley Periodicals, Inc.
Grids, virtualization, and clouds at Fermilab
Timm, S.; Chadwick, K.; Garzoglio, G.; ...
2014-06-11
Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture andmore » the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). Lastly, this work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.« less
Grids, virtualization, and clouds at Fermilab
NASA Astrophysics Data System (ADS)
Timm, S.; Chadwick, K.; Garzoglio, G.; Noh, S.
2014-06-01
Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture and the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). This work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.
An energy-efficient failure detector for vehicular cloud computing.
Liu, Jiaxi; Wu, Zhibo; Dong, Jian; Wu, Jin; Wen, Dongxin
2018-01-01
Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption.
An energy-efficient failure detector for vehicular cloud computing
Liu, Jiaxi; Wu, Zhibo; Wu, Jin; Wen, Dongxin
2018-01-01
Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption. PMID:29352282
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing
NASA Astrophysics Data System (ADS)
Chen, A.; Pham, L.; Kempler, S.; Theobald, M.; Esfandiari, A.; Campino, J.; Vollmer, B.; Lynnes, C.
2011-12-01
Cloud Computing technology has been used to offer high-performance and low-cost computing and storage resources for both scientific problems and business services. Several cloud computing services have been implemented in the commercial arena, e.g. Amazon's EC2 & S3, Microsoft's Azure, and Google App Engine. There are also some research and application programs being launched in academia and governments to utilize Cloud Computing. NASA launched the Nebula Cloud Computing platform in 2008, which is an Infrastructure as a Service (IaaS) to deliver on-demand distributed virtual computers. Nebula users can receive required computing resources as a fully outsourced service. NASA Goddard Earth Science Data and Information Service Center (GES DISC) migrated several GES DISC's applications to the Nebula as a proof of concept, including: a) The Simple, Scalable, Script-based Science Processor for Measurements (S4PM) for processing scientific data; b) the Atmospheric Infrared Sounder (AIRS) data process workflow for processing AIRS raw data; and c) the GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI) for online access to, analysis, and visualization of Earth science data. This work aims to evaluate the practicability and adaptability of the Nebula. The initial work focused on the AIRS data process workflow to evaluate the Nebula. The AIRS data process workflow consists of a series of algorithms being used to process raw AIRS level 0 data and output AIRS level 2 geophysical retrievals. Migrating the entire workflow to the Nebula platform is challenging, but practicable. After installing several supporting libraries and the processing code itself, the workflow is able to process AIRS data in a similar fashion to its current (non-cloud) configuration. We compared the performance of processing 2 days of AIRS level 0 data through level 2 using a Nebula virtual computer and a local Linux computer. The result shows that Nebula has significantly better performance than the local machine. Much of the difference was due to newer equipment in the Nebula than the legacy computer, which is suggestive of a potential economic advantage beyond elastic power, i.e., access to up-to-date hardware vs. legacy hardware that must be maintained past its prime to amortize the cost. In addition to a trade study of advantages and challenges of porting complex processing to the cloud, a tutorial was developed to enable further progress in utilizing the Nebula for Earth Science applications and understanding better the potential for Cloud Computing in further data- and computing-intensive Earth Science research. In particular, highly bursty computing such as that experienced in the user-demand-driven Giovanni system may become more tractable in a Cloud environment. Our future work will continue to focus on migrating more GES DISC's applications/instances, e.g. Giovanni instances, to the Nebula platform and making matured migrated applications to be in operation on the Nebula.
Investigating the Use of Cloudbursts for High-Throughput Medical Image Registration
Kim, Hyunjoo; Parashar, Manish; Foran, David J.; Yang, Lin
2010-01-01
This paper investigates the use of clouds and autonomic cloudbursting to support a medical image registration. The goal is to enable a virtual computational cloud that integrates local computational environments and public cloud services on-the-fly, and support image registration requests from different distributed researcher groups with varied computational requirements and QoS constraints. The virtual cloud essentially implements shared and coordinated task-spaces, which coordinates the scheduling of jobs submitted by a dynamic set of research groups to their local job queues. A policy-driven scheduling agent uses the QoS constraints along with performance history and the state of the resources to determine the appropriate size and mix of the public and private cloud resource that should be allocated to a specific request. The virtual computational cloud and the medical image registration service have been developed using the CometCloud engine and have been deployed on a combination of private clouds at Rutgers University and the Cancer Institute of New Jersey and Amazon EC2. An experimental evaluation is presented and demonstrates the effectiveness of autonomic cloudbursts and policy-based autonomic scheduling for this application. PMID:20640235
Experience in using commercial clouds in CMS
NASA Astrophysics Data System (ADS)
Bauerdick, L.; Bockelman, B.; Dykstra, D.; Fuess, S.; Garzoglio, G.; Girone, M.; Gutsche, O.; Holzman, B.; Hufnagel, D.; Kim, H.; Kennedy, R.; Mason, D.; Spentzouris, P.; Timm, S.; Tiradani, A.; Vaandering, E.; CMS Collaboration
2017-10-01
Historically high energy physics computing has been performed on large purpose-built computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.
Experience in using commercial clouds in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bauerdick, L.; Bockelman, B.; Dykstra, D.
Historically high energy physics computing has been performed on large purposebuilt computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is amore » growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.« less
The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences
USDA-ARS?s Scientific Manuscript database
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...
A service based adaptive U-learning system using UX.
Jeong, Hwa-Young; Yi, Gangman
2014-01-01
In recent years, traditional development techniques for e-learning systems have been changing to become more convenient and efficient. One new technology in the development of application systems includes both cloud and ubiquitous computing. Cloud computing can support learning system processes by using services while ubiquitous computing can provide system operation and management via a high performance technical process and network. In the cloud computing environment, a learning service application can provide a business module or process to the user via the internet. This research focuses on providing the learning material and processes of courses by learning units using the services in a ubiquitous computing environment. And we also investigate functions that support users' tailored materials according to their learning style. That is, we analyzed the user's data and their characteristics in accordance with their user experience. We subsequently applied the learning process to fit on their learning performance and preferences. Finally, we demonstrate how the proposed system outperforms learning effects to learners better than existing techniques.
A Service Based Adaptive U-Learning System Using UX
Jeong, Hwa-Young
2014-01-01
In recent years, traditional development techniques for e-learning systems have been changing to become more convenient and efficient. One new technology in the development of application systems includes both cloud and ubiquitous computing. Cloud computing can support learning system processes by using services while ubiquitous computing can provide system operation and management via a high performance technical process and network. In the cloud computing environment, a learning service application can provide a business module or process to the user via the internet. This research focuses on providing the learning material and processes of courses by learning units using the services in a ubiquitous computing environment. And we also investigate functions that support users' tailored materials according to their learning style. That is, we analyzed the user's data and their characteristics in accordance with their user experience. We subsequently applied the learning process to fit on their learning performance and preferences. Finally, we demonstrate how the proposed system outperforms learning effects to learners better than existing techniques. PMID:25147832
An Architecture for Cross-Cloud System Management
NASA Astrophysics Data System (ADS)
Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad
The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
High-Resiliency and Auto-Scaling of Large-Scale Cloud Computing for OCO-2 L2 Full Physics Processing
NASA Astrophysics Data System (ADS)
Hua, H.; Manipon, G.; Starch, M.; Dang, L. B.; Southam, P.; Wilson, B. D.; Avis, C.; Chang, A.; Cheng, C.; Smyth, M.; McDuffie, J. L.; Ramirez, P.
2015-12-01
Next generation science data systems are needed to address the incoming flood of data from new missions such as SWOT and NISAR where data volumes and data throughput rates are order of magnitude larger than present day missions. Additionally, traditional means of procuring hardware on-premise are already limited due to facilities capacity constraints for these new missions. Existing missions, such as OCO-2, may also require high turn-around time for processing different science scenarios where on-premise and even traditional HPC computing environments may not meet the high processing needs. We present our experiences on deploying a hybrid-cloud computing science data system (HySDS) for the OCO-2 Science Computing Facility to support large-scale processing of their Level-2 full physics data products. We will explore optimization approaches to getting best performance out of hybrid-cloud computing as well as common issues that will arise when dealing with large-scale computing. Novel approaches were utilized to do processing on Amazon's spot market, which can potentially offer ~10X costs savings but with an unpredictable computing environment based on market forces. We will present how we enabled high-tolerance computing in order to achieve large-scale computing as well as operational cost savings.
Research on Influence of Cloud Environment on Traditional Network Security
NASA Astrophysics Data System (ADS)
Ming, Xiaobo; Guo, Jinhua
2018-02-01
Cloud computing is a symbol of the progress of modern information network, cloud computing provides a lot of convenience to the Internet users, but it also brings a lot of risk to the Internet users. Second, one of the main reasons for Internet users to choose cloud computing is that the network security performance is great, it also is the cornerstone of cloud computing applications. This paper briefly explores the impact on cloud environment on traditional cybersecurity, and puts forward corresponding solutions.
Performance, Agility and Cost of Cloud Computing Services for NASA GES DISC Giovanni Application
NASA Astrophysics Data System (ADS)
Pham, L.; Chen, A.; Wharton, S.; Winter, E. L.; Lynnes, C.
2013-12-01
The NASA Goddard Earth Science Data and Information Services Center (GES DISC) is investigating the performance, agility and cost of Cloud computing for GES DISC applications. Giovanni (Geospatial Interactive Online Visualization ANd aNalysis Infrastructure), one of the core applications at the GES DISC for online climate-related Earth science data access, subsetting, analysis, visualization, and downloading, was used to evaluate the feasibility and effort of porting an application to the Amazon Cloud Services platform. The performance and the cost of running Giovanni on the Amazon Cloud were compared to similar parameters for the GES DISC local operational system. A Giovanni Time-Series analysis of aerosol absorption optical depth (388nm) from OMI (Ozone Monitoring Instrument)/Aura was selected for these comparisons. All required data were pre-cached in both the Cloud and local system to avoid data transfer delays. The 3-, 6-, 12-, and 24-month data were used for analysis on the Cloud and local system respectively, and the processing times for the analysis were used to evaluate system performance. To investigate application agility, Giovanni was installed and tested on multiple Cloud platforms. The cost of using a Cloud computing platform mainly consists of: computing, storage, data requests, and data transfer in/out. The Cloud computing cost is calculated based on the hourly rate, and the storage cost is calculated based on the rate of Gigabytes per month. Cost for incoming data transfer is free, and for data transfer out, the cost is based on the rate in Gigabytes. The costs for a local server system consist of buying hardware/software, system maintenance/updating, and operating cost. The results showed that the Cloud platform had a 38% better performance and cost 36% less than the local system. This investigation shows the potential of cloud computing to increase system performance and lower the overall cost of system management.
Service Migration from Cloud to Multi-tier Fog Nodes for Multimedia Dissemination with QoE Support
Camargo, João; Rochol, Juergen; Gerla, Mario
2018-01-01
A wide range of multimedia services is expected to be offered for mobile users via various wireless access networks. Even the integration of Cloud Computing in such networks does not support an adequate Quality of Experience (QoE) in areas with high demands for multimedia contents. Fog computing has been conceptualized to facilitate the deployment of new services that cloud computing cannot provide, particularly those demanding QoE guarantees. These services are provided using fog nodes located at the network edge, which is capable of virtualizing their functions/applications. Service migration from the cloud to fog nodes can be actuated by request patterns and the timing issues. To the best of our knowledge, existing works on fog computing focus on architecture and fog node deployment issues. In this article, we describe the operational impacts and benefits associated with service migration from the cloud to multi-tier fog computing for video distribution with QoE support. Besides that, we perform the evaluation of such service migration of video services. Finally, we present potential research challenges and trends. PMID:29364172
Service Migration from Cloud to Multi-tier Fog Nodes for Multimedia Dissemination with QoE Support.
Rosário, Denis; Schimuneck, Matias; Camargo, João; Nobre, Jéferson; Both, Cristiano; Rochol, Juergen; Gerla, Mario
2018-01-24
A wide range of multimedia services is expected to be offered for mobile users via various wireless access networks. Even the integration of Cloud Computing in such networks does not support an adequate Quality of Experience (QoE) in areas with high demands for multimedia contents. Fog computing has been conceptualized to facilitate the deployment of new services that cloud computing cannot provide, particularly those demanding QoE guarantees. These services are provided using fog nodes located at the network edge, which is capable of virtualizing their functions/applications. Service migration from the cloud to fog nodes can be actuated by request patterns and the timing issues. To the best of our knowledge, existing works on fog computing focus on architecture and fog node deployment issues. In this article, we describe the operational impacts and benefits associated with service migration from the cloud to multi-tier fog computing for video distribution with QoE support. Besides that, we perform the evaluation of such service migration of video services. Finally, we present potential research challenges and trends.
Exploiting Parallel R in the Cloud with SPRINT
Piotrowski, M.; McGilvary, G.A.; Sloan, T. M.; Mewissen, M.; Lloyd, A.D.; Forster, T.; Mitchell, L.; Ghazal, P.; Hill, J.
2012-01-01
Background Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Objectives Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. Methods The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. Results It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds. PMID:23223611
Exploiting parallel R in the cloud with SPRINT.
Piotrowski, M; McGilvary, G A; Sloan, T M; Mewissen, M; Lloyd, A D; Forster, T; Mitchell, L; Ghazal, P; Hill, J
2013-01-01
Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon's Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user's location impacts on costs due to factors such as local taxation. Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations
NASA Technical Reports Server (NTRS)
Lynnes, Chris; Little, Mike; Huang, Thomas; Jacob, Joseph; Yang, Phil; Kuo, Kwo-Sen
2016-01-01
Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based file systems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations
NASA Astrophysics Data System (ADS)
Lynnes, C.; Little, M. M.; Huang, T.; Jacob, J. C.; Yang, C. P.; Kuo, K. S.
2016-12-01
Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based filesystems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
NASA Astrophysics Data System (ADS)
Chen, Xiuhong; Huang, Xianglei; Jiao, Chaoyi; Flanner, Mark G.; Raeker, Todd; Palen, Brock
2017-01-01
The suites of numerical models used for simulating climate of our planet are usually run on dedicated high-performance computing (HPC) resources. This study investigates an alternative to the usual approach, i.e. carrying out climate model simulations on commercially available cloud computing environment. We test the performance and reliability of running the CESM (Community Earth System Model), a flagship climate model in the United States developed by the National Center for Atmospheric Research (NCAR), on Amazon Web Service (AWS) EC2, the cloud computing environment by Amazon.com, Inc. StarCluster is used to create virtual computing cluster on the AWS EC2 for the CESM simulations. The wall-clock time for one year of CESM simulation on the AWS EC2 virtual cluster is comparable to the time spent for the same simulation on a local dedicated high-performance computing cluster with InfiniBand connections. The CESM simulation can be efficiently scaled with the number of CPU cores on the AWS EC2 virtual cluster environment up to 64 cores. For the standard configuration of the CESM at a spatial resolution of 1.9° latitude by 2.5° longitude, increasing the number of cores from 16 to 64 reduces the wall-clock running time by more than 50% and the scaling is nearly linear. Beyond 64 cores, the communication latency starts to outweigh the benefit of distributed computing and the parallel speedup becomes nearly unchanged.
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.
2015-01-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L
2015-02-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
The EPOS Vision for the Open Science Cloud
NASA Astrophysics Data System (ADS)
Jeffery, Keith; Harrison, Matt; Cocco, Massimo
2016-04-01
Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.
MaMR: High-performance MapReduce programming model for material cloud applications
NASA Astrophysics Data System (ADS)
Jing, Weipeng; Tong, Danyu; Wang, Yangang; Wang, Jingyuan; Liu, Yaqiu; Zhao, Peng
2017-02-01
With the increasing data size in materials science, existing programming models no longer satisfy the application requirements. MapReduce is a programming model that enables the easy development of scalable parallel applications to process big data on cloud computing systems. However, this model does not directly support the processing of multiple related data, and the processing performance does not reflect the advantages of cloud computing. To enhance the capability of workflow applications in material data processing, we defined a programming model for material cloud applications that supports multiple different Map and Reduce functions running concurrently based on hybrid share-memory BSP called MaMR. An optimized data sharing strategy to supply the shared data to the different Map and Reduce stages was also designed. We added a new merge phase to MapReduce that can efficiently merge data from the map and reduce modules. Experiments showed that the model and framework present effective performance improvements compared to previous work.
Cloud computing approaches to accelerate drug discovery value chain.
Garg, Vibhav; Arora, Suchir; Gupta, Chitra
2011-12-01
Continued advancements in the area of technology have helped high throughput screening (HTS) evolve from a linear to parallel approach by performing system level screening. Advanced experimental methods used for HTS at various steps of drug discovery (i.e. target identification, target validation, lead identification and lead validation) can generate data of the order of terabytes. As a consequence, there is pressing need to store, manage, mine and analyze this data to identify informational tags. This need is again posing challenges to computer scientists to offer the matching hardware and software infrastructure, while managing the varying degree of desired computational power. Therefore, the potential of "On-Demand Hardware" and "Software as a Service (SAAS)" delivery mechanisms cannot be denied. This on-demand computing, largely referred to as Cloud Computing, is now transforming the drug discovery research. Also, integration of Cloud computing with parallel computing is certainly expanding its footprint in the life sciences community. The speed, efficiency and cost effectiveness have made cloud computing a 'good to have tool' for researchers, providing them significant flexibility, allowing them to focus on the 'what' of science and not the 'how'. Once reached to its maturity, Discovery-Cloud would fit best to manage drug discovery and clinical development data, generated using advanced HTS techniques, hence supporting the vision of personalized medicine.
Homomorphic encryption experiments on IBM's cloud quantum computing platform
NASA Astrophysics Data System (ADS)
Huang, He-Liang; Zhao, You-Wei; Li, Tan; Li, Feng-Guang; Du, Yu-Tao; Fu, Xiang-Qun; Zhang, Shuo; Wang, Xiang; Bao, Wan-Su
2017-02-01
Quantum computing has undergone rapid development in recent years. Owing to limitations on scalability, personal quantum computers still seem slightly unrealistic in the near future. The first practical quantum computer for ordinary users is likely to be on the cloud. However, the adoption of cloud computing is possible only if security is ensured. Homomorphic encryption is a cryptographic protocol that allows computation to be performed on encrypted data without decrypting them, so it is well suited to cloud computing. Here, we first applied homomorphic encryption on IBM's cloud quantum computer platform. In our experiments, we successfully implemented a quantum algorithm for linear equations while protecting our privacy. This demonstration opens a feasible path to the next stage of development of cloud quantum information technology.
Evaluating the Influence of the Client Behavior in Cloud Computing.
Souza Pardo, Mário Henrique; Centurion, Adriana Molina; Franco Eustáquio, Paulo Sérgio; Carlucci Santana, Regina Helena; Bruschi, Sarita Mazzini; Santana, Marcos José
2016-01-01
This paper proposes a novel approach for the implementation of simulation scenarios, providing a client entity for cloud computing systems. The client entity allows the creation of scenarios in which the client behavior has an influence on the simulation, making the results more realistic. The proposed client entity is based on several characteristics that affect the performance of a cloud computing system, including different modes of submission and their behavior when the waiting time between requests (think time) is considered. The proposed characterization of the client enables the sending of either individual requests or group of Web services to scenarios where the workload takes the form of bursts. The client entity is included in the CloudSim, a framework for modelling and simulation of cloud computing. Experimental results show the influence of the client behavior on the performance of the services executed in a cloud computing system.
Evaluating the Influence of the Client Behavior in Cloud Computing
Centurion, Adriana Molina; Franco Eustáquio, Paulo Sérgio; Carlucci Santana, Regina Helena; Bruschi, Sarita Mazzini; Santana, Marcos José
2016-01-01
This paper proposes a novel approach for the implementation of simulation scenarios, providing a client entity for cloud computing systems. The client entity allows the creation of scenarios in which the client behavior has an influence on the simulation, making the results more realistic. The proposed client entity is based on several characteristics that affect the performance of a cloud computing system, including different modes of submission and their behavior when the waiting time between requests (think time) is considered. The proposed characterization of the client enables the sending of either individual requests or group of Web services to scenarios where the workload takes the form of bursts. The client entity is included in the CloudSim, a framework for modelling and simulation of cloud computing. Experimental results show the influence of the client behavior on the performance of the services executed in a cloud computing system. PMID:27441559
Cloud computing task scheduling strategy based on improved differential evolution algorithm
NASA Astrophysics Data System (ADS)
Ge, Junwei; He, Qian; Fang, Yiqiu
2017-04-01
In order to optimize the cloud computing task scheduling scheme, an improved differential evolution algorithm for cloud computing task scheduling is proposed. Firstly, the cloud computing task scheduling model, according to the model of the fitness function, and then used improved optimization calculation of the fitness function of the evolutionary algorithm, according to the evolution of generation of dynamic selection strategy through dynamic mutation strategy to ensure the global and local search ability. The performance test experiment was carried out in the CloudSim simulation platform, the experimental results show that the improved differential evolution algorithm can reduce the cloud computing task execution time and user cost saving, good implementation of the optimal scheduling of cloud computing tasks.
Secure data sharing in public cloud
NASA Astrophysics Data System (ADS)
Venkataramana, Kanaparti; Naveen Kumar, R.; Tatekalva, Sandhya; Padmavathamma, M.
2012-04-01
Secure multi-party protocols have been proposed for entities (organizations or individuals) that don't fully trust each other to share sensitive information. Many types of entities need to collect, analyze, and disseminate data rapidly and accurately, without exposing sensitive information to unauthorized or untrusted parties. Solutions based on secure multiparty computation guarantee privacy and correctness, at an extra communication (too costly in communication to be practical) and computation cost. The high overhead motivates us to extend this SMC to cloud environment which provides large computation and communication capacity which makes SMC to be used between multiple clouds (i.e., it may between private or public or hybrid clouds).Cloud may encompass many high capacity servers which acts as a hosts which participate in computation (IaaS and PaaS) for final result, which is controlled by Cloud Trusted Authority (CTA) for secret sharing within the cloud. The communication between two clouds is controlled by High Level Trusted Authority (HLTA) which is one of the hosts in a cloud which provides MgaaS (Management as a Service). Due to high risk for security in clouds, HLTA generates and distributes public keys and private keys by using Carmichael-R-Prime- RSA algorithm for exchange of private data in SMC between itself and clouds. In cloud, CTA creates Group key for Secure communication between the hosts in cloud based on keys sent by HLTA for exchange of Intermediate values and shares for computation of final result. Since this scheme is extended to be used in clouds( due to high availability and scalability to increase computation power) it is possible to implement SMC practically for privacy preserving in data mining at low cost for the clients.
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL.
Stone, John E; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-05-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications.
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL
Stone, John E.; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-01-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications. PMID:27747137
Speeding Up Geophysical Research Using Docker Containers Within Multi-Cloud Environment.
NASA Astrophysics Data System (ADS)
Synytsky, R.; Henadiy, S.; Lobzakov, V.; Kolesnikov, L.; Starovoit, Y. O.
2016-12-01
How useful are the geophysical observations in a scope of minimizing losses from natural disasters today? Does it help to decrease number of human victims during tsunami and earthquake? Unfortunately it's still at early stage these days. It's a big goal and achievement to make such observations more useful by improving early warning and prediction systems with the help of cloud computing. Cloud computing technologies have proved the ability to speed up application development in many areas for 10 years already. Cloud unlocks new opportunities for geoscientists by providing access to modern data processing tools and algorithms including real-time high-performance computing, big data processing, artificial intelligence and others. Emerging lightweight cloud technologies, such as Docker containers, are gaining wide traction in IT due to the fact of faster and more efficient deployment of different applications in a cloud environment. It allows to deploy and manage geophysical applications and systems in minutes across multiple clouds and data centers that becomes of utmost importance for the next generation applications. In this session we'll demonstrate how Docker containers technology within multi-cloud can accelerate the development of applications specifically designed for geophysical researches.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-12-01
Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-01-01
Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Performance Evaluation of Cloud Service Considering Fault Recovery
NASA Astrophysics Data System (ADS)
Yang, Bo; Tan, Feng; Dai, Yuan-Shun; Guo, Suchang
In cloud computing, cloud service performance is an important issue. To improve cloud service reliability, fault recovery may be used. However, the use of fault recovery could have impact on the performance of cloud service. In this paper, we conduct a preliminary study on this issue. Cloud service performance is quantified by service response time, whose probability density function as well as the mean is derived.
Grids and clouds in the Czech NGI
NASA Astrophysics Data System (ADS)
Kundrát, Jan; Adam, Martin; Adamová, Dagmar; Chudoba, Jiří; Kouba, Tomáš; Lokajíček, Miloš; Mikula, Alexandr; Říkal, Václav; Švec, Jan; Vohnout, Rudolf
2016-09-01
There are several infrastructure operators within the Czech Republic NGI (National Grid Initiative) which provide users with access to high-performance computing facilities over a grid and cloud interface. This article focuses on those where the primary author has personal first-hand experience. We cover some operational issues as well as the history of these facilities.
Energy Consumption Management of Virtual Cloud Computing Platform
NASA Astrophysics Data System (ADS)
Li, Lin
2017-11-01
For energy consumption management research on virtual cloud computing platforms, energy consumption management of virtual computers and cloud computing platform should be understood deeper. Only in this way can problems faced by energy consumption management be solved. In solving problems, the key to solutions points to data centers with high energy consumption, so people are in great need to use a new scientific technique. Virtualization technology and cloud computing have become powerful tools in people’s real life, work and production because they have strong strength and many advantages. Virtualization technology and cloud computing now is in a rapid developing trend. It has very high resource utilization rate. In this way, the presence of virtualization and cloud computing technologies is very necessary in the constantly developing information age. This paper has summarized, explained and further analyzed energy consumption management questions of the virtual cloud computing platform. It eventually gives people a clearer understanding of energy consumption management of virtual cloud computing platform and brings more help to various aspects of people’s live, work and son on.
Symmetrical compression distance for arrhythmia discrimination in cloud-based big-data services.
Lillo-Castellano, J M; Mora-Jiménez, I; Santiago-Mozos, R; Chavarría-Asso, F; Cano-González, A; García-Alberola, A; Rojo-Álvarez, J L
2015-07-01
The current development of cloud computing is completely changing the paradigm of data knowledge extraction in huge databases. An example of this technology in the cardiac arrhythmia field is the SCOOP platform, a national-level scientific cloud-based big data service for implantable cardioverter defibrillators. In this scenario, we here propose a new methodology for automatic classification of intracardiac electrograms (EGMs) in a cloud computing system, designed for minimal signal preprocessing. A new compression-based similarity measure (CSM) is created for low computational burden, so-called weighted fast compression distance, which provides better performance when compared with other CSMs in the literature. Using simple machine learning techniques, a set of 6848 EGMs extracted from SCOOP platform were classified into seven cardiac arrhythmia classes and one noise class, reaching near to 90% accuracy when previous patient arrhythmia information was available and 63% otherwise, hence overcoming in all cases the classification provided by the majority class. Results show that this methodology can be used as a high-quality service of cloud computing, providing support to physicians for improving the knowledge on patient diagnosis.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-01-01
Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-06-01
Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
Bao, Riyue; Hernandez, Kyle; Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge
2015-01-01
Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.
On the Modeling and Management of Cloud Data Analytics
NASA Astrophysics Data System (ADS)
Castillo, Claris; Tantawi, Asser; Steinder, Malgorzata; Pacifici, Giovanni
A new era is dawning where vast amount of data is subjected to intensive analysis in a cloud computing environment. Over the years, data about a myriad of things, ranging from user clicks to galaxies, have been accumulated, and continue to be collected, on storage media. The increasing availability of such data, along with the abundant supply of compute power and the urge to create useful knowledge, gave rise to a new data analytics paradigm in which data is subjected to intensive analysis, and additional data is created in the process. Meanwhile, a new cloud computing environment has emerged where seemingly limitless compute and storage resources are being provided to host computation and data for multiple users through virtualization technologies. Such a cloud environment is becoming the home for data analytics. Consequently, providing good performance at run-time to data analytics workload is an important issue for cloud management. In this paper, we provide an overview of the data analytics and cloud environment landscapes, and investigate the performance management issues related to running data analytics in the cloud. In particular, we focus on topics such as workload characterization, profiling analytics applications and their pattern of data usage, cloud resource allocation, placement of computation and data and their dynamic migration in the cloud, and performance prediction. In solving such management problems one relies on various run-time analytic models. We discuss approaches for modeling and optimizing the dynamic data analytics workload in the cloud environment. All along, we use the Map-Reduce paradigm as an illustration of data analytics.
A Cloud-Based Simulation Architecture for Pandemic Influenza Simulation
Eriksson, Henrik; Raciti, Massimiliano; Basile, Maurizio; Cunsolo, Alessandro; Fröberg, Anders; Leifler, Ola; Ekberg, Joakim; Timpka, Toomas
2011-01-01
High-fidelity simulations of pandemic outbreaks are resource consuming. Cluster-based solutions have been suggested for executing such complex computations. We present a cloud-based simulation architecture that utilizes computing resources both locally available and dynamically rented online. The approach uses the Condor framework for job distribution and management of the Amazon Elastic Computing Cloud (EC2) as well as local resources. The architecture has a web-based user interface that allows users to monitor and control simulation execution. In a benchmark test, the best cost-adjusted performance was recorded for the EC2 H-CPU Medium instance, while a field trial showed that the job configuration had significant influence on the execution time and that the network capacity of the master node could become a bottleneck. We conclude that it is possible to develop a scalable simulation environment that uses cloud-based solutions, while providing an easy-to-use graphical user interface. PMID:22195089
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.
Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E
2012-03-19
A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community
2012-01-01
Background A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them. PMID:22429538
An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.
Yang, Xi; Wu, Chengkun; Lu, Kai; Fang, Lin; Zhang, Yong; Li, Shengkang; Guo, Guixin; Du, YunFei
2017-12-01
Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the "allocate-when-needed" paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.
A world-wide databridge supported by a commercial cloud provider
NASA Astrophysics Data System (ADS)
Tat Cheung, Kwong; Field, Laurence; Furano, Fabrizio
2017-10-01
Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments. One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources. However, high energy physics applications require more data input and output than the CPU intensive applications that are typically used by other volunteer computing projects. While the so-called databridge has already been successfully proposed as a method to span the untrusted and trusted domains of volunteer computing and Grid computing respective, globally transferring data between potentially poor-performing residential networks and CERN could be unreliable, leading to wasted resources usage. The expectation is that by placing a storage endpoint that is part of a wider, flexible geographical databridge deployment closer to the volunteers, the transfer success rate and the overall performance can be improved. This contribution investigates the provision of a globally distributed databridge implemented upon a commercial cloud provider.
Research on OpenStack of open source cloud computing in colleges and universities’ computer room
NASA Astrophysics Data System (ADS)
Wang, Lei; Zhang, Dandan
2017-06-01
In recent years, the cloud computing technology has a rapid development, especially open source cloud computing. Open source cloud computing has attracted a large number of user groups by the advantages of open source and low cost, have now become a large-scale promotion and application. In this paper, firstly we briefly introduced the main functions and architecture of the open source cloud computing OpenStack tools, and then discussed deeply the core problems of computer labs in colleges and universities. Combining with this research, it is not that the specific application and deployment of university computer rooms with OpenStack tool. The experimental results show that the application of OpenStack tool can efficiently and conveniently deploy cloud of university computer room, and its performance is stable and the functional value is good.
Improving Individual Acceptance of Health Clouds through Confidentiality Assurance.
Ermakova, Tatiana; Fabian, Benjamin; Zarnekow, Rüdiger
2016-10-26
Cloud computing promises to essentially improve healthcare delivery performance. However, shifting sensitive medical records to third-party cloud providers could create an adoption hurdle because of security and privacy concerns. This study examines the effect of confidentiality assurance in a cloud-computing environment on individuals' willingness to accept the infrastructure for inter-organizational sharing of medical data. We empirically investigate our research question by a survey with over 260 full responses. For the setting with a high confidentiality assurance, we base on a recent multi-cloud architecture which provides very high confidentiality assurance through a secret-sharing mechanism: Health information is cryptographically encoded and distributed in a way that no single and no small group of cloud providers is able to decode it. Our results indicate the importance of confidentiality assurance in individuals' acceptance of health clouds for sensitive medical data. Specifically, this finding holds for a variety of practically relevant circumstances, i.e., in the absence and despite the presence of conventional offline alternatives and along with pseudonymization. On the other hand, we do not find support for the effect of confidentiality assurance in individuals' acceptance of health clouds for non-sensitive medical data. These results could support the process of privacy engineering for health-cloud solutions.
Improving Individual Acceptance of Health Clouds through Confidentiality Assurance
Fabian, Benjamin; Zarnekow, Rüdiger
2016-01-01
Summary Background Cloud computing promises to essentially improve healthcare delivery performance. However, shifting sensitive medical records to third-party cloud providers could create an adoption hurdle because of security and privacy concerns. Objectives This study examines the effect of confidentiality assurance in a cloud-computing environment on individuals’ willingness to accept the infrastructure for inter-organizational sharing of medical data. Methods We empirically investigate our research question by a survey with over 260 full responses. For the setting with a high confidentiality assurance, we base on a recent multi-cloud architecture which provides very high confidentiality assurance through a secret-sharing mechanism: Health information is cryptographically encoded and distributed in a way that no single and no small group of cloud providers is able to decode it. Results Our results indicate the importance of confidentiality assurance in individuals’ acceptance of health clouds for sensitive medical data. Specifically, this finding holds for a variety of practically relevant circumstances, i.e., in the absence and despite the presence of conventional offline alternatives and along with pseudonymization. On the other hand, we do not find support for the effect of confidentiality assurance in individuals’ acceptance of health clouds for non-sensitive medical data. These results could support the process of privacy engineering for health-cloud solutions. PMID:27781238
The performance of low-cost commercial cloud computing as an alternative in computational chemistry.
Thackston, Russell; Fortenberry, Ryan C
2015-05-05
The growth of commercial cloud computing (CCC) as a viable means of computational infrastructure is largely unexplored for the purposes of quantum chemistry. In this work, the PSI4 suite of computational chemistry programs is installed on five different types of Amazon World Services CCC platforms. The performance for a set of electronically excited state single-point energies is compared between these CCC platforms and typical, "in-house" physical machines. Further considerations are made for the number of cores or virtual CPUs (vCPUs, for the CCC platforms), but no considerations are made for full parallelization of the program (even though parallelization of the BLAS library is implemented), complete high-performance computing cluster utilization, or steal time. Even with this most pessimistic view of the computations, CCC resources are shown to be more cost effective for significant numbers of typical quantum chemistry computations. Large numbers of large computations are still best utilized by more traditional means, but smaller-scale research may be more effectively undertaken through CCC services. © 2015 Wiley Periodicals, Inc.
e-Collaboration for Earth observation (E-CEO): the Cloud4SAR interferometry data challenge
NASA Astrophysics Data System (ADS)
Casu, Francesco; Manunta, Michele; Boissier, Enguerran; Brito, Fabrice; Aas, Christina; Lavender, Samantha; Ribeiro, Rita; Farres, Jordi
2014-05-01
The e-Collaboration for Earth Observation (E-CEO) project addresses the technologies and architectures needed to provide a collaborative research Platform for automating data mining and processing, and information extraction experiments. The Platform serves for the implementation of Data Challenge Contests focusing on Information Extraction for Earth Observations (EO) applications. The possibility to implement multiple processors within a Common Software Environment facilitates the validation, evaluation and transparent peer comparison among different methodologies, which is one of the main requirements rose by scientists who develop algorithms in the EO field. In this scenario, we set up a Data Challenge, referred to as Cloud4SAR (http://wiki.services.eoportal.org/tiki-index.php?page=ECEO), to foster the deployment of Interferometric SAR (InSAR) processing chains within a Cloud Computing platform. While a large variety of InSAR processing software tools are available, they require a high level of expertise and a complex user interaction to be effectively run. Computing a co-seismic interferogram or a 20-years deformation time series on a volcanic area are not easy tasks to be performed in a fully unsupervised way and/or in very short time (hours or less). Benefiting from ESA's E-CEO platform, participants can optimise algorithms on a Virtual Sandbox environment without being expert programmers, and compute results on high performing Cloud platforms. Cloud4SAR requires solving a relatively easy InSAR problem by trying to maximize the exploitation of the processing capabilities provided by a Cloud Computing infrastructure. The proposed challenge offers two different frameworks, each dedicated to participants with different skills, identified as Beginners and Experts. For both of them, the contest mainly resides in the degree of automation of the deployed algorithms, no matter which one is used, as well as in the capability of taking effective benefit from a parallel computing environment.
NASA Astrophysics Data System (ADS)
Yu, Xiaoyuan; Yuan, Jian; Chen, Shi
2013-03-01
Cloud computing is one of the most popular topics in the IT industry and is recently being adopted by many companies. It has four development models, as: public cloud, community cloud, hybrid cloud and private cloud. Except others, private cloud can be implemented in a private network, and delivers some benefits of cloud computing without pitfalls. This paper makes a comparison of typical open source platforms through which we can implement a private cloud. After this comparison, we choose Eucalyptus and Wavemaker to do a case study on the private cloud. We also do some performance estimation of cloud platform services and development of prototype software as cloud services.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-08
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
Real-time WAMI streaming target tracking in fog
NASA Astrophysics Data System (ADS)
Chen, Yu; Blasch, Erik; Chen, Ning; Deng, Anna; Ling, Haibin; Chen, Genshe
2016-05-01
Real-time information fusion based on WAMI (Wide-Area Motion Imagery), FMV (Full Motion Video), and Text data is highly desired for many mission critical emergency or security applications. Cloud Computing has been considered promising to achieve big data integration from multi-modal sources. In many mission critical tasks, however, powerful Cloud technology cannot satisfy the tight latency tolerance as the servers are allocated far from the sensing platform, actually there is no guaranteed connection in the emergency situations. Therefore, data processing, information fusion, and decision making are required to be executed on-site (i.e., near the data collection). Fog Computing, a recently proposed extension and complement for Cloud Computing, enables computing on-site without outsourcing jobs to a remote Cloud. In this work, we have investigated the feasibility of processing streaming WAMI in the Fog for real-time, online, uninterrupted target tracking. Using a single target tracking algorithm, we studied the performance of a Fog Computing prototype. The experimental results are very encouraging that validated the effectiveness of our Fog approach to achieve real-time frame rates.
ERIC Educational Resources Information Center
Tweel, Abdeneaser
2012-01-01
High uncertainties related to cloud computing adoption may hinder IT managers from making solid decisions about adopting cloud computing. The problem addressed in this study was the lack of understanding of the relationship between factors related to the adoption of cloud computing and IT managers' interest in adopting this technology. In…
Visual Analysis of Cloud Computing Performance Using Behavioral Lines.
Muelder, Chris; Zhu, Biao; Chen, Wei; Zhang, Hongxin; Ma, Kwan-Liu
2016-02-29
Cloud computing is an essential technology to Big Data analytics and services. A cloud computing system is often comprised of a large number of parallel computing and storage devices. Monitoring the usage and performance of such a system is important for efficient operations, maintenance, and security. Tracing every application on a large cloud system is untenable due to scale and privacy issues. But profile data can be collected relatively efficiently by regularly sampling the state of the system, including properties such as CPU load, memory usage, network usage, and others, creating a set of multivariate time series for each system. Adequate tools for studying such large-scale, multidimensional data are lacking. In this paper, we present a visual based analysis approach to understanding and analyzing the performance and behavior of cloud computing systems. Our design is based on similarity measures and a layout method to portray the behavior of each compute node over time. When visualizing a large number of behavioral lines together, distinct patterns often appear suggesting particular types of performance bottleneck. The resulting system provides multiple linked views, which allow the user to interactively explore the data by examining the data or a selected subset at different levels of detail. Our case studies, which use datasets collected from two different cloud systems, show that this visual based approach is effective in identifying trends and anomalies of the systems.
Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Abdulhamid, Shafi'i Muhammad; Usman, Mohammed Joda
2017-01-01
Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing.
Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Usman, Mohammed Joda
2017-01-01
Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing. PMID:28467505
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters
ERIC Educational Resources Information Center
Younge, Andrew J.
2016-01-01
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Murphy, K. J.; Baynes, K.; Lynnes, C.
2016-12-01
With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way Earth observation data is processed, analyzed, and visualized. The cloud infrastructure provides the flexibility to scale up to large volumes of data and handle high velocity data streams efficiently. Having freely available Earth observation data collocated on a cloud infrastructure creates opportunities for innovation and value-added data re-use in ways unforeseen by the original data provider. These innovations spur new industries and applications and spawn new scientific pathways that were previously limited due to data volume and computational infrastructure issues. NASA, in collaboration with Amazon, Google, and Microsoft, have jointly developed a set of recommendations to enable efficient transfer of Earth observation data from existing data systems to a cloud computing infrastructure. The purpose of these recommendations is to provide guidelines against which all data providers can evaluate existing data systems and be used to improve any issues uncovered to enable efficient search, access, and use of large volumes of data. Additionally, these guidelines ensure that all cloud providers utilize a common methodology for bulk-downloading data from data providers thus preventing the data providers from building custom capabilities to meet the needs of individual cloud providers. The intent is to share these recommendations with other Federal agencies and organizations that serve Earth observation to enable efficient search, access, and use of large volumes of data. Additionally, the adoption of these recommendations will benefit data users interested in moving large volumes of data from data systems to any other location. These data users include the cloud providers, cloud users such as scientists, and other users working in a high performance computing environment who need to move large volumes of data.
Cloud Computing at the Tactical Edge
2012-10-01
Cloud Computing (CloudCom ’09). Bejing , China , December 2009. Springer-Verlag, 2009. [Marinelli 2009] Marinelli, E. Hyrax: Cloud Computing on Mobile...offloading is appropriate. Each applica- tion overlay is generated from the same Base VM Image that resides in the cloudlet. In an opera - tional setting...overlay, the following opera - tions execute: 1. The overlay is decompressed using the tools listed in Section 4.2. 2. VM synthesis is performed through
Static Memory Deduplication for Performance Optimization in Cloud Computing.
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-04-27
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-01-01
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
A Hybrid Cloud Computing Service for Earth Sciences
NASA Astrophysics Data System (ADS)
Yang, C. P.
2016-12-01
Cloud Computing is becoming a norm for providing computing capabilities for advancing Earth sciences including big Earth data management, processing, analytics, model simulations, and many other aspects. A hybrid spatiotemporal cloud computing service is bulit at George Mason NSF spatiotemporal innovation center to meet this demands. This paper will report the service including several aspects: 1) the hardware includes 500 computing services and close to 2PB storage as well as connection to XSEDE Jetstream and Caltech experimental cloud computing environment for sharing the resource; 2) the cloud service is geographically distributed at east coast, west coast, and central region; 3) the cloud includes private clouds managed using open stack and eucalyptus, DC2 is used to bridge these and the public AWS cloud for interoperability and sharing computing resources when high demands surfing; 4) the cloud service is used to support NSF EarthCube program through the ECITE project, ESIP through the ESIP cloud computing cluster, semantics testbed cluster, and other clusters; 5) the cloud service is also available for the earth science communities to conduct geoscience. A brief introduction about how to use the cloud service will be included.
Toward a Big Data Science: A challenge of "Science Cloud"
NASA Astrophysics Data System (ADS)
Murata, Ken T.; Watanabe, Hidenobu
2013-04-01
During these 50 years, along with appearance and development of high-performance computers (and super-computers), numerical simulation is considered to be a third methodology for science, following theoretical (first) and experimental and/or observational (second) approaches. The variety of data yielded by the second approaches has been getting more and more. It is due to the progress of technologies of experiments and observations. The amount of the data generated by the third methodologies has been getting larger and larger. It is because of tremendous development and programming techniques of super computers. Most of the data files created by both experiments/observations and numerical simulations are saved in digital formats and analyzed on computers. The researchers (domain experts) are interested in not only how to make experiments and/or observations or perform numerical simulations, but what information (new findings) to extract from the data. However, data does not usually tell anything about the science; sciences are implicitly hidden in the data. Researchers have to extract information to find new sciences from the data files. This is a basic concept of data intensive (data oriented) science for Big Data. As the scales of experiments and/or observations and numerical simulations get larger, new techniques and facilities are required to extract information from a large amount of data files. The technique is called as informatics as a fourth methodology for new sciences. Any methodologies must work on their facilities: for example, space environment are observed via spacecraft and numerical simulations are performed on super-computers, respectively in space science. The facility of the informatics, which deals with large-scale data, is a computational cloud system for science. This paper is to propose a cloud system for informatics, which has been developed at NICT (National Institute of Information and Communications Technology), Japan. The NICT science cloud, we named as OneSpaceNet (OSN), is the first open cloud system for scientists who are going to carry out their informatics for their own science. The science cloud is not for simple uses. Many functions are expected to the science cloud; such as data standardization, data collection and crawling, large and distributed data storage system, security and reliability, database and meta-database, data stewardship, long-term data preservation, data rescue and preservation, data mining, parallel processing, data publication and provision, semantic web, 3D and 4D visualization, out-reach and in-reach, and capacity buildings. Figure (not shown here) is a schematic picture of the NICT science cloud. Both types of data from observation and simulation are stored in the storage system in the science cloud. It should be noted that there are two types of data in observation. One is from archive site out of the cloud: this is a data to be downloaded through the Internet to the cloud. The other one is data from the equipment directly connected to the science cloud. They are often called as sensor clouds. In the present talk, we first introduce the NICT science cloud. We next demonstrate the efficiency of the science cloud, showing several scientific results which we achieved with this cloud system. Through the discussions and demonstrations, the potential performance of sciences cloud will be revealed for any research fields.
Context-aware distributed cloud computing using CloudScheduler
NASA Astrophysics Data System (ADS)
Seuster, R.; Leavett-Brown, CR; Casteels, K.; Driemel, C.; Paterson, M.; Ring, D.; Sobie, RJ; Taylor, RP; Weldon, J.
2017-10-01
The distributed cloud using the CloudScheduler VM provisioning service is one of the longest running systems for HEP workloads. It has run millions of jobs for ATLAS and Belle II over the past few years using private and commercial clouds around the world. Our goal is to scale the distributed cloud to the 10,000-core level, with the ability to run any type of application (low I/O, high I/O and high memory) on any cloud. To achieve this goal, we have been implementing changes that utilize context-aware computing designs that are currently employed in the mobile communication industry. Context-awareness makes use of real-time and archived data to respond to user or system requirements. In our distributed cloud, we have many opportunistic clouds with no local HEP services, software or storage repositories. A context-aware design significantly improves the reliability and performance of our system by locating the nearest location of the required services. We describe how we are collecting and managing contextual information from our workload management systems, the clouds, the virtual machines and our services. This information is used not only to monitor the system but also to carry out automated corrective actions. We are incrementally adding new alerting and response services to our distributed cloud. This will enable us to scale the number of clouds and virtual machines. Further, a context-aware design will enable us to run analysis or high I/O application on opportunistic clouds. We envisage an open-source HTTP data federation (for example, the DynaFed system at CERN) as a service that would provide us access to existing storage elements used by the HEP experiments.
Security Risks of Cloud Computing and Its Emergence as 5th Utility Service
NASA Astrophysics Data System (ADS)
Ahmad, Mushtaq
Cloud Computing is being projected by the major cloud services provider IT companies such as IBM, Google, Yahoo, Amazon and others as fifth utility where clients will have access for processing those applications and or software projects which need very high processing speed for compute intensive and huge data capacity for scientific, engineering research problems and also e- business and data content network applications. These services for different types of clients are provided under DASM-Direct Access Service Management based on virtualization of hardware, software and very high bandwidth Internet (Web 2.0) communication. The paper reviews these developments for Cloud Computing and Hardware/Software configuration of the cloud paradigm. The paper also examines the vital aspects of security risks projected by IT Industry experts, cloud clients. The paper also highlights the cloud provider's response to cloud security risks.
2012-01-01
Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941
El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter
2012-01-01
Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.
Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge
2015-01-01
Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud. PMID:26271043
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
2014-01-16
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
2014-01-01
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
When cloud computing meets bioinformatics: a review.
Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong
2013-10-01
In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.
A resource management architecture based on complex network theory in cloud computing federation
NASA Astrophysics Data System (ADS)
Zhang, Zehua; Zhang, Xuejie
2011-10-01
Cloud Computing Federation is a main trend of Cloud Computing. Resource Management has significant effect on the design, realization, and efficiency of Cloud Computing Federation. Cloud Computing Federation has the typical characteristic of the Complex System, therefore, we propose a resource management architecture based on complex network theory for Cloud Computing Federation (abbreviated as RMABC) in this paper, with the detailed design of the resource discovery and resource announcement mechanisms. Compare with the existing resource management mechanisms in distributed computing systems, a Task Manager in RMABC can use the historical information and current state data get from other Task Managers for the evolution of the complex network which is composed of Task Managers, thus has the advantages in resource discovery speed, fault tolerance and adaptive ability. The result of the model experiment confirmed the advantage of RMABC in resource discovery performance.
Cloudbus Toolkit for Market-Oriented Cloud Computing
NASA Astrophysics Data System (ADS)
Buyya, Rajkumar; Pandey, Suraj; Vecchiola, Christian
This keynote paper: (1) presents the 21st century vision of computing and identifies various IT paradigms promising to deliver computing as a utility; (2) defines the architecture for creating market-oriented Clouds and computing atmosphere by leveraging technologies such as virtual machines; (3) provides thoughts on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain SLA-oriented resource allocation; (4) presents the work carried out as part of our new Cloud Computing initiative, called Cloudbus: (i) Aneka, a Platform as a Service software system containing SDK (Software Development Kit) for construction of Cloud applications and deployment on private or public Clouds, in addition to supporting market-oriented resource management; (ii) internetworking of Clouds for dynamic creation of federated computing environments for scaling of elastic applications; (iii) creation of 3rd party Cloud brokering services for building content delivery networks and e-Science applications and their deployment on capabilities of IaaS providers such as Amazon along with Grid mashups; (iv) CloudSim supporting modelling and simulation of Clouds for performance studies; (v) Energy Efficient Resource Allocation Mechanisms and Techniques for creation and management of Green Clouds; and (vi) pathways for future research.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing.
Cole, Brian S; Moore, Jason H
2018-03-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing
Moore, Jason H.
2018-01-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-01
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4× speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration. PMID:28075343
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D T; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D. T.; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Background Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. Results We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. Conclusions CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/. PMID:24897343
Cloud-based crowd sensing: a framework for location-based crowd analyzer and advisor
NASA Astrophysics Data System (ADS)
Aishwarya, K. C.; Nambi, A.; Hudson, S.; Nadesh, R. K.
2017-11-01
Cloud computing is an emerging field of computer science to integrate and explore large and powerful computing systems and storages for personal and also for enterprise requirements. Mobile Cloud Computing is the inheritance of this concept towards mobile hand-held devices. Crowdsensing, or to be precise, Mobile Crowdsensing is the process of sharing resources from an available group of mobile handheld devices that support sharing of different resources such as data, memory and bandwidth to perform a single task for collective reasons. In this paper, we propose a framework to use Crowdsensing and perform a crowd analyzer and advisor whether the user can go to the place or not. This is an ongoing research and is a new concept to which the direction of cloud computing has shifted and is viable for more expansion in the near future.
NASA Astrophysics Data System (ADS)
Sareen, Sanjay; Gupta, Sunil Kumar; Sood, Sandeep K.
2017-10-01
Zika virus is a mosquito-borne disease that spreads very quickly in different parts of the world. In this article, we proposed a system to prevent and control the spread of Zika virus disease using integration of Fog computing, cloud computing, mobile phones and the Internet of things (IoT)-based sensor devices. Fog computing is used as an intermediary layer between the cloud and end users to reduce the latency time and extra communication cost that is usually found high in cloud-based systems. A fuzzy k-nearest neighbour is used to diagnose the possibly infected users, and Google map web service is used to provide the geographic positioning system (GPS)-based risk assessment to prevent the outbreak. It is used to represent each Zika virus (ZikaV)-infected user, mosquito-dense sites and breeding sites on the Google map that help the government healthcare authorities to control such risk-prone areas effectively and efficiently. The proposed system is deployed on Amazon EC2 cloud to evaluate its performance and accuracy using data set for 2 million users. Our system provides high accuracy of 94.5% for initial diagnosis of different users according to their symptoms and appropriate GPS-based risk assessment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shorgin, Sergey Ya.; Pechinkin, Alexander V.; Samouylov, Konstantin E.
Cloud computing is promising technology to manage and improve utilization of computing center resources to deliver various computing and IT services. For the purpose of energy saving there is no need to unnecessarily operate many servers under light loads, and they are switched off. On the other hand, some servers should be switched on in heavy load cases to prevent very long delays. Thus, waiting times and system operating cost can be maintained on acceptable level by dynamically adding or removing servers. One more fact that should be taken into account is significant server setup costs and activation times. Formore » better energy efficiency, cloud computing system should not react on instantaneous increase or instantaneous decrease of load. That is the main motivation for using queuing systems with hysteresis for cloud computing system modelling. In the paper, we provide a model of cloud computing system in terms of multiple server threshold-based infinite capacity queuing system with hysteresis and noninstantanuous server activation. For proposed model, we develop a method for computing steady-state probabilities that allow to estimate a number of performance measures.« less
State of the Art of Network Security Perspectives in Cloud Computing
NASA Astrophysics Data System (ADS)
Oh, Tae Hwan; Lim, Shinyoung; Choi, Young B.; Park, Kwang-Roh; Lee, Heejo; Choi, Hyunsang
Cloud computing is now regarded as one of social phenomenon that satisfy customers' needs. It is possible that the customers' needs and the primary principle of economy - gain maximum benefits from minimum investment - reflects realization of cloud computing. We are living in the connected society with flood of information and without connected computers to the Internet, our activities and work of daily living will be impossible. Cloud computing is able to provide customers with custom-tailored features of application software and user's environment based on the customer's needs by adopting on-demand outsourcing of computing resources through the Internet. It also provides cloud computing users with high-end computing power and expensive application software package, and accordingly the users will access their data and the application software where they are located at the remote system. As the cloud computing system is connected to the Internet, network security issues of cloud computing are considered as mandatory prior to real world service. In this paper, survey and issues on the network security in cloud computing are discussed from the perspective of real world service environments.
Capturing and analyzing wheelchair maneuvering patterns with mobile cloud computing.
Fu, Jicheng; Hao, Wei; White, Travis; Yan, Yuqing; Jones, Maria; Jan, Yih-Kuen
2013-01-01
Power wheelchairs have been widely used to provide independent mobility to people with disabilities. Despite great advancements in power wheelchair technology, research shows that wheelchair related accidents occur frequently. To ensure safe maneuverability, capturing wheelchair maneuvering patterns is fundamental to enable other research, such as safe robotic assistance for wheelchair users. In this study, we propose to record, store, and analyze wheelchair maneuvering data by means of mobile cloud computing. Specifically, the accelerometer and gyroscope sensors in smart phones are used to record wheelchair maneuvering data in real-time. Then, the recorded data are periodically transmitted to the cloud for storage and analysis. The analyzed results are then made available to various types of users, such as mobile phone users, traditional desktop users, etc. The combination of mobile computing and cloud computing leverages the advantages of both techniques and extends the smart phone's capabilities of computing and data storage via the Internet. We performed a case study to implement the mobile cloud computing framework using Android smart phones and Google App Engine, a popular cloud computing platform. Experimental results demonstrated the feasibility of the proposed mobile cloud computing framework.
A General Cross-Layer Cloud Scheduling Framework for Multiple IoT Computer Tasks.
Wu, Guanlin; Bao, Weidong; Zhu, Xiaomin; Zhang, Xiongtao
2018-05-23
The diversity of IoT services and applications brings enormous challenges to improving the performance of multiple computer tasks' scheduling in cross-layer cloud computing systems. Unfortunately, the commonly-employed frameworks fail to adapt to the new patterns on the cross-layer cloud. To solve this issue, we design a new computer task scheduling framework for multiple IoT services in cross-layer cloud computing systems. Specifically, we first analyze the features of the cross-layer cloud and computer tasks. Then, we design the scheduling framework based on the analysis and present detailed models to illustrate the procedures of using the framework. With the proposed framework, the IoT services deployed in cross-layer cloud computing systems can dynamically select suitable algorithms and use resources more effectively to finish computer tasks with different objectives. Finally, the algorithms are given based on the framework, and extensive experiments are also given to validate its effectiveness, as well as its superiority.
Design for Run-Time Monitor on Cloud Computing
NASA Astrophysics Data System (ADS)
Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Evolving the Land Information System into a Cloud Computing Service
DOE Office of Scientific and Technical Information (OSTI.GOV)
Houser, Paul R.
The Land Information System (LIS) was developed to use advanced flexible land surface modeling and data assimilation frameworks to integrate extremely large satellite- and ground-based observations with advanced land surface models to produce continuous high-resolution fields of land surface states and fluxes. The resulting fields are extremely useful for drought and flood assessment, agricultural planning, disaster management, weather and climate forecasting, water resources assessment, and the like. We envisioned transforming the LIS modeling system into a scientific cloud computing-aware web and data service that would allow clients to easily setup and configure for use in addressing large water management issues.more » The focus of this Phase 1 project was to determine the scientific, technical, commercial merit and feasibility of the proposed LIS-cloud innovations that are currently barriers to broad LIS applicability. We (a) quantified the barriers to broad LIS utility and commercialization (high performance computing, big data, user interface, and licensing issues); (b) designed the proposed LIS-cloud web service, model-data interface, database services, and user interfaces; (c) constructed a prototype LIS user interface including abstractions for simulation control, visualization, and data interaction, (d) used the prototype to conduct a market analysis and survey to determine potential market size and competition, (e) identified LIS software licensing and copyright limitations and developed solutions, and (f) developed a business plan for development and marketing of the LIS-cloud innovation. While some significant feasibility issues were found in the LIS licensing, overall a high degree of LIS-cloud technical feasibility was found.« less
A Cloud-Computing Service for Environmental Geophysics and Seismic Data Processing
NASA Astrophysics Data System (ADS)
Heilmann, B. Z.; Maggi, P.; Piras, A.; Satta, G.; Deidda, G. P.; Bonomi, E.
2012-04-01
Cloud computing is establishing worldwide as a new high performance computing paradigm that offers formidable possibilities to industry and science. The presented cloud-computing portal, part of the Grida3 project, provides an innovative approach to seismic data processing by combining open-source state-of-the-art processing software and cloud-computing technology, making possible the effective use of distributed computation and data management with administratively distant resources. We substituted the user-side demanding hardware and software requirements by remote access to high-performance grid-computing facilities. As a result, data processing can be done quasi in real-time being ubiquitously controlled via Internet by a user-friendly web-browser interface. Besides the obvious advantages over locally installed seismic-processing packages, the presented cloud-computing solution creates completely new possibilities for scientific education, collaboration, and presentation of reproducible results. The web-browser interface of our portal is based on the commercially supported grid portal EnginFrame, an open framework based on Java, XML, and Web Services. We selected the hosted applications with the objective to allow the construction of typical 2D time-domain seismic-imaging workflows as used for environmental studies and, originally, for hydrocarbon exploration. For data visualization and pre-processing, we chose the free software package Seismic Un*x. We ported tools for trace balancing, amplitude gaining, muting, frequency filtering, dip filtering, deconvolution and rendering, with a customized choice of options as services onto the cloud-computing portal. For structural imaging and velocity-model building, we developed a grid version of the Common-Reflection-Surface stack, a data-driven imaging method that requires no user interaction at run time such as manual picking in prestack volumes or velocity spectra. Due to its high level of automation, CRS stacking can benefit largely from the hardware parallelism provided by the cloud deployment. The resulting output, post-stack section, coherence, and NMO-velocity panels are used to generate a smooth migration-velocity model. Residual static corrections are calculated as a by-product of the stack and can be applied iteratively. As a final step, a time migrated subsurface image is obtained by a parallelized Kirchhoff time migration scheme. Processing can be done step-by-step or using a graphical workflow editor that can launch a series of pipelined tasks. The status of the submitted jobs is monitored by a dedicated service. All results are stored in project directories, where they can be downloaded of viewed directly in the browser. Currently, the portal has access to three research clusters having a total number of 70 nodes with 4 cores each. They are shared with four other cloud-computing applications bundled within the GRIDA3 project. To demonstrate the functionality of our "seismic cloud lab", we will present results obtained for three different types of data, all taken from hydrogeophysical studies: (1) a seismic reflection data set, made of compressional waves from explosive sources, recorded in Muravera, Sardinia; (2) a shear-wave data set from, Sardinia; (3) a multi-offset Ground-Penetrating-Radar data set from Larreule, France. The presented work was funded by the government of the Autonomous Region of Sardinia and by the Italian Ministry of Research and Education.
An ARM data-oriented diagnostics package to evaluate the climate model simulation
NASA Astrophysics Data System (ADS)
Zhang, C.; Xie, S.
2016-12-01
A set of diagnostics that utilize long-term high frequency measurements from the DOE Atmospheric Radiation Measurement (ARM) program is developed for evaluating the regional simulation of clouds, radiation and precipitation in climate models. The diagnostics results are computed and visualized automatically in a python-based package that aims to serve as an easy entry point for evaluating climate simulations using the ARM data, as well as the CMIP5 multi-model simulations. Basic performance metrics are computed to measure the accuracy of mean state and variability of simulated regional climate. The evaluated physical quantities include vertical profiles of clouds, temperature, relative humidity, cloud liquid water path, total column water vapor, precipitation, sensible and latent heat fluxes, radiative fluxes, aerosol and cloud microphysical properties. Process-oriented diagnostics focusing on individual cloud and precipitation-related phenomena are developed for the evaluation and development of specific model physical parameterizations. Application of the ARM diagnostics package will be presented in the AGU session. This work is performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, IM release number is: LLNL-ABS-698645.
Elastic Cloud Computing Infrastructures in the Open Cirrus Testbed Implemented via Eucalyptus
NASA Astrophysics Data System (ADS)
Baun, Christian; Kunze, Marcel
Cloud computing realizes the advantages and overcomes some restrictionsof the grid computing paradigm. Elastic infrastructures can easily be createdand managed by cloud users. In order to accelerate the research ondata center management and cloud services the OpenCirrusTM researchtestbed has been started by HP, Intel and Yahoo!. Although commercialcloud offerings are proprietary, Open Source solutions exist in the field ofIaaS with Eucalyptus, PaaS with AppScale and at the applications layerwith Hadoop MapReduce. This paper examines the I/O performance ofcloud computing infrastructures implemented with Eucalyptus in contrastto Amazon S3.
NASA Astrophysics Data System (ADS)
Sánchez-Martínez, V.; Borges, G.; Borrego, C.; del Peso, J.; Delfino, M.; Gomes, J.; González de la Hoz, S.; Pacheco Pages, A.; Salt, J.; Sedov, A.; Villaplana, M.; Wolters, H.
2014-06-01
In this contribution we describe the performance of the Iberian (Spain and Portugal) ATLAS cloud during the first LHC running period (March 2010-January 2013) in the context of the GRID Computing and Data Distribution Model. The evolution of the resources for CPU, disk and tape in the Iberian Tier-1 and Tier-2s is summarized. The data distribution over all ATLAS destinations is shown, focusing on the number of files transferred and the size of the data. The status and distribution of simulation and analysis jobs within the cloud are discussed. The Distributed Analysis tools used to perform physics analysis are explained as well. Cloud performance in terms of the availability and reliability of its sites is discussed. The effect of the changes in the ATLAS Computing Model on the cloud is analyzed. Finally, the readiness of the Iberian Cloud towards the first Long Shutdown (LS1) is evaluated and an outline of the foreseen actions to take in the coming years is given. The shutdown will be a good opportunity to improve and evolve the ATLAS Distributed Computing system to prepare for the future challenges of the LHC operation.
NASA Astrophysics Data System (ADS)
Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.
2012-12-01
Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
Investigation into Cloud Computing for More Robust Automated Bulk Image Geoprocessing
NASA Technical Reports Server (NTRS)
Brown, Richard B.; Smoot, James C.; Underwood, Lauren; Armstrong, C. Duane
2012-01-01
Geospatial resource assessments frequently require timely geospatial data processing that involves large multivariate remote sensing data sets. In particular, for disasters, response requires rapid access to large data volumes, substantial storage space and high performance processing capability. The processing and distribution of this data into usable information products requires a processing pipeline that can efficiently manage the required storage, computing utilities, and data handling requirements. In recent years, with the availability of cloud computing technology, cloud processing platforms have made available a powerful new computing infrastructure resource that can meet this need. To assess the utility of this resource, this project investigates cloud computing platforms for bulk, automated geoprocessing capabilities with respect to data handling and application development requirements. This presentation is of work being conducted by Applied Sciences Program Office at NASA-Stennis Space Center. A prototypical set of image manipulation and transformation processes that incorporate sample Unmanned Airborne System data were developed to create value-added products and tested for implementation on the "cloud". This project outlines the steps involved in creating and testing of open source software developed process code on a local prototype platform, and then transitioning this code with associated environment requirements into an analogous, but memory and processor enhanced cloud platform. A data processing cloud was used to store both standard digital camera panchromatic and multi-band image data, which were subsequently subjected to standard image processing functions such as NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), band stacking, reprojection, and other similar type data processes. Cloud infrastructure service providers were evaluated by taking these locally tested processing functions, and then applying them to a given cloud-enabled infrastructure to assesses and compare environment setup options and enabled technologies. This project reviews findings that were observed when cloud platforms were evaluated for bulk geoprocessing capabilities based on data handling and application development requirements.
Development and Validation of a New Fallout Transport Method Using Variable Spectral Winds
NASA Astrophysics Data System (ADS)
Hopkins, Arthur Thomas
A new method has been developed to incorporate variable winds into fallout transport calculations. The method uses spectral coefficients derived by the National Meteorological Center. Wind vector components are computed with the coefficients along the trajectories of falling particles. Spectral winds are used in the two-step method to compute dose rate on the ground, downwind of a nuclear cloud. First, the hotline is located by computing trajectories of particles from an initial, stabilized cloud, through spectral winds, to the ground. The connection of particle landing points is the hotline. Second, dose rate on and around the hotline is computed by analytically smearing the falling cloud's activity along the ground. The feasibility of using specgtral winds for fallout particle transport was validated by computing Mount St. Helens ashfall locations and comparing calculations to fallout data. In addition, an ashfall equation was derived for computing volcanic ash mass/area on the ground. Ashfall data and the ashfall equation were used to back-calculate an aggregated particle size distribution for the Mount St. Helens eruption cloud. Further validation was performed by comparing computed and actual trajectories of a high explosive dust cloud (DIRECT COURSE). Using an error propagation formula, it was determined that uncertainties in spectral wind components produce less than four percent of the total dose rate variance. In summary, this research demonstrated the feasibility of using spectral coefficients for fallout transport calculations, developed a two-step smearing model to treat variable winds, and showed that uncertainties in spectral winds do not contribute significantly to the error in computed dose rate.
Lai, Chin-Feng; Chen, Min; Pan, Jeng-Shyang; Youn, Chan-Hyun; Chao, Han-Chieh
2014-03-01
As cloud computing and wireless body sensor network technologies become gradually developed, ubiquitous healthcare services prevent accidents instantly and effectively, as well as provides relevant information to reduce related processing time and cost. This study proposes a co-processing intermediary framework integrated cloud and wireless body sensor networks, which is mainly applied to fall detection and 3-D motion reconstruction. In this study, the main focuses includes distributed computing and resource allocation of processing sensing data over the computing architecture, network conditions and performance evaluation. Through this framework, the transmissions and computing time of sensing data are reduced to enhance overall performance for the services of fall events detection and 3-D motion reconstruction.
Scalable cloud without dedicated storage
NASA Astrophysics Data System (ADS)
Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.
2015-05-01
We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Reconciliation of the cloud computing model with US federal electronic health record regulations
2011-01-01
Cloud computing refers to subscription-based, fee-for-service utilization of computer hardware and software over the Internet. The model is gaining acceptance for business information technology (IT) applications because it allows capacity and functionality to increase on the fly without major investment in infrastructure, personnel or licensing fees. Large IT investments can be converted to a series of smaller operating expenses. Cloud architectures could potentially be superior to traditional electronic health record (EHR) designs in terms of economy, efficiency and utility. A central issue for EHR developers in the US is that these systems are constrained by federal regulatory legislation and oversight. These laws focus on security and privacy, which are well-recognized challenges for cloud computing systems in general. EHRs built with the cloud computing model can achieve acceptable privacy and security through business associate contracts with cloud providers that specify compliance requirements, performance metrics and liability sharing. PMID:21727204
Reconciliation of the cloud computing model with US federal electronic health record regulations.
Schweitzer, Eugene J
2012-01-01
Cloud computing refers to subscription-based, fee-for-service utilization of computer hardware and software over the Internet. The model is gaining acceptance for business information technology (IT) applications because it allows capacity and functionality to increase on the fly without major investment in infrastructure, personnel or licensing fees. Large IT investments can be converted to a series of smaller operating expenses. Cloud architectures could potentially be superior to traditional electronic health record (EHR) designs in terms of economy, efficiency and utility. A central issue for EHR developers in the US is that these systems are constrained by federal regulatory legislation and oversight. These laws focus on security and privacy, which are well-recognized challenges for cloud computing systems in general. EHRs built with the cloud computing model can achieve acceptable privacy and security through business associate contracts with cloud providers that specify compliance requirements, performance metrics and liability sharing.
Jade: using on-demand cloud analysis to give scientists back their flow
NASA Astrophysics Data System (ADS)
Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.
2017-12-01
The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.
NASA Technical Reports Server (NTRS)
Chaudhary, Aashish; Votava, Petr; Nemani, Ramakrishna R.; Michaelis, Andrew; Kotfila, Chris
2016-01-01
We are developing capabilities for an integrated petabyte-scale Earth science collaborative analysis and visualization environment. The ultimate goal is to deploy this environment within the NASA Earth Exchange (NEX) and OpenNEX in order to enhance existing science data production pipelines in both high-performance computing (HPC) and cloud environments. Bridging of HPC and cloud is a fairly new concept under active research and this system significantly enhances the ability of the scientific community to accelerate analysis and visualization of Earth science data from NASA missions, model outputs and other sources. We have developed a web-based system that seamlessly interfaces with both high-performance computing (HPC) and cloud environments, providing tools that enable science teams to develop and deploy large-scale analysis, visualization and QA pipelines of both the production process and the data products, and enable sharing results with the community. Our project is developed in several stages each addressing separate challenge - workflow integration, parallel execution in either cloud or HPC environments and big-data analytics or visualization. This work benefits a number of existing and upcoming projects supported by NEX, such as the Web Enabled Landsat Data (WELD), where we are developing a new QA pipeline for the 25PB system.
Analytics and Visualization Pipelines for Big Data on the NASA Earth Exchange (NEX) and OpenNEX
NASA Astrophysics Data System (ADS)
Chaudhary, A.; Votava, P.; Nemani, R. R.; Michaelis, A.; Kotfila, C.
2016-12-01
We are developing capabilities for an integrated petabyte-scale Earth science collaborative analysis and visualization environment. The ultimate goal is to deploy this environment within the NASA Earth Exchange (NEX) and OpenNEX in order to enhance existing science data production pipelines in both high-performance computing (HPC) and cloud environments. Bridging of HPC and cloud is a fairly new concept under active research and this system significantly enhances the ability of the scientific community to accelerate analysis and visualization of Earth science data from NASA missions, model outputs and other sources. We have developed a web-based system that seamlessly interfaces with both high-performance computing (HPC) and cloud environments, providing tools that enable science teams to develop and deploy large-scale analysis, visualization and QA pipelines of both the production process and the data products, and enable sharing results with the community. Our project is developed in several stages each addressing separate challenge - workflow integration, parallel execution in either cloud or HPC environments and big-data analytics or visualization. This work benefits a number of existing and upcoming projects supported by NEX, such as the Web Enabled Landsat Data (WELD), where we are developing a new QA pipeline for the 25PB system.
Toward a Proof of Concept Cloud Framework for Physics Applications on Blue Gene Supercomputers
NASA Astrophysics Data System (ADS)
Dreher, Patrick; Scullin, William; Vouk, Mladen
2015-09-01
Traditional high performance supercomputers are capable of delivering large sustained state-of-the-art computational resources to physics applications over extended periods of time using batch processing mode operating environments. However, today there is an increasing demand for more complex workflows that involve large fluctuations in the levels of HPC physics computational requirements during the simulations. Some of the workflow components may also require a richer set of operating system features and schedulers than normally found in a batch oriented HPC environment. This paper reports on progress toward a proof of concept design that implements a cloud framework onto BG/P and BG/Q platforms at the Argonne Leadership Computing Facility. The BG/P implementation utilizes the Kittyhawk utility and the BG/Q platform uses an experimental heterogeneous FusedOS operating system environment. Both platforms use the Virtual Computing Laboratory as the cloud computing system embedded within the supercomputer. This proof of concept design allows a cloud to be configured so that it can capitalize on the specialized infrastructure capabilities of a supercomputer and the flexible cloud configurations without resorting to virtualization. Initial testing of the proof of concept system is done using the lattice QCD MILC code. These types of user reconfigurable environments have the potential to deliver experimental schedulers and operating systems within a working HPC environment for physics computations that may be different from the native OS and schedulers on production HPC supercomputers.
GenomeVIP: a cloud platform for genomic variant discovery and interpretation
Mashl, R. Jay; Scott, Adam D.; Huang, Kuan-lin; Wyczalkowski, Matthew A.; Yoon, Christopher J.; Niu, Beifang; DeNardo, Erin; Yellapantula, Venkata D.; Handsaker, Robert E.; Chen, Ken; Koboldt, Daniel C.; Ye, Kai; Fenyö, David; Raphael, Benjamin J.; Wendl, Michael C.; Ding, Li
2017-01-01
Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets. PMID:28522612
Cloud Computing - A Unified Approach for Surveillance Issues
NASA Astrophysics Data System (ADS)
Rachana, C. R.; Banu, Reshma, Dr.; Ahammed, G. F. Ali, Dr.; Parameshachari, B. D., Dr.
2017-08-01
Cloud computing describes highly scalable resources provided as an external service via the Internet on a basis of pay-per-use. From the economic point of view, the main attractiveness of cloud computing is that users only use what they need, and only pay for what they actually use. Resources are available for access from the cloud at any time, and from any location through networks. Cloud computing is gradually replacing the traditional Information Technology Infrastructure. Securing data is one of the leading concerns and biggest issue for cloud computing. Privacy of information is always a crucial pointespecially when an individual’s personalinformation or sensitive information is beingstored in the organization. It is indeed true that today; cloud authorization systems are notrobust enough. This paper presents a unified approach for analyzing the various security issues and techniques to overcome the challenges in the cloud environment.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
High Resolution Nature Runs and the Big Data Challenge
NASA Technical Reports Server (NTRS)
Webster, W. Phillip; Duffy, Daniel Q.
2015-01-01
NASA's Global Modeling and Assimilation Office at Goddard Space Flight Center is undertaking a series of very computationally intensive Nature Runs and a downscaled reanalysis. The nature runs use the GEOS-5 as an Atmospheric General Circulation Model (AGCM) while the reanalysis uses the GEOS-5 in Data Assimilation mode. This paper will present computational challenges from three runs, two of which are AGCM and one is downscaled reanalysis using the full DAS. The nature runs will be completed at two surface grid resolutions, 7 and 3 kilometers and 72 vertical levels. The 7 km run spanned 2 years (2005-2006) and produced 4 PB of data while the 3 km run will span one year and generate 4 BP of data. The downscaled reanalysis (MERRA-II Modern-Era Reanalysis for Research and Applications) will cover 15 years and generate 1 PB of data. Our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. MERRA/AS enables MapReduce analytics over MERRA reanalysis data collection by bringing together the high-performance computing, scalable data management, and a domain-specific climate data services API. NASA's High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC comprises a high speed Infinib and network, high performance file systems and object storage, and a virtual system environments specific for data intensive, science applications. These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs
Research on elastic resource management for multi-queue under cloud computing environment
NASA Astrophysics Data System (ADS)
CHENG, Zhenjing; LI, Haibo; HUANG, Qiulan; Cheng, Yaodong; CHEN, Gang
2017-10-01
As a new approach to manage computing resource, virtualization technology is more and more widely applied in the high-energy physics field. A virtual computing cluster based on Openstack was built at IHEP, using HTCondor as the job queue management system. In a traditional static cluster, a fixed number of virtual machines are pre-allocated to the job queue of different experiments. However this method cannot be well adapted to the volatility of computing resource requirements. To solve this problem, an elastic computing resource management system under cloud computing environment has been designed. This system performs unified management of virtual computing nodes on the basis of job queue in HTCondor based on dual resource thresholds as well as the quota service. A two-stage pool is designed to improve the efficiency of resource pool expansion. This paper will present several use cases of the elastic resource management system in IHEPCloud. The practical run shows virtual computing resource dynamically expanded or shrunk while computing requirements change. Additionally, the CPU utilization ratio of computing resource was significantly increased when compared with traditional resource management. The system also has good performance when there are multiple condor schedulers and multiple job queues.
Use of parallel computing in mass processing of laser data
NASA Astrophysics Data System (ADS)
Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.
2015-12-01
The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
Use of cloud computing in biomedicine.
Sobeslav, Vladimir; Maresova, Petra; Krejcar, Ondrej; Franca, Tanos C C; Kuca, Kamil
2016-12-01
Nowadays, biomedicine is characterised by a growing need for processing of large amounts of data in real time. This leads to new requirements for information and communication technologies (ICT). Cloud computing offers a solution to these requirements and provides many advantages, such as cost savings, elasticity and scalability of using ICT. The aim of this paper is to explore the concept of cloud computing and the related use of this concept in the area of biomedicine. Authors offer a comprehensive analysis of the implementation of the cloud computing approach in biomedical research, decomposed into infrastructure, platform and service layer, and a recommendation for processing large amounts of data in biomedicine. Firstly, the paper describes the appropriate forms and technological solutions of cloud computing. Secondly, the high-end computing paradigm of cloud computing aspects is analysed. Finally, the potential and current use of applications in scientific research of this technology in biomedicine is discussed.
NASA Astrophysics Data System (ADS)
Nguyen, L.; Chee, T.; Minnis, P.; Spangenberg, D.; Ayers, J. K.; Palikonda, R.; Vakhnin, A.; Dubois, R.; Murphy, P. R.
2014-12-01
The processing, storage and dissemination of satellite cloud and radiation products produced at NASA Langley Research Center are key activities for the Climate Science Branch. A constellation of systems operates in sync to accomplish these goals. Because of the complexity involved with operating such intricate systems, there are both high failure rates and high costs for hardware and system maintenance. Cloud computing has the potential to ameliorate cost and complexity issues. Over time, the cloud computing model has evolved and hybrid systems comprising off-site as well as on-site resources are now common. Towards our mission of providing the highest quality research products to the widest audience, we have explored the use of the Amazon Web Services (AWS) Cloud and Storage and present a case study of our results and efforts. This project builds upon NASA Langley Cloud and Radiation Group's experience with operating large and complex computing infrastructures in a reliable and cost effective manner to explore novel ways to leverage cloud computing resources in the atmospheric science environment. Our case study presents the project requirements and then examines the fit of AWS with the LaRC computing model. We also discuss the evaluation metrics, feasibility, and outcomes and close the case study with the lessons we learned that would apply to others interested in exploring the implementation of the AWS system in their own atmospheric science computing environments.
Uncover the Cloud for Geospatial Sciences and Applications to Adopt Cloud Computing
NASA Astrophysics Data System (ADS)
Yang, C.; Huang, Q.; Xia, J.; Liu, K.; Li, J.; Xu, C.; Sun, M.; Bambacus, M.; Xu, Y.; Fay, D.
2012-12-01
Cloud computing is emerging as the future infrastructure for providing computing resources to support and enable scientific research, engineering development, and application construction, as well as work force education. On the other hand, there is a lot of doubt about the readiness of cloud computing to support a variety of scientific research, development and educations. This research is a project funded by NASA SMD to investigate through holistic studies how ready is the cloud computing to support geosciences. Four applications with different computing characteristics including data, computing, concurrent, and spatiotemporal intensities are taken to test the readiness of cloud computing to support geosciences. Three popular and representative cloud platforms including Amazon EC2, Microsoft Azure, and NASA Nebula as well as a traditional cluster are utilized in the study. Results illustrates that cloud is ready to some degree but more research needs to be done to fully implemented the cloud benefit as advertised by many vendors and defined by NIST. Specifically, 1) most cloud platform could help stand up new computing instances, a new computer, in a few minutes as envisioned, therefore, is ready to support most computing needs in an on demand fashion; 2) the load balance and elasticity, a defining characteristic, is ready in some cloud platforms, such as Amazon EC2, to support bigger jobs, e.g., needs response in minutes, while some are not ready to support the elasticity and load balance well. All cloud platform needs further research and development to support real time application at subminute level; 3) the user interface and functionality of cloud platforms vary a lot and some of them are very professional and well supported/documented, such as Amazon EC2, some of them needs significant improvement for the general public to adopt cloud computing without professional training or knowledge about computing infrastructure; 4) the security is a big concern in cloud computing platform, with the sharing spirit of cloud computing, it is very hard to ensure higher level security, except a private cloud is built for a specific organization without public access, public cloud platform does not support FISMA medium level yet and may never be able to support FISMA high level; 5) HPC jobs needs of cloud computing is not well supported and only Amazon EC2 supports this well. The research is being taken by NASA and other agencies to consider cloud computing adoption. We hope the publication of the research would also benefit the public to adopt cloud computing.
Platform for High-Assurance Cloud Computing
2016-06-01
to create today’s standard cloud computing applications and services. Additionally , our SuperCloud (a related but distinct project under the same... Additionally , our SuperCloud (a related but distinct project under the same MRC funding) reduces vendor lock-in and permits application to migrate, to follow...managing key- value storage with strong assurance properties. This first accomplishment allows us to climb the cloud technical stack, by offering
Neylon, J; Min, Y; Kupelian, P; Low, D A; Santhanam, A
2017-04-01
In this paper, a multi-GPU cloud-based server (MGCS) framework is presented for dose calculations, exploring the feasibility of remote computing power for parallelization and acceleration of computationally and time intensive radiotherapy tasks in moving toward online adaptive therapies. An analytical model was developed to estimate theoretical MGCS performance acceleration and intelligently determine workload distribution. Numerical studies were performed with a computing setup of 14 GPUs distributed over 4 servers interconnected by a 1 Gigabits per second (Gbps) network. Inter-process communication methods were optimized to facilitate resource distribution and minimize data transfers over the server interconnect. The analytically predicted computation time predicted matched experimentally observations within 1-5 %. MGCS performance approached a theoretical limit of acceleration proportional to the number of GPUs utilized when computational tasks far outweighed memory operations. The MGCS implementation reproduced ground-truth dose computations with negligible differences, by distributing the work among several processes and implemented optimization strategies. The results showed that a cloud-based computation engine was a feasible solution for enabling clinics to make use of fast dose calculations for advanced treatment planning and adaptive radiotherapy. The cloud-based system was able to exceed the performance of a local machine even for optimized calculations, and provided significant acceleration for computationally intensive tasks. Such a framework can provide access to advanced technology and computational methods to many clinics, providing an avenue for standardization across institutions without the requirements of purchasing, maintaining, and continually updating hardware.
NASA Astrophysics Data System (ADS)
Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon
2017-01-01
With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
NASA Astrophysics Data System (ADS)
Puzyrkov, Dmitry; Polyakov, Sergey; Podryga, Viktoriia; Markizov, Sergey
2018-02-01
At the present stage of computer technology development it is possible to study the properties and processes in complex systems at molecular and even atomic levels, for example, by means of molecular dynamics methods. The most interesting are problems related with the study of complex processes under real physical conditions. Solving such problems requires the use of high performance computing systems of various types, for example, GRID systems and HPC clusters. Considering the time consuming computational tasks, the need arises of software for automatic and unified monitoring of such computations. A complex computational task can be performed over different HPC systems. It requires output data synchronization between the storage chosen by a scientist and the HPC system used for computations. The design of the computational domain is also quite a problem. It requires complex software tools and algorithms for proper atomistic data generation on HPC systems. The paper describes the prototype of a cloud service, intended for design of atomistic systems of large volume for further detailed molecular dynamic calculations and computational management for this calculations, and presents the part of its concept aimed at initial data generation on the HPC systems.
Point Cloud Management Through the Realization of the Intelligent Cloud Viewer Software
NASA Astrophysics Data System (ADS)
Costantino, D.; Angelini, M. G.; Settembrini, F.
2017-05-01
The paper presents a software dedicated to the elaboration of point clouds, called Intelligent Cloud Viewer (ICV), made in-house by AESEI software (Spin-Off of Politecnico di Bari), allowing to view point cloud of several tens of millions of points, also on of "no" very high performance systems. The elaborations are carried out on the whole point cloud and managed by means of the display only part of it in order to speed up rendering. It is designed for 64-bit Windows and is fully written in C ++ and integrates different specialized modules for computer graphics (Open Inventor by SGI, Silicon Graphics Inc), maths (BLAS, EIGEN), computational geometry (CGAL, Computational Geometry Algorithms Library), registration and advanced algorithms for point clouds (PCL, Point Cloud Library), advanced data structures (BOOST, Basic Object Oriented Supporting Tools), etc. ICV incorporates a number of features such as, for example, cropping, transformation and georeferencing, matching, registration, decimation, sections, distances calculation between clouds, etc. It has been tested on photographic and TLS (Terrestrial Laser Scanner) data, obtaining satisfactory results. The potentialities of the software have been tested by carrying out the photogrammetric survey of the Castel del Monte which was already available in previous laser scanner survey made from the ground by the same authors. For the aerophotogrammetric survey has been adopted a flight height of approximately 1000ft AGL (Above Ground Level) and, overall, have been acquired over 800 photos in just over 15 minutes, with a covering not less than 80%, the planned speed of about 90 knots.
Radiotherapy Monte Carlo simulation using cloud computing technology.
Poole, C M; Cornelius, I; Trapp, J V; Langton, C M
2012-12-01
Cloud computing allows for vast computational resources to be leveraged quickly and easily in bursts as and when required. Here we describe a technique that allows for Monte Carlo radiotherapy dose calculations to be performed using GEANT4 and executed in the cloud, with relative simulation cost and completion time evaluated as a function of machine count. As expected, simulation completion time decreases as 1/n for n parallel machines, and relative simulation cost is found to be optimal where n is a factor of the total simulation time in hours. Using the technique, we demonstrate the potential usefulness of cloud computing as a solution for rapid Monte Carlo simulation for radiotherapy dose calculation without the need for dedicated local computer hardware as a proof of principal.
Evaluation of wind field statistics near and inside clouds using a coherent Doppler lidar
NASA Astrophysics Data System (ADS)
Lottman, Brian Todd
1998-09-01
This work proposes advanced techniques for measuring the spatial wind field statistics near and inside clouds using a vertically pointing solid state coherent Doppler lidar on a fixed ground based platform. The coherent Doppler lidar is an ideal instrument for high spatial and temporal resolution velocity estimates. The basic parameters of lidar are discussed, including a complete statistical description of the Doppler lidar signal. This description is extended to cases with simple functional forms for aerosol backscatter and velocity. An estimate for the mean velocity over a sensing volume is produced by estimating the mean spectra. There are many traditional spectral estimators, which are useful for conditions with slowly varying velocity and backscatter. A new class of estimators (novel) is introduced that produces reliable velocity estimates for conditions with large variations in aerosol backscatter and velocity with range, such as cloud conditions. Performance of traditional and novel estimators is computed for a variety of deterministic atmospheric conditions using computer simulated data. Wind field statistics are produced for actual data for a cloud deck, and for multi- layer clouds. Unique results include detection of possible spectral signatures for rain, estimates for the structure function inside a cloud deck, reliable velocity estimation techniques near and inside thin clouds, and estimates for simple wind field statistics between cloud layers.
NASA Astrophysics Data System (ADS)
Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas
2016-03-01
Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.
SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data
Fischer, Maria; Snajder, Rene; Pabinger, Stephan; Dander, Andreas; Schossig, Anna; Zschocke, Johannes; Trajanoski, Zlatko; Stocker, Gernot
2012-01-01
In recent studies, exome sequencing has proven to be a successful screening tool for the identification of candidate genes causing rare genetic diseases. Although underlying targeted sequencing methods are well established, necessary data handling and focused, structured analysis still remain demanding tasks. Here, we present a cloud-enabled autonomous analysis pipeline, which comprises the complete exome analysis workflow. The pipeline combines several in-house developed and published applications to perform the following steps: (a) initial quality control, (b) intelligent data filtering and pre-processing, (c) sequence alignment to a reference genome, (d) SNP and DIP detection, (e) functional annotation of variants using different approaches, and (f) detailed report generation during various stages of the workflow. The pipeline connects the selected analysis steps, exposes all available parameters for customized usage, performs required data handling, and distributes computationally expensive tasks either on a dedicated high-performance computing infrastructure or on the Amazon cloud environment (EC2). The presented application has already been used in several research projects including studies to elucidate the role of rare genetic diseases. The pipeline is continuously tested and is publicly available under the GPL as a VirtualBox or Cloud image at http://simplex.i-med.ac.at; additional supplementary data is provided at http://www.icbi.at/exome. PMID:22870267
Flexible services for the support of research.
Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John
2013-01-28
Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
Bigdata Driven Cloud Security: A Survey
NASA Astrophysics Data System (ADS)
Raja, K.; Hanifa, Sabibullah Mohamed
2017-08-01
Cloud Computing (CC) is a fast-growing technology to perform massive-scale and complex computing. It eliminates the need to maintain expensive computing hardware, dedicated space, and software. Recently, it has been observed that massive growth in the scale of data or big data generated through cloud computing. CC consists of a front-end, includes the users’ computers and software required to access the cloud network, and back-end consists of various computers, servers and database systems that create the cloud. In SaaS (Software as-a-Service - end users to utilize outsourced software), PaaS (Platform as-a-Service-platform is provided) and IaaS (Infrastructure as-a-Service-physical environment is outsourced), and DaaS (Database as-a-Service-data can be housed within a cloud), where leading / traditional cloud ecosystem delivers the cloud services become a powerful and popular architecture. Many challenges and issues are in security or threats, most vital barrier for cloud computing environment. The main barrier to the adoption of CC in health care relates to Data security. When placing and transmitting data using public networks, cyber attacks in any form are anticipated in CC. Hence, cloud service users need to understand the risk of data breaches and adoption of service delivery model during deployment. This survey deeply covers the CC security issues (covering Data Security in Health care) so as to researchers can develop the robust security application models using Big Data (BD) on CC (can be created / deployed easily). Since, BD evaluation is driven by fast-growing cloud-based applications developed using virtualized technologies. In this purview, MapReduce [12] is a good example of big data processing in a cloud environment, and a model for Cloud providers.
Unidata's Vision for Transforming Geoscience by Moving Data Services and Software to the Cloud
NASA Astrophysics Data System (ADS)
Ramamurthy, M. K.; Fisher, W.; Yoksas, T.
2014-12-01
Universities are facing many challenges: shrinking budgets, rapidly evolving information technologies, exploding data volumes, multidisciplinary science requirements, and high student expectations. These changes are upending traditional approaches to accessing and using data and software. It is clear that Unidata's products and services must evolve to support new approaches to research and education. After years of hype and ambiguity, cloud computing is maturing in usability in many areas of science and education, bringing the benefits of virtualized and elastic remote services to infrastructure, software, computation, and data. Cloud environments reduce the amount of time and money spent to procure, install, and maintain new hardware and software, and reduce costs through resource pooling and shared infrastructure. Cloud services aimed at providing any resource, at any time, from any place, using any device are increasingly being embraced by all types of organizations. Given this trend and the enormous potential of cloud-based services, Unidata is taking moving to augment its products, services, data delivery mechanisms and applications to align with the cloud-computing paradigm. Specifically, Unidata is working toward establishing a community-based development environment that supports the creation and use of software services to build end-to-end data workflows. The design encourages the creation of services that can be broken into small, independent chunks that provide simple capabilities. Chunks could be used individually to perform a task, or chained into simple or elaborate workflows. The services will also be portable, allowing their use in researchers' own cloud-based computing environments. In this talk, we present a vision for Unidata's future in a cloud-enabled data services and discuss our initial efforts to deploy a subset of Unidata data services and tools in the Amazon EC2 and Microsoft Azure cloud environments, including the transfer of real-time meteorological data into its cloud instances, product generation using those data, and the deployment of TDS, McIDAS ADDE and AWIPS II data servers and the Integrated Data Server visualization tool.
NASA Technical Reports Server (NTRS)
Berendes, Todd; Sengupta, Sailes K.; Welch, Ron M.; Wielicki, Bruce A.; Navar, Murgesh
1992-01-01
A semiautomated methodology is developed for estimating cumulus cloud base heights on the basis of high spatial resolution Landsat MSS data, using various image-processing techniques to match cloud edges with their corresponding shadow edges. The cloud base height is then estimated by computing the separation distance between the corresponding generalized Hough transform reference points. The differences between the cloud base heights computed by these means and a manual verification technique are of the order of 100 m or less; accuracies of 50-70 m may soon be possible via EOS instruments.
A resource-sharing model based on a repeated game in fog computing.
Sun, Yan; Zhang, Nan
2017-03-01
With the rapid development of cloud computing techniques, the number of users is undergoing exponential growth. It is difficult for traditional data centers to perform many tasks in real time because of the limited bandwidth of resources. The concept of fog computing is proposed to support traditional cloud computing and to provide cloud services. In fog computing, the resource pool is composed of sporadic distributed resources that are more flexible and movable than a traditional data center. In this paper, we propose a fog computing structure and present a crowd-funding algorithm to integrate spare resources in the network. Furthermore, to encourage more resource owners to share their resources with the resource pool and to supervise the resource supporters as they actively perform their tasks, we propose an incentive mechanism in our algorithm. Simulation results show that our proposed incentive mechanism can effectively reduce the SLA violation rate and accelerate the completion of tasks.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.
Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker
2016-01-01
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
Detecting Abnormal Machine Characteristics in Cloud Infrastructures
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Das, Kamalika; Matthews, Bryan L.
2011-01-01
In the cloud computing environment resources are accessed as services rather than as a product. Monitoring this system for performance is crucial because of typical pay-peruse packages bought by the users for their jobs. With the huge number of machines currently in the cloud system, it is often extremely difficult for system administrators to keep track of all machines using distributed monitoring programs such as Ganglia1 which lacks system health assessment and summarization capabilities. To overcome this problem, we propose a technique for automated anomaly detection using machine performance data in the cloud. Our algorithm is entirely distributed and runs locally on each computing machine on the cloud in order to rank the machines in order of their anomalous behavior for given jobs. There is no need to centralize any of the performance data for the analysis and at the end of the analysis, our algorithm generates error reports, thereby allowing the system administrators to take corrective actions. Experiments performed on real data sets collected for different jobs validate the fact that our algorithm has a low overhead for tracking anomalous machines in a cloud infrastructure.
Monte Carlo Calculations of Polarized Microwave Radiation Emerging from Cloud Structures
NASA Technical Reports Server (NTRS)
Kummerow, Christian; Roberti, Laura
1998-01-01
The last decade has seen tremendous growth in cloud dynamical and microphysical models that are able to simulate storms and storm systems with very high spatial resolution, typically of the order of a few kilometers. The fairly realistic distributions of cloud and hydrometeor properties that these models generate has in turn led to a renewed interest in the three-dimensional microwave radiative transfer modeling needed to understand the effect of cloud and rainfall inhomogeneities upon microwave observations. Monte Carlo methods, and particularly backwards Monte Carlo methods have shown themselves to be very desirable due to the quick convergence of the solutions. Unfortunately, backwards Monte Carlo methods are not well suited to treat polarized radiation. This study reviews the existing Monte Carlo methods and presents a new polarized Monte Carlo radiative transfer code. The code is based on a forward scheme but uses aliasing techniques to keep the computational requirements equivalent to the backwards solution. Radiative transfer computations have been performed using a microphysical-dynamical cloud model and the results are presented together with the algorithm description.
Assessment of different models for computing the probability of a clear line of sight
NASA Astrophysics Data System (ADS)
Bojin, Sorin; Paulescu, Marius; Badescu, Viorel
2017-12-01
This paper is focused on modeling the morphological properties of the cloud fields in terms of the probability of a clear line of sight (PCLOS). PCLOS is defined as the probability that a line of sight between observer and a given point of the celestial vault goes freely without intersecting a cloud. A variety of PCLOS models assuming the cloud shape hemisphere, semi-ellipsoid and ellipsoid are tested. The effective parameters (cloud aspect ratio and absolute cloud fraction) are extracted from high-resolution series of sunshine number measurements. The performance of the PCLOS models is evaluated from the perspective of their ability in retrieving the point cloudiness. The advantages and disadvantages of the tested models are discussed, aiming to a simplified parameterization of PCLOS models.
ProteoCloud: a full-featured open source proteomics cloud computing pipeline.
Muth, Thilo; Peters, Julian; Blackburn, Jonathan; Rapp, Erdmann; Martens, Lennart
2013-08-02
We here present the ProteoCloud pipeline, a freely available, full-featured cloud-based platform to perform computationally intensive, exhaustive searches in a cloud environment using five different peptide identification algorithms. ProteoCloud is entirely open source, and is built around an easy to use and cross-platform software client with a rich graphical user interface. This client allows full control of the number of cloud instances to initiate and of the spectra to assign for identification. It also enables the user to track progress, and to visualize and interpret the results in detail. Source code, binaries and documentation are all available at http://proteocloud.googlecode.com. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Alvarez, César I.; Teodoro, Ana; Tierra, Alfonso
2017-10-01
Thin clouds in the optical remote sensing data are frequent and in most of the cases don't allow to have a pure surface data in order to calculate some indexes as Normalized Difference Vegetation Index (NDVI). This paper aims to evaluate the Automatic Cloud Removal Method (ACRM) algorithm over a high elevation city like Quito (Ecuador), with an altitude of 2800 meters above sea level, where the clouds are presented all the year. The ACRM is an algorithm that considers a linear regression between each Landsat 8 OLI band and the Cirrus band using the slope obtained with the linear regression established. This algorithm was employed without any reference image or mask to try to remove the clouds. The results of the application of the ACRM algorithm over Quito didn't show a good performance. Therefore, was considered improving this algorithm using a different slope value data (ACMR Improved). After, the NDVI computation was compared with a reference NDVI MODIS data (MOD13Q1). The ACMR Improved algorithm had a successful result when compared with the original ACRM algorithm. In the future, this Improved ACRM algorithm needs to be tested in different regions of the world with different conditions to evaluate if the algorithm works successfully for all conditions.
NASA Astrophysics Data System (ADS)
Kang, Zhizhong
2013-10-01
This paper presents a new approach to automatic registration of terrestrial laser scanning (TLS) point clouds utilizing a novel robust estimation method by an efficient BaySAC (BAYes SAmpling Consensus). The proposed method directly generates reflectance images from 3D point clouds, and then using SIFT algorithm extracts keypoints to identify corresponding image points. The 3D corresponding points, from which transformation parameters between point clouds are computed, are acquired by mapping the 2D ones onto the point cloud. To remove false accepted correspondences, we implement a conditional sampling method to select the n data points with the highest inlier probabilities as a hypothesis set and update the inlier probabilities of each data point using simplified Bayes' rule for the purpose of improving the computation efficiency. The prior probability is estimated by the verification of the distance invariance between correspondences. The proposed approach is tested on four data sets acquired by three different scanners. The results show that, comparing with the performance of RANSAC, BaySAC leads to less iterations and cheaper computation cost when the hypothesis set is contaminated with more outliers. The registration results also indicate that, the proposed algorithm can achieve high registration accuracy on all experimental datasets.
Dynamic VMs placement for energy efficiency by PSO in cloud computing
NASA Astrophysics Data System (ADS)
Dashti, Seyed Ebrahim; Rahmani, Amir Masoud
2016-03-01
Recently, cloud computing is growing fast and helps to realise other high technologies. In this paper, we propose a hieratical architecture to satisfy both providers' and consumers' requirements in these technologies. We design a new service in the PaaS layer for scheduling consumer tasks. In the providers' perspective, incompatibility between specification of physical machine and user requests in cloud leads to problems such as energy-performance trade-off and large power consumption so that profits are decreased. To guarantee Quality of service of users' tasks, and reduce energy efficiency, we proposed to modify Particle Swarm Optimisation to reallocate migrated virtual machines in the overloaded host. We also dynamically consolidate the under-loaded host which provides power saving. Simulation results in CloudSim demonstrated that whatever simulation condition is near to the real environment, our method is able to save as much as 14% more energy and the number of migrations and simulation time significantly reduces compared with the previous works.
An integrated system for land resources supervision based on the IoT and cloud computing
NASA Astrophysics Data System (ADS)
Fang, Shifeng; Zhu, Yunqiang; Xu, Lida; Zhang, Jinqu; Zhou, Peiji; Luo, Kan; Yang, Jie
2017-01-01
Integrated information systems are important safeguards for the utilisation and development of land resources. Information technologies, including the Internet of Things (IoT) and cloud computing, are inevitable requirements for the quality and efficiency of land resources supervision tasks. In this study, an economical and highly efficient supervision system for land resources has been established based on IoT and cloud computing technologies; a novel online and offline integrated system with synchronised internal and field data that includes the entire process of 'discovering breaches, analysing problems, verifying fieldwork and investigating cases' was constructed. The system integrates key technologies, such as the automatic extraction of high-precision information based on remote sensing, semantic ontology-based technology to excavate and discriminate public sentiment on the Internet that is related to illegal incidents, high-performance parallel computing based on MapReduce, uniform storing and compressing (bitwise) technology, global positioning system data communication and data synchronisation mode, intelligent recognition and four-level ('device, transfer, system and data') safety control technology. The integrated system based on a 'One Map' platform has been officially implemented by the Department of Land and Resources of Guizhou Province, China, and was found to significantly increase the efficiency and level of land resources supervision. The system promoted the overall development of informatisation in fields related to land resource management.
NASA Astrophysics Data System (ADS)
Zhao, Yu; Shi, Chen-Xiao; Kwon, Ki-Chul; Piao, Yan-Ling; Piao, Mei-Lan; Kim, Nam
2018-03-01
We propose a fast calculation method for a computer-generated hologram (CGH) of real objects that uses a point cloud gridding method. The depth information of the scene is acquired using a depth camera and the point cloud model is reconstructed virtually. Because each point of the point cloud is distributed precisely to the exact coordinates of each layer, each point of the point cloud can be classified into grids according to its depth. A diffraction calculation is performed on the grids using a fast Fourier transform (FFT) to obtain a CGH. The computational complexity is reduced dramatically in comparison with conventional methods. The feasibility of the proposed method was confirmed by numerical and optical experiments.
Cloud archiving and data mining of High-Resolution Rapid Refresh forecast model output
NASA Astrophysics Data System (ADS)
Blaylock, Brian K.; Horel, John D.; Liston, Samuel T.
2017-12-01
Weather-related research often requires synthesizing vast amounts of data that need archival solutions that are both economical and viable during and past the lifetime of the project. Public cloud computing services (e.g., from Amazon, Microsoft, or Google) or private clouds managed by research institutions are providing object data storage systems potentially appropriate for long-term archives of such large geophysical data sets. We illustrate the use of a private cloud object store developed by the Center for High Performance Computing (CHPC) at the University of Utah. Since early 2015, we have been archiving thousands of two-dimensional gridded fields (each one containing over 1.9 million values over the contiguous United States) from the High-Resolution Rapid Refresh (HRRR) data assimilation and forecast modeling system. The archive is being used for retrospective analyses of meteorological conditions during high-impact weather events, assessing the accuracy of the HRRR forecasts, and providing initial and boundary conditions for research simulations. The archive is accessible interactively and through automated download procedures for researchers at other institutions that can be tailored by the user to extract individual two-dimensional grids from within the highly compressed files. Characteristics of the CHPC object storage system are summarized relative to network file system storage or tape storage solutions. The CHPC storage system is proving to be a scalable, reliable, extensible, affordable, and usable archive solution for our research.
Arc4nix: A cross-platform geospatial analytical library for cluster and cloud computing
NASA Astrophysics Data System (ADS)
Tang, Jingyin; Matyas, Corene J.
2018-02-01
Big Data in geospatial technology is a grand challenge for processing capacity. The ability to use a GIS for geospatial analysis on Cloud Computing and High Performance Computing (HPC) clusters has emerged as a new approach to provide feasible solutions. However, users lack the ability to migrate existing research tools to a Cloud Computing or HPC-based environment because of the incompatibility of the market-dominating ArcGIS software stack and Linux operating system. This manuscript details a cross-platform geospatial library "arc4nix" to bridge this gap. Arc4nix provides an application programming interface compatible with ArcGIS and its Python library "arcpy". Arc4nix uses a decoupled client-server architecture that permits geospatial analytical functions to run on the remote server and other functions to run on the native Python environment. It uses functional programming and meta-programming language to dynamically construct Python codes containing actual geospatial calculations, send them to a server and retrieve results. Arc4nix allows users to employ their arcpy-based script in a Cloud Computing and HPC environment with minimal or no modification. It also supports parallelizing tasks using multiple CPU cores and nodes for large-scale analyses. A case study of geospatial processing of a numerical weather model's output shows that arcpy scales linearly in a distributed environment. Arc4nix is open-source software.
Beating the tyranny of scale with a private cloud configured for Big Data
NASA Astrophysics Data System (ADS)
Lawrence, Bryan; Bennett, Victoria; Churchill, Jonathan; Juckes, Martin; Kershaw, Philip; Pepler, Sam; Pritchard, Matt; Stephens, Ag
2015-04-01
The Joint Analysis System, JASMIN, consists of a five significant hardware components: a batch computing cluster, a hypervisor cluster, bulk disk storage, high performance disk storage, and access to a tape robot. Each of the computing clusters consists of a heterogeneous set of servers, supporting a range of possible data analysis tasks - and a unique network environment makes it relatively trivial to migrate servers between the two clusters. The high performance disk storage will include the world's largest (publicly visible) deployment of the Panasas parallel disk system. Initially deployed in April 2012, JASMIN has already undergone two major upgrades, culminating in a system which by April 2015, will have in excess of 16 PB of disk and 4000 cores. Layered on the basic hardware are a range of services, ranging from managed services, such as the curated archives of the Centre for Environmental Data Archival or the data analysis environment for the National Centres for Atmospheric Science and Earth Observation, to a generic Infrastructure as a Service (IaaS) offering for the UK environmental science community. Here we present examples of some of the big data workloads being supported in this environment - ranging from data management tasks, such as checksumming 3 PB of data held in over one hundred million files, to science tasks, such as re-processing satellite observations with new algorithms, or calculating new diagnostics on petascale climate simulation outputs. We will demonstrate how the provision of a cloud environment closely coupled to a batch computing environment, all sharing the same high performance disk system allows massively parallel processing without the necessity to shuffle data excessively - even as it supports many different virtual communities, each with guaranteed performance. We will discuss the advantages of having a heterogeneous range of servers with available memory from tens of GB at the low end to (currently) two TB at the high end. There are some limitations of the JASMIN environment, the high performance disk environment is not fully available in the IaaS environment, and a planned ability to burst compute heavy jobs into the public cloud is not yet fully available. There are load balancing and performance issues that need to be understood. We will conclude with projections for future usage, and our plans to meet those requirements.
Monte Carlo simulation of photon migration in a cloud computing environment with MapReduce
Pratx, Guillem; Xing, Lei
2011-01-01
Monte Carlo simulation is considered the most reliable method for modeling photon migration in heterogeneous media. However, its widespread use is hindered by the high computational cost. The purpose of this work is to report on our implementation of a simple MapReduce method for performing fault-tolerant Monte Carlo computations in a massively-parallel cloud computing environment. We ported the MC321 Monte Carlo package to Hadoop, an open-source MapReduce framework. In this implementation, Map tasks compute photon histories in parallel while a Reduce task scores photon absorption. The distributed implementation was evaluated on a commercial compute cloud. The simulation time was found to be linearly dependent on the number of photons and inversely proportional to the number of nodes. For a cluster size of 240 nodes, the simulation of 100 billion photon histories took 22 min, a 1258 × speed-up compared to the single-threaded Monte Carlo program. The overall computational throughput was 85,178 photon histories per node per second, with a latency of 100 s. The distributed simulation produced the same output as the original implementation and was resilient to hardware failure: the correctness of the simulation was unaffected by the shutdown of 50% of the nodes. PMID:22191916
NASA Astrophysics Data System (ADS)
Watari, S.; Morikawa, Y.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Kato, H.; Shimojo, S.; Murata, K. T.
2010-12-01
In the Solar-Terrestrial Physics (STP) field, spatio-temporal resolution of computer simulations is getting higher and higher because of tremendous advancement of supercomputers. A more advanced technology is Grid Computing that integrates distributed computational resources to provide scalable computing resources. In the simulation research, it is effective that a researcher oneself designs his physical model, performs calculations with a supercomputer, and analyzes and visualizes for consideration by a familiar method. A supercomputer is far from an analysis and visualization environment. In general, a researcher analyzes and visualizes in the workstation (WS) managed at hand because the installation and the operation of software in the WS are easy. Therefore, it is necessary to copy the data from the supercomputer to WS manually. Time necessary for the data transfer through long delay network disturbs high-accuracy simulations actually. In terms of usefulness, integrating a supercomputer and an analysis and visualization environment seamlessly with a researcher's familiar method is important. NICT has been developing a cloud computing environment (NICT Space Weather Cloud). In the NICT Space Weather Cloud, disk servers are located near its supercomputer and WSs for data analysis and visualization. They are connected to JGN2plus that is high-speed network for research and development. Distributed virtual high-capacity storage is also constructed by Grid Datafarm (Gfarm v2). Huge-size data output from the supercomputer is transferred to the virtual storage through JGN2plus. A researcher can concentrate on the research by a familiar method without regard to distance between a supercomputer and an analysis and visualization environment. Now, total 16 disk servers are setup in NICT headquarters (at Koganei, Tokyo), JGN2plus NOC (at Otemachi, Tokyo), Okinawa Subtropical Environment Remote-Sensing Center, and Cybermedia Center, Osaka University. They are connected on JGN2plus, and they constitute 1PB (physical size) virtual storage by Gfarm v2. These disk servers are connected with supercomputers of NICT and Osaka University. A system that data output from the supercomputers are automatically transferred to the virtual storage had been built up. Transfer rate is about 50 GB/hrs by actual measurement. It is estimated that the performance is reasonable for a certain simulation and analysis for reconstruction of coronal magnetic field. This research is assumed an experiment of the system, and the verification of practicality is advanced at the same time. Herein we introduce an overview of the space weather cloud system so far we have developed. We also demonstrate several scientific results using the space weather cloud system. We also introduce several web applications of the cloud as a service of the space weather cloud, which is named as "e-SpaceWeather" (e-SW). The e-SW provides with a variety of space weather online services from many aspects.
NASA Astrophysics Data System (ADS)
Perez Montes, Diego A.; Añel Cabanelas, Juan A.; Wallom, David C. H.; Arribas, Alberto; Uhe, Peter; Caderno, Pablo V.; Pena, Tomas F.
2017-04-01
Cloud Computing is a technological option that offers great possibilities for modelling in geosciences. We have studied how two different climate models, HadAM3P-HadRM3P and CESM-WACCM, can be adapted in two different ways to run on Cloud Computing Environments from three different vendors: Amazon, Google and Microsoft. Also, we have evaluated qualitatively how the use of Cloud Computing can affect the allocation of resources by funding bodies and issues related to computing security, including scientific reproducibility. Our first experiments were developed using the well known ClimatePrediction.net (CPDN), that uses BOINC, over the infrastructure from two cloud providers, namely Microsoft Azure and Amazon Web Services (hereafter AWS). For this comparison we ran a set of thirteen month climate simulations for CPDN in Azure and AWS using a range of different virtual machines (VMs) for HadRM3P (50 km resolution over South America CORDEX region) nested in the global atmosphere-only model HadAM3P. These simulations were run on a single processor and took between 3 and 5 days to compute depending on the VM type. The last part of our simulation experiments was running WACCM over different VMS on the Google Compute Engine (GCE) and make a comparison with the supercomputer (SC) Finisterrae1 from the Centro de Supercomputacion de Galicia. It was shown that GCE gives better performance than the SC for smaller number of cores/MPI tasks but the model throughput shows clearly how the SC performance is better after approximately 100 cores (related with network speed and latency differences). From a cost point of view, Cloud Computing moves researchers from a traditional approach where experiments were limited by the available hardware resources to monetary resources (how many resources can be afforded). As there is an increasing movement and recommendation for budgeting HPC projects on this technology (budgets can be calculated in a more realistic way) we could see a shift on the trends over the next years to consolidate Cloud as the preferred solution.
Machine learning patterns for neuroimaging-genetic studies in the cloud.
Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand
2014-01-01
Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.
Polyphony: A Workflow Orchestration Framework for Cloud Computing
NASA Technical Reports Server (NTRS)
Shams, Khawaja S.; Powell, Mark W.; Crockett, Tom M.; Norris, Jeffrey S.; Rossi, Ryan; Soderstrom, Tom
2010-01-01
Cloud Computing has delivered unprecedented compute capacity to NASA missions at affordable rates. Missions like the Mars Exploration Rovers (MER) and Mars Science Lab (MSL) are enjoying the elasticity that enables them to leverage hundreds, if not thousands, or machines for short durations without making any hardware procurements. In this paper, we describe Polyphony, a resilient, scalable, and modular framework that efficiently leverages a large set of computing resources to perform parallel computations. Polyphony can employ resources on the cloud, excess capacity on local machines, as well as spare resources on the supercomputing center, and it enables these resources to work in concert to accomplish a common goal. Polyphony is resilient to node failures, even if they occur in the middle of a transaction. We will conclude with an evaluation of a production-ready application built on top of Polyphony to perform image-processing operations of images from around the solar system, including Mars, Saturn, and Titan.
Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects
Clematis, Andrea; Quarati, Alfonso; Cesini, Daniele; Milanesi, Luciano; Merelli, Ivan
2013-01-01
Cloud computing opens new perspectives for small-medium biotechnology laboratories that need to perform bioinformatics analysis in a flexible and effective way. This seems particularly true for hybrid clouds that couple the scalability offered by general-purpose public clouds with the greater control and ad hoc customizations supplied by the private ones. A hybrid cloud broker, acting as an intermediary between users and public providers, can support customers in the selection of the most suitable offers, optionally adding the provisioning of dedicated services with higher levels of quality. This paper analyses some economic and practical aspects of exploiting cloud computing in a real research scenario for the in silico drug discovery in terms of requirements, costs, and computational load based on the number of expected users. In particular, our work is aimed at supporting both the researchers and the cloud broker delivering an IaaS cloud infrastructure for biotechnology laboratories exposing different levels of nonfunctional requirements. PMID:24106693
NASA Astrophysics Data System (ADS)
Yang, Wei; Hall, Trevor J.
2013-12-01
The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users. As a consequence, the nature of the Internet traffic has been fundamentally transformed from a pure packet-based pattern to today's predominantly flow-based pattern. Cloud computing has also brought about an unprecedented growth in the Internet traffic. In this paper, a hybrid optical switch architecture is presented to deal with the flow-based Internet traffic, aiming to offer flexible and intelligent bandwidth on demand to improve fiber capacity utilization. The hybrid optical switch is capable of integrating IP into optical networks for cloud-based traffic with predictable performance, for which the delay performance of the electronic module in the hybrid optical switch architecture is evaluated through simulation.
Integration of Cloud resources in the LHCb Distributed Computing
NASA Astrophysics Data System (ADS)
Úbeda García, Mario; Méndez Muñoz, Víctor; Stagni, Federico; Cabarrou, Baptiste; Rauschmayr, Nathalie; Charpentier, Philippe; Closier, Joel
2014-06-01
This contribution describes how Cloud resources have been integrated in the LHCb Distributed Computing. LHCb is using its specific Dirac extension (LHCbDirac) as an interware for its Distributed Computing. So far, it was seamlessly integrating Grid resources and Computer clusters. The cloud extension of DIRAC (VMDIRAC) allows the integration of Cloud computing infrastructures. It is able to interact with multiple types of infrastructures in commercial and institutional clouds, supported by multiple interfaces (Amazon EC2, OpenNebula, OpenStack and CloudStack) - instantiates, monitors and manages Virtual Machines running on this aggregation of Cloud resources. Moreover, specifications for institutional Cloud resources proposed by Worldwide LHC Computing Grid (WLCG), mainly by the High Energy Physics Unix Information Exchange (HEPiX) group, have been taken into account. Several initiatives and computing resource providers in the eScience environment have already deployed IaaS in production during 2013. Keeping this on mind, pros and cons of a cloud based infrasctructure have been studied in contrast with the current setup. As a result, this work addresses four different use cases which represent a major improvement on several levels of our infrastructure. We describe the solution implemented by LHCb for the contextualisation of the VMs based on the idea of Cloud Site. We report on operational experience of using in production several institutional Cloud resources that are thus becoming integral part of the LHCb Distributed Computing resources. Furthermore, we describe as well the gradual migration of our Service Infrastructure towards a fully distributed architecture following the Service as a Service (SaaS) model.
Comparison of the different approaches to generate holograms from data acquired with a Kinect sensor
NASA Astrophysics Data System (ADS)
Kang, Ji-Hoon; Leportier, Thibault; Ju, Byeong-Kwon; Song, Jin Dong; Lee, Kwang-Hoon; Park, Min-Chul
2017-05-01
Data of real scenes acquired in real-time with a Kinect sensor can be processed with different approaches to generate a hologram. 3D models can be generated from a point cloud or a mesh representation. The advantage of the point cloud approach is that computation process is well established since it involves only diffraction and propagation of point sources between parallel planes. On the other hand, the mesh representation enables to reduce the number of elements necessary to represent the object. Then, even though the computation time for the contribution of a single element increases compared to a simple point, the total computation time can be reduced significantly. However, the algorithm is more complex since propagation of elemental polygons between non-parallel planes should be implemented. Finally, since a depth map of the scene is acquired at the same time than the intensity image, a depth layer approach can also be adopted. This technique is appropriate for a fast computation since propagation of an optical wavefront from one plane to another can be handled efficiently with the fast Fourier transform. Fast computation with depth layer approach is convenient for real time applications, but point cloud method is more appropriate when high resolution is needed. In this study, since Kinect can be used to obtain both point cloud and depth map, we examine the different approaches that can be adopted for hologram computation and compare their performance.
Genomic cloud computing: legal and ethical points to consider
Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Burton, Paul; Chisholm, Rex; Fortier, Isabel; Goodwin, Pat; Harris, Jennifer; Hveem, Kristian; Kaye, Jane; Kent, Alistair; Knoppers, Bartha Maria; Lindpaintner, Klaus; Little, Julian; Riegman, Peter; Ripatti, Samuli; Stolk, Ronald; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn; Joly, Yann; Kato, Kazuto; Knoppers, Bartha Maria; Rodriguez, Laura Lyman; McPherson, Treasa; Nicolás, Pilar; Ouellette, Francis; Romeo-Casabona, Carlos; Sarin, Rajiv; Wallace, Susan; Wiesner, Georgia; Wilson, Julia; Zeps, Nikolajs; Simkevitz, Howard; De Rienzo, Assunta; Knoppers, Bartha M
2015-01-01
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key ‘points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These ‘points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure. PMID:25248396
Genomic cloud computing: legal and ethical points to consider.
Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Knoppers, Bartha M
2015-10-01
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key 'points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These 'points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure.
1984-07-01
aerosols and sub pixel-sized clouds all tend to increase Channel 1 with respect to Channel 2 and reduce the computed VIN. Further, the Guide states that... computation of the VIN. Large scale cloud contamination of pixels, while diffi- cult to correct for, can at least be monitored and affected pixels...techniques have been developed for computer cloud screening. See, for example, Horvath et al. (1982), Gray and McCrary (1981a) and Nixon et al. (1983
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Shawn
This code consists of Matlab routines which enable the user to perform non-manifold surface reconstruction via triangulation from high dimensional point cloud data. The code was based on an algorithm originally developed in [Freedman (2007), An Incremental Algorithm for Reconstruction of Surfaces of Arbitrary Codimension Computational Geometry: Theory and Applications, 36(2):106-116]. This algorithm has been modified to accommodate non-manifold surface according to the work described in [S. Martin and J.-P. Watson (2009), Non-Manifold Surface Reconstruction from High Dimensional Point Cloud DataSAND #5272610].The motivation for developing the code was a point cloud describing the molecular conformation space of cyclooctane (C8H16). Cyclooctanemore » conformation space was represented using points in 72 dimensions (3 coordinates for each molecule). The code was used to triangulate the point cloud and thereby study the geometry and topology of cyclooctane. Futures applications are envisioned for peptides and proteins.« less
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system
NASA Astrophysics Data System (ADS)
Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.
2014-06-01
The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
COMBAT: mobile-Cloud-based cOmpute/coMmunications infrastructure for BATtlefield applications
NASA Astrophysics Data System (ADS)
Soyata, Tolga; Muraleedharan, Rajani; Langdon, Jonathan; Funai, Colin; Ames, Scott; Kwon, Minseok; Heinzelman, Wendi
2012-05-01
The amount of data processed annually over the Internet has crossed the zetabyte boundary, yet this Big Data cannot be efficiently processed or stored using today's mobile devices. Parallel to this explosive growth in data, a substantial increase in mobile compute-capability and the advances in cloud computing have brought the state-of-the- art in mobile-cloud computing to an inflection point, where the right architecture may allow mobile devices to run applications utilizing Big Data and intensive computing. In this paper, we propose the MObile Cloud-based Hybrid Architecture (MOCHA), which formulates a solution to permit mobile-cloud computing applications such as object recognition in the battlefield by introducing a mid-stage compute- and storage-layer, called the cloudlet. MOCHA is built on the key observation that many mobile-cloud applications have the following characteristics: 1) they are compute-intensive, requiring the compute-power of a supercomputer, and 2) they use Big Data, requiring a communications link to cloud-based database sources in near-real-time. In this paper, we describe the operation of MOCHA in battlefield applications, by formulating the aforementioned mobile and cloudlet to be housed within a soldier's vest and inside a military vehicle, respectively, and enabling access to the cloud through high latency satellite links. We provide simulations using the traditional mobile-cloud approach as well as utilizing MOCHA with a mid-stage cloudlet to quantify the utility of this architecture. We show that the MOCHA platform for mobile-cloud computing promises a future for critical battlefield applications that access Big Data, which is currently not possible using existing technology.
Climbing the Slope of Enlightenment during NASA's Arctic Boreal Vulnerability Experiment
NASA Astrophysics Data System (ADS)
Griffith, P. C.; Hoy, E.; Duffy, D.; McInerney, M.
2015-12-01
The Arctic Boreal Vulnerability Experiment (ABoVE) is a new field campaign sponsored by NASA's Terrestrial Ecology Program and designed to improve understanding of the vulnerability and resilience of Arctic and boreal social-ecological systems to environmental change (http://above.nasa.gov). ABoVE is integrating field-based studies, modeling, and data from airborne and satellite remote sensing. The NASA Center for Climate Simulation (NCCS) has partnered with the NASA Carbon Cycle and Ecosystems Office (CCEO) to create a high performance science cloud for this field campaign. The ABoVE Science Cloud combines high performance computing with emerging technologies and data management with tools for analyzing and processing geographic information to create an environment specifically designed for large-scale modeling, analysis of remote sensing data, copious disk storage for "big data" with integrated data management, and integration of core variables from in-situ networks. The ABoVE Science Cloud is a collaboration that is accelerating the pace of new Arctic science for researchers participating in the field campaign. Specific examples of the utilization of the ABoVE Science Cloud by several funded projects will be presented.
Cloud-Based Numerical Weather Prediction for Near Real-Time Forecasting and Disaster Response
NASA Technical Reports Server (NTRS)
Molthan, Andrew; Case, Jonathan; Venners, Jason; Schroeder, Richard; Checchi, Milton; Zavodsky, Bradley; Limaye, Ashutosh; O'Brien, Raymond
2015-01-01
The use of cloud computing resources continues to grow within the public and private sector components of the weather enterprise as users become more familiar with cloud-computing concepts, and competition among service providers continues to reduce costs and other barriers to entry. Cloud resources can also provide capabilities similar to high-performance computing environments, supporting multi-node systems required for near real-time, regional weather predictions. Referred to as "Infrastructure as a Service", or IaaS, the use of cloud-based computing hardware in an on-demand payment system allows for rapid deployment of a modeling system in environments lacking access to a large, supercomputing infrastructure. Use of IaaS capabilities to support regional weather prediction may be of particular interest to developing countries that have not yet established large supercomputing resources, but would otherwise benefit from a regional weather forecasting capability. Recently, collaborators from NASA Marshall Space Flight Center and Ames Research Center have developed a scripted, on-demand capability for launching the NOAA/NWS Science and Training Resource Center (STRC) Environmental Modeling System (EMS), which includes pre-compiled binaries of the latest version of the Weather Research and Forecasting (WRF) model. The WRF-EMS provides scripting for downloading appropriate initial and boundary conditions from global models, along with higher-resolution vegetation, land surface, and sea surface temperature data sets provided by the NASA Short-term Prediction Research and Transition (SPoRT) Center. This presentation will provide an overview of the modeling system capabilities and benchmarks performed on the Amazon Elastic Compute Cloud (EC2) environment. In addition, the presentation will discuss future opportunities to deploy the system in support of weather prediction in developing countries supported by NASA's SERVIR Project, which provides capacity building activities in environmental monitoring and prediction across a growing number of regional hubs throughout the world. Capacity-building applications that extend numerical weather prediction to developing countries are intended to provide near real-time applications to benefit public health, safety, and economic interests, but may have a greater impact during disaster events by providing a source for local predictions of weather-related hazards, or impacts that local weather events may have during the recovery phase.
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.
Zhao, Yongan; Wang, Xiaofeng; Tang, Haixu
2018-05-09
The elastic and inexpensive computing resources such as clouds have been recognized as a useful solution to analyzing massive human genomic data (e.g., acquired by using next-generation sequencers) in biomedical researches. However, outsourcing human genome computation to public or commercial clouds was hindered due to privacy concerns: even a small number of human genome sequences contain sufficient information for identifying the donor of the genomic data. This issue cannot be directly addressed by existing security and cryptographic techniques (such as homomorphic encryption), because they are too heavyweight to carry out practical genome computation tasks on massive data. In this article, we present a secure algorithm to accomplish the read mapping, one of the most basic tasks in human genomic data analysis based on a hybrid cloud computing model. Comparing with the existing approaches, our algorithm delegates most computation to the public cloud, while only performing encryption and decryption on the private cloud, and thus makes the maximum use of the computing resource of the public cloud. Furthermore, our algorithm reports similar results as the nonsecure read mapping algorithms, including the alignment between reads and the reference genome, which can be directly used in the downstream analysis such as the inference of genomic variations. We implemented the algorithm in C++ and Python on a hybrid cloud system, in which the public cloud uses an Apache Spark system.
Abdullahi, Mohammed; Ngadi, Md Asri
2016-01-01
Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS) has been shown to perform competitively with Particle Swarm Optimization (PSO). The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA) based SOS (SASOS) in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs) which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan.
Abdullahi, Mohammed; Ngadi, Md Asri
2016-01-01
Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS) has been shown to perform competitively with Particle Swarm Optimization (PSO). The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA) based SOS (SASOS) in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs) which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan. PMID:27348127
Infrastructures for Distributed Computing: the case of BESIII
NASA Astrophysics Data System (ADS)
Pellegrino, J.
2018-05-01
The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.
Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.
Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav
2012-01-01
Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences
Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker
2016-01-01
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627
Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.
Abouelhoda, Mohamed; Issa, Shadi Alaa; Ghanem, Moustafa
2012-05-04
Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.
NASA Astrophysics Data System (ADS)
Furht, Borko
In the introductory chapter we define the concept of cloud computing and cloud services, and we introduce layers and types of cloud computing. We discuss the differences between cloud computing and cloud services. New technologies that enabled cloud computing are presented next. We also discuss cloud computing features, standards, and security issues. We introduce the key cloud computing platforms, their vendors, and their offerings. We discuss cloud computing challenges and the future of cloud computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Legal issues in clouds: towards a risk inventory.
Djemame, Karim; Barnitzke, Benno; Corrales, Marcelo; Kiran, Mariam; Jiang, Ming; Armstrong, Django; Forgó, Nikolaus; Nwankwo, Iheanyi
2013-01-28
Cloud computing technologies have reached a high level of development, yet a number of obstacles still exist that must be overcome before widespread commercial adoption can become a reality. In a cloud environment, end users requesting services and cloud providers negotiate service-level agreements (SLAs) that provide explicit statements of all expectations and obligations of the participants. If cloud computing is to experience widespread commercial adoption, then incorporating risk assessment techniques is essential during SLA negotiation and service operation. This article focuses on the legal issues surrounding risk assessment in cloud computing. Specifically, it analyses risk regarding data protection and security, and presents the requirements of an inherent risk inventory. The usefulness of such a risk inventory is described in the context of the OPTIMIS project.
Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening.
Karthikeyan, Muthukumarasamy; Pandit, Deepak; Bhavasar, Arvind; Vyas, Renu
2015-01-01
The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platorm ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.
JINR cloud infrastructure evolution
NASA Astrophysics Data System (ADS)
Baranov, A. V.; Balashov, N. A.; Kutovskiy, N. A.; Semenov, R. N.
2016-09-01
To fulfil JINR commitments in different national and international projects related to the use of modern information technologies such as cloud and grid computing as well as to provide a modern tool for JINR users for their scientific research a cloud infrastructure was deployed at Laboratory of Information Technologies of Joint Institute for Nuclear Research. OpenNebula software was chosen as a cloud platform. Initially it was set up in simple configuration with single front-end host and a few cloud nodes. Some custom development was done to tune JINR cloud installation to fit local needs: web form in the cloud web-interface for resources request, a menu item with cloud utilization statistics, user authentication via Kerberos, custom driver for OpenVZ containers. Because of high demand in that cloud service and its resources over-utilization it was re-designed to cover increasing users' needs in capacity, availability and reliability. Recently a new cloud instance has been deployed in high-availability configuration with distributed network file system and additional computing power.
Using the cloud to speed-up calibration of watershed-scale hydrologic models (Invited)
NASA Astrophysics Data System (ADS)
Goodall, J. L.; Ercan, M. B.; Castronova, A. M.; Humphrey, M.; Beekwilder, N.; Steele, J.; Kim, I.
2013-12-01
This research focuses on using the cloud to address computational challenges associated with hydrologic modeling. One example is calibration of a watershed-scale hydrologic model, which can take days of execution time on typical computers. While parallel algorithms for model calibration exist and some researchers have used multi-core computers or clusters to run these algorithms, these solutions do not fully address the challenge because (i) calibration can still be too time consuming even on multicore personal computers and (ii) few in the community have the time and expertise needed to manage a compute cluster. Given this, another option for addressing this challenge that we are exploring through this work is the use of the cloud for speeding-up calibration of watershed-scale hydrologic models. The cloud used in this capacity provides a means for renting a specific number and type of machines for only the time needed to perform a calibration model run. The cloud allows one to precisely balance the duration of the calibration with the financial costs so that, if the budget allows, the calibration can be performed more quickly by renting more machines. Focusing specifically on the SWAT hydrologic model and a parallel version of the DDS calibration algorithm, we show significant speed-up time across a range of watershed sizes using up to 256 cores to perform a model calibration. The tool provides a simple web-based user interface and the ability to monitor the calibration job submission process during the calibration process. Finally this talk concludes with initial work to leverage the cloud for other tasks associated with hydrologic modeling including tasks related to preparing inputs for constructing place-based hydrologic models.
On-demand Simulation of Atmospheric Transport Processes on the AlpEnDAC Cloud
NASA Astrophysics Data System (ADS)
Hachinger, S.; Harsch, C.; Meyer-Arnek, J.; Frank, A.; Heller, H.; Giemsa, E.
2016-12-01
The "Alpine Environmental Data Analysis Centre" (AlpEnDAC) develops a data-analysis platform for high-altitude research facilities within the "Virtual Alpine Observatory" project (VAO). This platform, with its web portal, will support use cases going much beyond data management: On user request, the data are augmented with "on-demand" simulation results, such as air-parcel trajectories for tracing down the source of pollutants when they appear in high concentration. The respective back-end mechanism uses the Compute Cloud of the Leibniz Supercomputing Centre (LRZ) to transparently calculate results requested by the user, as far as they have not yet been stored in AlpEnDAC. The queuing-system operation model common in supercomputing is replaced by a model in which Virtual Machines (VMs) on the cloud are automatically created/destroyed, providing the necessary computing power immediately on demand. From a security point of view, this allows to perform simulations in a sandbox defined by the VM configuration, without direct access to a computing cluster. Within few minutes, the user receives conveniently visualized results. The AlpEnDAC infrastructure is distributed among two participating institutes [front-end at German Aerospace Centre (DLR), simulation back-end at LRZ], requiring an efficient mechanism for synchronization of measured and augmented data. We discuss our iRODS-based solution for these data-management tasks as well as the general AlpEnDAC framework. Our cloud-based offerings aim at making scientific computing for our users much more convenient and flexible than it has been, and to allow scientists without a broad background in scientific computing to benefit from complex numerical simulations.
Cloud computing for comparative genomics
2010-01-01
Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786
Application of microarray analysis on computer cluster and cloud platforms.
Bernau, C; Boulesteix, A-L; Knaus, J
2013-01-01
Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Cloud computing for comparative genomics.
Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J
2010-05-18
Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.
Bioinformatics clouds for big data manipulation.
Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang
2012-11-28
As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.
Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing
Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav
2012-01-01
Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows. PMID:23248640
Cloud based intelligent system for delivering health care as a service.
Kaur, Pankaj Deep; Chana, Inderveer
2014-01-01
The promising potential of cloud computing and its convergence with technologies such as mobile computing, wireless networks, sensor technologies allows for creation and delivery of newer type of cloud services. In this paper, we advocate the use of cloud computing for the creation and management of cloud based health care services. As a representative case study, we design a Cloud Based Intelligent Health Care Service (CBIHCS) that performs real time monitoring of user health data for diagnosis of chronic illness such as diabetes. Advance body sensor components are utilized to gather user specific health data and store in cloud based storage repositories for subsequent analysis and classification. In addition, infrastructure level mechanisms are proposed to provide dynamic resource elasticity for CBIHCS. Experimental results demonstrate that classification accuracy of 92.59% is achieved with our prototype system and the predicted patterns of CPU usage offer better opportunities for adaptive resource elasticity. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Cloud-based hospital information system as a service for grassroots healthcare institutions.
Yao, Qin; Han, Xiong; Ma, Xi-Kun; Xue, Yi-Feng; Chen, Yi-Jun; Li, Jing-Song
2014-09-01
Grassroots healthcare institutions (GHIs) are the smallest administrative levels of medical institutions, where most patients access health services. The latest report from the National Bureau of Statistics of China showed that 96.04 % of 950,297 medical institutions in China were at the grassroots level in 2012, including county-level hospitals, township central hospitals, community health service centers, and rural clinics. In developing countries, these institutions are facing challenges involving a shortage of funds and talent, inconsistent medical standards, inefficient information sharing, and difficulties in management during the adoption of health information technologies (HIT). Because of the necessity and gravity for GHIs, our aim is to provide hospital information services for GHIs using Cloud computing technologies and service modes. In this medical scenario, the computing resources are pooled by means of a Cloud-based Virtual Desktop Infrastructure (VDI) to serve multiple GHIs, with different hospital information systems dynamically assigned and reassigned according to demand. This paper is concerned with establishing a Cloud-based Hospital Information Service Center to provide hospital information software as a service (HI-SaaS) with the aim of providing GHIs with an attractive and high-performance medical information service. Compared with individually establishing all hospital information systems, this approach is more cost-effective and affordable for GHIs and does not compromise HIT performance.
Implementation of Virtualization Oriented Architecture: A Healthcare Industry Case Study
NASA Astrophysics Data System (ADS)
Rao, G. Subrahmanya Vrk; Parthasarathi, Jinka; Karthik, Sundararaman; Rao, Gvn Appa; Ganesan, Suresh
This paper presents a Virtualization Oriented Architecture (VOA) and an implementation of VOA for Hridaya - a Telemedicine initiative. Hadoop Compute cloud was established at our labs and jobs which require a massive computing capability such as ECG signal analysis were submitted and the study is presented in this current paper. VOA takes advantage of inexpensive community PCs and provides added advantages such as Fault Tolerance, Scalability, Performance, High Availability.
Unidata's Vision for Transforming Geoscience by Moving Data Services and Software to the Cloud
NASA Astrophysics Data System (ADS)
Ramamurthy, Mohan; Fisher, Ward; Yoksas, Tom
2015-04-01
Universities are facing many challenges: shrinking budgets, rapidly evolving information technologies, exploding data volumes, multidisciplinary science requirements, and high expectations from students who have grown up with smartphones and tablets. These changes are upending traditional approaches to accessing and using data and software. Unidata recognizes that its products and services must evolve to support new approaches to research and education. After years of hype and ambiguity, cloud computing is maturing in usability in many areas of science and education, bringing the benefits of virtualized and elastic remote services to infrastructure, software, computation, and data. Cloud environments reduce the amount of time and money spent to procure, install, and maintain new hardware and software, and reduce costs through resource pooling and shared infrastructure. Cloud services aimed at providing any resource, at any time, from any place, using any device are increasingly being embraced by all types of organizations. Given this trend and the enormous potential of cloud-based services, Unidata is taking moving to augment its products, services, data delivery mechanisms and applications to align with the cloud-computing paradigm. Specifically, Unidata is working toward establishing a community-based development environment that supports the creation and use of software services to build end-to-end data workflows. The design encourages the creation of services that can be broken into small, independent chunks that provide simple capabilities. Chunks could be used individually to perform a task, or chained into simple or elaborate workflows. The services will also be portable in the form of downloadable Unidata-in-a-box virtual images, allowing their use in researchers' own cloud-based computing environments. In this talk, we present a vision for Unidata's future in a cloud-enabled data services and discuss our ongoing efforts to deploy a suite of Unidata data services and tools in the Amazon EC2 and Microsoft Azure cloud environments, including the transfer of real-time meteorological data into its cloud instances, product generation using those data, and the deployment of TDS, McIDAS ADDE and AWIPS II data servers and the Integrated Data Server visualization tool.
Interfacing HTCondor-CE with OpenStack
NASA Astrophysics Data System (ADS)
Bockelman, B.; Caballero Bejar, J.; Hover, J.
2017-10-01
Over the past few years, Grid Computing technologies have reached a high level of maturity. One key aspect of this success has been the development and adoption of newer Compute Elements to interface the external Grid users with local batch systems. These new Compute Elements allow for better handling of jobs requirements and a more precise management of diverse local resources. However, despite this level of maturity, the Grid Computing world is lacking diversity in local execution platforms. As Grid Computing technologies have historically been driven by the needs of the High Energy Physics community, most resource providers run the platform (operating system version and architecture) that best suits the needs of their particular users. In parallel, the development of virtualization and cloud technologies has accelerated recently, making available a variety of solutions, both commercial and academic, proprietary and open source. Virtualization facilitates performing computational tasks on platforms not available at most computing sites. This work attempts to join the technologies, allowing users to interact with computing sites through one of the standard Computing Elements, HTCondor-CE, but running their jobs within VMs on a local cloud platform, OpenStack, when needed. The system will re-route, in a transparent way, end user jobs into dynamically-launched VM worker nodes when they have requirements that cannot be satisfied by the static local batch system nodes. Also, once the automated mechanisms are in place, it becomes straightforward to allow an end user to invoke a custom Virtual Machine at the site. This will allow cloud resources to be used without requiring the user to establish a separate account. Both scenarios are described in this work.
The International Symposium on Grids and Clouds
NASA Astrophysics Data System (ADS)
The International Symposium on Grids and Clouds (ISGC) 2012 will be held at Academia Sinica in Taipei from 26 February to 2 March 2012, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). 2012 is the decennium anniversary of the ISGC which over the last decade has tracked the convergence, collaboration and innovation of individual researchers across the Asia Pacific region to a coherent community. With the continuous support and dedication from the delegates, ISGC has provided the primary international distributed computing platform where distinguished researchers and collaboration partners from around the world share their knowledge and experiences. The last decade has seen the wide-scale emergence of e-Infrastructure as a critical asset for the modern e-Scientist. The emergence of large-scale research infrastructures and instruments that has produced a torrent of electronic data is forcing a generational change in the scientific process and the mechanisms used to analyse the resulting data deluge. No longer can the processing of these vast amounts of data and production of relevant scientific results be undertaken by a single scientist. Virtual Research Communities that span organisations around the world, through an integrated digital infrastructure that connects the trust and administrative domains of multiple resource providers, have become critical in supporting these analyses. Topics covered in ISGC 2012 include: High Energy Physics, Biomedicine & Life Sciences, Earth Science, Environmental Changes and Natural Disaster Mitigation, Humanities & Social Sciences, Operations & Management, Middleware & Interoperability, Security and Networking, Infrastructure Clouds & Virtualisation, Business Models & Sustainability, Data Management, Distributed Volunteer & Desktop Grid Computing, High Throughput Computing, and High Performance, Manycore & GPU Computing.
Enterprise Cloud Architecture for Chinese Ministry of Railway
NASA Astrophysics Data System (ADS)
Shan, Xumei; Liu, Hefeng
Enterprise like PRC Ministry of Railways (MOR), is facing various challenges ranging from highly distributed computing environment and low legacy system utilization, Cloud Computing is increasingly regarded as one workable solution to address this. This article describes full scale cloud solution with Intel Tashi as virtual machine infrastructure layer, Hadoop HDFS as computing platform, and self developed SaaS interface, gluing virtual machine and HDFS with Xen hypervisor. As a result, on demand computing task application and deployment have been tackled per MOR real working scenarios at the end of article.
Dynamic VM Provisioning for TORQUE in a Cloud Environment
NASA Astrophysics Data System (ADS)
Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.
2014-06-01
Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.
TOSCA-based orchestration of complex clusters at the IaaS level
NASA Astrophysics Data System (ADS)
Caballer, M.; Donvito, G.; Moltó, G.; Rocha, R.; Velten, M.
2017-10-01
This paper describes the adoption and extension of the TOSCA standard by the INDIGO-DataCloud project for the definition and deployment of complex computing clusters together with the required support in both OpenStack and OpenNebula, carried out in close collaboration with industry partners such as IBM. Two examples of these clusters are described in this paper, the definition of an elastic computing cluster to support the Galaxy bioinformatics application where the nodes are dynamically added and removed from the cluster to adapt to the workload, and the definition of an scalable Apache Mesos cluster for the execution of batch jobs and support for long-running services. The coupling of TOSCA with Ansible Roles to perform automated installation has resulted in the definition of high-level, deterministic templates to provision complex computing clusters across different Cloud sites.
A Cost-Benefit Study of Doing Astrophysics On The Cloud: Production of Image Mosaics
NASA Astrophysics Data System (ADS)
Berriman, G. B.; Good, J. C. Deelman, E.; Singh, G. Livny, M.
2009-09-01
Utility grids such as the Amazon EC2 and Amazon S3 clouds offer computational and storage resources that can be used on-demand for a fee by compute- and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. We studied via simulation the cost performance trade-offs of different execution and resource provisioning plans by creating, under the Amazon cloud fee structure, mosaics with the Montage image mosaic engine, a widely used data- and compute-intensive application. Specifically, we studied the cost of building mosaics of 2MASS data that have sizes of 1, 2 and 4 square degrees, and a 2MASS all-sky mosaic. These are examples of mosaics commonly generated by astronomers. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archiving. Our results show that by provisioning the right amount of storage and compute resources cost can be significantly reduced with no significant impact on application performance.
Climate simulations and services on HPC, Cloud and Grid infrastructures
NASA Astrophysics Data System (ADS)
Cofino, Antonio S.; Blanco, Carlos; Minondo Tshuma, Antonio
2017-04-01
Cloud, Grid and High Performance Computing have changed the accessibility and availability of computing resources for Earth Science research communities, specially for Climate community. These paradigms are modifying the way how climate applications are being executed. By using these technologies the number, variety and complexity of experiments and resources are increasing substantially. But, although computational capacity is increasing, traditional applications and tools used by the community are not good enough to manage this large volume and variety of experiments and computing resources. In this contribution, we evaluate the challenges to run climate simulations and services on Grid, Cloud and HPC infrestructures and how to tackle them. The Grid and Cloud infrastructures provided by EGI's VOs ( esr , earth.vo.ibergrid and fedcloud.egi.eu) will be evaluated, as well as HPC resources from PRACE infrastructure and institutional clusters. To solve those challenges, solutions using DRM4G framework will be shown. DRM4G provides a good framework to manage big volume and variety of computing resources for climate experiments. This work has been supported by the Spanish National R&D Plan under projects WRF4G (CGL2011-28864), INSIGNIA (CGL2016-79210-R) and MULTI-SDM (CGL2015-66583-R) ; the IS-ENES2 project from the 7FP of the European Commission (grant agreement no. 312979); the European Regional Development Fund—ERDF and the Programa de Personal Investigador en Formación Predoctoral from Universidad de Cantabria and Government of Cantabria.
The Magellan Final Report on Cloud Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
,; Coghlan, Susan; Yelick, Katherine
The goal of Magellan, a project funded through the U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR), was to investigate the potential role of cloud computing in addressing the computing needs for the DOE Office of Science (SC), particularly related to serving the needs of mid- range computing and future data-intensive computing workloads. A set of research questions was formed to probe various aspects of cloud computing from performance, usability, and cost. To address these questions, a distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility (ALCF) and the National Energy Research Scientific Computingmore » Center (NERSC). The testbed was designed to be flexible and capable enough to explore a variety of computing models and hardware design points in order to understand the impact for various scientific applications. During the project, the testbed also served as a valuable resource to application scientists. Applications from a diverse set of projects such as MG-RAST (a metagenomics analysis server), the Joint Genome Institute, the STAR experiment at the Relativistic Heavy Ion Collider, and the Laser Interferometer Gravitational Wave Observatory (LIGO), were used by the Magellan project for benchmarking within the cloud, but the project teams were also able to accomplish important production science utilizing the Magellan cloud resources.« less
A Big Data Platform for Storing, Accessing, Mining and Learning Geospatial Data
NASA Astrophysics Data System (ADS)
Yang, C. P.; Bambacus, M.; Duffy, D.; Little, M. M.
2017-12-01
Big Data is becoming a norm in geoscience domains. A platform that is capable to effiently manage, access, analyze, mine, and learn the big data for new information and knowledge is desired. This paper introduces our latest effort on developing such a platform based on our past years' experiences on cloud and high performance computing, analyzing big data, comparing big data containers, and mining big geospatial data for new information. The platform includes four layers: a) the bottom layer includes a computing infrastructure with proper network, computer, and storage systems; b) the 2nd layer is a cloud computing layer based on virtualization to provide on demand computing services for upper layers; c) the 3rd layer is big data containers that are customized for dealing with different types of data and functionalities; d) the 4th layer is a big data presentation layer that supports the effient management, access, analyses, mining and learning of big geospatial data.
A Brief Analysis of Development Situations and Trend of Cloud Computing
NASA Astrophysics Data System (ADS)
Yang, Wenyan
2017-12-01
in recent years, the rapid development of Internet technology has radically changed people's work, learning and lifestyles. More and more activities are completed by virtue of computers and networks. The amount of information and data generated is bigger day by day, and people rely more on computer, which makes computing power of computer fail to meet demands of accuracy and rapidity from people. The cloud computing technology has experienced fast development, which is widely applied in the computer industry as a result of advantages of high precision, fast computing and easy usage. Moreover, it has become a focus in information research at present. In this paper, the development situations and trend of cloud computing shall be analyzed and researched.
NASA Astrophysics Data System (ADS)
Yang, Hui; Zhang, Jie; Ji, Yuefeng; He, Yongqi; Lee, Young
2016-07-01
Cloud radio access network (C-RAN) becomes a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing in 5G area. However, the radio network, optical network and processing unit cloud have been decoupled from each other, so that their resources are controlled independently. Traditional architecture cannot implement the resource optimization and scheduling for the high-level service guarantee due to the communication obstacle among them with the growing number of mobile internet users. In this paper, we report a study on multi-dimensional resources integration (MDRI) for service provisioning in cloud radio over fiber network (C-RoFN). A resources integrated provisioning (RIP) scheme using an auxiliary graph is introduced based on the proposed architecture. The MDRI can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical network and processing resources effectively to maximize radio coverage. The feasibility of the proposed architecture is experimentally verified on OpenFlow-based enhanced SDN testbed. The performance of RIP scheme under heavy traffic load scenario is also quantitatively evaluated to demonstrate the efficiency of the proposal based on MDRI architecture in terms of resource utilization, path blocking probability, network cost and path provisioning latency, compared with other provisioning schemes.
Yang, Hui; Zhang, Jie; Ji, Yuefeng; He, Yongqi; Lee, Young
2016-07-28
Cloud radio access network (C-RAN) becomes a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing in 5G area. However, the radio network, optical network and processing unit cloud have been decoupled from each other, so that their resources are controlled independently. Traditional architecture cannot implement the resource optimization and scheduling for the high-level service guarantee due to the communication obstacle among them with the growing number of mobile internet users. In this paper, we report a study on multi-dimensional resources integration (MDRI) for service provisioning in cloud radio over fiber network (C-RoFN). A resources integrated provisioning (RIP) scheme using an auxiliary graph is introduced based on the proposed architecture. The MDRI can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical network and processing resources effectively to maximize radio coverage. The feasibility of the proposed architecture is experimentally verified on OpenFlow-based enhanced SDN testbed. The performance of RIP scheme under heavy traffic load scenario is also quantitatively evaluated to demonstrate the efficiency of the proposal based on MDRI architecture in terms of resource utilization, path blocking probability, network cost and path provisioning latency, compared with other provisioning schemes.
Yang, Hui; Zhang, Jie; Ji, Yuefeng; He, Yongqi; Lee, Young
2016-01-01
Cloud radio access network (C-RAN) becomes a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing in 5G area. However, the radio network, optical network and processing unit cloud have been decoupled from each other, so that their resources are controlled independently. Traditional architecture cannot implement the resource optimization and scheduling for the high-level service guarantee due to the communication obstacle among them with the growing number of mobile internet users. In this paper, we report a study on multi-dimensional resources integration (MDRI) for service provisioning in cloud radio over fiber network (C-RoFN). A resources integrated provisioning (RIP) scheme using an auxiliary graph is introduced based on the proposed architecture. The MDRI can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical network and processing resources effectively to maximize radio coverage. The feasibility of the proposed architecture is experimentally verified on OpenFlow-based enhanced SDN testbed. The performance of RIP scheme under heavy traffic load scenario is also quantitatively evaluated to demonstrate the efficiency of the proposal based on MDRI architecture in terms of resource utilization, path blocking probability, network cost and path provisioning latency, compared with other provisioning schemes. PMID:27465296
Cloudbursting - Solving the 3-body problem
NASA Astrophysics Data System (ADS)
Chang, G.; Heistand, S.; Vakhnin, A.; Huang, T.; Zimdars, P.; Hua, H.; Hood, R.; Koenig, J.; Mehrotra, P.; Little, M. M.; Law, E.
2014-12-01
Many science projects in the future will be accomplished through collaboration among 2 or more NASA centers along with, potentially, external scientists. Science teams will be composed of more geographically dispersed individuals and groups. However, the current computing environment does not make this easy and seamless. By being able to share computing resources among members of a multi-center team working on a science/ engineering project, limited pre-competition funds could be more efficiently applied and technical work could be conducted more effectively with less time spent moving data or waiting for computing resources to free up. Based on the work from an NASA CIO IT Labs task, this presentation will highlight our prototype work in identifying the feasibility and identify the obstacles, both technical and management, to perform "Cloudbursting" among private clouds located at three different centers. We will demonstrate the use of private cloud computing infrastructure at the Jet Propulsion Laboratory, Langley Research Center, and Ames Research Center to provide elastic computation to each other to perform parallel Earth Science data imaging. We leverage elastic load balancing and auto-scaling features at each data center so that each location can independently define how many resources to allocate to a particular job that was "bursted" from another data center and demonstrate that compute capacity scales up and down with the job. We will also discuss future work in the area, which could include the use of cloud infrastructure from different cloud framework providers as well as other cloud service providers.
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets
Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L
2014-01-01
Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852
Predictive Control of Networked Multiagent Systems via Cloud Computing.
Liu, Guo-Ping
2017-01-18
This paper studies the design and analysis of networked multiagent predictive control systems via cloud computing. A cloud predictive control scheme for networked multiagent systems (NMASs) is proposed to achieve consensus and stability simultaneously and to compensate for network delays actively. The design of the cloud predictive controller for NMASs is detailed. The analysis of the cloud predictive control scheme gives the necessary and sufficient conditions of stability and consensus of closed-loop networked multiagent control systems. The proposed scheme is verified to characterize the dynamical behavior and control performance of NMASs through simulations. The outcome provides a foundation for the development of cooperative and coordinative control of NMASs and its applications.
NASA Astrophysics Data System (ADS)
Lin, Guofen; Hong, Hanshu; Xia, Yunhao; Sun, Zhixin
2017-10-01
Attribute-based encryption (ABE) is an interesting cryptographic technique for flexible cloud data sharing access control. However, some open challenges hinder its practical application. In previous schemes, all attributes are considered as in the same status while they are not in most of practical scenarios. Meanwhile, the size of access policy increases dramatically with the raise of its expressiveness complexity. In addition, current research hardly notices that mobile front-end devices, such as smartphones, are poor in computational performance while too much bilinear pairing computation is needed for ABE. In this paper, we propose a key-policy weighted attribute-based encryption without bilinear pairing computation (KP-WABE-WB) for secure cloud data sharing access control. A simple weighted mechanism is presented to describe different importance of each attribute. We introduce a novel construction of ABE without executing any bilinear pairing computation. Compared to previous schemes, our scheme has a better performance in expressiveness of access policy and computational efficiency.
A Secure and Verifiable Outsourced Access Control Scheme in Fog-Cloud Computing.
Fan, Kai; Wang, Junxiong; Wang, Xin; Li, Hui; Yang, Yintang
2017-07-24
With the rapid development of big data and Internet of things (IOT), the number of networking devices and data volume are increasing dramatically. Fog computing, which extends cloud computing to the edge of the network can effectively solve the bottleneck problems of data transmission and data storage. However, security and privacy challenges are also arising in the fog-cloud computing environment. Ciphertext-policy attribute-based encryption (CP-ABE) can be adopted to realize data access control in fog-cloud computing systems. In this paper, we propose a verifiable outsourced multi-authority access control scheme, named VO-MAACS. In our construction, most encryption and decryption computations are outsourced to fog devices and the computation results can be verified by using our verification method. Meanwhile, to address the revocation issue, we design an efficient user and attribute revocation method for it. Finally, analysis and simulation results show that our scheme is both secure and highly efficient.
Impact of office productivity cloud computing on energy consumption and greenhouse gas emissions.
Williams, Daniel R; Tang, Yinshan
2013-05-07
Cloud computing is usually regarded as being energy efficient and thus emitting less greenhouse gases (GHG) than traditional forms of computing. When the energy consumption of Microsoft's cloud computing Office 365 (O365) and traditional Office 2010 (O2010) software suites were tested and modeled, some cloud services were found to consume more energy than the traditional form. The developed model in this research took into consideration the energy consumption at the three main stages of data transmission; data center, network, and end user device. Comparable products from each suite were selected and activities were defined for each product to represent a different computing type. Microsoft provided highly confidential data for the data center stage, while the networking and user device stages were measured directly. A new measurement and software apportionment approach was defined and utilized allowing the power consumption of cloud services to be directly measured for the user device stage. Results indicated that cloud computing is more energy efficient for Excel and Outlook which consumed less energy and emitted less GHG than the standalone counterpart. The power consumption of the cloud based Outlook (8%) and Excel (17%) was lower than their traditional counterparts. However, the power consumption of the cloud version of Word was 17% higher than its traditional equivalent. A third mixed access method was also measured for Word which emitted 5% more GHG than the traditional version. It is evident that cloud computing may not provide a unified way forward to reduce energy consumption and GHG. Direct conversion from the standalone package into the cloud provision platform can now consider energy and GHG emissions at the software development and cloud service design stage using the methods described in this research.
Bioinformatics clouds for big data manipulation
2012-01-01
Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475
Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments
Kadima, Hubert; Granado, Bertrand
2013-01-01
We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361
Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.
Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand
2013-01-01
We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path.
Xie, Zhiqiang; Shao, Xia; Xin, Yu
2016-01-01
To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path
Xie, Zhiqiang; Shao, Xia; Xin, Yu
2016-01-01
To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective. PMID:27490901
NASA Astrophysics Data System (ADS)
Alimi, Isiaka A.; Monteiro, Paulo P.; Teixeira, António L.
2017-11-01
The key paths toward the fifth generation (5G) network requirements are towards centralized processing and small-cell densification systems that are implemented on the cloud computing-based radio access networks (CC-RANs). The increasing recognitions of the CC-RANs can be attributed to their valuable features regarding system performance optimization and cost-effectiveness. Nevertheless, realization of the stringent requirements of the fronthaul that connects the network elements is highly demanding. In this paper, considering the small-cell network architectures, we present multiuser mixed radio-frequency/free-space optical (RF/FSO) relay networks as feasible technologies for the alleviation of the stringent requirements in the CC-RANs. In this study, we use the end-to-end (e2e) outage probability, average symbol error probability (ASEP), and ergodic channel capacity as the performance metrics in our analysis. Simulation results show the suitability of deployment of mixed RF/FSO schemes in the real-life scenarios.
Cloud computing approaches for prediction of ligand binding poses and pathways.
Lawrenz, Morgan; Shukla, Diwakar; Pande, Vijay S
2015-01-22
We describe an innovative protocol for ab initio prediction of ligand crystallographic binding poses and highly effective analysis of large datasets generated for protein-ligand dynamics. We include a procedure for setup and performance of distributed molecular dynamics simulations on cloud computing architectures, a model for efficient analysis of simulation data, and a metric for evaluation of model convergence. We give accurate binding pose predictions for five ligands ranging in affinity from 7 nM to > 200 μM for the immunophilin protein FKBP12, for expedited results in cases where experimental structures are difficult to produce. Our approach goes beyond single, low energy ligand poses to give quantitative kinetic information that can inform protein engineering and ligand design.
NASA Astrophysics Data System (ADS)
Morikawa, Y.; Murata, K. T.; Watari, S.; Kato, H.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Shimojo, S.
2010-12-01
Main methodologies of Solar-Terrestrial Physics (STP) so far are theoretical, experimental and observational, and computer simulation approaches. Recently "informatics" is expected as a new (fourth) approach to the STP studies. Informatics is a methodology to analyze large-scale data (observation data and computer simulation data) to obtain new findings using a variety of data processing techniques. At NICT (National Institute of Information and Communications Technology, Japan) we are now developing a new research environment named "OneSpaceNet". The OneSpaceNet is a cloud-computing environment specialized for science works, which connects many researchers with high-speed network (JGN: Japan Gigabit Network). The JGN is a wide-area back-born network operated by NICT; it provides 10G network and many access points (AP) over Japan. The OneSpaceNet also provides with rich computer resources for research studies, such as super-computers, large-scale data storage area, licensed applications, visualization devices (like tiled display wall: TDW), database/DBMS, cluster computers (4-8 nodes) for data processing and communication devices. What is amazing in use of the science cloud is that a user simply prepares a terminal (low-cost PC). Once connecting the PC to JGN2plus, the user can make full use of the rich resources of the science cloud. Using communication devices, such as video-conference system, streaming and reflector servers, and media-players, the users on the OneSpaceNet can make research communications as if they belong to a same (one) laboratory: they are members of a virtual laboratory. The specification of the computer resources on the OneSpaceNet is as follows: The size of data storage we have developed so far is almost 1PB. The number of the data files managed on the cloud storage is getting larger and now more than 40,000,000. What is notable is that the disks forming the large-scale storage are distributed to 5 data centers over Japan (but the storage system performs as one disk). There are three supercomputers allocated on the cloud, one from Tokyo, one from Osaka and the other from Nagoya. One's simulation job data on any supercomputers are saved on the cloud data storage (same directory); it is a kind of virtual computing environment. The tiled display wall has 36 panels acting as one display; the pixel (resolution) size of it is as large as 18000x4300. This size is enough to preview or analyze the large-scale computer simulation data. It also allows us to take a look of multiple (e.g., 100 pictures) on one screen together with many researchers. In our talk we also present a brief report of the initial results using the OneSpaceNet for Global MHD simulations as an example of successful use of our science cloud; (i) Ultra-high time resolution visualization of Global MHD simulations on the large-scale storage and parallel processing system on the cloud, (ii) Database of real-time Global MHD simulation and statistic analyses of the data, and (iii) 3D Web service of Global MHD simulations.
Atlas2 Cloud: a framework for personal genome analysis in the cloud
2012-01-01
Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms. PMID:23134663
Atlas2 Cloud: a framework for personal genome analysis in the cloud.
Evani, Uday S; Challis, Danny; Yu, Jin; Jackson, Andrew R; Paithankar, Sameer; Bainbridge, Matthew N; Jakkamsetti, Adinarayana; Pham, Peter; Coarfa, Cristian; Milosavljevic, Aleksandar; Yu, Fuli
2012-01-01
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
RACORO Extended-Term Aircraft Observations of Boundary-Layer Clouds
NASA Technical Reports Server (NTRS)
Vogelmann, Andrew M.; McFarquhar, Greg M.; Ogren, John A.; Turner, David D.; Comstock, Jennifer M.; Feingold, Graham; Long, Charles N.; Jonsson, Haflidi H.; Bucholtz, Anthony; Collins, Don R.;
2012-01-01
Small boundary-layer clouds are ubiquitous over many parts of the globe and strongly influence the Earths radiative energy balance. However, our understanding of these clouds is insufficient to solve pressing scientific problems. For example, cloud feedback represents the largest uncertainty amongst all climate feedbacks in general circulation models (GCM). Several issues complicate understanding boundary-layer clouds and simulating them in GCMs. The high spatial variability of boundary-layer clouds poses an enormous computational challenge, since their horizontal dimensions and internal variability occur at spatial scales much finer than the computational grids used in GCMs. Aerosol-cloud interactions further complicate boundary-layer cloud measurement and simulation. Additionally, aerosols influence processes such as precipitation and cloud lifetime. An added complication is that at small scales (order meters to 10s of meters) distinguishing cloud from aerosol is increasingly difficult, due to the effects of aerosol humidification, cloud fragments and photon scattering between clouds.
Fault Tolerant Software Technology for Distributed Computer Systems
1989-03-01
RAY.) &-TR-88-296 I Fin;.’ Technical Report ,r 19,39 i A28 3329 F’ULT TOLERANT SOFTWARE TECHNOLOGY FOR DISTRIBUTED COMPUTER SYSTEMS Georgia Institute...GrfisABN 34-70IiWftlI NO0. IN?3. NO IACCESSION NO. 158 21 7 11. TITLE (Incld security Cassification) FAULT TOLERANT SOFTWARE FOR DISTRIBUTED COMPUTER ...Technology for Distributed Computing Systems," a two year effort performed at Georgia Institute of Technology as part of the Clouds Project. The Clouds
NASA Astrophysics Data System (ADS)
Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.
2017-12-01
Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of exploiting large computational and storage resources of Cloud Computing platforms for large scale DInSAR analysis. The presented Cloud Computing P-SBAS processing chain can be a precious tool in the perspective of developing operational services disposable for the EO scientific community related to hazard monitoring and risk prevention and mitigation.
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.
Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina
2013-01-28
Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
A case study of tuning MapReduce for efficient Bioinformatics in the cloud
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Lizhen; Wang, Zhong; Yu, Weikuan
The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less
Strategic Implications of Cloud Computing for Modeling and Simulation (Briefing)
2016-04-01
of Promises with Cloud • Cost efficiency • Unlimited storage • Backup and recovery • Automatic software integration • Easy access to information...activities that wrap the actual exercise itself (e.g., travel for exercise support, data collection, integration , etc.). Cloud -based simulation would...requiring quick delivery rather than fewer large messages requiring high bandwidth. Cloud environments tend to be better at providing high-bandwidth
Prediction based proactive thermal virtual machine scheduling in green clouds.
Kinger, Supriya; Kumar, Rajesh; Sharma, Anju
2014-01-01
Cloud computing has rapidly emerged as a widely accepted computing paradigm, but the research on Cloud computing is still at an early stage. Cloud computing provides many advanced features but it still has some shortcomings such as relatively high operating cost and environmental hazards like increasing carbon footprints. These hazards can be reduced up to some extent by efficient scheduling of Cloud resources. Working temperature on which a machine is currently running can be taken as a criterion for Virtual Machine (VM) scheduling. This paper proposes a new proactive technique that considers current and maximum threshold temperature of Server Machines (SMs) before making scheduling decisions with the help of a temperature predictor, so that maximum temperature is never reached. Different workload scenarios have been taken into consideration. The results obtained show that the proposed system is better than existing systems of VM scheduling, which does not consider current temperature of nodes before making scheduling decisions. Thus, a reduction in need of cooling systems for a Cloud environment has been obtained and validated.
SCEAPI: A unified Restful Web API for High-Performance Computing
NASA Astrophysics Data System (ADS)
Rongqiang, Cao; Haili, Xiao; Shasha, Lu; Yining, Zhao; Xiaoning, Wang; Xuebin, Chi
2017-10-01
The development of scientific computing is increasingly moving to collaborative web and mobile applications. All these applications need high-quality programming interface for accessing heterogeneous computing resources consisting of clusters, grid computing or cloud computing. In this paper, we introduce our high-performance computing environment that integrates computing resources from 16 HPC centers across China. Then we present a bundle of web services called SCEAPI and describe how it can be used to access HPC resources with HTTP or HTTPs protocols. We discuss SCEAPI from several aspects including architecture, implementation and security, and address specific challenges in designing compatible interfaces and protecting sensitive data. We describe the functions of SCEAPI including authentication, file transfer and job management for creating, submitting and monitoring, and how to use SCEAPI in an easy-to-use way. Finally, we discuss how to exploit more HPC resources quickly for the ATLAS experiment by implementing the custom ARC compute element based on SCEAPI, and our work shows that SCEAPI is an easy-to-use and effective solution to extend opportunistic HPC resources.
SPARSE—A subgrid particle averaged Reynolds stress equivalent model: testing with a priori closure
Davis, Sean L.; Sen, Oishik; Udaykumar, H. S.
2017-01-01
A Lagrangian particle cloud model is proposed that accounts for the effects of Reynolds-averaged particle and turbulent stresses and the averaged carrier-phase velocity of the subparticle cloud scale on the averaged motion and velocity of the cloud. The SPARSE (subgrid particle averaged Reynolds stress equivalent) model is based on a combination of a truncated Taylor expansion of a drag correction function and Reynolds averaging. It reduces the required number of computational parcels to trace a cloud of particles in Eulerian–Lagrangian methods for the simulation of particle-laden flow. Closure is performed in an a priori manner using a reference simulation where all particles in the cloud are traced individually with a point-particle model. Comparison of a first-order model and SPARSE with the reference simulation in one dimension shows that both the stress and the averaging of the carrier-phase velocity on the cloud subscale affect the averaged motion of the particle. A three-dimensional isotropic turbulence computation shows that only one computational parcel is sufficient to accurately trace a cloud of tens of thousands of particles. PMID:28413341
SPARSE-A subgrid particle averaged Reynolds stress equivalent model: testing with a priori closure.
Davis, Sean L; Jacobs, Gustaaf B; Sen, Oishik; Udaykumar, H S
2017-03-01
A Lagrangian particle cloud model is proposed that accounts for the effects of Reynolds-averaged particle and turbulent stresses and the averaged carrier-phase velocity of the subparticle cloud scale on the averaged motion and velocity of the cloud. The SPARSE (subgrid particle averaged Reynolds stress equivalent) model is based on a combination of a truncated Taylor expansion of a drag correction function and Reynolds averaging. It reduces the required number of computational parcels to trace a cloud of particles in Eulerian-Lagrangian methods for the simulation of particle-laden flow. Closure is performed in an a priori manner using a reference simulation where all particles in the cloud are traced individually with a point-particle model. Comparison of a first-order model and SPARSE with the reference simulation in one dimension shows that both the stress and the averaging of the carrier-phase velocity on the cloud subscale affect the averaged motion of the particle. A three-dimensional isotropic turbulence computation shows that only one computational parcel is sufficient to accurately trace a cloud of tens of thousands of particles.
Phenomenology tools on cloud infrastructures using OpenStack
NASA Astrophysics Data System (ADS)
Campos, I.; Fernández-del-Castillo, E.; Heinemeyer, S.; Lopez-Garcia, A.; Pahlen, F.; Borges, G.
2013-04-01
We present a new environment for computations in particle physics phenomenology employing recent developments in cloud computing. On this environment users can create and manage "virtual" machines on which the phenomenology codes/tools can be deployed easily in an automated way. We analyze the performance of this environment based on "virtual" machines versus the utilization of physical hardware. In this way we provide a qualitative result for the influence of the host operating system on the performance of a representative set of applications for phenomenology calculations.
Applying analytic hierarchy process to assess healthcare-oriented cloud computing service systems.
Liao, Wen-Hwa; Qiu, Wan-Li
2016-01-01
Numerous differences exist between the healthcare industry and other industries. Difficulties in the business operation of the healthcare industry have continually increased because of the volatility and importance of health care, changes to and requirements of health insurance policies, and the statuses of healthcare providers, which are typically considered not-for-profit organizations. Moreover, because of the financial risks associated with constant changes in healthcare payment methods and constantly evolving information technology, healthcare organizations must continually adjust their business operation objectives; therefore, cloud computing presents both a challenge and an opportunity. As a response to aging populations and the prevalence of the Internet in fast-paced contemporary societies, cloud computing can be used to facilitate the task of balancing the quality and costs of health care. To evaluate cloud computing service systems for use in health care, providing decision makers with a comprehensive assessment method for prioritizing decision-making factors is highly beneficial. Hence, this study applied the analytic hierarchy process, compared items related to cloud computing and health care, executed a questionnaire survey, and then classified the critical factors influencing healthcare cloud computing service systems on the basis of statistical analyses of the questionnaire results. The results indicate that the primary factor affecting the design or implementation of optimal cloud computing healthcare service systems is cost effectiveness, with the secondary factors being practical considerations such as software design and system architecture.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chow, J
Purpose: This study evaluated the efficiency of 4D lung radiation treatment planning using Monte Carlo simulation on the cloud. The EGSnrc Monte Carlo code was used in dose calculation on the 4D-CT image set. Methods: 4D lung radiation treatment plan was created by the DOSCTP linked to the cloud, based on the Amazon elastic compute cloud platform. Dose calculation was carried out by Monte Carlo simulation on the 4D-CT image set on the cloud, and results were sent to the FFD4D image deformation program for dose reconstruction. The dependence of computing time for treatment plan on the number of computemore » node was optimized with variations of the number of CT image set in the breathing cycle and dose reconstruction time of the FFD4D. Results: It is found that the dependence of computing time on the number of compute node was affected by the diminishing return of the number of node used in Monte Carlo simulation. Moreover, the performance of the 4D treatment planning could be optimized by using smaller than 10 compute nodes on the cloud. The effects of the number of image set and dose reconstruction time on the dependence of computing time on the number of node were not significant, as more than 15 compute nodes were used in Monte Carlo simulations. Conclusion: The issue of long computing time in 4D treatment plan, requiring Monte Carlo dose calculations in all CT image sets in the breathing cycle, can be solved using the cloud computing technology. It is concluded that the optimized number of compute node selected in simulation should be between 5 and 15, as the dependence of computing time on the number of node is significant.« less
Cloud Computing for Geosciences--GeoCloud for standardized geospatial service platforms (Invited)
NASA Astrophysics Data System (ADS)
Nebert, D. D.; Huang, Q.; Yang, C.
2013-12-01
The 21st century geoscience faces challenges of Big Data, spike computing requirements (e.g., when natural disaster happens), and sharing resources through cyberinfrastructure across different organizations (Yang et al., 2011). With flexibility and cost-efficiency of computing resources a primary concern, cloud computing emerges as a promising solution to provide core capabilities to address these challenges. Many governmental and federal agencies are adopting cloud technologies to cut costs and to make federal IT operations more efficient (Huang et al., 2010). However, it is still difficult for geoscientists to take advantage of the benefits of cloud computing to facilitate the scientific research and discoveries. This presentation reports using GeoCloud to illustrate the process and strategies used in building a common platform for geoscience communities to enable the sharing, integration of geospatial data, information and knowledge across different domains. GeoCloud is an annual incubator project coordinated by the Federal Geographic Data Committee (FGDC) in collaboration with the U.S. General Services Administration (GSA) and the Department of Health and Human Services. It is designed as a staging environment to test and document the deployment of a common GeoCloud community platform that can be implemented by multiple agencies. With these standardized virtual geospatial servers, a variety of government geospatial applications can be quickly migrated to the cloud. In order to achieve this objective, multiple projects are nominated each year by federal agencies as existing public-facing geospatial data services. From the initial candidate projects, a set of common operating system and software requirements was identified as the baseline for platform as a service (PaaS) packages. Based on these developed common platform packages, each project deploys and monitors its web application, develops best practices, and documents cost and performance information. This paper presents the background, architectural design, and activities of GeoCloud in support of the Geospatial Platform Initiative. System security strategies and approval processes for migrating federal geospatial data, information, and applications into cloud, and cost estimation for cloud operations are covered. Finally, some lessons learned from the GeoCloud project are discussed as reference for geoscientists to consider in the adoption of cloud computing.
Performance implications from sizing a VM on multi-core systems: A Data analytic application s view
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lim, Seung-Hwan; Horey, James L; Begoli, Edmon
In this paper, we present a quantitative performance analysis of data analytics applications running on multi-core virtual machines. Such environments form the core of cloud computing. In addition, data analytics applications, such as Cassandra and Hadoop, are becoming increasingly popular on cloud computing platforms. This convergence necessitates a better understanding of the performance and cost implications of such hybrid systems. For example, the very rst step in hosting applications in virtualized environments, requires the user to con gure the number of virtual processors and the size of memory. To understand performance implications of this step, we benchmarked three Yahoo Cloudmore » Serving Benchmark (YCSB) workloads in a virtualized multi-core environment. Our measurements indicate that the performance of Cassandra for YCSB workloads does not heavily depend on the processing capacity of a system, while the size of the data set is critical to performance relative to allocated memory. We also identi ed a strong relationship between the running time of workloads and various hardware events (last level cache loads, misses, and CPU migrations). From this analysis, we provide several suggestions to improve the performance of data analytics applications running on cloud computing environments.« less
Performance Evaluation of Resource Management in Cloud Computing Environments.
Batista, Bruno Guazzelli; Estrella, Julio Cezar; Ferreira, Carlos Henrique Gomes; Filho, Dionisio Machado Leite; Nakamura, Luis Hideo Vasconcelos; Reiff-Marganiec, Stephan; Santana, Marcos José; Santana, Regina Helena Carlucci
2015-01-01
Cloud computing is a computational model in which resource providers can offer on-demand services to clients in a transparent way. However, to be able to guarantee quality of service without limiting the number of accepted requests, providers must be able to dynamically manage the available resources so that they can be optimized. This dynamic resource management is not a trivial task, since it involves meeting several challenges related to workload modeling, virtualization, performance modeling, deployment and monitoring of applications on virtualized resources. This paper carries out a performance evaluation of a module for resource management in a cloud environment that includes handling available resources during execution time and ensuring the quality of service defined in the service level agreement. An analysis was conducted of different resource configurations to define which dimension of resource scaling has a real influence on client requests. The results were used to model and implement a simulated cloud system, in which the allocated resource can be changed on-the-fly, with a corresponding change in price. In this way, the proposed module seeks to satisfy both the client by ensuring quality of service, and the provider by ensuring the best use of resources at a fair price.
Performance Evaluation of Resource Management in Cloud Computing Environments
Batista, Bruno Guazzelli; Estrella, Julio Cezar; Ferreira, Carlos Henrique Gomes; Filho, Dionisio Machado Leite; Nakamura, Luis Hideo Vasconcelos; Reiff-Marganiec, Stephan; Santana, Marcos José; Santana, Regina Helena Carlucci
2015-01-01
Cloud computing is a computational model in which resource providers can offer on-demand services to clients in a transparent way. However, to be able to guarantee quality of service without limiting the number of accepted requests, providers must be able to dynamically manage the available resources so that they can be optimized. This dynamic resource management is not a trivial task, since it involves meeting several challenges related to workload modeling, virtualization, performance modeling, deployment and monitoring of applications on virtualized resources. This paper carries out a performance evaluation of a module for resource management in a cloud environment that includes handling available resources during execution time and ensuring the quality of service defined in the service level agreement. An analysis was conducted of different resource configurations to define which dimension of resource scaling has a real influence on client requests. The results were used to model and implement a simulated cloud system, in which the allocated resource can be changed on-the-fly, with a corresponding change in price. In this way, the proposed module seeks to satisfy both the client by ensuring quality of service, and the provider by ensuring the best use of resources at a fair price. PMID:26555730
A compressive sensing based secure watermark detection and privacy preserving storage framework.
Qia Wang; Wenjun Zeng; Jun Tian
2014-03-01
Privacy is a critical issue when the data owners outsource data storage or processing to a third party computing service, such as the cloud. In this paper, we identify a cloud computing application scenario that requires simultaneously performing secure watermark detection and privacy preserving multimedia data storage. We then propose a compressive sensing (CS)-based framework using secure multiparty computation (MPC) protocols to address such a requirement. In our framework, the multimedia data and secret watermark pattern are presented to the cloud for secure watermark detection in a CS domain to protect the privacy. During CS transformation, the privacy of the CS matrix and the watermark pattern is protected by the MPC protocols under the semi-honest security model. We derive the expected watermark detection performance in the CS domain, given the target image, watermark pattern, and the size of the CS matrix (but without the CS matrix itself). The correctness of the derived performance has been validated by our experiments. Our theoretical analysis and experimental results show that secure watermark detection in the CS domain is feasible. Our framework can also be extended to other collaborative secure signal processing and data-mining applications in the cloud.
A lightweight distributed framework for computational offloading in mobile cloud computing.
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC.
A Lightweight Distributed Framework for Computational Offloading in Mobile Cloud Computing
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC. PMID:25127245
2010-04-29
Cloud Computing The answer, my friend, is blowing in the wind. The answer is blowing in the wind. 1Bingue ‐ Cook Cloud Computing STSC 2010... Cloud Computing STSC 2010 Objectives • Define the cloud • Risks of cloud computing f l d i• Essence o c ou comput ng • Deployed clouds in DoD 3Bingue...Cook Cloud Computing STSC 2010 Definitions of Cloud Computing Cloud computing is a model for enabling b d d ku
Cloud Computing Services for Seismic Networks
NASA Astrophysics Data System (ADS)
Olson, Michael
This thesis describes a compositional framework for developing situation awareness applications: applications that provide ongoing information about a user's changing environment. The thesis describes how the framework is used to develop a situation awareness application for earthquakes. The applications are implemented as Cloud computing services connected to sensors and actuators. The architecture and design of the Cloud services are described and measurements of performance metrics are provided. The thesis includes results of experiments on earthquake monitoring conducted over a year. The applications developed by the framework are (1) the CSN---the Community Seismic Network---which uses relatively low-cost sensors deployed by members of the community, and (2) SAF---the Situation Awareness Framework---which integrates data from multiple sources, including the CSN, CISN---the California Integrated Seismic Network, a network consisting of high-quality seismometers deployed carefully by professionals in the CISN organization and spread across Southern California---and prototypes of multi-sensor platforms that include carbon monoxide, methane, dust and radiation sensors.
Chen, Shang-Liang; Chen, Yun-Yao; Hsu, Chiang
2014-01-01
Cloud computing is changing the ways software is developed and managed in enterprises, which is changing the way of doing business in that dynamically scalable and virtualized resources are regarded as services over the Internet. Traditional manufacturing systems such as supply chain management (SCM), customer relationship management (CRM), and enterprise resource planning (ERP) are often developed case by case. However, effective collaboration between different systems, platforms, programming languages, and interfaces has been suggested by researchers. In cloud-computing-based systems, distributed resources are encapsulated into cloud services and centrally managed, which allows high automation, flexibility, fast provision, and ease of integration at low cost. The integration between physical resources and cloud services can be improved by combining Internet of things (IoT) technology and Software-as-a-Service (SaaS) technology. This study proposes a new approach for developing cloud-based manufacturing systems based on a four-layer SaaS model. There are three main contributions of this paper: (1) enterprises can develop their own cloud-based logistic management information systems based on the approach proposed in this paper; (2) a case study based on literature reviews with experimental results is proposed to verify that the system performance is remarkable; (3) challenges encountered and feedback collected from T Company in the case study are discussed in this paper for the purpose of enterprise deployment. PMID:24686728
Chen, Shang-Liang; Chen, Yun-Yao; Hsu, Chiang
2014-03-28
Cloud computing is changing the ways software is developed and managed in enterprises, which is changing the way of doing business in that dynamically scalable and virtualized resources are regarded as services over the Internet. Traditional manufacturing systems such as supply chain management (SCM), customer relationship management (CRM), and enterprise resource planning (ERP) are often developed case by case. However, effective collaboration between different systems, platforms, programming languages, and interfaces has been suggested by researchers. In cloud-computing-based systems, distributed resources are encapsulated into cloud services and centrally managed, which allows high automation, flexibility, fast provision, and ease of integration at low cost. The integration between physical resources and cloud services can be improved by combining Internet of things (IoT) technology and Software-as-a-Service (SaaS) technology. This study proposes a new approach for developing cloud-based manufacturing systems based on a four-layer SaaS model. There are three main contributions of this paper: (1) enterprises can develop their own cloud-based logistic management information systems based on the approach proposed in this paper; (2) a case study based on literature reviews with experimental results is proposed to verify that the system performance is remarkable; (3) challenges encountered and feedback collected from T Company in the case study are discussed in this paper for the purpose of enterprise deployment.
Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support
2012-01-01
Background Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. Results In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Conclusions Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org. PMID:22559942
A Fast Infrared Radiative Transfer Model for Overlapping Clouds
NASA Technical Reports Server (NTRS)
Niu, Jianguo; Yang, Ping; Huang, Huang-Lung; Davies, James E.; Li, Jun; Baum, Bryan A.; Hu, Yong X.
2006-01-01
A fast infrared radiative transfer model (FIRTM2) appropriate for application to both single-layered and overlapping cloud situations is developed for simulating the outgoing infrared spectral radiance at the top of the atmosphere (TOA). In FIRTM2 a pre-computed library of cloud reflectance and transmittance values is employed to account for one or two cloud layers, whereas the background atmospheric optical thickness due to gaseous absorption can be computed from a clear-sky radiative transfer model. FIRTM2 is applicable to three atmospheric conditions: 1) clear-sky, 2) single-layered ice or water cloud, and 3) two simultaneous cloud layers in a column (e.g., ice cloud overlying water cloud). Moreover, FIRTM2 outputs the derivatives (i.e., Jacobians) of the TOA brightness temperature with respect to cloud optical thickness and effective particle size. Sensitivity analyses have been carried out to assess the performance of FIRTM2 for two spectral regions, namely the longwave (LW) band (587.3 - 1179.5/cm) and the short-to-medium wave (SMW) band (1180.1 - 2228.9/cm). The assessment is carried out in terms of brightness temperature differences (BTD) between FIRTM2 and the well-known discrete ordinates radiative transfer model (DISORT), henceforth referred to as BTD (F-D). The BTD (F-D) values for single-layered clouds are generally less than 0.8 K. For the case of two cloud layers (specifically ice cloud over water cloud), the BTD(F-D) values are also generally less than 0.8 K except for the SMW band for the case of a very high altitude (>15 km) cloud comprised of small ice particles. Note that for clear-sky atmospheres, FIRTM2 reduces to the clear-sky radiative transfer model that is incorporated into FIRTM2, and the errors in this case are essentially those of the clear-sky radiative transfer model.
Yang, Hui; He, Yongqi; Zhang, Jie; Ji, Yuefeng; Bai, Wei; Lee, Young
2016-04-18
Cloud radio access network (C-RAN) has become a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing using cloud BBUs. In our previous work, we implemented cross stratum optimization of optical network and application stratums resources that allows to accommodate the services in optical networks. In view of this, this study extends to consider the multiple dimensional resources optimization of radio, optical and BBU processing in 5G age. We propose a novel multi-stratum resources optimization (MSRO) architecture with network functions virtualization for cloud-based radio over optical fiber networks (C-RoFN) using software defined control. A global evaluation scheme (GES) for MSRO in C-RoFN is introduced based on the proposed architecture. The MSRO can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical and BBU resources effectively to maximize radio coverage. The efficiency and feasibility of the proposed architecture are experimentally demonstrated on OpenFlow-based enhanced SDN testbed. The performance of GES under heavy traffic load scenario is also quantitatively evaluated based on MSRO architecture in terms of resource occupation rate and path provisioning latency, compared with other provisioning scheme.
Monte Carlo verification of radiotherapy treatments with CloudMC.
Miras, Hector; Jiménez, Rubén; Perales, Álvaro; Terrón, José Antonio; Bertolet, Alejandro; Ortiz, Antonio; Macías, José
2018-06-27
A new implementation has been made on CloudMC, a cloud-based platform presented in a previous work, in order to provide services for radiotherapy treatment verification by means of Monte Carlo in a fast, easy and economical way. A description of the architecture of the application and the new developments implemented is presented together with the results of the tests carried out to validate its performance. CloudMC has been developed over Microsoft Azure cloud. It is based on a map/reduce implementation for Monte Carlo calculations distribution over a dynamic cluster of virtual machines in order to reduce calculation time. CloudMC has been updated with new methods to read and process the information related to radiotherapy treatment verification: CT image set, treatment plan, structures and dose distribution files in DICOM format. Some tests have been designed in order to determine, for the different tasks, the most suitable type of virtual machines from those available in Azure. Finally, the performance of Monte Carlo verification in CloudMC is studied through three real cases that involve different treatment techniques, linac models and Monte Carlo codes. Considering computational and economic factors, D1_v2 and G1 virtual machines were selected as the default type for the Worker Roles and the Reducer Role respectively. Calculation times up to 33 min and costs of 16 € were achieved for the verification cases presented when a statistical uncertainty below 2% (2σ) was required. The costs were reduced to 3-6 € when uncertainty requirements are relaxed to 4%. Advantages like high computational power, scalability, easy access and pay-per-usage model, make Monte Carlo cloud-based solutions, like the one presented in this work, an important step forward to solve the long-lived problem of truly introducing the Monte Carlo algorithms in the daily routine of the radiotherapy planning process.
Abdulhamid, Shafi’i Muhammad; Abd Latiff, Muhammad Shafie; Abdul-Salaam, Gaddafi; Hussain Madni, Syed Hamid
2016-01-01
Cloud computing system is a huge cluster of interconnected servers residing in a datacenter and dynamically provisioned to clients on-demand via a front-end interface. Scientific applications scheduling in the cloud computing environment is identified as NP-hard problem due to the dynamic nature of heterogeneous resources. Recently, a number of metaheuristics optimization schemes have been applied to address the challenges of applications scheduling in the cloud system, without much emphasis on the issue of secure global scheduling. In this paper, scientific applications scheduling techniques using the Global League Championship Algorithm (GBLCA) optimization technique is first presented for global task scheduling in the cloud environment. The experiment is carried out using CloudSim simulator. The experimental results show that, the proposed GBLCA technique produced remarkable performance improvement rate on the makespan that ranges between 14.44% to 46.41%. It also shows significant reduction in the time taken to securely schedule applications as parametrically measured in terms of the response time. In view of the experimental results, the proposed technique provides better-quality scheduling solution that is suitable for scientific applications task execution in the Cloud Computing environment than the MinMin, MaxMin, Genetic Algorithm (GA) and Ant Colony Optimization (ACO) scheduling techniques. PMID:27384239
Abdulhamid, Shafi'i Muhammad; Abd Latiff, Muhammad Shafie; Abdul-Salaam, Gaddafi; Hussain Madni, Syed Hamid
2016-01-01
Cloud computing system is a huge cluster of interconnected servers residing in a datacenter and dynamically provisioned to clients on-demand via a front-end interface. Scientific applications scheduling in the cloud computing environment is identified as NP-hard problem due to the dynamic nature of heterogeneous resources. Recently, a number of metaheuristics optimization schemes have been applied to address the challenges of applications scheduling in the cloud system, without much emphasis on the issue of secure global scheduling. In this paper, scientific applications scheduling techniques using the Global League Championship Algorithm (GBLCA) optimization technique is first presented for global task scheduling in the cloud environment. The experiment is carried out using CloudSim simulator. The experimental results show that, the proposed GBLCA technique produced remarkable performance improvement rate on the makespan that ranges between 14.44% to 46.41%. It also shows significant reduction in the time taken to securely schedule applications as parametrically measured in terms of the response time. In view of the experimental results, the proposed technique provides better-quality scheduling solution that is suitable for scientific applications task execution in the Cloud Computing environment than the MinMin, MaxMin, Genetic Algorithm (GA) and Ant Colony Optimization (ACO) scheduling techniques.
Survey on Security Issues in Cloud Computing and Associated Mitigation Techniques
NASA Astrophysics Data System (ADS)
Bhadauria, Rohit; Sanyal, Sugata
2012-06-01
Cloud Computing holds the potential to eliminate the requirements for setting up of high-cost computing infrastructure for IT-based solutions and services that the industry uses. It promises to provide a flexible IT architecture, accessible through internet for lightweight portable devices. This would allow multi-fold increase in the capacity or capabilities of the existing and new software. In a cloud computing environment, the entire data reside over a set of networked resources, enabling the data to be accessed through virtual machines. Since these data-centers may lie in any corner of the world beyond the reach and control of users, there are multifarious security and privacy challenges that need to be understood and taken care of. Also, one can never deny the possibility of a server breakdown that has been witnessed, rather quite often in the recent times. There are various issues that need to be dealt with respect to security and privacy in a cloud computing scenario. This extensive survey paper aims to elaborate and analyze the numerous unresolved issues threatening the cloud computing adoption and diffusion affecting the various stake-holders linked to it.
An Elliptic Curve Based Schnorr Cloud Security Model in Distributed Environment
Muthurajan, Vinothkumar; Narayanasamy, Balaji
2016-01-01
Cloud computing requires the security upgrade in data transmission approaches. In general, key-based encryption/decryption (symmetric and asymmetric) mechanisms ensure the secure data transfer between the devices. The symmetric key mechanisms (pseudorandom function) provide minimum protection level compared to asymmetric key (RSA, AES, and ECC) schemes. The presence of expired content and the irrelevant resources cause unauthorized data access adversely. This paper investigates how the integrity and secure data transfer are improved based on the Elliptic Curve based Schnorr scheme. This paper proposes a virtual machine based cloud model with Hybrid Cloud Security Algorithm (HCSA) to remove the expired content. The HCSA-based auditing improves the malicious activity prediction during the data transfer. The duplication in the cloud server degrades the performance of EC-Schnorr based encryption schemes. This paper utilizes the blooming filter concept to avoid the cloud server duplication. The combination of EC-Schnorr and blooming filter efficiently improves the security performance. The comparative analysis between proposed HCSA and the existing Distributed Hash Table (DHT) regarding execution time, computational overhead, and auditing time with auditing requests and servers confirms the effectiveness of HCSA in the cloud security model creation. PMID:26981584
An Elliptic Curve Based Schnorr Cloud Security Model in Distributed Environment.
Muthurajan, Vinothkumar; Narayanasamy, Balaji
2016-01-01
Cloud computing requires the security upgrade in data transmission approaches. In general, key-based encryption/decryption (symmetric and asymmetric) mechanisms ensure the secure data transfer between the devices. The symmetric key mechanisms (pseudorandom function) provide minimum protection level compared to asymmetric key (RSA, AES, and ECC) schemes. The presence of expired content and the irrelevant resources cause unauthorized data access adversely. This paper investigates how the integrity and secure data transfer are improved based on the Elliptic Curve based Schnorr scheme. This paper proposes a virtual machine based cloud model with Hybrid Cloud Security Algorithm (HCSA) to remove the expired content. The HCSA-based auditing improves the malicious activity prediction during the data transfer. The duplication in the cloud server degrades the performance of EC-Schnorr based encryption schemes. This paper utilizes the blooming filter concept to avoid the cloud server duplication. The combination of EC-Schnorr and blooming filter efficiently improves the security performance. The comparative analysis between proposed HCSA and the existing Distributed Hash Table (DHT) regarding execution time, computational overhead, and auditing time with auditing requests and servers confirms the effectiveness of HCSA in the cloud security model creation.
Local storage federation through XRootD architecture for interactive distributed analysis
NASA Astrophysics Data System (ADS)
Colamaria, F.; Colella, D.; Donvito, G.; Elia, D.; Franco, A.; Luparello, G.; Maggi, G.; Miniello, G.; Vallero, S.; Vino, G.
2015-12-01
A cloud-based Virtual Analysis Facility (VAF) for the ALICE experiment at the LHC has been deployed in Bari. Similar facilities are currently running in other Italian sites with the aim to create a federation of interoperating farms able to provide their computing resources for interactive distributed analysis. The use of cloud technology, along with elastic provisioning of computing resources as an alternative to the grid for running data intensive analyses, is the main challenge of these facilities. One of the crucial aspects of the user-driven analysis execution is the data access. A local storage facility has the disadvantage that the stored data can be accessed only locally, i.e. from within the single VAF. To overcome such a limitation a federated infrastructure, which provides full access to all the data belonging to the federation independently from the site where they are stored, has been set up. The federation architecture exploits both cloud computing and XRootD technologies, in order to provide a dynamic, easy-to-use and well performing solution for data handling. It should allow the users to store the files and efficiently retrieve the data, since it implements a dynamic distributed cache among many datacenters in Italy connected to one another through the high-bandwidth national network. Details on the preliminary architecture implementation and performance studies are discussed.
NASA Astrophysics Data System (ADS)
Moro, A. C.; Nadesh, R. K.
2017-11-01
The cloud computing paradigm has transformed the way we do business in today’s world. Services on cloud have come a long way since just providing basic storage or software on demand. One of the fastest growing factor in this is mobile cloud computing. With the option of offloading now available to mobile users, mobile users can offload entire applications onto cloudlets. With the problems regarding availability and limited-storage capacity of these mobile cloudlets, it becomes difficult to decide for the mobile user when to use his local memory or the cloudlets. Hence, we take a look at a fast algorithm that decides whether the mobile user should go for cloudlet or rely on local memory based on an offloading probability. We have partially implemented the algorithm which decides whether the task can be carried out locally or given to a cloudlet. But as it becomes a burden on the mobile devices to perform the complete computation, so we look to offload this on to a cloud in our paper. Also further we use a file compression technique before sending the file onto the cloud to further reduce the load.
Hybrid cloud and cluster computing paradigms for life science applications
2010-01-01
Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Hybrid cloud and cluster computing paradigms for life science applications.
Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey
2010-12-21
Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Cloud prediction of protein structure and function with PredictProtein for Debian.
Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard
2013-01-01
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.
Cloud Prediction of Protein Structure and Function with PredictProtein for Debian
Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard
2013-01-01
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032
NASA Astrophysics Data System (ADS)
Schnase, J. L.; Duffy, D.; Tamkin, G. S.; Nadeau, D.; Thompson, J. H.; Grieg, C. M.; McInerney, M.; Webster, W. P.
2013-12-01
Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS built on this principle. MERRA/AS enables MapReduce analytics over NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRA/AS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRA/AS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data.
NASA Technical Reports Server (NTRS)
Schnase, John L.; Duffy, Daniel Quinn; Tamkin, Glenn S.; Nadeau, Denis; Thompson, John H.; Grieg, Christina M.; McInerney, Mark A.; Webster, William P.
2014-01-01
Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we it see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRAAS) is an example of cloud-enabled CAaaS built on this principle. MERRAAS enables MapReduce analytics over NASAs Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRAAS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRAAS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data.
Feasibility of Virtual Machine and Cloud Computing Technologies for High Performance Computing
2014-05-01
Hat Enterprise Linux SaaS software as a service VM virtual machine vNUMA virtual non-uniform memory access WRF weather research and forecasting...previously mentioned in Chapter I Section B1 of this paper, which is used to run the weather research and forecasting ( WRF ) model in their experiments...against a VMware virtualization solution of WRF . The experiment consisted of running WRF in a standard configuration between the D-VTM and VMware while
Above-Campus Services: Shaping the Promise of Cloud Computing for Higher Education
ERIC Educational Resources Information Center
Wheeler, Brad; Waggener, Shelton
2009-01-01
The concept of today's cloud computing may date back to 1961, when John McCarthy, retired Stanford professor and Turing Award winner, delivered a speech at MIT's Centennial. In that speech, he predicted that in the future, computing would become a "public utility." Yet for colleges and universities, the recent growth of pervasive, very high speed…
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.
Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L
2014-01-01
As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering.
Guo, Xuan; Meng, Yu; Yu, Ning; Pan, Yi
2014-04-10
Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.
Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering
2014-01-01
Backgroud Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. Results In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Conclusions Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS. PMID:24717145
A Secure and Verifiable Outsourced Access Control Scheme in Fog-Cloud Computing
Fan, Kai; Wang, Junxiong; Wang, Xin; Li, Hui; Yang, Yintang
2017-01-01
With the rapid development of big data and Internet of things (IOT), the number of networking devices and data volume are increasing dramatically. Fog computing, which extends cloud computing to the edge of the network can effectively solve the bottleneck problems of data transmission and data storage. However, security and privacy challenges are also arising in the fog-cloud computing environment. Ciphertext-policy attribute-based encryption (CP-ABE) can be adopted to realize data access control in fog-cloud computing systems. In this paper, we propose a verifiable outsourced multi-authority access control scheme, named VO-MAACS. In our construction, most encryption and decryption computations are outsourced to fog devices and the computation results can be verified by using our verification method. Meanwhile, to address the revocation issue, we design an efficient user and attribute revocation method for it. Finally, analysis and simulation results show that our scheme is both secure and highly efficient. PMID:28737733
An Overview of Cloud Computing in Distributed Systems
NASA Astrophysics Data System (ADS)
Divakarla, Usha; Kumari, Geetha
2010-11-01
Cloud computing is the emerging trend in the field of distributed computing. Cloud computing evolved from grid computing and distributed computing. Cloud plays an important role in huge organizations in maintaining huge data with limited resources. Cloud also helps in resource sharing through some specific virtual machines provided by the cloud service provider. This paper gives an overview of the cloud organization and some of the basic security issues pertaining to the cloud.
NASA Astrophysics Data System (ADS)
Stillinger, T.; Dozier, J.; Phares, N.; Rittger, K.
2015-12-01
Discrimination between snow and clouds poses a serious but tractable challenge to the consistent delivery of high-quality information on mountain snow from remote sensing. Clouds obstruct the surface from the sensor's view, and the similar optical properties of clouds and snow make accurate discrimination difficult. We assess the performance of the current Landsat 8 operational snow and cloud mask products (LDCM CCA and CFmask), along with a new method, using over one million manually identified snow and clouds pixels in Landsat 8 scenes. The new method uses physically based scattering models to generate spectra in each Landsat 8 band, at that scene's solar illumination, for snow and cloud particle sizes that cover the plausible range for each. The modeled spectra are compared to pixels' spectra via several independent ways to identify snow and clouds. The results are synthesized to create a final snow/cloud mask, and the method can be applied to any multispectral imager with bands covering the visible, near-infrared, and shortwave-infrared regions. Each algorithm we tested misidentifies snow and clouds in both directions to varying degrees. We assess performance with measures of Precision, Recall, and the F statistic, which are based on counts of true and false positives and negatives. Tests for significance in differences between spectra in the measured and modeled values among incorrectly identified pixels help ascertain reasons for misidentification. A cloud mask specifically designed to separate snow from clouds is a valuable tool for those interested in remotely sensing snow cover. Given freely available remote sensing datasets and computational tools to feasibly process entire mission histories for an area of interest, enabling researchers to reliably identify and separate snow and clouds increases the usability of the data for hydrological and climatological studies.
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage
Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-01-01
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query. PMID:29652810
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage.
Guo, Yeting; Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-04-13
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query.
Analysis on the security of cloud computing
NASA Astrophysics Data System (ADS)
He, Zhonglin; He, Yuhua
2011-02-01
Cloud computing is a new technology, which is the fusion of computer technology and Internet development. It will lead the revolution of IT and information field. However, in cloud computing data and application software is stored at large data centers, and the management of data and service is not completely trustable, resulting in safety problems, which is the difficult point to improve the quality of cloud service. This paper briefly introduces the concept of cloud computing. Considering the characteristics of cloud computing, it constructs the security architecture of cloud computing. At the same time, with an eye toward the security threats cloud computing faces, several corresponding strategies are provided from the aspect of cloud computing users and service providers.
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...
2016-10-06
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
The HEPiX Virtualisation Working Group: Towards a Grid of Clouds
NASA Astrophysics Data System (ADS)
Cass, Tony
2012-12-01
The use of virtual machine images, as for example with Cloud services such as Amazon's Elastic Compute Cloud, is attractive for users as they have a guaranteed execution environment, something that cannot today be provided across sites participating in computing grids such as the Worldwide LHC Computing Grid. However, Grid sites often operate within computer security frameworks which preclude the use of remotely generated images. The HEPiX Virtualisation Working Group was setup with the objective to enable use of remotely generated virtual machine images at Grid sites and, to this end, has introduced the idea of trusted virtual machine images which are guaranteed to be secure and configurable by sites such that security policy commitments can be met. This paper describes the requirements and details of these trusted virtual machine images and presents a model for their use to facilitate the integration of Grid- and Cloud-based computing environments for High Energy Physics.
BlueSky Cloud Framework: An E-Learning Framework Embracing Cloud Computing
NASA Astrophysics Data System (ADS)
Dong, Bo; Zheng, Qinghua; Qiao, Mu; Shu, Jian; Yang, Jie
Currently, E-Learning has grown into a widely accepted way of learning. With the huge growth of users, services, education contents and resources, E-Learning systems are facing challenges of optimizing resource allocations, dealing with dynamic concurrency demands, handling rapid storage growth requirements and cost controlling. In this paper, an E-Learning framework based on cloud computing is presented, namely BlueSky cloud framework. Particularly, the architecture and core components of BlueSky cloud framework are introduced. In BlueSky cloud framework, physical machines are virtualized, and allocated on demand for E-Learning systems. Moreover, BlueSky cloud framework combines with traditional middleware functions (such as load balancing and data caching) to serve for E-Learning systems as a general architecture. It delivers reliable, scalable and cost-efficient services to E-Learning systems, and E-Learning organizations can establish systems through these services in a simple way. BlueSky cloud framework solves the challenges faced by E-Learning, and improves the performance, availability and scalability of E-Learning systems.
A new airborne sampler for interstitial particles in ice and liquid clouds
NASA Astrophysics Data System (ADS)
Moharreri, A.; Craig, L.; Rogers, D. C.; Brown, M.; Dhaniyala, S.
2011-12-01
In-situ measurements of cloud droplets and aerosols using aircraft platforms are required for understanding aerosol-cloud processes and aiding development of improved aerosol-cloud models. A variety of clouds with different temperature ranges and cloud particle sizes/phases must be studied for comprehensive knowledge about the role of aerosols in the formation and evolution of cloud systems under different atmospheric conditions. While representative aerosol measurements are regularly made from aircrafts under clear air conditions, aerosol measurements in clouds are often contaminated by the generation of secondary particles from the high speed impaction of ice particles and liquid droplets on the surfaces of the aircraft probes/inlets. A new interstitial particle sampler, called the blunt-body aerosol sampler (BASE) has been designed and used for aerosol sampling during two recent airborne campaigns using NCAR/NSF C-130 aircraft: PLOWS (2009-2010) and ICE-T (2011). Central to the design of the new interstitial inlet is an upstream blunt body housing that acts to shield/deflect large cloud droplets and ice particles from an aft sampling region. The blunt-body design also ensures that small shatter particles created from the impaction of cloud-droplets on the blunt-body are not present in the aft region where the interstitial inlet is located. Computational fluid dynamics (CFD) simulations along with particle transport modeling and wind tunnel studies have been utilized in different stages of design and development of this inlet. The initial flights tests during the PLOWS campaign showed that the inlet had satisfactory performance only in warm clouds and when large precipitation droplets were absent. In the presence of large droplets and ice, the inlet samples were contaminated with significant shatter artifacts. These initial results were reanalyzed in conjunction with a computational droplet shatter model and the numerical results were used to arrive at an improved sampler design. Analysis of the data from the recent ICE-T campaign with the improved sampler design shows that the modified version of BASE can provide shatter-artifact free sampling of aerosol particles in the presence of ice particles and significantly reduced shatter artifacts in warm clouds. Detailed design and modeling aspects of the sampler will be discussed and the sampler performance in warm and cold clouds will be presented and compared with measurements made using other aerosol inlets flown on the NCAR/NSF C-130 aircraft.
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Future of Department of Defense Cloud Computing Amid Cultural Confusion
2013-03-01
enterprise cloud - computing environment and transition to a public cloud service provider. Services have started the development of individual cloud - computing environments...endorsing cloud computing . It addresses related issues in matters of service culture changes and how strategic leaders will dictate the future of cloud ...through data center consolidation and individual Service provided cloud computing .
phpMs: A PHP-Based Mass Spectrometry Utilities Library.
Collins, Andrew; Jones, Andrew R
2018-03-02
The recent establishment of cloud computing, high-throughput networking, and more versatile web standards and browsers has led to a renewed interest in web-based applications. While traditionally big data has been the domain of optimized desktop and server applications, it is now possible to store vast amounts of data and perform the necessary calculations offsite in cloud storage and computing providers, with the results visualized in a high-quality cross-platform interface via a web browser. There are number of emerging platforms for cloud-based mass spectrometry data analysis; however, there is limited pre-existing code accessible to web developers, especially for those that are constrained to a shared hosting environment where Java and C applications are often forbidden from use by the hosting provider. To remedy this, we provide an open-source mass spectrometry library for one of the most commonly used web development languages, PHP. Our new library, phpMs, provides objects for storing and manipulating spectra and identification data as well as utilities for file reading, file writing, calculations, peptide fragmentation, and protein digestion as well as a software interface for controlling search engines. We provide a working demonstration of some of the capabilities at http://pgb.liv.ac.uk/phpMs .
Helix Nebula and CERN: A Symbiotic approach to exploiting commercial clouds
NASA Astrophysics Data System (ADS)
Barreiro Megino, Fernando H.; Jones, Robert; Kucharczyk, Katarzyna; Medrano Llamas, Ramón; van der Ster, Daniel
2014-06-01
The recent paradigm shift toward cloud computing in IT, and general interest in "Big Data" in particular, have demonstrated that the computing requirements of HEP are no longer globally unique. Indeed, the CERN IT department and LHC experiments have already made significant R&D investments in delivering and exploiting cloud computing resources. While a number of technical evaluations of interesting commercial offerings from global IT enterprises have been performed by various physics labs, further technical, security, sociological, and legal issues need to be address before their large-scale adoption by the research community can be envisaged. Helix Nebula - the Science Cloud is an initiative that explores these questions by joining the forces of three European research institutes (CERN, ESA and EMBL) with leading European commercial IT enterprises. The goals of Helix Nebula are to establish a cloud platform federating multiple commercial cloud providers, along with new business models, which can sustain the cloud marketplace for years to come. This contribution will summarize the participation of CERN in Helix Nebula. We will explain CERN's flagship use-case and the model used to integrate several cloud providers with an LHC experiment's workload management system. During the first proof of concept, this project contributed over 40.000 CPU-days of Monte Carlo production throughput to the ATLAS experiment with marginal manpower required. CERN's experience, together with that of ESA and EMBL, is providing a great insight into the cloud computing industry and highlighted several challenges that are being tackled in order to ease the export of the scientific workloads to the cloud environments.
NASA Astrophysics Data System (ADS)
McInerney, M.; Schnase, J. L.; Duffy, D.; Tamkin, G.; Nadeau, D.; Strong, S.; Thompson, J. H.; Sinno, S.; Lazar, D.
2014-12-01
The climate sciences represent a big data domain that is experiencing unprecedented growth. In our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with big data that ultimately product societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. Within this framework, cloud computing plays an important role; however, we see it as only one element in a constellation of capabilities that are essential to delivering climate analytics-as-a-service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the big data challenges in this domain. This poster will highlight specific examples of CAaaS using climate reanalysis data, high-performance cloud computing, map reduce, and the Climate Data Services API.
Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola
2015-01-01
New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.
Prediction Based Proactive Thermal Virtual Machine Scheduling in Green Clouds
Kinger, Supriya; Kumar, Rajesh; Sharma, Anju
2014-01-01
Cloud computing has rapidly emerged as a widely accepted computing paradigm, but the research on Cloud computing is still at an early stage. Cloud computing provides many advanced features but it still has some shortcomings such as relatively high operating cost and environmental hazards like increasing carbon footprints. These hazards can be reduced up to some extent by efficient scheduling of Cloud resources. Working temperature on which a machine is currently running can be taken as a criterion for Virtual Machine (VM) scheduling. This paper proposes a new proactive technique that considers current and maximum threshold temperature of Server Machines (SMs) before making scheduling decisions with the help of a temperature predictor, so that maximum temperature is never reached. Different workload scenarios have been taken into consideration. The results obtained show that the proposed system is better than existing systems of VM scheduling, which does not consider current temperature of nodes before making scheduling decisions. Thus, a reduction in need of cooling systems for a Cloud environment has been obtained and validated. PMID:24737962
A Framework and Improvements of the Korea Cloud Services Certification System.
Jeon, Hangoo; Seo, Kwang-Kyu
2015-01-01
Cloud computing service is an evolving paradigm that affects a large part of the ICT industry and provides new opportunities for ICT service providers such as the deployment of new business models and the realization of economies of scale by increasing efficiency of resource utilization. However, despite benefits of cloud services, there are some obstacles to adopt such as lack of assessing and comparing the service quality of cloud services regarding availability, security, and reliability. In order to adopt the successful cloud service and activate it, it is necessary to establish the cloud service certification system to ensure service quality and performance of cloud services. This paper proposes a framework and improvements of the Korea certification system of cloud service. In order to develop it, the critical issues related to service quality, performance, and certification of cloud service are identified and the systematic framework for the certification system of cloud services and service provider domains are developed. Improvements of the developed Korea certification system of cloud services are also proposed.
A Framework and Improvements of the Korea Cloud Services Certification System
Jeon, Hangoo
2015-01-01
Cloud computing service is an evolving paradigm that affects a large part of the ICT industry and provides new opportunities for ICT service providers such as the deployment of new business models and the realization of economies of scale by increasing efficiency of resource utilization. However, despite benefits of cloud services, there are some obstacles to adopt such as lack of assessing and comparing the service quality of cloud services regarding availability, security, and reliability. In order to adopt the successful cloud service and activate it, it is necessary to establish the cloud service certification system to ensure service quality and performance of cloud services. This paper proposes a framework and improvements of the Korea certification system of cloud service. In order to develop it, the critical issues related to service quality, performance, and certification of cloud service are identified and the systematic framework for the certification system of cloud services and service provider domains are developed. Improvements of the developed Korea certification system of cloud services are also proposed. PMID:26125049
International Symposium on Grids and Clouds (ISGC) 2016
NASA Astrophysics Data System (ADS)
The International Symposium on Grids and Clouds (ISGC) 2016 will be held at Academia Sinica in Taipei, Taiwan from 13-18 March 2016, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). The theme of ISGC 2016 focuses on“Ubiquitous e-infrastructures and Applications”. Contemporary research is impossible without a strong IT component - researchers rely on the existence of stable and widely available e-infrastructures and their higher level functions and properties. As a result of these expectations, e-Infrastructures are becoming ubiquitous, providing an environment that supports large scale collaborations that deal with global challenges as well as smaller and temporal research communities focusing on particular scientific problems. To support those diversified communities and their needs, the e-Infrastructures themselves are becoming more layered and multifaceted, supporting larger groups of applications. Following the call for the last year conference, ISGC 2016 continues its aim to bring together users and application developers with those responsible for the development and operation of multi-purpose ubiquitous e-Infrastructures. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities, Arts, and Social Sciences (HASS) Applications, Virtual Research Environment (including Middleware, tools, services, workflow, etc.), Data Management, Big Data, Networking & Security, Infrastructure & Operations, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC), etc.
Atmospheric cloud physics thermal systems analysis
NASA Technical Reports Server (NTRS)
1977-01-01
Engineering analyses performed on the Atmospheric Cloud Physics (ACPL) Science Simulator expansion chamber and associated thermal control/conditioning system are reported. Analyses were made to develop a verified thermal model and to perform parametric thermal investigations to evaluate systems performance characteristics. Thermal network representations of solid components and the complete fluid conditioning system were solved simultaneously using the Systems Improved Numerical Differencing Analyzer (SINDA) computer program.
Scalable and responsive event processing in the cloud
Suresh, Visalakshmi; Ezhilchelvan, Paul; Watson, Paul
2013-01-01
Event processing involves continuous evaluation of queries over streams of events. Response-time optimization is traditionally done over a fixed set of nodes and/or by using metrics measured at query-operator levels. Cloud computing makes it easy to acquire and release computing nodes as required. Leveraging this flexibility, we propose a novel, queueing-theory-based approach for meeting specified response-time targets against fluctuating event arrival rates by drawing only the necessary amount of computing resources from a cloud platform. In the proposed approach, the entire processing engine of a distinct query is modelled as an atomic unit for predicting response times. Several such units hosted on a single node are modelled as a multiple class M/G/1 system. These aspects eliminate intrusive, low-level performance measurements at run-time, and also offer portability and scalability. Using model-based predictions, cloud resources are efficiently used to meet response-time targets. The efficacy of the approach is demonstrated through cloud-based experiments. PMID:23230164
NASA Astrophysics Data System (ADS)
Evans, J. D.; Hao, W.; Chettri, S.
2013-12-01
The cloud is proving to be a uniquely promising platform for scientific computing. Our experience with processing satellite data using Amazon Web Services highlights several opportunities for enhanced performance, flexibility, and cost effectiveness in the cloud relative to traditional computing -- for example: - Direct readout from a polar-orbiting satellite such as the Suomi National Polar-Orbiting Partnership (S-NPP) requires bursts of processing a few times a day, separated by quiet periods when the satellite is out of receiving range. In the cloud, by starting and stopping virtual machines in minutes, we can marshal significant computing resources quickly when needed, but not pay for them when not needed. To take advantage of this capability, we are automating a data-driven approach to the management of cloud computing resources, in which new data availability triggers the creation of new virtual machines (of variable size and processing power) which last only until the processing workflow is complete. - 'Spot instances' are virtual machines that run as long as one's asking price is higher than the provider's variable spot price. Spot instances can greatly reduce the cost of computing -- for software systems that are engineered to withstand unpredictable interruptions in service (as occurs when a spot price exceeds the asking price). We are implementing an approach to workflow management that allows data processing workflows to resume with minimal delays after temporary spot price spikes. This will allow systems to take full advantage of variably-priced 'utility computing.' - Thanks to virtual machine images, we can easily launch multiple, identical machines differentiated only by 'user data' containing individualized instructions (e.g., to fetch particular datasets or to perform certain workflows or algorithms) This is particularly useful when (as is the case with S-NPP data) we need to launch many very similar machines to process an unpredictable number of data files concurrently. Our experience shows the viability and flexibility of this approach to workflow management for scientific data processing. - Finally, cloud computing is a promising platform for distributed volunteer ('interstitial') computing, via mechanisms such as the Berkeley Open Infrastructure for Network Computing (BOINC) popularized with the SETI@Home project and others such as ClimatePrediction.net and NASA's Climate@Home. Interstitial computing faces significant challenges as commodity computing shifts from (always on) desktop computers towards smartphones and tablets (untethered and running on scarce battery power); but cloud computing offers significant slack capacity. This capacity includes virtual machines with unused RAM or underused CPUs; virtual storage volumes allocated (& paid for) but not full; and virtual machines that are paid up for the current hour but whose work is complete. We are devising ways to facilitate the reuse of these resources (i.e., cloud-based interstitial computing) for satellite data processing and related analyses. We will present our findings and research directions on these and related topics.
Compression of 3D Point Clouds Using a Region-Adaptive Hierarchical Transform.
De Queiroz, Ricardo; Chou, Philip A
2016-06-01
In free-viewpoint video, there is a recent trend to represent scene objects as solids rather than using multiple depth maps. Point clouds have been used in computer graphics for a long time and with the recent possibility of real time capturing and rendering, point clouds have been favored over meshes in order to save computation. Each point in the cloud is associated with its 3D position and its color. We devise a method to compress the colors in point clouds which is based on a hierarchical transform and arithmetic coding. The transform is a hierarchical sub-band transform that resembles an adaptive variation of a Haar wavelet. The arithmetic encoding of the coefficients assumes Laplace distributions, one per sub-band. The Laplace parameter for each distribution is transmitted to the decoder using a custom method. The geometry of the point cloud is encoded using the well-established octtree scanning. Results show that the proposed solution performs comparably to the current state-of-the-art, in many occasions outperforming it, while being much more computationally efficient. We believe this work represents the state-of-the-art in intra-frame compression of point clouds for real-time 3D video.
High-Productivity Computing in Computational Physics Education
NASA Astrophysics Data System (ADS)
Tel-Zur, Guy
2011-03-01
We describe the development of a new course in Computational Physics at the Ben-Gurion University. This elective course for 3rd year undergraduates and MSc. students is being taught during one semester. Computational Physics is by now well accepted as the Third Pillar of Science. This paper's claim is that modern Computational Physics education should deal also with High-Productivity Computing. The traditional approach of teaching Computational Physics emphasizes ``Correctness'' and then ``Accuracy'' and we add also ``Performance.'' Along with topics in Mathematical Methods and case studies in Physics the course deals a significant amount of time with ``Mini-Courses'' in topics such as: High-Throughput Computing - Condor, Parallel Programming - MPI and OpenMP, How to build a Beowulf, Visualization and Grid and Cloud Computing. The course does not intend to teach neither new physics nor new mathematics but it is focused on an integrated approach for solving problems starting from the physics problem, the corresponding mathematical solution, the numerical scheme, writing an efficient computer code and finally analysis and visualization.
A cloud-based system for measuring radiation treatment plan similarity
NASA Astrophysics Data System (ADS)
Andrea, Jennifer
PURPOSE: Radiation therapy is used to treat cancer using carefully designed plans that maximize the radiation dose delivered to the target and minimize damage to healthy tissue, with the dose administered over multiple occasions. Creating treatment plans is a laborious process and presents an obstacle to more frequent replanning, which remains an unsolved problem. However, in between new plans being created, the patient's anatomy can change due to multiple factors including reduction in tumor size and loss of weight, which results in poorer patient outcomes. Cloud computing is a newer technology that is slowly being used for medical applications with promising results. The objective of this work was to design and build a system that could analyze a database of previously created treatment plans, which are stored with their associated anatomical information in studies, to find the one with the most similar anatomy to a new patient. The analyses would be performed in parallel on the cloud to decrease the computation time of finding this plan. METHODS: The system used SlicerRT, a radiation therapy toolkit for the open-source platform 3D Slicer, for its tools to perform the similarity analysis algorithm. Amazon Web Services was used for the cloud instances on which the analyses were performed, as well as for storage of the radiation therapy studies and messaging between the instances and a master local computer. A module was built in SlicerRT to provide the user with an interface to direct the system on the cloud, as well as to perform other related tasks. RESULTS: The cloud-based system out-performed previous methods of conducting the similarity analyses in terms of time, as it analyzed 100 studies in approximately 13 minutes, and produced the same similarity values as those methods. It also scaled up to larger numbers of studies to analyze in the database with a small increase in computation time of just over 2 minutes. CONCLUSION: This system successfully analyzes a large database of radiation therapy studies and finds the one that is most similar to a new patient, which represents a potential step forward in achieving feasible adaptive radiation therapy replanning.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-08-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-01-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089
DOE Office of Scientific and Technical Information (OSTI.GOV)
Na, Y; Kapp, D; Kim, Y
2014-06-01
Purpose: To report the first experience on the development of a cloud-based treatment planning system and investigate the performance improvement of dose calculation and treatment plan optimization of the cloud computing platform. Methods: A cloud computing-based radiation treatment planning system (cc-TPS) was developed for clinical treatment planning. Three de-identified clinical head and neck, lung, and prostate cases were used to evaluate the cloud computing platform. The de-identified clinical data were encrypted with 256-bit Advanced Encryption Standard (AES) algorithm. VMAT and IMRT plans were generated for the three de-identified clinical cases to determine the quality of the treatment plans and computationalmore » efficiency. All plans generated from the cc-TPS were compared to those obtained with the PC-based TPS (pc-TPS). The performance evaluation of the cc-TPS was quantified as the speedup factors for Monte Carlo (MC) dose calculations and large-scale plan optimizations, as well as the performance ratios (PRs) of the amount of performance improvement compared to the pc-TPS. Results: Speedup factors were improved up to 14.0-fold dependent on the clinical cases and plan types. The computation times for VMAT and IMRT plans with the cc-TPS were reduced by 91.1% and 89.4%, respectively, on average of the clinical cases compared to those with pc-TPS. The PRs were mostly better for VMAT plans (1.0 ≤ PRs ≤ 10.6 for the head and neck case, 1.2 ≤ PRs ≤ 13.3 for lung case, and 1.0 ≤ PRs ≤ 10.3 for prostate cancer cases) than for IMRT plans. The isodose curves of plans on both cc-TPS and pc-TPS were identical for each of the clinical cases. Conclusion: A cloud-based treatment planning has been setup and our results demonstrate the computation efficiency of treatment planning with the cc-TPS can be dramatically improved while maintaining the same plan quality to that obtained with the pc-TPS. This work was supported in part by the National Cancer Institute (1R01 CA133474) and by Leading Foreign Research Institute Recruitment Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (MSIP) (Grant No.2009-00420)« less
Interoperating Cloud-based Virtual Farms
NASA Astrophysics Data System (ADS)
Bagnasco, S.; Colamaria, F.; Colella, D.; Casula, E.; Elia, D.; Franco, A.; Lusso, S.; Luparello, G.; Masera, M.; Miniello, G.; Mura, D.; Piano, S.; Vallero, S.; Venaruzzo, M.; Vino, G.
2015-12-01
The present work aims at optimizing the use of computing resources available at the grid Italian Tier-2 sites of the ALICE experiment at CERN LHC by making them accessible to interactive distributed analysis, thanks to modern solutions based on cloud computing. The scalability and elasticity of the computing resources via dynamic (“on-demand”) provisioning is essentially limited by the size of the computing site, reaching the theoretical optimum only in the asymptotic case of infinite resources. The main challenge of the project is to overcome this limitation by federating different sites through a distributed cloud facility. Storage capacities of the participating sites are seen as a single federated storage area, preventing the need of mirroring data across them: high data access efficiency is guaranteed by location-aware analysis software and storage interfaces, in a transparent way from an end-user perspective. Moreover, the interactive analysis on the federated cloud reduces the execution time with respect to grid batch jobs. The tests of the investigated solutions for both cloud computing and distributed storage on wide area network will be presented.
Dinh, Thanh; Kim, Younghan; Lee, Hyukjoon
2017-03-01
This paper presents a location-based interactive model of Internet of Things (IoT) and cloud integration (IoT-cloud) for mobile cloud computing applications, in comparison with the periodic sensing model. In the latter, sensing collections are performed without awareness of sensing demands. Sensors are required to report their sensing data periodically regardless of whether or not there are demands for their sensing services. This leads to unnecessary energy loss due to redundant transmission. In the proposed model, IoT-cloud provides sensing services on demand based on interest and location of mobile users. By taking advantages of the cloud as a coordinator, sensing scheduling of sensors is controlled by the cloud, which knows when and where mobile users request for sensing services. Therefore, when there is no demand, sensors are put into an inactive mode to save energy. Through extensive analysis and experimental results, we show that the location-based model achieves a significant improvement in terms of network lifetime compared to the periodic model.
Dinh, Thanh; Kim, Younghan; Lee, Hyukjoon
2017-01-01
This paper presents a location-based interactive model of Internet of Things (IoT) and cloud integration (IoT-cloud) for mobile cloud computing applications, in comparison with the periodic sensing model. In the latter, sensing collections are performed without awareness of sensing demands. Sensors are required to report their sensing data periodically regardless of whether or not there are demands for their sensing services. This leads to unnecessary energy loss due to redundant transmission. In the proposed model, IoT-cloud provides sensing services on demand based on interest and location of mobile users. By taking advantages of the cloud as a coordinator, sensing scheduling of sensors is controlled by the cloud, which knows when and where mobile users request for sensing services. Therefore, when there is no demand, sensors are put into an inactive mode to save energy. Through extensive analysis and experimental results, we show that the location-based model achieves a significant improvement in terms of network lifetime compared to the periodic model. PMID:28257067
NASA Astrophysics Data System (ADS)
Ford, Eric B.; Dindar, Saleh; Peters, Jorg
2015-08-01
The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
Teaching Cybersecurity Using the Cloud
ERIC Educational Resources Information Center
Salah, Khaled; Hammoud, Mohammad; Zeadally, Sherali
2015-01-01
Cloud computing platforms can be highly attractive to conduct course assignments and empower students with valuable and indispensable hands-on experience. In particular, the cloud can offer teaching staff and students (whether local or remote) on-demand, elastic, dedicated, isolated, (virtually) unlimited, and easily configurable virtual machines.…
Stochastic Convection Parameterizations
NASA Technical Reports Server (NTRS)
Teixeira, Joao; Reynolds, Carolyn; Suselj, Kay; Matheou, Georgios
2012-01-01
computational fluid dynamics, radiation, clouds, turbulence, convection, gravity waves, surface interaction, radiation interaction, cloud and aerosol microphysics, complexity (vegetation, biogeochemistry, radiation versus turbulence/convection stochastic approach, non-linearities, Monte Carlo, high resolutions, large-Eddy Simulations, cloud structure, plumes, saturation in tropics, forecasting, parameterizations, stochastic, radiation-clod interaction, hurricane forecasts
Diaz, Javier; Arrizabalaga, Saioa; Bustamante, Paul; Mesa, Iker; Añorga, Javier; Goya, Jon
2013-01-01
Portable systems and global communications open a broad spectrum for new health applications. In the framework of electrophysiological applications, several challenges are faced when developing portable systems embedded in Cloud computing services. In order to facilitate new developers in this area based on our experience, five areas of interest are presented in this paper where strategies can be applied for improving the performance of portable systems: transducer and conditioning, processing, wireless communications, battery and power management. Likewise, for Cloud services, scalability, portability, privacy and security guidelines have been highlighted.
Cloud Computing for radiologists.
Kharat, Amit T; Safvi, Amjad; Thind, Ss; Singh, Amarjit
2012-07-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future.
Cloud Computing for radiologists
Kharat, Amit T; Safvi, Amjad; Thind, SS; Singh, Amarjit
2012-01-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future. PMID:23599560
OpenTopography: Addressing Big Data Challenges Using Cloud Computing, HPC, and Data Analytics
NASA Astrophysics Data System (ADS)
Crosby, C. J.; Nandigam, V.; Phan, M.; Youn, C.; Baru, C.; Arrowsmith, R.
2014-12-01
OpenTopography (OT) is a geoinformatics-based data facility initiated in 2009 for democratizing access to high-resolution topographic data, derived products, and tools. Hosted at the San Diego Supercomputer Center (SDSC), OT utilizes cyberinfrastructure, including large-scale data management, high-performance computing, and service-oriented architectures to provide efficient Web based access to large, high-resolution topographic datasets. OT collocates data with processing tools to enable users to quickly access custom data and derived products for their application. OT's ongoing R&D efforts aim to solve emerging technical challenges associated with exponential growth in data, higher order data products, as well as user base. Optimization of data management strategies can be informed by a comprehensive set of OT user access metrics that allows us to better understand usage patterns with respect to the data. By analyzing the spatiotemporal access patterns within the datasets, we can map areas of the data archive that are highly active (hot) versus the ones that are rarely accessed (cold). This enables us to architect a tiered storage environment consisting of high performance disk storage (SSD) for the hot areas and less expensive slower disk for the cold ones, thereby optimizing price to performance. From a compute perspective, OT is looking at cloud based solutions such as the Microsoft Azure platform to handle sudden increases in load. An OT virtual machine image in Microsoft's VM Depot can be invoked and deployed quickly in response to increased system demand. OT has also integrated SDSC HPC systems like the Gordon supercomputer into our infrastructure tier to enable compute intensive workloads like parallel computation of hydrologic routing on high resolution topography. This capability also allows OT to scale to HPC resources during high loads to meet user demand and provide more efficient processing. With a growing user base and maturing scientific user community comes new requests for algorithms and processing capabilities. To address this demand, OT is developing an extensible service based architecture for integrating community-developed software. This "plugable" approach to Web service deployment will enable new processing and analysis tools to run collocated with OT hosted data.
Dalpé, Gratien; Joly, Yann
2014-09-01
Healthcare-related bioinformatics databases are increasingly offering the possibility to maintain, organize, and distribute DNA sequencing data. Different national and international institutions are currently hosting such databases that offer researchers website platforms where they can obtain sequencing data on which they can perform different types of analysis. Until recently, this process remained mostly one-dimensional, with most analysis concentrated on a limited amount of data. However, newer genome sequencing technology is producing a huge amount of data that current computer facilities are unable to handle. An alternative approach has been to start adopting cloud computing services for combining the information embedded in genomic and model system biology data, patient healthcare records, and clinical trials' data. In this new technological paradigm, researchers use virtual space and computing power from existing commercial or not-for-profit cloud service providers to access, store, and analyze data via different application programming interfaces. Cloud services are an alternative to the need of larger data storage; however, they raise different ethical, legal, and social issues. The purpose of this Commentary is to summarize how cloud computing can contribute to bioinformatics-based drug discovery and to highlight some of the outstanding legal, ethical, and social issues that are inherent in the use of cloud services. © 2014 Wiley Periodicals, Inc.
An Interactive Web-Based Analysis Framework for Remote Sensing Cloud Computing
NASA Astrophysics Data System (ADS)
Wang, X. Z.; Zhang, H. M.; Zhao, J. H.; Lin, Q. H.; Zhou, Y. C.; Li, J. H.
2015-07-01
Spatiotemporal data, especially remote sensing data, are widely used in ecological, geographical, agriculture, and military research and applications. With the development of remote sensing technology, more and more remote sensing data are accumulated and stored in the cloud. An effective way for cloud users to access and analyse these massive spatiotemporal data in the web clients becomes an urgent issue. In this paper, we proposed a new scalable, interactive and web-based cloud computing solution for massive remote sensing data analysis. We build a spatiotemporal analysis platform to provide the end-user with a safe and convenient way to access massive remote sensing data stored in the cloud. The lightweight cloud storage system used to store public data and users' private data is constructed based on open source distributed file system. In it, massive remote sensing data are stored as public data, while the intermediate and input data are stored as private data. The elastic, scalable, and flexible cloud computing environment is built using Docker, which is a technology of open-source lightweight cloud computing container in the Linux operating system. In the Docker container, open-source software such as IPython, NumPy, GDAL, and Grass GIS etc., are deployed. Users can write scripts in the IPython Notebook web page through the web browser to process data, and the scripts will be submitted to IPython kernel to be executed. By comparing the performance of remote sensing data analysis tasks executed in Docker container, KVM virtual machines and physical machines respectively, we can conclude that the cloud computing environment built by Docker makes the greatest use of the host system resources, and can handle more concurrent spatial-temporal computing tasks. Docker technology provides resource isolation mechanism in aspects of IO, CPU, and memory etc., which offers security guarantee when processing remote sensing data in the IPython Notebook. Users can write complex data processing code on the web directly, so they can design their own data processing algorithm.
Cloud Computing Applications in Support of Earth Science Activities at Marshall Space Flight Center
NASA Astrophysics Data System (ADS)
Molthan, A.; Limaye, A. S.
2011-12-01
Currently, the NASA Nebula Cloud Computing Platform is available to Agency personnel in a pre-release status as the system undergoes a formal operational readiness review. Over the past year, two projects within the Earth Science Office at NASA Marshall Space Flight Center have been investigating the performance and value of Nebula's "Infrastructure as a Service", or "IaaS" concept and applying cloud computing concepts to advance their respective mission goals. The Short-term Prediction Research and Transition (SPoRT) Center focuses on the transition of unique NASA satellite observations and weather forecasting capabilities for use within the operational forecasting community through partnerships with NOAA's National Weather Service (NWS). SPoRT has evaluated the performance of the Weather Research and Forecasting (WRF) model on virtual machines deployed within Nebula and used Nebula instances to simulate local forecasts in support of regional forecast studies of interest to select NWS forecast offices. In addition to weather forecasting applications, rapidly deployable Nebula virtual machines have supported the processing of high resolution NASA satellite imagery to support disaster assessment following the historic severe weather and tornado outbreak of April 27, 2011. Other modeling and satellite analysis activities are underway in support of NASA's SERVIR program, which integrates satellite observations, ground-based data and forecast models to monitor environmental change and improve disaster response in Central America, the Caribbean, Africa, and the Himalayas. Leveraging SPoRT's experience, SERVIR is working to establish a real-time weather forecasting model for Central America. Other modeling efforts include hydrologic forecasts for Kenya, driven by NASA satellite observations and reanalysis data sets provided by the broader meteorological community. Forecast modeling efforts are supplemented by short-term forecasts of convective initiation, determined by geostationary satellite observations processed on virtual machines powered by Nebula. This presentation will provide an overview of these activities from a scientific and cloud computing applications perspective, identifying the strengths and weaknesses for deploying each project within an IaaS environment, and ways to collaborate with the Nebula or other cloud-user communities to collaborate on projects as they go forward.
Cloud@Home: A New Enhanced Computing Paradigm
NASA Astrophysics Data System (ADS)
Distefano, Salvatore; Cunsolo, Vincenzo D.; Puliafito, Antonio; Scarpa, Marco
Cloud computing is a distributed computing paradigm that mixes aspects of Grid computing, ("… hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities" (Foster, 2002)) Internet Computing ("…a computing platform geographically distributed across the Internet" (Milenkovic et al., 2003)), Utility computing ("a collection of technologies and business practices that enables computing to be delivered seamlessly and reliably across multiple computers, ... available as needed and billed according to usage, much like water and electricity are today" (Ross & Westerman, 2004)) Autonomic computing ("computing systems that can manage themselves given high-level objectives from administrators" (Kephart & Chess, 2003)), Edge computing ("… provides a generic template facility for any type of application to spread its execution across a dedicated grid, balancing the load …" Davis, Parikh, & Weihl, 2004) and Green computing (a new frontier of Ethical computing1 starting from the assumption that in next future energy costs will be related to the environment pollution).
Privacy-Preserving Location-Based Service Scheme for Mobile Sensing Data.
Xie, Qingqing; Wang, Liangmin
2016-11-25
With the wide use of mobile sensing application, more and more location-embedded data are collected and stored in mobile clouds, such as iCloud, Samsung cloud, etc. Using these data, the cloud service provider (CSP) can provide location-based service (LBS) for users. However, the mobile cloud is untrustworthy. The privacy concerns force the sensitive locations to be stored on the mobile cloud in an encrypted form. However, this brings a great challenge to utilize these data to provide efficient LBS. To solve this problem, we propose a privacy-preserving LBS scheme for mobile sensing data, based on the RSA (for Rivest, Shamir and Adleman) algorithm and ciphertext policy attribute-based encryption (CP-ABE) scheme. The mobile cloud can perform location distance computing and comparison efficiently for authorized users, without location privacy leakage. In the end, theoretical security analysis and experimental evaluation demonstrate that our scheme is secure against the chosen plaintext attack (CPA) and efficient enough for practical applications in terms of user side computation overhead.
Privacy-Preserving Location-Based Service Scheme for Mobile Sensing Data †
Xie, Qingqing; Wang, Liangmin
2016-01-01
With the wide use of mobile sensing application, more and more location-embedded data are collected and stored in mobile clouds, such as iCloud, Samsung cloud, etc. Using these data, the cloud service provider (CSP) can provide location-based service (LBS) for users. However, the mobile cloud is untrustworthy. The privacy concerns force the sensitive locations to be stored on the mobile cloud in an encrypted form. However, this brings a great challenge to utilize these data to provide efficient LBS. To solve this problem, we propose a privacy-preserving LBS scheme for mobile sensing data, based on the RSA (for Rivest, Shamir and Adleman) algorithm and ciphertext policy attribute-based encryption (CP-ABE) scheme. The mobile cloud can perform location distance computing and comparison efficiently for authorized users, without location privacy leakage. In the end, theoretical security analysis and experimental evaluation demonstrate that our scheme is secure against the chosen plaintext attack (CPA) and efficient enough for practical applications in terms of user side computation overhead. PMID:27897984
Gaussian Radial Basis Function for Efficient Computation of Forest Indirect Illumination
NASA Astrophysics Data System (ADS)
Abbas, Fayçal; Babahenini, Mohamed Chaouki
2018-06-01
Global illumination of natural scenes in real time like forests is one of the most complex problems to solve, because the multiple inter-reflections between the light and material of the objects composing the scene. The major problem that arises is the problem of visibility computation. In fact, the computing of visibility is carried out for all the set of leaves visible from the center of a given leaf, given the enormous number of leaves present in a tree, this computation performed for each leaf of the tree which also reduces performance. We describe a new approach that approximates visibility queries, which precede in two steps. The first step is to generate point cloud representing the foliage. We assume that the point cloud is composed of two classes (visible, not-visible) non-linearly separable. The second step is to perform a point cloud classification by applying the Gaussian radial basis function, which measures the similarity in term of distance between each leaf and a landmark leaf. It allows approximating the visibility requests to extract the leaves that will be used to calculate the amount of indirect illumination exchanged between neighbor leaves. Our approach allows efficiently treat the light exchanges in the scene of a forest, it allows a fast computation and produces images of good visual quality, all this takes advantage of the immense power of computation of the GPU.
Merging of multi-temporal SST data at South China Sea
NASA Astrophysics Data System (ADS)
Ng, H. G.; MatJafri, M. Z.; Abdullah, K.; Lim, H. S.
2008-10-01
The sea surface temperature (SST) mapping could be performed with a wide spatial and temporal extent in a reasonable time limit. The space-borne sensor of AVHRR was widely used for the purpose. However, the current SST retrieval techniques for infrared channels were limited only for the cloud-free area, because the electromagnetic waves in the infrared wavelengths could not penetrate the cloud. Therefore, the SST availability was low for the single image. To overcome this problem, we studied to produce the composite of three day's SST map. The diurnal changes of SST data are quite stable through a short period of time if no abrupt natural disaster occurrence. Therefore, the SST data of three consecutive days with nearly coincident daily time were merged in order to create a three day's composite SST data. The composite image could increase the SST availability. In this study, we acquired the level 1b AVHRR (Advanced Very High Resolution Radiometer) images from Malaysia Center of Remote Sensing (MACRES). The images were first preprocessed and the cloud and land areas were masked. We made some modifications on the technique of obtaining the threshold value for cloud masking. The SST was estimated by using the day split MCSST algorithm. The cloud free water pixels availability were computed and compared. The mean of SST for three day's composite data were calculated and a SST map was generated. The cloud free water pixels availability were computed and compared. The SST data availability was increased by merging the SST data.
Collaborative Working Architecture for IoT-Based Applications.
Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus
2018-05-23
The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation
NASA Astrophysics Data System (ADS)
Anisenkov, A. V.
2018-03-01
In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
BigData and computing challenges in high energy and nuclear physics
NASA Astrophysics Data System (ADS)
Klimentov, A.; Grigorieva, M.; Kiryanov, A.; Zarochentsev, A.
2017-06-01
In this contribution we discuss the various aspects of the computing resource needs experiments in High Energy and Nuclear Physics, in particular at the Large Hadron Collider. This will evolve in the future when moving from LHC to HL-LHC in ten years from now, when the already exascale levels of data we are processing could increase by a further order of magnitude. The distributed computing environment has been a great success and the inclusion of new super-computing facilities, cloud computing and volunteering computing for the future is a big challenge, which we are successfully mastering with a considerable contribution from many super-computing centres around the world, academic and commercial cloud providers. We also discuss R&D computing projects started recently in National Research Center ``Kurchatov Institute''
Towards Cloud-based Asynchronous Elasticity for Iterative HPC Applications
NASA Astrophysics Data System (ADS)
da Rosa Righi, Rodrigo; Facco Rodrigues, Vinicius; André da Costa, Cristiano; Kreutz, Diego; Heiss, Hans-Ulrich
2015-10-01
Elasticity is one of the key features of cloud computing. It allows applications to dynamically scale computing and storage resources, avoiding over- and under-provisioning. In high performance computing (HPC), initiatives are normally modeled to handle bag-of-tasks or key-value applications through a load balancer and a loosely-coupled set of virtual machine (VM) instances. In the joint-field of Message Passing Interface (MPI) and tightly-coupled HPC applications, we observe the need of rewriting source codes, previous knowledge of the application and/or stop-reconfigure-and-go approaches to address cloud elasticity. Besides, there are problems related to how profit this new feature in the HPC scope, since in MPI 2.0 applications the programmers need to handle communicators by themselves, and a sudden consolidation of a VM, together with a process, can compromise the entire execution. To address these issues, we propose a PaaS-based elasticity model, named AutoElastic. It acts as a middleware that allows iterative HPC applications to take advantage of dynamic resource provisioning of cloud infrastructures without any major modification. AutoElastic provides a new concept denoted here as asynchronous elasticity, i.e., it provides a framework to allow applications to either increase or decrease their computing resources without blocking the current execution. The feasibility of AutoElastic is demonstrated through a prototype that runs a CPU-bound numerical integration application on top of the OpenNebula middleware. The results showed the saving of about 3 min at each scaling out operations, emphasizing the contribution of the new concept on contexts where seconds are precious.
Derivation of Tropospheric Ozone Climatology and Trends from TOMS Data
NASA Technical Reports Server (NTRS)
Newchurch, Michael J.; McPeters, Rich; Logan, Jennifer; Kim, Jae-Hwan
2002-01-01
This research addresses the following three objectives: (1) Derive tropospheric ozone columns from the TOMS instruments by computing the difference between total-ozone columns over cloudy areas and over clear areas in the tropics; (2) Compute secular trends in Nimbus-7 derived tropospheric Ozone column amounts and associated potential trends in the decadal-scale tropical cloud climatology; (3) Explain the occurrence of anomalously high ozone retrievals over high ice clouds.
Cloud identification using genetic algorithms and massively parallel computation
NASA Technical Reports Server (NTRS)
Buckles, Bill P.; Petry, Frederick E.
1996-01-01
As a Guest Computational Investigator under the NASA administered component of the High Performance Computing and Communication Program, we implemented a massively parallel genetic algorithm on the MasPar SIMD computer. Experiments were conducted using Earth Science data in the domains of meteorology and oceanography. Results obtained in these domains are competitive with, and in most cases better than, similar problems solved using other methods. In the meteorological domain, we chose to identify clouds using AVHRR spectral data. Four cloud speciations were used although most researchers settle for three. Results were remarkedly consistent across all tests (91% accuracy). Refinements of this method may lead to more timely and complete information for Global Circulation Models (GCMS) that are prevalent in weather forecasting and global environment studies. In the oceanographic domain, we chose to identify ocean currents from a spectrometer having similar characteristics to AVHRR. Here the results were mixed (60% to 80% accuracy). Given that one is willing to run the experiment several times (say 10), then it is acceptable to claim the higher accuracy rating. This problem has never been successfully automated. Therefore, these results are encouraging even though less impressive than the cloud experiment. Successful conclusion of an automated ocean current detection system would impact coastal fishing, naval tactics, and the study of micro-climates. Finally we contributed to the basic knowledge of GA (genetic algorithm) behavior in parallel environments. We developed better knowledge of the use of subpopulations in the context of shared breeding pools and the migration of individuals. Rigorous experiments were conducted based on quantifiable performance criteria. While much of the work confirmed current wisdom, for the first time we were able to submit conclusive evidence. The software developed under this grant was placed in the public domain. An extensive user's manual was written and distributed nationwide to scientists whose work might benefit from its availability. Several papers, including two journal articles, were produced.
Real-time video streaming in mobile cloud over heterogeneous wireless networks
NASA Astrophysics Data System (ADS)
Abdallah-Saleh, Saleh; Wang, Qi; Grecos, Christos
2012-06-01
Recently, the concept of Mobile Cloud Computing (MCC) has been proposed to offload the resource requirements in computational capabilities, storage and security from mobile devices into the cloud. Internet video applications such as real-time streaming are expected to be ubiquitously deployed and supported over the cloud for mobile users, who typically encounter a range of wireless networks of diverse radio access technologies during their roaming. However, real-time video streaming for mobile cloud users across heterogeneous wireless networks presents multiple challenges. The network-layer quality of service (QoS) provision to support high-quality mobile video delivery in this demanding scenario remains an open research question, and this in turn affects the application-level visual quality and impedes mobile users' perceived quality of experience (QoE). In this paper, we devise a framework to support real-time video streaming in this new mobile video networking paradigm and evaluate the performance of the proposed framework empirically through a lab-based yet realistic testing platform. One particular issue we focus on is the effect of users' mobility on the QoS of video streaming over the cloud. We design and implement a hybrid platform comprising of a test-bed and an emulator, on which our concept of mobile cloud computing, video streaming and heterogeneous wireless networks are implemented and integrated to allow the testing of our framework. As representative heterogeneous wireless networks, the popular WLAN (Wi-Fi) and MAN (WiMAX) networks are incorporated in order to evaluate effects of handovers between these different radio access technologies. The H.264/AVC (Advanced Video Coding) standard is employed for real-time video streaming from a server to mobile users (client nodes) in the networks. Mobility support is introduced to enable continuous streaming experience for a mobile user across the heterogeneous wireless network. Real-time video stream packets are captured for analytical purposes on the mobile user node. Experimental results are obtained and analysed. Future work is identified towards further improvement of the current design and implementation. With this new mobile video networking concept and paradigm implemented and evaluated, results and observations obtained from this study would form the basis of a more in-depth, comprehensive understanding of various challenges and opportunities in supporting high-quality real-time video streaming in mobile cloud over heterogeneous wireless networks.
Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.
Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507
Bioinformatics and Microarray Data Analysis on the Cloud.
Calabrese, Barbara; Cannataro, Mario
2016-01-01
High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.
A Secure Cloud-Assisted Wireless Body Area Network in Mobile Emergency Medical Care System.
Li, Chun-Ta; Lee, Cheng-Chi; Weng, Chi-Yao
2016-05-01
Recent advances in medical treatment and emergency applications, the need of integrating wireless body area network (WBAN) with cloud computing can be motivated by providing useful and real time information about patients' health state to the doctors and emergency staffs. WBAN is a set of body sensors carried by the patient to collect and transmit numerous health items to medical clouds via wireless and public communication channels. Therefore, a cloud-assisted WBAN facilitates response in case of emergency which can save patients' lives. Since the patient's data is sensitive and private, it is important to provide strong security and protection on the patient's medical data over public and insecure communication channels. In this paper, we address the challenge of participant authentication in mobile emergency medical care systems for patients supervision and propose a secure cloud-assisted architecture for accessing and monitoring health items collected by WBAN. For ensuring a high level of security and providing a mutual authentication property, chaotic maps based authentication and key agreement mechanisms are designed according to the concept of Diffie-Hellman key exchange, which depends on the CMBDLP and CMBDHP problems. Security and performance analyses show how the proposed system guaranteed the patient privacy and the system confidentiality of sensitive medical data while preserving the low computation property in medical treatment and remote medical monitoring.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lingerfelt, Eric J; Endeve, Eirik; Hui, Yawei
Improvements in scientific instrumentation allow imaging at mesoscopic to atomic length scales, many spectroscopic modes, and now--with the rise of multimodal acquisition systems and the associated processing capability--the era of multidimensional, informationally dense data sets has arrived. Technical issues in these combinatorial scientific fields are exacerbated by computational challenges best summarized as a necessity for drastic improvement in the capability to transfer, store, and analyze large volumes of data. The Bellerophon Environment for Analysis of Materials (BEAM) platform provides material scientists the capability to directly leverage the integrated computational and analytical power of High Performance Computing (HPC) to perform scalablemore » data analysis and simulation and manage uploaded data files via an intuitive, cross-platform client user interface. This framework delivers authenticated, "push-button" execution of complex user workflows that deploy data analysis algorithms and computational simulations utilizing compute-and-data cloud infrastructures and HPC environments like Titan at the Oak Ridge Leadershp Computing Facility (OLCF).« less
Satellite Imagery Analysis for Automated Global Food Security Forecasting
NASA Astrophysics Data System (ADS)
Moody, D.; Brumby, S. P.; Chartrand, R.; Keisler, R.; Mathis, M.; Beneke, C. M.; Nicholaeff, D.; Skillman, S.; Warren, M. S.; Poehnelt, J.
2017-12-01
The recent computing performance revolution has driven improvements in sensor, communication, and storage technology. Multi-decadal remote sensing datasets at the petabyte scale are now available in commercial clouds, with new satellite constellations generating petabytes/year of daily high-resolution global coverage imagery. Cloud computing and storage, combined with recent advances in machine learning, are enabling understanding of the world at a scale and at a level of detail never before feasible. We present results from an ongoing effort to develop satellite imagery analysis tools that aggregate temporal, spatial, and spectral information and that can scale with the high-rate and dimensionality of imagery being collected. We focus on the problem of monitoring food crop productivity across the Middle East and North Africa, and show how an analysis-ready, multi-sensor data platform enables quick prototyping of satellite imagery analysis algorithms, from land use/land cover classification and natural resource mapping, to yearly and monthly vegetative health change trends at the structural field level.
Development of an atmospheric infrared radiation model with high clouds for target detection
NASA Astrophysics Data System (ADS)
Bellisario, Christophe; Malherbe, Claire; Schweitzer, Caroline; Stein, Karin
2016-10-01
In the field of target detection, the simulation of the camera FOV (field of view) background is a significant issue. The presence of heterogeneous clouds might have a strong impact on a target detection algorithm. In order to address this issue, we present here the construction of the CERAMIC package (Cloudy Environment for RAdiance and MIcrophysics Computation) that combines cloud microphysical computation and 3D radiance computation to produce a 3D atmospheric infrared radiance in attendance of clouds. The input of CERAMIC starts with an observer with a spatial position and a defined FOV (by the mean of a zenithal angle and an azimuthal angle). We introduce a 3D cloud generator provided by the French LaMP for statistical and simplified physics. The cloud generator is implemented with atmospheric profiles including heterogeneity factor for 3D fluctuations. CERAMIC also includes a cloud database from the French CNRM for a physical approach. We present here some statistics developed about the spatial and time evolution of the clouds. Molecular optical properties are provided by the model MATISSE (Modélisation Avancée de la Terre pour l'Imagerie et la Simulation des Scènes et de leur Environnement). The 3D radiance is computed with the model LUCI (for LUminance de CIrrus). It takes into account 3D microphysics with a resolution of 5 cm-1 over a SWIR bandwidth. In order to have a fast computation time, most of the radiance contributors are calculated with analytical expressions. The multiple scattering phenomena are more difficult to model. Here a discrete ordinate method with correlated-K precision to compute the average radiance is used. We add a 3D fluctuations model (based on a behavioral model) taking into account microphysics variations. In fine, the following parameters are calculated: transmission, thermal radiance, single scattering radiance, radiance observed through the cloud and multiple scattering radiance. Spatial images are produced, with a dimension of 10 km x 10 km and a resolution of 0.1 km with each contribution of the radiance separated. We present here the first results with examples of a typical scenarii. A 1D comparison in results is made with the use of the MATISSE model by separating each radiance calculated, in order to validate outputs. The code performance in 3D is shown by comparing LUCI to SHDOM model, referency code which uses the Spherical Harmonic Discrete Ordinate Method for 3D Atmospheric Radiative Transfer model. The results obtained by the different codes present a strong agreement and the sources of small differences are considered. An important gain in time is observed for LUCI versus SHDOM. We finally conclude on various scenarios for case analysis.
2012-05-01
cloud computing 17 NASA Nebula Platform • Cloud computing pilot program at NASA Ames • Integrates open-source components into seamless, self...Mission support • Education and public outreach (NASA Nebula , 2010) 18 NSF Supported Cloud Research • Support for Cloud Computing in...Mell, P. & Grance, T. (2011). The NIST Definition of Cloud Computing. NIST Special Publication 800-145 • NASA Nebula (2010). Retrieved from
GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service.
Wang, Yiqi; Li, Gen; Ma, Mark; He, Fazhong; Song, Zhuo; Zhang, Wei; Wu, Chengkun
2018-01-19
Whole-genome sequencing (WGS) plays an increasingly important role in clinical practice and public health. Due to the big data size, WGS data analysis is usually compute-intensive and IO-intensive. Currently it usually takes 30 to 40 h to finish a 50× WGS analysis task, which is far from the ideal speed required by the industry. Furthermore, the high-end infrastructure required by WGS computing is costly in terms of time and money. In this paper, we aim to improve the time efficiency of WGS analysis and minimize the cost by elastic cloud computing. We developed a distributed system, GT-WGS, for large-scale WGS analyses utilizing the Amazon Web Services (AWS). Our system won the first prize on the Wind and Cloud challenge held by Genomics and Cloud Technology Alliance conference (GCTA) committee. The system makes full use of the dynamic pricing mechanism of AWS. We evaluate the performance of GT-WGS with a 55× WGS dataset (400GB fastq) provided by the GCTA 2017 competition. In the best case, it only took 18.4 min to finish the analysis and the AWS cost of the whole process is only 16.5 US dollars. The accuracy of GT-WGS is 99.9% consistent with that of the Genome Analysis Toolkit (GATK) best practice. We also evaluated the performance of GT-WGS performance on a real-world dataset provided by the XiangYa hospital, which consists of 5× whole-genome dataset with 500 samples, and on average GT-WGS managed to finish one 5× WGS analysis task in 2.4 min at a cost of $3.6. WGS is already playing an important role in guiding therapeutic intervention. However, its application is limited by the time cost and computing cost. GT-WGS excelled as an efficient and affordable WGS analyses tool to address this problem. The demo video and supplementary materials of GT-WGS can be accessed at https://github.com/Genetalks/wgs_analysis_demo .
A Systematic Literature Mapping of Risk Analysis of Big Data in Cloud Computing Environment
NASA Astrophysics Data System (ADS)
Bee Yusof Ali, Hazirah; Marziana Abdullah, Lili; Kartiwi, Mira; Nordin, Azlin; Salleh, Norsaremah; Sham Awang Abu Bakar, Normi
2018-05-01
This paper investigates previous literature that focusses on the three elements: risk assessment, big data and cloud. We use a systematic literature mapping method to search for journals and proceedings. The systematic literature mapping process is utilized to get a properly screened and focused literature. With the help of inclusion and exclusion criteria, the search of literature is further narrowed. Classification helps us in grouping the literature into categories. At the end of the mapping, gaps can be seen. The gap is where our focus should be in analysing risk of big data in cloud computing environment. Thus, a framework of how to assess the risk of security, privacy and trust associated with big data and cloud computing environment is highly needed.
Do Clouds Compute? A Framework for Estimating the Value of Cloud Computing
NASA Astrophysics Data System (ADS)
Klems, Markus; Nimis, Jens; Tai, Stefan
On-demand provisioning of scalable and reliable compute services, along with a cost model that charges consumers based on actual service usage, has been an objective in distributed computing research and industry for a while. Cloud Computing promises to deliver on this objective: consumers are able to rent infrastructure in the Cloud as needed, deploy applications and store data, and access them via Web protocols on a pay-per-use basis. The acceptance of Cloud Computing, however, depends on the ability for Cloud Computing providers and consumers to implement a model for business value co-creation. Therefore, a systematic approach to measure costs and benefits of Cloud Computing is needed. In this paper, we discuss the need for valuation of Cloud Computing, identify key components, and structure these components in a framework. The framework assists decision makers in estimating Cloud Computing costs and to compare these costs to conventional IT solutions. We demonstrate by means of representative use cases how our framework can be applied to real world scenarios.
A Novel Artificial Bee Colony Approach of Live Virtual Machine Migration Policy Using Bayes Theorem
Xu, Gaochao; Hu, Liang; Fu, Xiaodong
2013-01-01
Green cloud data center has become a research hotspot of virtualized cloud computing architecture. Since live virtual machine (VM) migration technology is widely used and studied in cloud computing, we have focused on the VM placement selection of live migration for power saving. We present a novel heuristic approach which is called PS-ABC. Its algorithm includes two parts. One is that it combines the artificial bee colony (ABC) idea with the uniform random initialization idea, the binary search idea, and Boltzmann selection policy to achieve an improved ABC-based approach with better global exploration's ability and local exploitation's ability. The other one is that it uses the Bayes theorem to further optimize the improved ABC-based process to faster get the final optimal solution. As a result, the whole approach achieves a longer-term efficient optimization for power saving. The experimental results demonstrate that PS-ABC evidently reduces the total incremental power consumption and better protects the performance of VM running and migrating compared with the existing research. It makes the result of live VM migration more high-effective and meaningful. PMID:24385877
A novel artificial bee colony approach of live virtual machine migration policy using Bayes theorem.
Xu, Gaochao; Ding, Yan; Zhao, Jia; Hu, Liang; Fu, Xiaodong
2013-01-01
Green cloud data center has become a research hotspot of virtualized cloud computing architecture. Since live virtual machine (VM) migration technology is widely used and studied in cloud computing, we have focused on the VM placement selection of live migration for power saving. We present a novel heuristic approach which is called PS-ABC. Its algorithm includes two parts. One is that it combines the artificial bee colony (ABC) idea with the uniform random initialization idea, the binary search idea, and Boltzmann selection policy to achieve an improved ABC-based approach with better global exploration's ability and local exploitation's ability. The other one is that it uses the Bayes theorem to further optimize the improved ABC-based process to faster get the final optimal solution. As a result, the whole approach achieves a longer-term efficient optimization for power saving. The experimental results demonstrate that PS-ABC evidently reduces the total incremental power consumption and better protects the performance of VM running and migrating compared with the existing research. It makes the result of live VM migration more high-effective and meaningful.
Design and Implementation of a Cloud Computing Adoption Decision Tool: Generating a Cloud Road.
Bildosola, Iñaki; Río-Belver, Rosa; Cilleruelo, Ernesto; Garechana, Gaizka
2015-01-01
Migrating to cloud computing is one of the current enterprise challenges. This technology provides a new paradigm based on "on-demand payment" for information and communication technologies. In this sense, the small and medium enterprise is supposed to be the most interested, since initial investments are avoided and the technology allows gradual implementation. However, even if the characteristics and capacities have been widely discussed, entry into the cloud is still lacking in terms of practical, real frameworks. This paper aims at filling this gap, presenting a real tool already implemented and tested, which can be used as a cloud computing adoption decision tool. This tool uses diagnosis based on specific questions to gather the required information and subsequently provide the user with valuable information to deploy the business within the cloud, specifically in the form of Software as a Service (SaaS) solutions. This information allows the decision makers to generate their particular Cloud Road. A pilot study has been carried out with enterprises at a local level with a two-fold objective: to ascertain the degree of knowledge on cloud computing and to identify the most interesting business areas and their related tools for this technology. As expected, the results show high interest and low knowledge on this subject and the tool presented aims to readdress this mismatch, insofar as possible.
Design and Implementation of a Cloud Computing Adoption Decision Tool: Generating a Cloud Road
Bildosola, Iñaki; Río-Belver, Rosa; Cilleruelo, Ernesto; Garechana, Gaizka
2015-01-01
Migrating to cloud computing is one of the current enterprise challenges. This technology provides a new paradigm based on “on-demand payment” for information and communication technologies. In this sense, the small and medium enterprise is supposed to be the most interested, since initial investments are avoided and the technology allows gradual implementation. However, even if the characteristics and capacities have been widely discussed, entry into the cloud is still lacking in terms of practical, real frameworks. This paper aims at filling this gap, presenting a real tool already implemented and tested, which can be used as a cloud computing adoption decision tool. This tool uses diagnosis based on specific questions to gather the required information and subsequently provide the user with valuable information to deploy the business within the cloud, specifically in the form of Software as a Service (SaaS) solutions. This information allows the decision makers to generate their particular Cloud Road. A pilot study has been carried out with enterprises at a local level with a two-fold objective: to ascertain the degree of knowledge on cloud computing and to identify the most interesting business areas and their related tools for this technology. As expected, the results show high interest and low knowledge on this subject and the tool presented aims to readdress this mismatch, insofar as possible. PMID:26230400
Fog computing job scheduling optimization based on bees swarm
NASA Astrophysics Data System (ADS)
Bitam, Salim; Zeadally, Sherali; Mellouk, Abdelhamid
2018-04-01
Fog computing is a new computing architecture, composed of a set of near-user edge devices called fog nodes, which collaborate together in order to perform computational services such as running applications, storing an important amount of data, and transmitting messages. Fog computing extends cloud computing by deploying digital resources at the premise of mobile users. In this new paradigm, management and operating functions, such as job scheduling aim at providing high-performance, cost-effective services requested by mobile users and executed by fog nodes. We propose a new bio-inspired optimization approach called Bees Life Algorithm (BLA) aimed at addressing the job scheduling problem in the fog computing environment. Our proposed approach is based on the optimized distribution of a set of tasks among all the fog computing nodes. The objective is to find an optimal tradeoff between CPU execution time and allocated memory required by fog computing services established by mobile users. Our empirical performance evaluation results demonstrate that the proposal outperforms the traditional particle swarm optimization and genetic algorithm in terms of CPU execution time and allocated memory.
Palmer, T. N.
2014-01-01
This paper sets out a new methodological approach to solving the equations for simulating and predicting weather and climate. In this approach, the conventionally hard boundary between the dynamical core and the sub-grid parametrizations is blurred. This approach is motivated by the relatively shallow power-law spectrum for atmospheric energy on scales of hundreds of kilometres and less. It is first argued that, because of this, the closure schemes for weather and climate simulators should be based on stochastic–dynamic systems rather than deterministic formulae. Second, as high-wavenumber elements of the dynamical core will necessarily inherit this stochasticity during time integration, it is argued that the dynamical core will be significantly over-engineered if all computations, regardless of scale, are performed completely deterministically and if all variables are represented with maximum numerical precision (in practice using double-precision floating-point numbers). As the era of exascale computing is approached, an energy- and computationally efficient approach to cloud-resolved weather and climate simulation is described where determinism and numerical precision are focused on the largest scales only. PMID:24842038
Palmer, T N
2014-06-28
This paper sets out a new methodological approach to solving the equations for simulating and predicting weather and climate. In this approach, the conventionally hard boundary between the dynamical core and the sub-grid parametrizations is blurred. This approach is motivated by the relatively shallow power-law spectrum for atmospheric energy on scales of hundreds of kilometres and less. It is first argued that, because of this, the closure schemes for weather and climate simulators should be based on stochastic-dynamic systems rather than deterministic formulae. Second, as high-wavenumber elements of the dynamical core will necessarily inherit this stochasticity during time integration, it is argued that the dynamical core will be significantly over-engineered if all computations, regardless of scale, are performed completely deterministically and if all variables are represented with maximum numerical precision (in practice using double-precision floating-point numbers). As the era of exascale computing is approached, an energy- and computationally efficient approach to cloud-resolved weather and climate simulation is described where determinism and numerical precision are focused on the largest scales only.
"Analysis of the multi-layered cloud radiative effects at the surface using A-train data"
NASA Astrophysics Data System (ADS)
Viudez-Mora, A.; Smith, W. L., Jr.; Kato, S.
2017-12-01
Clouds cover about 74% of the planet and they are an important part of the climate system and strongly influence the surface energy budget. The cloud vertical distribution has important implications in the atmospheric heating and cooling rates. Based on observations by active sensors in the A-train satellite constellation, CALIPSO [Winker et. al, 2010] and CloudSat [Stephens et. al, 2002], more than 1/3 of all clouds are multi-layered. Detection and retrieval of multi-layer cloud physical properties are needed in understanding their effects on the surface radiation budget. This study examines the sensitivity of surface irradiances to cloud properties derived from satellite sensors. Surface irradiances were computed in two different ways, one using cloud properties solely from MODerate resolution Imaging Spectroradiometer (MODIS), and the other using MODIS data supplemented with CALIPSO and CloudSat (hereafter CLCS) cloud vertical structure information [Kato et. al, 2010]. Results reveal that incorporating more precise and realistic cloud properties from CLCS into radiative transfer calculations yields improved estimates of cloud radiative effects (CRE) at the surface (CREsfc). The calculations using only MODIS cloud properties, comparisons of the computed CREsfc for 2-layer (2L) overcast CERES footprints, CLCS reduces the SW CRE by 1.5±26.7 Wm-2, increases the LW CRE by 4.1±12.7 Wm-2, and increases the net CREsfc by 0.9±46.7 Wm-2. In a subsequent analysis, we classified up to 6 different combinations of multi-layered clouds depending on the cloud top height as: High-high (HH), high-middle (HM), high-low (HL), middle-middle (MM), middle-low (ML) and low-low (LL). The 3 most frequent 2L cloud systems were: HL (56.1%), HM (22.3%) and HH (12.1%). For these cases, the computed CREsfc estimated using CLCS data presented the most significant differences when compared using only MODIS data. For example, the differences for the SW and Net CRE in the case HH was 12.3±47.3 Wm-2 and 16.0±48.45 Wm-2, respectively. For the case of HM, the LW CRE difference was -9.9±14.0 Wm-2. Kato, S., et al. (2010), J. Geophys. Res., 115. Stephens, G. L., et al. (2002), Bull. Am. Meteorol. Soc., 83. Winker, D. M., et al., (2010),Bull. Amer. Meteor. Soc., 91.
Efficient and Flexible Climate Analysis with Python in a Cloud-Based Distributed Computing Framework
NASA Astrophysics Data System (ADS)
Gannon, C.
2017-12-01
As climate models become progressively more advanced, and spatial resolution further improved through various downscaling projects, climate projections at a local level are increasingly insightful and valuable. However, the raw size of climate datasets presents numerous hurdles for analysts wishing to develop customized climate risk metrics or perform site-specific statistical analysis. Four Twenty Seven, a climate risk consultancy, has implemented a Python-based distributed framework to analyze large climate datasets in the cloud. With the freedom afforded by efficiently processing these datasets, we are able to customize and continually develop new climate risk metrics using the most up-to-date data. Here we outline our process for using Python packages such as XArray and Dask to evaluate netCDF files in a distributed framework, StarCluster to operate in a cluster-computing environment, cloud computing services to access publicly hosted datasets, and how this setup is particularly valuable for generating climate change indicators and performing localized statistical analysis.
IBM Cloud Computing Powering a Smarter Planet
NASA Astrophysics Data System (ADS)
Zhu, Jinzy; Fang, Xing; Guo, Zhe; Niu, Meng Hua; Cao, Fan; Yue, Shuang; Liu, Qin Yu
With increasing need for intelligent systems supporting the world's businesses, Cloud Computing has emerged as a dominant trend to provide a dynamic infrastructure to make such intelligence possible. The article introduced how to build a smarter planet with cloud computing technology. First, it introduced why we need cloud, and the evolution of cloud technology. Secondly, it analyzed the value of cloud computing and how to apply cloud technology. Finally, it predicted the future of cloud in the smarter planet.
How to keep the Grid full and working with ATLAS production and physics jobs
NASA Astrophysics Data System (ADS)
Pacheco Pagés, A.; Barreiro Megino, F. H.; Cameron, D.; Fassi, F.; Filipcic, A.; Di Girolamo, A.; González de la Hoz, S.; Glushkov, I.; Maeno, T.; Walker, R.; Yang, W.; ATLAS Collaboration
2017-10-01
The ATLAS production system provides the infrastructure to process millions of events collected during the LHC Run 1 and the first two years of Run 2 using grid, clouds and high performance computing. We address in this contribution the strategies and improvements that have been implemented to the production system for optimal performance and to achieve the highest efficiency of available resources from operational perspective. We focus on the recent developments.
Cloud Computing Security Issue: Survey
NASA Astrophysics Data System (ADS)
Kamal, Shailza; Kaur, Rajpreet
2011-12-01
Cloud computing is the growing field in IT industry since 2007 proposed by IBM. Another company like Google, Amazon, and Microsoft provides further products to cloud computing. The cloud computing is the internet based computing that shared recourses, information on demand. It provides the services like SaaS, IaaS and PaaS. The services and recourses are shared by virtualization that run multiple operation applications on cloud computing. This discussion gives the survey on the challenges on security issues during cloud computing and describes some standards and protocols that presents how security can be managed.
T-Check in System-of-Systems Technologies: Cloud Computing
2010-09-01
T-Check in System-of-Systems Technologies: Cloud Computing Harrison D. Strowd Grace A. Lewis September 2010 TECHNICAL NOTE CMU/SEI-2010... Cloud Computing 1 1.2 Types of Cloud Computing 2 1.3 Drivers and Barriers to Cloud Computing Adoption 5 2 Using the T-Check Method 7 2.1 T-Check...Hypothesis 3 25 3.4.2 Deployment View of the Solution for Testing Hypothesis 3 27 3.5 Selecting Cloud Computing Providers 30 3.6 Implementing the T-Check
Genotyping in the cloud with Crossbow.
Gurtowski, James; Schatz, Michael C; Langmead, Ben
2012-09-01
Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.
Lebeda, Frank J; Zalatoris, Jeffrey J; Scheerer, Julia B
2018-02-07
This position paper summarizes the development and the present status of Department of Defense (DoD) and other government policies and guidances regarding cloud computing services. Due to the heterogeneous and growing biomedical big datasets, cloud computing services offer an opportunity to mitigate the associated storage and analysis requirements. Having on-demand network access to a shared pool of flexible computing resources creates a consolidated system that should reduce potential duplications of effort in military biomedical research. Interactive, online literature searches were performed with Google, at the Defense Technical Information Center, and at two National Institutes of Health research portfolio information sites. References cited within some of the collected documents also served as literature resources. We gathered, selected, and reviewed DoD and other government cloud computing policies and guidances published from 2009 to 2017. These policies were intended to consolidate computer resources within the government and reduce costs by decreasing the number of federal data centers and by migrating electronic data to cloud systems. Initial White House Office of Management and Budget information technology guidelines were developed for cloud usage, followed by policies and other documents from the DoD, the Defense Health Agency, and the Armed Services. Security standards from the National Institute of Standards and Technology, the Government Services Administration, the DoD, and the Army were also developed. Government Services Administration and DoD Inspectors General monitored cloud usage by the DoD. A 2016 Government Accountability Office report characterized cloud computing as being economical, flexible and fast. A congressionally mandated independent study reported that the DoD was active in offering a wide selection of commercial cloud services in addition to its milCloud system. Our findings from the Department of Health and Human Services indicated that the security infrastructure in cloud services may be more compliant with the Health Insurance Portability and Accountability Act of 1996 regulations than traditional methods. To gauge the DoD's adoption of cloud technologies proposed metrics included cost factors, ease of use, automation, availability, accessibility, security, and policy compliance. Since 2009, plans and policies were developed for the use of cloud technology to help consolidate and reduce the number of data centers which were expected to reduce costs, improve environmental factors, enhance information technology security, and maintain mission support for service members. Cloud technologies were also expected to improve employee efficiency and productivity. Federal cloud computing policies within the last decade also offered increased opportunities to advance military healthcare. It was assumed that these opportunities would benefit consumers of healthcare and health science data by allowing more access to centralized cloud computer facilities to store, analyze, search and share relevant data, to enhance standardization, and to reduce potential duplications of effort. We recommend that cloud computing be considered by DoD biomedical researchers for increasing connectivity, presumably by facilitating communications and data sharing, among the various intra- and extramural laboratories. We also recommend that policies and other guidances be updated to include developing additional metrics that will help stakeholders evaluate the above mentioned assumptions and expectations. Published by Oxford University Press on behalf of the Association of Military Surgeons of the United States 2018. This work is written by (a) US Government employee(s) and is in the public domain in the US.
The Namibia Early Flood Warning System, A CEOS Pilot Project
NASA Technical Reports Server (NTRS)
Mandl, Daniel; Frye, Stuart; Cappelaere, Pat; Sohlberg, Robert; Handy, Matthew; Grossman, Robert
2012-01-01
Over the past year few years, an international collaboration has developed a pilot project under the auspices of Committee on Earth Observation Satellite (CEOS) Disasters team. The overall team consists of civilian satellite agencies. For this pilot effort, the development team consists of NASA, Canadian Space Agency, Univ. of Maryland, Univ. of Colorado, Univ. of Oklahoma, Ukraine Space Research Institute and Joint Research Center(JRC) for European Commission. This development team collaborates with regional , national and international agencies to deliver end-to-end disaster coverage. In particular, the team in collaborating on this effort with the Namibia Department of Hydrology to begin in Namibia . However, the ultimate goal is to expand the functionality to provide early warning over the South Africa region. The initial collaboration was initiated by United Nations Office of Outer Space Affairs and CEOS Working Group for Information Systems and Services (WGISS). The initial driver was to demonstrate international interoperability using various space agency sensors and models along with regional in-situ ground sensors. In 2010, the team created a preliminary semi-manual system to demonstrate moving and combining key data streams and delivering the data to the Namibia Department of Hydrology during their flood season which typically is January through April. In this pilot, a variety of moderate resolution and high resolution satellite flood imagery was rapidly delivered and used in conjunction with flood predictive models in Namibia. This was collected in conjunction with ground measurements and was used to examine how to create a customized flood early warning system. During the first year, the team made use of SensorWeb technology to gather various sensor data which was used to monitor flood waves traveling down basins originating in Angola, but eventually flooding villages in Namibia. The team made use of standardized interfaces such as those articulated under the Open Cloud Consortium (OGC) Sensor Web Enablement (SWE) set of web services was good [1][2]. However, it was discovered that in order to make a system like this functional, there were many performance issues. Data sets were large and located in a variety of location behind firewalls and had to be accessed across open networks, so security was an issue. Furthermore, the network access acted as bottleneck to transfer map products to where they are needed. Finally, during disasters, many users and computer processes act in parallel and thus it was very easy to overload the single string of computers stitched together in a virtual system that was initially developed. To address some of these performance issues, the team partnered with the Open Cloud Consortium (OCC) who supplied a Computation Cloud located at the University of Illinois at Chicago and some manpower to administer this Cloud. The Flood SensorWeb [3] system was interfaced to the Cloud to provide a high performance user interface and product development engine. Figure 1 shows the functional diagram of the Flood SensorWeb. Figure 2 shows some of the functionality of the Computation Cloud that was integrated. A significant portion of the original system was ported to the Cloud and during the past year, technical issues were resolved which included web access to the Cloud, security over the open Internet, beginning experiments on how to handle surge capacity by using the virtual machines in the cloud in parallel, using tiling techniques to render large data sets as layers on map, interfaces to allow user to customize the data processing/product chain and other performance enhancing techniques. The conclusion reached from the effort and this presentation is that defining the interoperability standards in a small fraction of the work. For example, once open web service standards were defined, many users could not make use of the standards due to security restrictions. Furthermore, once an interoperable sysm is functional, then a surge of users can render a system unusable, especially in the disaster domain.
2010-07-01
Cloud computing , an emerging form of computing in which users have access to scalable, on-demand capabilities that are provided through Internet... cloud computing , (2) the information security implications of using cloud computing services in the Federal Government, and (3) federal guidance and...efforts to address information security when using cloud computing . The complete report is titled Information Security: Federal Guidance Needed to
Development of a Cloud Resolving Model for Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Sreepathi, S.; Norman, M. R.; Pal, A.; Hannah, W.; Ponder, C.
2017-12-01
A cloud resolving climate model is needed to reduce major systematic errors in climate simulations due to structural uncertainty in numerical treatments of convection - such as convective storm systems. This research describes the porting effort to enable SAM (System for Atmosphere Modeling) cloud resolving model on heterogeneous supercomputers using GPUs (Graphical Processing Units). We have isolated a standalone configuration of SAM that is targeted to be integrated into the DOE ACME (Accelerated Climate Modeling for Energy) Earth System model. We have identified key computational kernels from the model and offloaded them to a GPU using the OpenACC programming model. Furthermore, we are investigating various optimization strategies intended to enhance GPU utilization including loop fusion/fission, coalesced data access and loop refactoring to a higher abstraction level. We will present early performance results, lessons learned as well as optimization strategies. The computational platform used in this study is the Summitdev system, an early testbed that is one generation removed from Summit, the next leadership class supercomputer at Oak Ridge National Laboratory. The system contains 54 nodes wherein each node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. This work is part of a larger project, ACME-MMF component of the U.S. Department of Energy(DOE) Exascale Computing Project. The ACME-MMF approach addresses structural uncertainty in cloud processes by replacing traditional parameterizations with cloud resolving "superparameterization" within each grid cell of global climate model. Super-parameterization dramatically increases arithmetic intensity, making the MMF approach an ideal strategy to achieve good performance on emerging exascale computing architectures. The goal of the project is to integrate superparameterization into ACME, and explore its full potential to scientifically and computationally advance climate simulation and prediction.
NASA Astrophysics Data System (ADS)
van Lew, Baldur; Botha, Charl P.; Milles, Julien R.; Vrooman, Henri A.; van de Giessen, Martijn; Lelieveldt, Boudewijn P. F.
2015-03-01
The cohort size required in epidemiological imaging genetics studies often mandates the pooling of data from multiple hospitals. Patient data, however, is subject to strict privacy protection regimes, and physical data storage may be legally restricted to a hospital network. To enable biomarker discovery, fast data access and interactive data exploration must be combined with high-performance computing resources, while respecting privacy regulations. We present a system using fast and inherently secure light-paths to access distributed data, thereby obviating the need for a central data repository. A secure private cloud computing framework facilitates interactive, computationally intensive exploration of this geographically distributed, privacy sensitive data. As a proof of concept, MRI brain imaging data hosted at two remote sites were processed in response to a user command at a third site. The system was able to automatically start virtual machines, run a selected processing pipeline and write results to a user accessible database, while keeping data locally stored in the hospitals. Individual tasks took approximately 50% longer compared to a locally hosted blade server but the cloud infrastructure reduced the total elapsed time by a factor of 40 using 70 virtual machines in the cloud. We demonstrated that the combination light-path and private cloud is a viable means of building an analysis infrastructure for secure data analysis. The system requires further work in the areas of error handling, load balancing and secure support of multiple users.
Risk in the Clouds?: Security Issues Facing Government Use of Cloud Computing
NASA Astrophysics Data System (ADS)
Wyld, David C.
Cloud computing is poised to become one of the most important and fundamental shifts in how computing is consumed and used. Forecasts show that government will play a lead role in adopting cloud computing - for data storage, applications, and processing power, as IT executives seek to maximize their returns on limited procurement budgets in these challenging economic times. After an overview of the cloud computing concept, this article explores the security issues facing public sector use of cloud computing and looks to the risk and benefits of shifting to cloud-based models. It concludes with an analysis of the challenges that lie ahead for government use of cloud resources.
A Review Study on Cloud Computing Issues
NASA Astrophysics Data System (ADS)
Kanaan Kadhim, Qusay; Yusof, Robiah; Sadeq Mahdi, Hamid; Al-shami, Sayed Samer Ali; Rahayu Selamat, Siti
2018-05-01
Cloud computing is the most promising current implementation of utility computing in the business world, because it provides some key features over classic utility computing, such as elasticity to allow clients dynamically scale-up and scale-down the resources in execution time. Nevertheless, cloud computing is still in its premature stage and experiences lack of standardization. The security issues are the main challenges to cloud computing adoption. Thus, critical industries such as government organizations (ministries) are reluctant to trust cloud computing due to the fear of losing their sensitive data, as it resides on the cloud with no knowledge of data location and lack of transparency of Cloud Service Providers (CSPs) mechanisms used to secure their data and applications which have created a barrier against adopting this agile computing paradigm. This study aims to review and classify the issues that surround the implementation of cloud computing which a hot area that needs to be addressed by future research.
Automatic Detection of Clouds and Shadows Using High Resolution Satellite Image Time Series
NASA Astrophysics Data System (ADS)
Champion, Nicolas
2016-06-01
Detecting clouds and their shadows is one of the primaries steps to perform when processing satellite images because they may alter the quality of some products such as large-area orthomosaics. The main goal of this paper is to present the automatic method developed at IGN-France for detecting clouds and shadows in a sequence of satellite images. In our work, surface reflectance orthoimages are used. They were processed from initial satellite images using a dedicated software. The cloud detection step consists of a region-growing algorithm. Seeds are firstly extracted. For that purpose and for each input ortho-image to process, we select the other ortho-images of the sequence that intersect it. The pixels of the input ortho-image are secondly labelled seeds if the difference of reflectance (in the blue channel) with overlapping ortho-images is bigger than a given threshold. Clouds are eventually delineated using a region-growing method based on a radiometric and homogeneity criterion. Regarding the shadow detection, our method is based on the idea that a shadow pixel is darker when comparing to the other images of the time series. The detection is basically composed of three steps. Firstly, we compute a synthetic ortho-image covering the whole study area. Its pixels have a value corresponding to the median value of all input reflectance ortho-images intersecting at that pixel location. Secondly, for each input ortho-image, a pixel is labelled shadows if the difference of reflectance (in the NIR channel) with the synthetic ortho-image is below a given threshold. Eventually, an optional region-growing step may be used to refine the results. Note that pixels labelled clouds during the cloud detection are not used for computing the median value in the first step; additionally, the NIR input data channel is used to perform the shadow detection, because it appeared to better discriminate shadow pixels. The method was tested on times series of Landsat 8 and Pléiades-HR images and our first experiments show the feasibility to automate the detection of shadows and clouds in satellite image sequences.
Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes
Shringarpure, Suyash S.; Carroll, Andrew; De La Vega, Francisco M.; Bustamante, Carlos D.
2015-01-01
Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future. PMID:26110529
A location selection policy of live virtual machine migration for power saving and load balancing.
Zhao, Jia; Ding, Yan; Xu, Gaochao; Hu, Liang; Dong, Yushuang; Fu, Xiaodong
2013-01-01
Green cloud data center has become a research hotspot of virtualized cloud computing architecture. And load balancing has also been one of the most important goals in cloud data centers. Since live virtual machine (VM) migration technology is widely used and studied in cloud computing, we have focused on location selection (migration policy) of live VM migration for power saving and load balancing. We propose a novel approach MOGA-LS, which is a heuristic and self-adaptive multiobjective optimization algorithm based on the improved genetic algorithm (GA). This paper has presented the specific design and implementation of MOGA-LS such as the design of the genetic operators, fitness values, and elitism. We have introduced the Pareto dominance theory and the simulated annealing (SA) idea into MOGA-LS and have presented the specific process to get the final solution, and thus, the whole approach achieves a long-term efficient optimization for power saving and load balancing. The experimental results demonstrate that MOGA-LS evidently reduces the total incremental power consumption and better protects the performance of VM migration and achieves the balancing of system load compared with the existing research. It makes the result of live VM migration more high-effective and meaningful.
A Location Selection Policy of Live Virtual Machine Migration for Power Saving and Load Balancing
Xu, Gaochao; Hu, Liang; Dong, Yushuang; Fu, Xiaodong
2013-01-01
Green cloud data center has become a research hotspot of virtualized cloud computing architecture. And load balancing has also been one of the most important goals in cloud data centers. Since live virtual machine (VM) migration technology is widely used and studied in cloud computing, we have focused on location selection (migration policy) of live VM migration for power saving and load balancing. We propose a novel approach MOGA-LS, which is a heuristic and self-adaptive multiobjective optimization algorithm based on the improved genetic algorithm (GA). This paper has presented the specific design and implementation of MOGA-LS such as the design of the genetic operators, fitness values, and elitism. We have introduced the Pareto dominance theory and the simulated annealing (SA) idea into MOGA-LS and have presented the specific process to get the final solution, and thus, the whole approach achieves a long-term efficient optimization for power saving and load balancing. The experimental results demonstrate that MOGA-LS evidently reduces the total incremental power consumption and better protects the performance of VM migration and achieves the balancing of system load compared with the existing research. It makes the result of live VM migration more high-effective and meaningful. PMID:24348165
NASA Technical Reports Server (NTRS)
Hasler, A. F.; Strong, J.; Woodward, R. H.; Pierce, H.
1991-01-01
Results are presented on an automatic stereo analysis of cloud-top heights from nearly simultaneous satellite image pairs from the GOES and NOAA satellites, using a massively parallel processor computer. Comparisons of computer-derived height fields and manually analyzed fields show that the automatic analysis technique shows promise for performing routine stereo analysis in a real-time environment, providing a useful forecasting tool by augmenting observational data sets of severe thunderstorms and hurricanes. Simulations using synthetic stereo data show that it is possible to automatically resolve small-scale features such as 4000-m-diam clouds to about 1500 m in the vertical.
CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.
Nguyen, Tung; Shi, Weisong; Ruden, Douglas
2011-06-06
Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http://mine.cs.wayne.edu:8080/CloudAligner/. Our results show that CloudAligner is faster than CloudBurst, provides more accurate results than RMAP, and supports various input as well as output formats. In addition, with the web-based interface, it is easier to use than its counterparts.
An innovative privacy preserving technique for incremental datasets on cloud computing.
Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Aljeroudi, Yazan
2016-08-01
Cloud computing (CC) is a magnificent service-based delivery with gigantic computer processing power and data storage across connected communications channels. It imparted overwhelming technological impetus in the internet (web) mediated IT industry, where users can easily share private data for further analysis and mining. Furthermore, user affable CC services enable to deploy sundry applications economically. Meanwhile, simple data sharing impelled various phishing attacks and malware assisted security threats. Some privacy sensitive applications like health services on cloud that are built with several economic and operational benefits necessitate enhanced security. Thus, absolute cyberspace security and mitigation against phishing blitz became mandatory to protect overall data privacy. Typically, diverse applications datasets are anonymized with better privacy to owners without providing all secrecy requirements to the newly added records. Some proposed techniques emphasized this issue by re-anonymizing the datasets from the scratch. The utmost privacy protection over incremental datasets on CC is far from being achieved. Certainly, the distribution of huge datasets volume across multiple storage nodes limits the privacy preservation. In this view, we propose a new anonymization technique to attain better privacy protection with high data utility over distributed and incremental datasets on CC. The proficiency of data privacy preservation and improved confidentiality requirements is demonstrated through performance evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-09-04
...--Intersection of Cloud Computing and Mobility Forum and Workshop AGENCY: National Institute of Standards and.../intersection-of-cloud-and-mobility.cfm . SUPPLEMENTARY INFORMATION: NIST hosted six prior Cloud Computing Forum... interoperability, portability, and security, discuss the Federal Government's experience with cloud computing...
Embracing the Cloud: Six Ways to Look at the Shift to Cloud Computing
ERIC Educational Resources Information Center
Ullman, David F.; Haggerty, Blake
2010-01-01
Cloud computing is the latest paradigm shift for the delivery of IT services. Where previous paradigms (centralized, decentralized, distributed) were based on fairly straightforward approaches to technology and its management, cloud computing is radical in comparison. The literature on cloud computing, however, suffers from many divergent…
A high-resolution oxygen A-band spectrometer (HABS) and its radiation closure
NASA Astrophysics Data System (ADS)
Min, Q.; Yin, B.; Li, S.; Berndt, J.; Harrison, L.; Joseph, E.; Duan, M.; Kiedron, P.
2014-02-01
The pressure dependence of oxygen A-band absorption enables the retrieval of the vertical profiles of aerosol and cloud properties from oxygen A-band spectrometry. To improve the understanding of oxygen A-band inversions and utility, we developed a high-resolution oxygen A-band spectrometer (HABS), and deployed it at Howard University Beltsville site during the NASA Discover Air-Quality Field Campaign in July 2011. The HABS has the ability to measure solar direct-beam and zenith diffuse radiation through a telescope automatically. It exhibits excellent performance: stable spectral response ratio, high signal-to-noise ratio (SNR), high spectrum resolution (0.16 nm), and high Out-of-Band Rejection (10-5). To evaluate the spectra performance of HABS, a HABS simulator has been developed by combing the discrete ordinates radiative transfer (DISORT) code with the High Resolution Transmission (HTRAN) database HITRAN2008. The simulator uses double-k approach to reduce the computational cost. The HABS measured spectra are consistent with the related simulated spectra. For direct-beam spectra, the confidence intervals (95%) of relative difference between measurements and simulation are (-0.06, 0.05) and (-0.08, 0.09) for solar zenith angles of 27° and 72°, respectively. The main differences between them occur at or near the strong oxygen absorption line centers. They are mainly caused by the noise/spikes of HABS measured spectra, as a result of combined effects of weak signal, low SNR, and errors in wavelength registration and absorption line parameters. The high-resolution oxygen A-band measurements from HABS can constrain the active radar retrievals for more accurate cloud optical properties, particularly for multi-layer clouds and for mixed-phase clouds.
The Research of the Parallel Computing Development from the Angle of Cloud Computing
NASA Astrophysics Data System (ADS)
Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun
2017-10-01
Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
International Symposium on Grids and Clouds (ISGC) 2014
NASA Astrophysics Data System (ADS)
The International Symposium on Grids and Clouds (ISGC) 2014 will be held at Academia Sinica in Taipei, Taiwan from 23-28 March 2014, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC).“Bringing the data scientist to global e-Infrastructures” is the theme of ISGC 2014. The last decade has seen the phenomenal growth in the production of data in all forms by all research communities to produce a deluge of data from which information and knowledge need to be extracted. Key to this success will be the data scientist - educated to use advanced algorithms, applications and infrastructures - collaborating internationally to tackle society’s challenges. ISGC 2014 will bring together researchers working in all aspects of data science from different disciplines around the world to collaborate and educate themselves in the latest achievements and techniques being used to tackle the data deluge. In addition to the regular workshops, technical presentations and plenary keynotes, ISGC this year will focus on how to grow the data science community by considering the educational foundation needed for tomorrow’s data scientist. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities & Social Sciences Application, Virtual Research Environment (including Middleware, tools, services, workflow, ... etc.), Data Management, Big Data, Infrastructure & Operations Management, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC).
Cloud computing basics for librarians.
Hoy, Matthew B
2012-01-01
"Cloud computing" is the name for the recent trend of moving software and computing resources to an online, shared-service model. This article briefly defines cloud computing, discusses different models, explores the advantages and disadvantages, and describes some of the ways cloud computing can be used in libraries. Examples of cloud services are included at the end of the article. Copyright © Taylor & Francis Group, LLC
Classification of large-scale fundus image data sets: a cloud-computing framework.
Roychowdhury, Sohini
2016-08-01
Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
BioVLAB-MMIA: a cloud environment for microRNA and mRNA integrated analysis (MMIA) on Amazon EC2.
Lee, Hyungro; Yang, Youngik; Chae, Heejoon; Nam, Seungyoon; Choi, Donghoon; Tangchaisin, Patanachai; Herath, Chathura; Marru, Suresh; Nephew, Kenneth P; Kim, Sun
2012-09-01
MicroRNAs, by regulating the expression of hundreds of target genes, play critical roles in developmental biology and the etiology of numerous diseases, including cancer. As a vast amount of microRNA expression profile data are now publicly available, the integration of microRNA expression data sets with gene expression profiles is a key research problem in life science research. However, the ability to conduct genome-wide microRNA-mRNA (gene) integration currently requires sophisticated, high-end informatics tools, significant expertise in bioinformatics and computer science to carry out the complex integration analysis. In addition, increased computing infrastructure capabilities are essential in order to accommodate large data sets. In this study, we have extended the BioVLAB cloud workbench to develop an environment for the integrated analysis of microRNA and mRNA expression data, named BioVLAB-MMIA. The workbench facilitates computations on the Amazon EC2 and S3 resources orchestrated by the XBaya Workflow Suite. The advantages of BioVLAB-MMIA over the web-based MMIA system include: 1) readily expanded as new computational tools become available; 2) easily modifiable by re-configuring graphic icons in the workflow; 3) on-demand cloud computing resources can be used on an "as needed" basis; 4) distributed orchestration supports complex and long running workflows asynchronously. We believe that BioVLAB-MMIA will be an easy-to-use computing environment for researchers who plan to perform genome-wide microRNA-mRNA (gene) integrated analysis tasks.
Motion/imagery secure cloud enterprise architecture analysis
NASA Astrophysics Data System (ADS)
DeLay, John L.
2012-06-01
Cloud computing with storage virtualization and new service-oriented architectures brings a new perspective to the aspect of a distributed motion imagery and persistent surveillance enterprise. Our existing research is focused mainly on content management, distributed analytics, WAN distributed cloud networking performance issues of cloud based technologies. The potential of leveraging cloud based technologies for hosting motion imagery, imagery and analytics workflows for DOD and security applications is relatively unexplored. This paper will examine technologies for managing, storing, processing and disseminating motion imagery and imagery within a distributed network environment. Finally, we propose areas for future research in the area of distributed cloud content management enterprises.
An adaptive process-based cloud infrastructure for space situational awareness applications
NASA Astrophysics Data System (ADS)
Liu, Bingwei; Chen, Yu; Shen, Dan; Chen, Genshe; Pham, Khanh; Blasch, Erik; Rubin, Bruce
2014-06-01
Space situational awareness (SSA) and defense space control capabilities are top priorities for groups that own or operate man-made spacecraft. Also, with the growing amount of space debris, there is an increase in demand for contextual understanding that necessitates the capability of collecting and processing a vast amount sensor data. Cloud computing, which features scalable and flexible storage and computing services, has been recognized as an ideal candidate that can meet the large data contextual challenges as needed by SSA. Cloud computing consists of physical service providers and middleware virtual machines together with infrastructure, platform, and software as service (IaaS, PaaS, SaaS) models. However, the typical Virtual Machine (VM) abstraction is on a per operating systems basis, which is at too low-level and limits the flexibility of a mission application architecture. In responding to this technical challenge, a novel adaptive process based cloud infrastructure for SSA applications is proposed in this paper. In addition, the details for the design rationale and a prototype is further examined. The SSA Cloud (SSAC) conceptual capability will potentially support space situation monitoring and tracking, object identification, and threat assessment. Lastly, the benefits of a more granular and flexible cloud computing resources allocation are illustrated for data processing and implementation considerations within a representative SSA system environment. We show that the container-based virtualization performs better than hypervisor-based virtualization technology in an SSA scenario.
Devi, D Chitra; Uthariaraj, V Rhymend
2016-01-01
Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods.
Devi, D. Chitra; Uthariaraj, V. Rhymend
2016-01-01
Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods. PMID:26955656
Zhu, Lingyun; Li, Lianjie; Meng, Chunyan
2014-12-01
There have been problems in the existing multiple physiological parameter real-time monitoring system, such as insufficient server capacity for physiological data storage and analysis so that data consistency can not be guaranteed, poor performance in real-time, and other issues caused by the growing scale of data. We therefore pro posed a new solution which was with multiple physiological parameters and could calculate clustered background data storage and processing based on cloud computing. Through our studies, a batch processing for longitudinal analysis of patients' historical data was introduced. The process included the resource virtualization of IaaS layer for cloud platform, the construction of real-time computing platform of PaaS layer, the reception and analysis of data stream of SaaS layer, and the bottleneck problem of multi-parameter data transmission, etc. The results were to achieve in real-time physiological information transmission, storage and analysis of a large amount of data. The simulation test results showed that the remote multiple physiological parameter monitoring system based on cloud platform had obvious advantages in processing time and load balancing over the traditional server model. This architecture solved the problems including long turnaround time, poor performance of real-time analysis, lack of extensibility and other issues, which exist in the traditional remote medical services. Technical support was provided in order to facilitate a "wearable wireless sensor plus mobile wireless transmission plus cloud computing service" mode moving towards home health monitoring for multiple physiological parameter wireless monitoring.
A Novel College Network Resource Management Method using Cloud Computing
NASA Astrophysics Data System (ADS)
Lin, Chen
At present information construction of college mainly has construction of college networks and management information system; there are many problems during the process of information. Cloud computing is development of distributed processing, parallel processing and grid computing, which make data stored on the cloud, make software and services placed in the cloud and build on top of various standards and protocols, you can get it through all kinds of equipments. This article introduces cloud computing and function of cloud computing, then analyzes the exiting problems of college network resource management, the cloud computing technology and methods are applied in the construction of college information sharing platform.
Cloud-based Jupyter Notebooks for Water Data Analysis
NASA Astrophysics Data System (ADS)
Castronova, A. M.; Brazil, L.; Seul, M.
2017-12-01
The development and adoption of technologies by the water science community to improve our ability to openly collaborate and share workflows will have a transformative impact on how we address the challenges associated with collaborative and reproducible scientific research. Jupyter notebooks offer one solution by providing an open-source platform for creating metadata-rich toolchains for modeling and data analysis applications. Adoption of this technology within the water sciences, coupled with publicly available datasets from agencies such as USGS, NASA, and EPA enables researchers to easily prototype and execute data intensive toolchains. Moreover, implementing this software stack in a cloud-based environment extends its native functionality to provide researchers a mechanism to build and execute toolchains that are too large or computationally demanding for typical desktop computers. Additionally, this cloud-based solution enables scientists to disseminate data processing routines alongside journal publications in an effort to support reproducibility. For example, these data collection and analysis toolchains can be shared, archived, and published using the HydroShare platform or downloaded and executed locally to reproduce scientific analysis. This work presents the design and implementation of a cloud-based Jupyter environment and its application for collecting, aggregating, and munging various datasets in a transparent, sharable, and self-documented manner. The goals of this work are to establish a free and open source platform for domain scientists to (1) conduct data intensive and computationally intensive collaborative research, (2) utilize high performance libraries, models, and routines within a pre-configured cloud environment, and (3) enable dissemination of research products. This presentation will discuss recent efforts towards achieving these goals, and describe the architectural design of the notebook server in an effort to support collaborative and reproducible science.
Translational Biomedical Informatics in the Cloud: Present and Future
Chen, Jiajia; Qian, Fuliang; Yan, Wenying; Shen, Bairong
2013-01-01
Next generation sequencing and other high-throughput experimental techniques of recent decades have driven the exponential growth in publicly available molecular and clinical data. This information explosion has prepared the ground for the development of translational bioinformatics. The scale and dimensionality of data, however, pose obvious challenges in data mining, storage, and integration. In this paper we demonstrated the utility and promise of cloud computing for tackling the big data problems. We also outline our vision that cloud computing could be an enabling tool to facilitate translational bioinformatics research. PMID:23586054
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.
NASA Astrophysics Data System (ADS)
Dong, Yumin; Xiao, Shufen; Ma, Hongyang; Chen, Libo
2016-12-01
Cloud computing and big data have become the developing engine of current information technology (IT) as a result of the rapid development of IT. However, security protection has become increasingly important for cloud computing and big data, and has become a problem that must be solved to develop cloud computing. The theft of identity authentication information remains a serious threat to the security of cloud computing. In this process, attackers intrude into cloud computing services through identity authentication information, thereby threatening the security of data from multiple perspectives. Therefore, this study proposes a model for cloud computing protection and management based on quantum authentication, introduces the principle of quantum authentication, and deduces the quantum authentication process. In theory, quantum authentication technology can be applied in cloud computing for security protection. This technology cannot be cloned; thus, it is more secure and reliable than classical methods.
Towards real-time photon Monte Carlo dose calculation in the cloud
NASA Astrophysics Data System (ADS)
Ziegenhein, Peter; Kozin, Igor N.; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-01
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards real-time photon Monte Carlo dose calculation in the cloud.
Ziegenhein, Peter; Kozin, Igor N; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-07
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards Dynamic Remote Data Auditing in Computational Clouds
Khurram Khan, Muhammad; Anuar, Nor Badrul
2014-01-01
Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server. PMID:25121114
Towards dynamic remote data auditing in computational clouds.
Sookhak, Mehdi; Akhunzada, Adnan; Gani, Abdullah; Khurram Khan, Muhammad; Anuar, Nor Badrul
2014-01-01
Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server.
Establishing a Cloud Computing Success Model for Hospitals in Taiwan.
Lian, Jiunn-Woei
2017-01-01
The purpose of this study is to understand the critical quality-related factors that affect cloud computing success of hospitals in Taiwan. In this study, private cloud computing is the major research target. The chief information officers participated in a questionnaire survey. The results indicate that the integration of trust into the information systems success model will have acceptable explanatory power to understand cloud computing success in the hospital. Moreover, information quality and system quality directly affect cloud computing satisfaction, whereas service quality indirectly affects the satisfaction through trust. In other words, trust serves as the mediator between service quality and satisfaction. This cloud computing success model will help hospitals evaluate or achieve success after adopting private cloud computing health care services.
Establishing a Cloud Computing Success Model for Hospitals in Taiwan
Lian, Jiunn-Woei
2017-01-01
The purpose of this study is to understand the critical quality-related factors that affect cloud computing success of hospitals in Taiwan. In this study, private cloud computing is the major research target. The chief information officers participated in a questionnaire survey. The results indicate that the integration of trust into the information systems success model will have acceptable explanatory power to understand cloud computing success in the hospital. Moreover, information quality and system quality directly affect cloud computing satisfaction, whereas service quality indirectly affects the satisfaction through trust. In other words, trust serves as the mediator between service quality and satisfaction. This cloud computing success model will help hospitals evaluate or achieve success after adopting private cloud computing health care services. PMID:28112020
Implementation of cloud computing in higher education
NASA Astrophysics Data System (ADS)
Asniar; Budiawan, R.
2016-04-01
Cloud computing research is a new trend in distributed computing, where people have developed service and SOA (Service Oriented Architecture) based application. This technology is very useful to be implemented, especially for higher education. This research is studied the need and feasibility for the suitability of cloud computing in higher education then propose the model of cloud computing service in higher education in Indonesia that can be implemented in order to support academic activities. Literature study is used as the research methodology to get a proposed model of cloud computing in higher education. Finally, SaaS and IaaS are cloud computing service that proposed to be implemented in higher education in Indonesia and cloud hybrid is the service model that can be recommended.
The direction of cloud computing for Malaysian education sector in 21st century
NASA Astrophysics Data System (ADS)
Jaafar, Jazurainifariza; Rahman, M. Nordin A.; Kadir, M. Fadzil A.; Shamsudin, Syadiah Nor; Saany, Syarilla Iryani A.
2017-08-01
In 21st century, technology has turned learning environment into a new way of education to make learning systems more effective and systematic. Nowadays, education institutions are faced many challenges to ensure the teaching and learning process is running smoothly and manageable. Some of challenges in the current education management are lack of integrated systems, high cost of maintenance, difficulty of configuration and deployment as well as complexity of storage provision. Digital learning is an instructional practice that use technology to make learning experience more effective, provides education process more systematic and attractive. Digital learning can be considered as one of the prominent application that implemented under cloud computing environment. Cloud computing is a type of network resources that provides on-demands services where the users can access applications inside it at any location and no time border. It also promises for minimizing the cost of maintenance and provides a flexible of data storage capacity. The aim of this article is to review the definition and types of cloud computing for improving digital learning management as required in the 21st century education. The analysis of digital learning context focused on primary school in Malaysia. Types of cloud applications and services in education sector are also discussed in the article. Finally, gap analysis and direction of cloud computing in education sector for facing the 21st century challenges are suggested.
Research on Key Technologies of Cloud Computing
NASA Astrophysics Data System (ADS)
Zhang, Shufen; Yan, Hongcan; Chen, Xuebin
With the development of multi-core processors, virtualization, distributed storage, broadband Internet and automatic management, a new type of computing mode named cloud computing is produced. It distributes computation task on the resource pool which consists of massive computers, so the application systems can obtain the computing power, the storage space and software service according to its demand. It can concentrate all the computing resources and manage them automatically by the software without intervene. This makes application offers not to annoy for tedious details and more absorbed in his business. It will be advantageous to innovation and reduce cost. It's the ultimate goal of cloud computing to provide calculation, services and applications as a public facility for the public, So that people can use the computer resources just like using water, electricity, gas and telephone. Currently, the understanding of cloud computing is developing and changing constantly, cloud computing still has no unanimous definition. This paper describes three main service forms of cloud computing: SAAS, PAAS, IAAS, compared the definition of cloud computing which is given by Google, Amazon, IBM and other companies, summarized the basic characteristics of cloud computing, and emphasized on the key technologies such as data storage, data management, virtualization and programming model.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.
Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli
2016-09-10
The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
The Many Colors and Shapes of Cloud
NASA Astrophysics Data System (ADS)
Yeh, James T.
While many enterprises and business entities are deploying and exploiting Cloud Computing, the academic institutes and researchers are also busy trying to wrestle this beast and put a leash on this possible paradigm changing computing model. Many have argued that Cloud Computing is nothing more than a name change of Utility Computing. Others have argued that Cloud Computing is a revolutionary change of the computing architecture. So it has been difficult to put a boundary of what is in Cloud Computing, and what is not. I assert that it is equally difficult to find a group of people who would agree on even the definition of Cloud Computing. In actuality, may be all that arguments are not necessary, as Clouds have many shapes and colors. In this presentation, the speaker will attempt to illustrate that the shape and the color of the cloud depend very much on the business goals one intends to achieve. It will be a very rich territory for both the businesses to take the advantage of the benefits of Cloud Computing and the academia to integrate the technology research and business research.
NASA Astrophysics Data System (ADS)
Panitkin, Sergey; Barreiro Megino, Fernando; Caballero Bejar, Jose; Benjamin, Doug; Di Girolamo, Alessandro; Gable, Ian; Hendrix, Val; Hover, John; Kucharczyk, Katarzyna; Medrano Llamas, Ramon; Love, Peter; Ohman, Henrik; Paterson, Michael; Sobie, Randall; Taylor, Ryan; Walker, Rodney; Zaytsev, Alexander; Atlas Collaboration
2014-06-01
The computing model of the ATLAS experiment was designed around the concept of grid computing and, since the start of data taking, this model has proven very successful. However, new cloud computing technologies bring attractive features to improve the operations and elasticity of scientific distributed computing. ATLAS sees grid and cloud computing as complementary technologies that will coexist at different levels of resource abstraction, and two years ago created an R&D working group to investigate the different integration scenarios. The ATLAS Cloud Computing R&D has been able to demonstrate the feasibility of offloading work from grid to cloud sites and, as of today, is able to integrate transparently various cloud resources into the PanDA workload management system. The ATLAS Cloud Computing R&D is operating various PanDA queues on private and public resources and has provided several hundred thousand CPU days to the experiment. As a result, the ATLAS Cloud Computing R&D group has gained a significant insight into the cloud computing landscape and has identified points that still need to be addressed in order to fully utilize this technology. This contribution will explain the cloud integration models that are being evaluated and will discuss ATLAS' learning during the collaboration with leading commercial and academic cloud providers.
NASA Technical Reports Server (NTRS)
Molthan, A. L.; Haynes, J. A.; Case, J. L.; Jedlovec, G. L.; Lapenta, W. M.
2008-01-01
As computational power increases, operational forecast models are performing simulations with higher spatial resolution allowing for the transition from sub-grid scale cloud parameterizations to an explicit forecast of cloud characteristics and precipitation through the use of single- or multi-moment bulk water microphysics schemes. investments in space-borne and terrestrial remote sensing have developed the NASA CloudSat Cloud Profiling Radar and the NOAA National Weather Service NEXRAD system, each providing observations related to the bulk properties of clouds and precipitation through measurements of reflectivity. CloudSat and NEXRAD system radars observed light to moderate snowfall in association with a cold-season, midlatitude cyclone traversing the Central United States in February 2007. These systems are responsible for widespread cloud cover and various types of precipitation, are of economic consequence, and pose a challenge to operational forecasters. This event is simulated with the Weather Research and Forecast (WRF) Model, utilizing the NASA Goddard Cumulus Ensemble microphysics scheme. Comparisons are made between WRF-simulated and observed reflectivity available from the CloudSat and NEXRAD systems. The application of CloudSat reflectivity is made possible through the QuickBeam radiative transfer model, with cautious application applied in light of single scattering characteristics and spherical target assumptions. Significant differences are noted within modeled and observed cloud profiles, based upon simulated reflectivity, and modifications to the single-moment scheme are tested through a supplemental WRF forecast that incorporates a temperature dependent snow crystal size distribution.
Linearized radiative transfer models for retrieval of cloud parameters from EPIC/DSCOVR measurements
NASA Astrophysics Data System (ADS)
Molina García, Víctor; Sasi, Sruthy; Efremenko, Dmitry S.; Doicu, Adrian; Loyola, Diego
2018-07-01
In this paper, we describe several linearized radiative transfer models which can be used for the retrieval of cloud parameters from EPIC (Earth Polychromatic Imaging Camera) measurements. The approaches under examination are (1) the linearized forward approach, represented in this paper by the linearized discrete ordinate and matrix operator methods with matrix exponential, and (2) the forward-adjoint approach based on the discrete ordinate method with matrix exponential. To enhance the performance of the radiative transfer computations, the correlated k-distribution method and the Principal Component Analysis (PCA) technique are used. We provide a compact description of the proposed methods, as well as a numerical analysis of their accuracy and efficiency when simulating EPIC measurements in the oxygen A-band channel at 764 nm. We found that the computation time of the forward-adjoint approach using the correlated k-distribution method in conjunction with PCA is approximately 13 s for simultaneously computing the derivatives with respect to cloud optical thickness and cloud top height.
The Education Value of Cloud Computing
ERIC Educational Resources Information Center
Katzan, Harry, Jr.
2010-01-01
Cloud computing is a technique for supplying computer facilities and providing access to software via the Internet. Cloud computing represents a contextual shift in how computers are provisioned and accessed. One of the defining characteristics of cloud software service is the transfer of control from the client domain to the service provider.…
Cloud Computing. Technology Briefing. Number 1
ERIC Educational Resources Information Center
Alberta Education, 2013
2013-01-01
Cloud computing is Internet-based computing in which shared resources, software and information are delivered as a service that computers or mobile devices can access on demand. Cloud computing is already used extensively in education. Free or low-cost cloud-based services are used daily by learners and educators to support learning, social…
Can cloud computing benefit health services? - a SWOT analysis.
Kuo, Mu-Hsing; Kushniruk, Andre; Borycki, Elizabeth
2011-01-01
In this paper, we discuss cloud computing, the current state of cloud computing in healthcare, and the challenges and opportunities of adopting cloud computing in healthcare. A Strengths, Weaknesses, Opportunities and Threats (SWOT) analysis was used to evaluate the feasibility of adopting this computing model in healthcare. The paper concludes that cloud computing could have huge benefits for healthcare but there are a number of issues that will need to be addressed before its widespread use in healthcare.
A PACS archive architecture supported on cloud services.
Silva, Luís A Bastião; Costa, Carlos; Oliveira, José Luis
2012-05-01
Diagnostic imaging procedures have continuously increased over the last decade and this trend may continue in coming years, creating a great impact on storage and retrieval capabilities of current PACS. Moreover, many smaller centers do not have financial resources or requirements that justify the acquisition of a traditional infrastructure. Alternative solutions, such as cloud computing, may help address this emerging need. A tremendous amount of ubiquitous computational power, such as that provided by Google and Amazon, are used every day as a normal commodity. Taking advantage of this new paradigm, an architecture for a Cloud-based PACS archive that provides data privacy, integrity, and availability is proposed. The solution is independent from the cloud provider and the core modules were successfully instantiated in examples of two cloud computing providers. Operational metrics for several medical imaging modalities were tabulated and compared for Google Storage, Amazon S3, and LAN PACS. A PACS-as-a-Service archive that provides storage of medical studies using the Cloud was developed. The results show that the solution is robust and that it is possible to store, query, and retrieve all desired studies in a similar way as in a local PACS approach. Cloud computing is an emerging solution that promises high scalability of infrastructures, software, and applications, according to a "pay-as-you-go" business model. The presented architecture uses the cloud to setup medical data repositories and can have a significant impact on healthcare institutions by reducing IT infrastructures.
Storage element performance optimization for CMS analysis jobs
NASA Astrophysics Data System (ADS)
Behrmann, G.; Dahlblom, J.; Guldmyr, J.; Happonen, K.; Lindén, T.
2012-12-01
Tier-2 computing sites in the Worldwide Large Hadron Collider Computing Grid (WLCG) host CPU-resources (Compute Element, CE) and storage resources (Storage Element, SE). The vast amount of data that needs to processed from the Large Hadron Collider (LHC) experiments requires good and efficient use of the available resources. Having a good CPU efficiency for the end users analysis jobs requires that the performance of the storage system is able to scale with I/O requests from hundreds or even thousands of simultaneous jobs. In this presentation we report on the work on improving the SE performance at the Helsinki Institute of Physics (HIP) Tier-2 used for the Compact Muon Experiment (CMS) at the LHC. Statistics from CMS grid jobs are collected and stored in the CMS Dashboard for further analysis, which allows for easy performance monitoring by the sites and by the CMS collaboration. As part of the monitoring framework CMS uses the JobRobot which sends every four hours 100 analysis jobs to each site. CMS also uses the HammerCloud tool for site monitoring and stress testing and it has replaced the JobRobot. The performance of the analysis workflow submitted with JobRobot or HammerCloud can be used to track the performance due to site configuration changes, since the analysis workflow is kept the same for all sites and for months in time. The CPU efficiency of the JobRobot jobs at HIP was increased approximately by 50 % to more than 90 %, by tuning the SE and by improvements in the CMSSW and dCache software. The performance of the CMS analysis jobs improved significantly too. Similar work has been done on other CMS Tier-sites, since on average the CPU efficiency for CMSSW jobs has increased during 2011. Better monitoring of the SE allows faster detection of problems, so that the performance level can be kept high. The next storage upgrade at HIP consists of SAS disk enclosures which can be stress tested on demand with HammerCloud workflows, to make sure that the I/O-performance is good.
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing
NASA Astrophysics Data System (ADS)
Shi, X.
2017-10-01
Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Private genome analysis through homomorphic encryption
2015-01-01
Background The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data. Methods We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes - the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig. Results The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them. Conclusions The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study. PMID:26733152
High performance data transfer
NASA Astrophysics Data System (ADS)
Cottrell, R.; Fang, C.; Hanushevsky, A.; Kreuger, W.; Yang, W.
2017-10-01
The exponentially increasing need for high speed data transfer is driven by big data, and cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software. This has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a scalable, balanced, easy to deploy and use way while minimizing power and space utilization. In collaboration with several commercial vendors, Proofs of Concept (PoC) consisting of clusters have been put together using off-the- shelf components to test the ZX scalability and ability to balance services using multiple cores, and links. The PoCs are based on SSD flash storage that is managed by a parallel file system. Each cluster occupies 4 rack units. Using the PoCs, between clusters we have achieved almost 200Gbps memory to memory over two 100Gbps links, and 70Gbps parallel file to parallel file with encryption over a 5000 mile 100Gbps link.
The structure of the clouds distributed operating system
NASA Technical Reports Server (NTRS)
Dasgupta, Partha; Leblanc, Richard J., Jr.
1989-01-01
A novel system architecture, based on the object model, is the central structuring concept used in the Clouds distributed operating system. This architecture makes Clouds attractive over a wide class of machines and environments. Clouds is a native operating system, designed and implemented at Georgia Tech. and runs on a set of generated purpose computers connected via a local area network. The system architecture of Clouds is composed of a system-wide global set of persistent (long-lived) virtual address spaces, called objects that contain persistent data and code. The object concept is implemented at the operating system level, thus presenting a single level storage view to the user. Lightweight treads carry computational activity through the code stored in the objects. The persistent objects and threads gives rise to a programming environment composed of shared permanent memory, dispensing with the need for hardware-derived concepts such as the file systems and message systems. Though the hardware may be distributed and may have disks and networks, the Clouds provides the applications with a logically centralized system, based on a shared, structured, single level store. The current design of Clouds uses a minimalist philosophy with respect to both the kernel and the operating system. That is, the kernel and the operating system support a bare minimum of functionality. Clouds also adheres to the concept of separation of policy and mechanism. Most low-level operating system services are implemented above the kernel and most high level services are implemented at the user level. From the measured performance of using the kernel mechanisms, we are able to demonstrate that efficient implementations are feasible for the object model on commercially available hardware. Clouds provides a rich environment for conducting research in distributed systems. Some of the topics addressed in this paper include distributed programming environments, consistency of persistent data and fault-tolerance.
If It's in the Cloud, Get It on Paper: Cloud Computing Contract Issues
ERIC Educational Resources Information Center
Trappler, Thomas J.
2010-01-01
Much recent discussion has focused on the pros and cons of cloud computing. Some institutions are attracted to cloud computing benefits such as rapid deployment, flexible scalability, and low initial start-up cost, while others are concerned about cloud computing risks such as those related to data location, level of service, and security…
A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy
NASA Astrophysics Data System (ADS)
Popic, Victoria; Batzoglou, Serafim
2017-05-01
Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party.
A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy
Popic, Victoria; Batzoglou, Serafim
2017-01-01
Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party. PMID:28508884
Development of a satellite-based nowcasting system for surface solar radiation
NASA Astrophysics Data System (ADS)
Limbach, Sebastian; Hungershoefer, Katja; Müller, Richard; Trentmann, Jörg; Asmus, Jörg; Schömer, Elmar; Groß, André
2014-05-01
The goal of the RadNowCast project was the development of a tool-chain for a satellite-based nowcasting of the all sky global and direct surface solar radiation. One important application of such short-term forecasts is the computation of the expected energy yield of photovoltaic systems. This information is of great importance for an efficient balancing of power generation and consumption in large, decentralized power grids. Our nowcasting approach is based on an optical-flow analysis of a series of Meteosat SEVIRI satellite images. For this, we extended and combined several existing software tools and set up a series of benchmarks for determining the optimal forecasting parameters. The first step in our processing-chain is the determination of the cloud albedo from the HRV (High Resolution Visible)-satellite images using a Heliosat-type method. The actual nowcasting is then performed by a commercial software system in two steps: First, vector fields characterizing the movement of the clouds are derived from the cloud albedo data from the previous 15 min to 2 hours. Next, these vector fields are combined with the most recent cloud albedo data in order to extrapolate the cloud albedo in the near future. In the last step of the processing, the Gnu-Magic software is used to calculate the global and direct solar radiation based on the forecasted cloud albedo data. For an evaluation of the strengths and weaknesses of our nowcastig system, we analyzed four different benchmarks, each of which covered different weather conditions. We compared the forecasted data with radiation data derived from the real satellite images of the corresponding time steps. The impact of different parameters on the cloud albedo nowcasting and the surface radiation computation has been analysed. Additionally, we could show that our cloud-albedo-based forecasts outperform forecasts based on the original HRV images. Possible future extension are the incorporation of additional data sources, for example NWC-SAF high resolution wind fields, in order to improve the quality of the atmospheric motion fields, and experiments with custom, optimized software components for the optical-flow estimation and the nowcasting.
Introducing the Cloud in an Introductory IT Course
ERIC Educational Resources Information Center
Woods, David M.
2018-01-01
Cloud computing is a rapidly emerging topic, but should it be included in an introductory IT course? The magnitude of cloud computing use, especially cloud infrastructure, along with students' limited knowledge of the topic support adding cloud content to the IT curriculum. There are several arguments that support including cloud computing in an…
Enabling Earth Science Through Cloud Computing
NASA Technical Reports Server (NTRS)
Hardman, Sean; Riofrio, Andres; Shams, Khawaja; Freeborn, Dana; Springer, Paul; Chafin, Brian
2012-01-01
Cloud Computing holds tremendous potential for missions across the National Aeronautics and Space Administration. Several flight missions are already benefiting from an investment in cloud computing for mission critical pipelines and services through faster processing time, higher availability, and drastically lower costs available on cloud systems. However, these processes do not currently extend to general scientific algorithms relevant to earth science missions. The members of the Airborne Cloud Computing Environment task at the Jet Propulsion Laboratory have worked closely with the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to integrate cloud computing into their science data processing pipeline. This paper details the efforts involved in deploying a science data system for the CARVE mission, evaluating and integrating cloud computing solutions with the system and porting their science algorithms for execution in a cloud environment.
Enhancing Security by System-Level Virtualization in Cloud Computing Environments
NASA Astrophysics Data System (ADS)
Sun, Dawei; Chang, Guiran; Tan, Chunguang; Wang, Xingwei
Many trends are opening up the era of cloud computing, which will reshape the IT industry. Virtualization techniques have become an indispensable ingredient for almost all cloud computing system. By the virtual environments, cloud provider is able to run varieties of operating systems as needed by each cloud user. Virtualization can improve reliability, security, and availability of applications by using consolidation, isolation, and fault tolerance. In addition, it is possible to balance the workloads by using live migration techniques. In this paper, the definition of cloud computing is given; and then the service and deployment models are introduced. An analysis of security issues and challenges in implementation of cloud computing is identified. Moreover, a system-level virtualization case is established to enhance the security of cloud computing environments.
ATLAS computing on Swiss Cloud SWITCHengines
NASA Astrophysics Data System (ADS)
Haug, S.; Sciacca, F. G.; ATLAS Collaboration
2017-10-01
Consolidation towards more computing at flat budgets beyond what pure chip technology can offer, is a requirement for the full scientific exploitation of the future data from the Large Hadron Collider at CERN in Geneva. One consolidation measure is to exploit cloud infrastructures whenever they are financially competitive. We report on the technical solutions and the performances used and achieved running simulation tasks for the ATLAS experiment on SWITCHengines. SWITCHengines is a new infrastructure as a service offered to Swiss academia by the National Research and Education Network SWITCH. While solutions and performances are general, financial considerations and policies, on which we also report, are country specific.
Military clouds: utilization of cloud computing systems at the battlefield
NASA Astrophysics Data System (ADS)
Süleyman, Sarıkürk; Volkan, Karaca; İbrahim, Kocaman; Ahmet, Şirzai
2012-05-01
Cloud computing is known as a novel information technology (IT) concept, which involves facilitated and rapid access to networks, servers, data saving media, applications and services via Internet with minimum hardware requirements. Use of information systems and technologies at the battlefield is not new. Information superiority is a force multiplier and is crucial to mission success. Recent advances in information systems and technologies provide new means to decision makers and users in order to gain information superiority. These developments in information technologies lead to a new term, which is known as network centric capability. Similar to network centric capable systems, cloud computing systems are operational today. In the near future extensive use of military clouds at the battlefield is predicted. Integrating cloud computing logic to network centric applications will increase the flexibility, cost-effectiveness, efficiency and accessibility of network-centric capabilities. In this paper, cloud computing and network centric capability concepts are defined. Some commercial cloud computing products and applications are mentioned. Network centric capable applications are covered. Cloud computing supported battlefield applications are analyzed. The effects of cloud computing systems on network centric capability and on the information domain in future warfare are discussed. Battlefield opportunities and novelties which might be introduced to network centric capability by cloud computing systems are researched. The role of military clouds in future warfare is proposed in this paper. It was concluded that military clouds will be indispensible components of the future battlefield. Military clouds have the potential of improving network centric capabilities, increasing situational awareness at the battlefield and facilitating the settlement of information superiority.