A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-08
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing
Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui
2017-01-01
Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4× speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration. PMID:28075343
On the Modeling and Management of Cloud Data Analytics
NASA Astrophysics Data System (ADS)
Castillo, Claris; Tantawi, Asser; Steinder, Malgorzata; Pacifici, Giovanni
A new era is dawning where vast amount of data is subjected to intensive analysis in a cloud computing environment. Over the years, data about a myriad of things, ranging from user clicks to galaxies, have been accumulated, and continue to be collected, on storage media. The increasing availability of such data, along with the abundant supply of compute power and the urge to create useful knowledge, gave rise to a new data analytics paradigm in which data is subjected to intensive analysis, and additional data is created in the process. Meanwhile, a new cloud computing environment has emerged where seemingly limitless compute and storage resources are being provided to host computation and data for multiple users through virtualization technologies. Such a cloud environment is becoming the home for data analytics. Consequently, providing good performance at run-time to data analytics workload is an important issue for cloud management. In this paper, we provide an overview of the data analytics and cloud environment landscapes, and investigate the performance management issues related to running data analytics in the cloud. In particular, we focus on topics such as workload characterization, profiling analytics applications and their pattern of data usage, cloud resource allocation, placement of computation and data and their dynamic migration in the cloud, and performance prediction. In solving such management problems one relies on various run-time analytic models. We discuss approaches for modeling and optimizing the dynamic data analytics workload in the cloud environment. All along, we use the Map-Reduce paradigm as an illustration of data analytics.
COMBAT: mobile-Cloud-based cOmpute/coMmunications infrastructure for BATtlefield applications
NASA Astrophysics Data System (ADS)
Soyata, Tolga; Muraleedharan, Rajani; Langdon, Jonathan; Funai, Colin; Ames, Scott; Kwon, Minseok; Heinzelman, Wendi
2012-05-01
The amount of data processed annually over the Internet has crossed the zetabyte boundary, yet this Big Data cannot be efficiently processed or stored using today's mobile devices. Parallel to this explosive growth in data, a substantial increase in mobile compute-capability and the advances in cloud computing have brought the state-of-the- art in mobile-cloud computing to an inflection point, where the right architecture may allow mobile devices to run applications utilizing Big Data and intensive computing. In this paper, we propose the MObile Cloud-based Hybrid Architecture (MOCHA), which formulates a solution to permit mobile-cloud computing applications such as object recognition in the battlefield by introducing a mid-stage compute- and storage-layer, called the cloudlet. MOCHA is built on the key observation that many mobile-cloud applications have the following characteristics: 1) they are compute-intensive, requiring the compute-power of a supercomputer, and 2) they use Big Data, requiring a communications link to cloud-based database sources in near-real-time. In this paper, we describe the operation of MOCHA in battlefield applications, by formulating the aforementioned mobile and cloudlet to be housed within a soldier's vest and inside a military vehicle, respectively, and enabling access to the cloud through high latency satellite links. We provide simulations using the traditional mobile-cloud approach as well as utilizing MOCHA with a mid-stage cloudlet to quantify the utility of this architecture. We show that the MOCHA platform for mobile-cloud computing promises a future for critical battlefield applications that access Big Data, which is currently not possible using existing technology.
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.
Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina
2013-01-28
Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
Data Intensive Scientific Workflows on a Federated Cloud: CRADA Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
The Fermilab Scientific Computing Division and the KISTI Global Science Experimental Data Hub Center have built a prototypical large-scale infrastructure to handle scientific workflows of stakeholders to run on multiple cloud resources. The demonstrations have been in the areas of (a) Data-Intensive Scientific Workflows on Federated Clouds, (b) Interoperability and Federation of Cloud Resources, and (c) Virtual Infrastructure Automation to enable On-Demand Services.
Cloud computing applications for biomedical science: A perspective.
Navale, Vivek; Bourne, Philip E
2018-06-01
Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Cloud computing applications for biomedical science: A perspective
2018-01-01
Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
Evaluating open-source cloud computing solutions for geosciences
NASA Astrophysics Data System (ADS)
Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong
2013-09-01
Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
Enabling Large-Scale Biomedical Analysis in the Cloud
Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen
2013-01-01
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
Yang, Shuai; Zhang, Xinlei; Diao, Lihong; Guo, Feifei; Wang, Dan; Liu, Zhongyang; Li, Honglei; Zheng, Junjie; Pan, Jingshan; Nice, Edouard C; Li, Dong; He, Fuchu
2015-09-04
The Chromosome-centric Human Proteome Project (C-HPP) aims to catalog genome-encoded proteins using a chromosome-by-chromosome strategy. As the C-HPP proceeds, the increasing requirement for data-intensive analysis of the MS/MS data poses a challenge to the proteomic community, especially small laboratories lacking computational infrastructure. To address this challenge, we have updated the previous CAPER browser into a higher version, CAPER 3.0, which is a scalable cloud-based system for data-intensive analysis of C-HPP data sets. CAPER 3.0 uses cloud computing technology to facilitate MS/MS-based peptide identification. In particular, it can use both public and private cloud, facilitating the analysis of C-HPP data sets. CAPER 3.0 provides a graphical user interface (GUI) to help users transfer data, configure jobs, track progress, and visualize the results comprehensively. These features enable users without programming expertise to easily conduct data-intensive analysis using CAPER 3.0. Here, we illustrate the usage of CAPER 3.0 with four specific mass spectral data-intensive problems: detecting novel peptides, identifying single amino acid variants (SAVs) derived from known missense mutations, identifying sample-specific SAVs, and identifying exon-skipping events. CAPER 3.0 is available at http://prodigy.bprc.ac.cn/caper3.
Genomic cloud computing: legal and ethical points to consider
Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Burton, Paul; Chisholm, Rex; Fortier, Isabel; Goodwin, Pat; Harris, Jennifer; Hveem, Kristian; Kaye, Jane; Kent, Alistair; Knoppers, Bartha Maria; Lindpaintner, Klaus; Little, Julian; Riegman, Peter; Ripatti, Samuli; Stolk, Ronald; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn; Joly, Yann; Kato, Kazuto; Knoppers, Bartha Maria; Rodriguez, Laura Lyman; McPherson, Treasa; Nicolás, Pilar; Ouellette, Francis; Romeo-Casabona, Carlos; Sarin, Rajiv; Wallace, Susan; Wiesner, Georgia; Wilson, Julia; Zeps, Nikolajs; Simkevitz, Howard; De Rienzo, Assunta; Knoppers, Bartha M
2015-01-01
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key ‘points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These ‘points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure. PMID:25248396
Genomic cloud computing: legal and ethical points to consider.
Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Knoppers, Bartha M
2015-10-01
The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key 'points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These 'points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure.
Security Risks of Cloud Computing and Its Emergence as 5th Utility Service
NASA Astrophysics Data System (ADS)
Ahmad, Mushtaq
Cloud Computing is being projected by the major cloud services provider IT companies such as IBM, Google, Yahoo, Amazon and others as fifth utility where clients will have access for processing those applications and or software projects which need very high processing speed for compute intensive and huge data capacity for scientific, engineering research problems and also e- business and data content network applications. These services for different types of clients are provided under DASM-Direct Access Service Management based on virtualization of hardware, software and very high bandwidth Internet (Web 2.0) communication. The paper reviews these developments for Cloud Computing and Hardware/Software configuration of the cloud paradigm. The paper also examines the vital aspects of security risks projected by IT Industry experts, cloud clients. The paper also highlights the cloud provider's response to cloud security risks.
A lightweight distributed framework for computational offloading in mobile cloud computing.
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC.
A Lightweight Distributed Framework for Computational Offloading in Mobile Cloud Computing
Shiraz, Muhammad; Gani, Abdullah; Ahmad, Raja Wasim; Adeel Ali Shah, Syed; Karim, Ahmad; Rahman, Zulkanain Abdul
2014-01-01
The latest developments in mobile computing technology have enabled intensive applications on the modern Smartphones. However, such applications are still constrained by limitations in processing potentials, storage capacity and battery lifetime of the Smart Mobile Devices (SMDs). Therefore, Mobile Cloud Computing (MCC) leverages the application processing services of computational clouds for mitigating resources limitations in SMDs. Currently, a number of computational offloading frameworks are proposed for MCC wherein the intensive components of the application are outsourced to computational clouds. Nevertheless, such frameworks focus on runtime partitioning of the application for computational offloading, which is time consuming and resources intensive. The resource constraint nature of SMDs require lightweight procedures for leveraging computational clouds. Therefore, this paper presents a lightweight framework which focuses on minimizing additional resources utilization in computational offloading for MCC. The framework employs features of centralized monitoring, high availability and on demand access services of computational clouds for computational offloading. As a result, the turnaround time and execution cost of the application are reduced. The framework is evaluated by testing prototype application in the real MCC environment. The lightweight nature of the proposed framework is validated by employing computational offloading for the proposed framework and the latest existing frameworks. Analysis shows that by employing the proposed framework for computational offloading, the size of data transmission is reduced by 91%, energy consumption cost is minimized by 81% and turnaround time of the application is decreased by 83.5% as compared to the existing offloading frameworks. Hence, the proposed framework minimizes additional resources utilization and therefore offers lightweight solution for computational offloading in MCC. PMID:25127245
A Cost-Benefit Study of Doing Astrophysics On The Cloud: Production of Image Mosaics
NASA Astrophysics Data System (ADS)
Berriman, G. B.; Good, J. C. Deelman, E.; Singh, G. Livny, M.
2009-09-01
Utility grids such as the Amazon EC2 and Amazon S3 clouds offer computational and storage resources that can be used on-demand for a fee by compute- and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. We studied via simulation the cost performance trade-offs of different execution and resource provisioning plans by creating, under the Amazon cloud fee structure, mosaics with the Montage image mosaic engine, a widely used data- and compute-intensive application. Specifically, we studied the cost of building mosaics of 2MASS data that have sizes of 1, 2 and 4 square degrees, and a 2MASS all-sky mosaic. These are examples of mosaics commonly generated by astronomers. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archiving. Our results show that by provisioning the right amount of storage and compute resources cost can be significantly reduced with no significant impact on application performance.
Cloud Computing Boosts Business Intelligence of Telecommunication Industry
NASA Astrophysics Data System (ADS)
Xu, Meng; Gao, Dan; Deng, Chao; Luo, Zhiguo; Sun, Shaoling
Business Intelligence becomes an attracting topic in today's data intensive applications, especially in telecommunication industry. Meanwhile, Cloud Computing providing IT supporting Infrastructure with excellent scalability, large scale storage, and high performance becomes an effective way to implement parallel data processing and data mining algorithms. BC-PDM (Big Cloud based Parallel Data Miner) is a new MapReduce based parallel data mining platform developed by CMRI (China Mobile Research Institute) to fit the urgent requirements of business intelligence in telecommunication industry. In this paper, the architecture, functionality and performance of BC-PDM are presented, together with the experimental evaluation and case studies of its applications. The evaluation result demonstrates both the usability and the cost-effectiveness of Cloud Computing based Business Intelligence system in applications of telecommunication industry.
Streaming support for data intensive cloud-based sequence analysis.
Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
The emerging role of cloud computing in molecular modelling.
Ebejer, Jean-Paul; Fulle, Simone; Morris, Garrett M; Finn, Paul W
2013-07-01
There is a growing recognition of the importance of cloud computing for large-scale and data-intensive applications. The distinguishing features of cloud computing and their relationship to other distributed computing paradigms are described, as are the strengths and weaknesses of the approach. We review the use made to date of cloud computing for molecular modelling projects and the availability of front ends for molecular modelling applications. Although the use of cloud computing technologies for molecular modelling is still in its infancy, we demonstrate its potential by presenting several case studies. Rapid growth can be expected as more applications become available and costs continue to fall; cloud computing can make a major contribution not just in terms of the availability of on-demand computing power, but could also spur innovation in the development of novel approaches that utilize that capacity in more effective ways. Copyright © 2013 Elsevier Inc. All rights reserved.
Integrating the Apache Big Data Stack with HPC for Big Data
NASA Astrophysics Data System (ADS)
Fox, G. C.; Qiu, J.; Jha, S.
2014-12-01
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However, the same is not so true for data intensive computing, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations. We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures. We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks and use these to identify a few key classes of hardware/software architectures. Our analysis builds on combining HPC and ABDS the Apache big data software stack that is well used in modern cloud computing. Initial results on clouds and HPC systems are encouraging. We propose the development of SPIDAL - Scalable Parallel Interoperable Data Analytics Library -- built on system aand data abstractions suggested by the HPC-ABDS architecture. We discuss how it can be used in several application areas including Polar Science.
Streaming Support for Data Intensive Cloud-Based Sequence Analysis
Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461
NASA Astrophysics Data System (ADS)
Wan, Junwei; Chen, Hongyan; Zhao, Jing
2017-08-01
According to the requirements of real-time, reliability and safety for aerospace experiment, the single center cloud computing technology application verification platform is constructed. At the IAAS level, the feasibility of the cloud computing technology be applied to the field of aerospace experiment is tested and verified. Based on the analysis of the test results, a preliminary conclusion is obtained: Cloud computing platform can be applied to the aerospace experiment computing intensive business. For I/O intensive business, it is recommended to use the traditional physical machine.
Hybrid cloud and cluster computing paradigms for life science applications
2010-01-01
Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Hybrid cloud and cluster computing paradigms for life science applications.
Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey
2010-12-21
Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem
NASA Astrophysics Data System (ADS)
Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.
2015-12-01
Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.
Cloud-based Jupyter Notebooks for Water Data Analysis
NASA Astrophysics Data System (ADS)
Castronova, A. M.; Brazil, L.; Seul, M.
2017-12-01
The development and adoption of technologies by the water science community to improve our ability to openly collaborate and share workflows will have a transformative impact on how we address the challenges associated with collaborative and reproducible scientific research. Jupyter notebooks offer one solution by providing an open-source platform for creating metadata-rich toolchains for modeling and data analysis applications. Adoption of this technology within the water sciences, coupled with publicly available datasets from agencies such as USGS, NASA, and EPA enables researchers to easily prototype and execute data intensive toolchains. Moreover, implementing this software stack in a cloud-based environment extends its native functionality to provide researchers a mechanism to build and execute toolchains that are too large or computationally demanding for typical desktop computers. Additionally, this cloud-based solution enables scientists to disseminate data processing routines alongside journal publications in an effort to support reproducibility. For example, these data collection and analysis toolchains can be shared, archived, and published using the HydroShare platform or downloaded and executed locally to reproduce scientific analysis. This work presents the design and implementation of a cloud-based Jupyter environment and its application for collecting, aggregating, and munging various datasets in a transparent, sharable, and self-documented manner. The goals of this work are to establish a free and open source platform for domain scientists to (1) conduct data intensive and computationally intensive collaborative research, (2) utilize high performance libraries, models, and routines within a pre-configured cloud environment, and (3) enable dissemination of research products. This presentation will discuss recent efforts towards achieving these goals, and describe the architectural design of the notebook server in an effort to support collaborative and reproducible science.
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.
2015-01-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L
2015-02-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage
Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-01-01
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query. PMID:29652810
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage.
Guo, Yeting; Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-04-13
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query.
From cosmos to connectomes: the evolution of data-intensive science.
Burns, Randal; Vogelstein, Joshua T; Szalay, Alexander S
2014-09-17
The analysis of data requires computation: originally by hand and more recently by computers. Different models of computing are designed and optimized for different kinds of data. In data-intensive science, the scale and complexity of data exceeds the comfort zone of local data stores on scientific workstations. Thus, cloud computing emerges as the preeminent model, utilizing data centers and high-performance clusters, enabling remote users to access and query subsets of the data efficiently. We examine how data-intensive computational systems originally built for cosmology, the Sloan Digital Sky Survey (SDSS), are now being used in connectomics, at the Open Connectome Project. We list lessons learned and outline the top challenges we expect to face. Success in computational connectomics would drastically reduce the time between idea and discovery, as SDSS did in cosmology. Copyright © 2014 Elsevier Inc. All rights reserved.
Exploring Cloud Computing for Large-scale Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Guang; Han, Binh; Yin, Jian
This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
Integration of drug dosing data with physiological data streams using a cloud computing paradigm.
Bressan, Nadja; James, Andrew; McGregor, Carolyn
2013-01-01
Many drugs are used during the provision of intensive care for the preterm newborn infant. Recommendations for drug dosing in newborns depend upon data from population based pharmacokinetic research. There is a need to be able to modify drug dosing in response to the preterm infant's response to the standard dosing recommendations. The real-time integration of physiological data with drug dosing data would facilitate individualised drug dosing for these immature infants. This paper proposes the use of a novel computational framework that employs real-time, temporal data analysis for this task. Deployment of the framework within the cloud computing paradigm will enable widespread distribution of individualized drug dosing for newborn infants.
Cloud Computing for Protein-Ligand Binding Site Comparison
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery. PMID:23762824
Cloud computing for protein-ligand binding site comparison.
Hung, Che-Lun; Hua, Guan-Jie
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery.
Evaluating virtual hosted desktops for graphics-intensive astronomy
NASA Astrophysics Data System (ADS)
Meade, B. F.; Fluke, C. J.
2018-04-01
Visualisation of data is critical to understanding astronomical phenomena. Today, many instruments produce datasets that are too big to be downloaded to a local computer, yet many of the visualisation tools used by astronomers are deployed only on desktop computers. Cloud computing is increasingly used to provide a computation and simulation platform in astronomy, but it also offers great potential as a visualisation platform. Virtual hosted desktops, with graphics processing unit (GPU) acceleration, allow interactive, graphics-intensive desktop applications to operate co-located with astronomy datasets stored in remote data centres. By combining benchmarking and user experience testing, with a cohort of 20 astronomers, we investigate the viability of replacing physical desktop computers with virtual hosted desktops. In our work, we compare two Apple MacBook computers (one old and one new, representing hardware and opposite ends of the useful lifetime) with two virtual hosted desktops: one commercial (Amazon Web Services) and one in a private research cloud (the Australian NeCTAR Research Cloud). For two-dimensional image-based tasks and graphics-intensive three-dimensional operations - typical of astronomy visualisation workflows - we found that benchmarks do not necessarily provide the best indication of performance. When compared to typical laptop computers, virtual hosted desktops can provide a better user experience, even with lower performing graphics cards. We also found that virtual hosted desktops are equally simple to use, provide greater flexibility in choice of configuration, and may actually be a more cost-effective option for typical usage profiles.
Cloud computing: a new business paradigm for biomedical information sharing.
Rosenthal, Arnon; Mork, Peter; Li, Maya Hao; Stanford, Jean; Koester, David; Reynolds, Patti
2010-04-01
We examine how the biomedical informatics (BMI) community, especially consortia that share data and applications, can take advantage of a new resource called "cloud computing". Clouds generally offer resources on demand. In most clouds, charges are pay per use, based on large farms of inexpensive, dedicated servers, sometimes supporting parallel computing. Substantial economies of scale potentially yield costs much lower than dedicated laboratory systems or even institutional data centers. Overall, even with conservative assumptions, for applications that are not I/O intensive and do not demand a fully mature environment, the numbers suggested that clouds can sometimes provide major improvements, and should be seriously considered for BMI. Methodologically, it was very advantageous to formulate analyses in terms of component technologies; focusing on these specifics enabled us to bypass the cacophony of alternative definitions (e.g., exactly what does a cloud include) and to analyze alternatives that employ some of the component technologies (e.g., an institution's data center). Relative analyses were another great simplifier. Rather than listing the absolute strengths and weaknesses of cloud-based systems (e.g., for security or data preservation), we focus on the changes from a particular starting point, e.g., individual lab systems. We often find a rough parity (in principle), but one needs to examine individual acquisitions--is a loosely managed lab moving to a well managed cloud, or a tightly managed hospital data center moving to a poorly safeguarded cloud? 2009 Elsevier Inc. All rights reserved.
Federated data storage system prototype for LHC experiments and data intensive science
NASA Astrophysics Data System (ADS)
Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Ryabinkin, E.; Zarochentsev, A.
2017-10-01
Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted physics computing community to evaluate new data handling and processing solutions. Russian grid sites and universities’ clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics collaborations. In our project we address the fundamental problem of designing a computing architecture to integrate distributed storage resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. Studies include development and implementation of federated data storage prototype for Worldwide LHC Computing Grid (WLCG) centres of different levels and University clusters within one National Cloud. The prototype is based on computing resources located in Moscow, Dubna, Saint Petersburg, Gatchina and Geneva. This project intends to implement a federated distributed storage for all kind of operations such as read/write/transfer and access via WAN from Grid centres, university clusters, supercomputers, academic and commercial clouds. The efficiency and performance of the system are demonstrated using synthetic and experiment-specific tests including real data processing and analysis workflows from ATLAS and ALICE experiments, as well as compute-intensive bioinformatics applications (PALEOMIX) running on supercomputers. We present topology and architecture of the designed system, report performance and statistics for different access patterns and show how federated data storage can be used efficiently by physicists and biologists. We also describe how sharing data on a widely distributed storage system can lead to a new computing model and reformations of computing style, for instance how bioinformatics program running on supercomputers can read/write data from the federated storage.
Cloud Based Metalearning System for Predictive Modeling of Biomedical Data
Vukićević, Milan
2014-01-01
Rapid growth and storage of biomedical data enabled many opportunities for predictive modeling and improvement of healthcare processes. On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies for analysis of biomedical data. PMID:24892101
Application of microarray analysis on computer cluster and cloud platforms.
Bernau, C; Boulesteix, A-L; Knaus, J
2013-01-01
Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Neylon, J; Min, Y; Kupelian, P; Low, D A; Santhanam, A
2017-04-01
In this paper, a multi-GPU cloud-based server (MGCS) framework is presented for dose calculations, exploring the feasibility of remote computing power for parallelization and acceleration of computationally and time intensive radiotherapy tasks in moving toward online adaptive therapies. An analytical model was developed to estimate theoretical MGCS performance acceleration and intelligently determine workload distribution. Numerical studies were performed with a computing setup of 14 GPUs distributed over 4 servers interconnected by a 1 Gigabits per second (Gbps) network. Inter-process communication methods were optimized to facilitate resource distribution and minimize data transfers over the server interconnect. The analytically predicted computation time predicted matched experimentally observations within 1-5 %. MGCS performance approached a theoretical limit of acceleration proportional to the number of GPUs utilized when computational tasks far outweighed memory operations. The MGCS implementation reproduced ground-truth dose computations with negligible differences, by distributing the work among several processes and implemented optimization strategies. The results showed that a cloud-based computation engine was a feasible solution for enabling clinics to make use of fast dose calculations for advanced treatment planning and adaptive radiotherapy. The cloud-based system was able to exceed the performance of a local machine even for optimized calculations, and provided significant acceleration for computationally intensive tasks. Such a framework can provide access to advanced technology and computational methods to many clinics, providing an avenue for standardization across institutions without the requirements of purchasing, maintaining, and continually updating hardware.
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
A Semantic Based Policy Management Framework for Cloud Computing Environments
ERIC Educational Resources Information Center
Takabi, Hassan
2013-01-01
Cloud computing paradigm has gained tremendous momentum and generated intensive interest. Although security issues are delaying its fast adoption, cloud computing is an unstoppable force and we need to provide security mechanisms to ensure its secure adoption. In this dissertation, we mainly focus on issues related to policy management and access…
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing
NASA Astrophysics Data System (ADS)
Chen, A.; Pham, L.; Kempler, S.; Theobald, M.; Esfandiari, A.; Campino, J.; Vollmer, B.; Lynnes, C.
2011-12-01
Cloud Computing technology has been used to offer high-performance and low-cost computing and storage resources for both scientific problems and business services. Several cloud computing services have been implemented in the commercial arena, e.g. Amazon's EC2 & S3, Microsoft's Azure, and Google App Engine. There are also some research and application programs being launched in academia and governments to utilize Cloud Computing. NASA launched the Nebula Cloud Computing platform in 2008, which is an Infrastructure as a Service (IaaS) to deliver on-demand distributed virtual computers. Nebula users can receive required computing resources as a fully outsourced service. NASA Goddard Earth Science Data and Information Service Center (GES DISC) migrated several GES DISC's applications to the Nebula as a proof of concept, including: a) The Simple, Scalable, Script-based Science Processor for Measurements (S4PM) for processing scientific data; b) the Atmospheric Infrared Sounder (AIRS) data process workflow for processing AIRS raw data; and c) the GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI) for online access to, analysis, and visualization of Earth science data. This work aims to evaluate the practicability and adaptability of the Nebula. The initial work focused on the AIRS data process workflow to evaluate the Nebula. The AIRS data process workflow consists of a series of algorithms being used to process raw AIRS level 0 data and output AIRS level 2 geophysical retrievals. Migrating the entire workflow to the Nebula platform is challenging, but practicable. After installing several supporting libraries and the processing code itself, the workflow is able to process AIRS data in a similar fashion to its current (non-cloud) configuration. We compared the performance of processing 2 days of AIRS level 0 data through level 2 using a Nebula virtual computer and a local Linux computer. The result shows that Nebula has significantly better performance than the local machine. Much of the difference was due to newer equipment in the Nebula than the legacy computer, which is suggestive of a potential economic advantage beyond elastic power, i.e., access to up-to-date hardware vs. legacy hardware that must be maintained past its prime to amortize the cost. In addition to a trade study of advantages and challenges of porting complex processing to the cloud, a tutorial was developed to enable further progress in utilizing the Nebula for Earth Science applications and understanding better the potential for Cloud Computing in further data- and computing-intensive Earth Science research. In particular, highly bursty computing such as that experienced in the user-demand-driven Giovanni system may become more tractable in a Cloud environment. Our future work will continue to focus on migrating more GES DISC's applications/instances, e.g. Giovanni instances, to the Nebula platform and making matured migrated applications to be in operation on the Nebula.
FUNCTION GENERATOR FOR ANALOGUE COMPUTERS
Skramstad, H.K.; Wright, J.H.; Taback, L.
1961-12-12
An improved analogue computer is designed which can be used to determine the final ground position of radioactive fallout particles in an atomic cloud. The computer determines the fallout pattern on the basis of known wind velocity and direction at various altitudes, and intensity of radioactivity in the mushroom cloud as a function of particle size and initial height in the cloud. The output is then displayed on a cathode-ray tube so that the average or total luminance of the tube screen at any point represents the intensity of radioactive fallout at the geographical location represented by that point. (AEC)
The Magellan Final Report on Cloud Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
,; Coghlan, Susan; Yelick, Katherine
The goal of Magellan, a project funded through the U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR), was to investigate the potential role of cloud computing in addressing the computing needs for the DOE Office of Science (SC), particularly related to serving the needs of mid- range computing and future data-intensive computing workloads. A set of research questions was formed to probe various aspects of cloud computing from performance, usability, and cost. To address these questions, a distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility (ALCF) and the National Energy Research Scientific Computingmore » Center (NERSC). The testbed was designed to be flexible and capable enough to explore a variety of computing models and hardware design points in order to understand the impact for various scientific applications. During the project, the testbed also served as a valuable resource to application scientists. Applications from a diverse set of projects such as MG-RAST (a metagenomics analysis server), the Joint Genome Institute, the STAR experiment at the Relativistic Heavy Ion Collider, and the Laser Interferometer Gravitational Wave Observatory (LIGO), were used by the Magellan project for benchmarking within the cloud, but the project teams were also able to accomplish important production science utilizing the Magellan cloud resources.« less
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.
2013-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved 'clock time' speedups in fusing datasets on our own compute nodes and in the public Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed 'near' the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill. SciReduce Architecture
Uncover the Cloud for Geospatial Sciences and Applications to Adopt Cloud Computing
NASA Astrophysics Data System (ADS)
Yang, C.; Huang, Q.; Xia, J.; Liu, K.; Li, J.; Xu, C.; Sun, M.; Bambacus, M.; Xu, Y.; Fay, D.
2012-12-01
Cloud computing is emerging as the future infrastructure for providing computing resources to support and enable scientific research, engineering development, and application construction, as well as work force education. On the other hand, there is a lot of doubt about the readiness of cloud computing to support a variety of scientific research, development and educations. This research is a project funded by NASA SMD to investigate through holistic studies how ready is the cloud computing to support geosciences. Four applications with different computing characteristics including data, computing, concurrent, and spatiotemporal intensities are taken to test the readiness of cloud computing to support geosciences. Three popular and representative cloud platforms including Amazon EC2, Microsoft Azure, and NASA Nebula as well as a traditional cluster are utilized in the study. Results illustrates that cloud is ready to some degree but more research needs to be done to fully implemented the cloud benefit as advertised by many vendors and defined by NIST. Specifically, 1) most cloud platform could help stand up new computing instances, a new computer, in a few minutes as envisioned, therefore, is ready to support most computing needs in an on demand fashion; 2) the load balance and elasticity, a defining characteristic, is ready in some cloud platforms, such as Amazon EC2, to support bigger jobs, e.g., needs response in minutes, while some are not ready to support the elasticity and load balance well. All cloud platform needs further research and development to support real time application at subminute level; 3) the user interface and functionality of cloud platforms vary a lot and some of them are very professional and well supported/documented, such as Amazon EC2, some of them needs significant improvement for the general public to adopt cloud computing without professional training or knowledge about computing infrastructure; 4) the security is a big concern in cloud computing platform, with the sharing spirit of cloud computing, it is very hard to ensure higher level security, except a private cloud is built for a specific organization without public access, public cloud platform does not support FISMA medium level yet and may never be able to support FISMA high level; 5) HPC jobs needs of cloud computing is not well supported and only Amazon EC2 supports this well. The research is being taken by NASA and other agencies to consider cloud computing adoption. We hope the publication of the research would also benefit the public to adopt cloud computing.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Toward a Big Data Science: A challenge of "Science Cloud"
NASA Astrophysics Data System (ADS)
Murata, Ken T.; Watanabe, Hidenobu
2013-04-01
During these 50 years, along with appearance and development of high-performance computers (and super-computers), numerical simulation is considered to be a third methodology for science, following theoretical (first) and experimental and/or observational (second) approaches. The variety of data yielded by the second approaches has been getting more and more. It is due to the progress of technologies of experiments and observations. The amount of the data generated by the third methodologies has been getting larger and larger. It is because of tremendous development and programming techniques of super computers. Most of the data files created by both experiments/observations and numerical simulations are saved in digital formats and analyzed on computers. The researchers (domain experts) are interested in not only how to make experiments and/or observations or perform numerical simulations, but what information (new findings) to extract from the data. However, data does not usually tell anything about the science; sciences are implicitly hidden in the data. Researchers have to extract information to find new sciences from the data files. This is a basic concept of data intensive (data oriented) science for Big Data. As the scales of experiments and/or observations and numerical simulations get larger, new techniques and facilities are required to extract information from a large amount of data files. The technique is called as informatics as a fourth methodology for new sciences. Any methodologies must work on their facilities: for example, space environment are observed via spacecraft and numerical simulations are performed on super-computers, respectively in space science. The facility of the informatics, which deals with large-scale data, is a computational cloud system for science. This paper is to propose a cloud system for informatics, which has been developed at NICT (National Institute of Information and Communications Technology), Japan. The NICT science cloud, we named as OneSpaceNet (OSN), is the first open cloud system for scientists who are going to carry out their informatics for their own science. The science cloud is not for simple uses. Many functions are expected to the science cloud; such as data standardization, data collection and crawling, large and distributed data storage system, security and reliability, database and meta-database, data stewardship, long-term data preservation, data rescue and preservation, data mining, parallel processing, data publication and provision, semantic web, 3D and 4D visualization, out-reach and in-reach, and capacity buildings. Figure (not shown here) is a schematic picture of the NICT science cloud. Both types of data from observation and simulation are stored in the storage system in the science cloud. It should be noted that there are two types of data in observation. One is from archive site out of the cloud: this is a data to be downloaded through the Internet to the cloud. The other one is data from the equipment directly connected to the science cloud. They are often called as sensor clouds. In the present talk, we first introduce the NICT science cloud. We next demonstrate the efficiency of the science cloud, showing several scientific results which we achieved with this cloud system. Through the discussions and demonstrations, the potential performance of sciences cloud will be revealed for any research fields.
A world-wide databridge supported by a commercial cloud provider
NASA Astrophysics Data System (ADS)
Tat Cheung, Kwong; Field, Laurence; Furano, Fabrizio
2017-10-01
Volunteer computing has the potential to provide significant additional computing capacity for the LHC experiments. One of the challenges with exploiting volunteer computing is to support a global community of volunteers that provides heterogeneous resources. However, high energy physics applications require more data input and output than the CPU intensive applications that are typically used by other volunteer computing projects. While the so-called databridge has already been successfully proposed as a method to span the untrusted and trusted domains of volunteer computing and Grid computing respective, globally transferring data between potentially poor-performing residential networks and CERN could be unreliable, leading to wasted resources usage. The expectation is that by placing a storage endpoint that is part of a wider, flexible geographical databridge deployment closer to the volunteers, the transfer success rate and the overall performance can be improved. This contribution investigates the provision of a globally distributed databridge implemented upon a commercial cloud provider.
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E.
2013-05-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept/prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed "near" the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be performed in an efficient way, with the researcher paying only his own Cloud compute bill.; Figure 1 -- Architecture.
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud
Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew
2015-01-01
Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation. PMID:26501966
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.
Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew
2015-01-01
Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.
Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung
2016-05-01
Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.
Web-based interactive visualization in a Grid-enabled neuroimaging application using HTML5.
Siewert, René; Specovius, Svenja; Wu, Jie; Krefting, Dagmar
2012-01-01
Interactive visualization and correction of intermediate results are required in many medical image analysis pipelines. To allow certain interaction in the remote execution of compute- and data-intensive applications, new features of HTML5 are used. They allow for transparent integration of user interaction into Grid- or Cloud-enabled scientific workflows. Both 2D and 3D visualization and data manipulation can be performed through a scientific gateway without the need to install specific software or web browser plugins. The possibilities of web-based visualization are presented along the FreeSurfer-pipeline, a popular compute- and data-intensive software tool for quantitative neuroimaging.
Enabling smart personalized healthcare: a hybrid mobile-cloud approach for ECG telemonitoring.
Wang, Xiaoliang; Gui, Qiong; Liu, Bingwei; Jin, Zhanpeng; Chen, Yu
2014-05-01
The severe challenges of the skyrocketing healthcare expenditure and the fast aging population highlight the needs for innovative solutions supporting more accurate, affordable, flexible, and personalized medical diagnosis and treatment. Recent advances of mobile technologies have made mobile devices a promising tool to manage patients' own health status through services like telemedicine. However, the inherent limitations of mobile devices make them less effective in computation- or data-intensive tasks such as medical monitoring. In this study, we propose a new hybrid mobile-cloud computational solution to enable more effective personalized medical monitoring. To demonstrate the efficacy and efficiency of the proposed approach, we present a case study of mobile-cloud based electrocardiograph monitoring and analysis and develop a mobile-cloud prototype. The experimental results show that the proposed approach can significantly enhance the conventional mobile-based medical monitoring in terms of diagnostic accuracy, execution efficiency, and energy efficiency, and holds the potential in addressing future large-scale data analysis in personalized healthcare.
Seismic waveform modeling over cloud
NASA Astrophysics Data System (ADS)
Luo, Cong; Friederich, Wolfgang
2016-04-01
With the fast growing computational technologies, numerical simulation of seismic wave propagation achieved huge successes. Obtaining the synthetic waveforms through numerical simulation receives an increasing amount of attention from seismologists. However, computational seismology is a data-intensive research field, and the numerical packages usually come with a steep learning curve. Users are expected to master considerable amount of computer knowledge and data processing skills. Training users to use the numerical packages, correctly access and utilize the computational resources is a troubled task. In addition to that, accessing to HPC is also a common difficulty for many users. To solve these problems, a cloud based solution dedicated on shallow seismic waveform modeling has been developed with the state-of-the-art web technologies. It is a web platform integrating both software and hardware with multilayer architecture: a well designed SQL database serves as the data layer, HPC and dedicated pipeline for it is the business layer. Through this platform, users will no longer need to compile and manipulate various packages on the local machine within local network to perform a simulation. By providing users professional access to the computational code through its interfaces and delivering our computational resources to the users over cloud, users can customize the simulation at expert-level, submit and run the job through it.
Analysis on the security of cloud computing
NASA Astrophysics Data System (ADS)
He, Zhonglin; He, Yuhua
2011-02-01
Cloud computing is a new technology, which is the fusion of computer technology and Internet development. It will lead the revolution of IT and information field. However, in cloud computing data and application software is stored at large data centers, and the management of data and service is not completely trustable, resulting in safety problems, which is the difficult point to improve the quality of cloud service. This paper briefly introduces the concept of cloud computing. Considering the characteristics of cloud computing, it constructs the security architecture of cloud computing. At the same time, with an eye toward the security threats cloud computing faces, several corresponding strategies are provided from the aspect of cloud computing users and service providers.
Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing
NASA Astrophysics Data System (ADS)
Bazhirov, Timur; Mohammadi, Mohammad; Ding, Kevin; Barabash, Sergey
Recent advances in cloud computing made it possible to access large-scale computational resources completely on-demand in a rapid and efficient manner. When combined with high fidelity simulations, they serve as an alternative pathway to enable computational discovery and design of new materials through large-scale high-throughput screening. Here, we present a case study for a cloud platform implemented at Exabyte Inc. We perform calculations to screen lightweight ternary alloys for thermodynamic stability. Due to the lack of experimental data for most such systems, we rely on theoretical approaches based on first-principle pseudopotential density functional theory. We calculate the formation energies for a set of ternary compounds approximated by special quasirandom structures. During an example run we were able to scale to 10,656 CPUs within 7 minutes from the start, and obtain results for 296 compounds within 38 hours. The results indicate that the ultimate formation enthalpy of ternary systems can be negative for some of lightweight alloys, including Li and Mg compounds. We conclude that compared to traditional capital-intensive approach that requires in on-premises hardware resources, cloud computing is agile and cost-effective, yet scalable and delivers similar performance.
Large-Scale, Multi-Sensor Atmospheric Data Fusion Using Hybrid Cloud Computing
NASA Astrophysics Data System (ADS)
Wilson, Brian; Manipon, Gerald; Hua, Hook; Fetzer, Eric
2014-05-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map-reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in a hybrid Cloud (private eucalyptus & public Amazon). Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Multi-year datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. We will also present a concept and prototype for staging NASA's A-Train Atmospheric datasets (Levels 2 & 3) in the Amazon Cloud so that any number of compute jobs can be executed "near" the multi-sensor data. Given such a system, multi-sensor climate studies over 10-20 years of data could be perform
NASA Astrophysics Data System (ADS)
van Lew, Baldur; Botha, Charl P.; Milles, Julien R.; Vrooman, Henri A.; van de Giessen, Martijn; Lelieveldt, Boudewijn P. F.
2015-03-01
The cohort size required in epidemiological imaging genetics studies often mandates the pooling of data from multiple hospitals. Patient data, however, is subject to strict privacy protection regimes, and physical data storage may be legally restricted to a hospital network. To enable biomarker discovery, fast data access and interactive data exploration must be combined with high-performance computing resources, while respecting privacy regulations. We present a system using fast and inherently secure light-paths to access distributed data, thereby obviating the need for a central data repository. A secure private cloud computing framework facilitates interactive, computationally intensive exploration of this geographically distributed, privacy sensitive data. As a proof of concept, MRI brain imaging data hosted at two remote sites were processed in response to a user command at a third site. The system was able to automatically start virtual machines, run a selected processing pipeline and write results to a user accessible database, while keeping data locally stored in the hospitals. Individual tasks took approximately 50% longer compared to a locally hosted blade server but the cloud infrastructure reduced the total elapsed time by a factor of 40 using 70 virtual machines in the cloud. We demonstrated that the combination light-path and private cloud is a viable means of building an analysis infrastructure for secure data analysis. The system requires further work in the areas of error handling, load balancing and secure support of multiple users.
A Hierarchical Auction-Based Mechanism for Real-Time Resource Allocation in Cloud Robotic Systems.
Wang, Lujia; Liu, Ming; Meng, Max Q-H
2017-02-01
Cloud computing enables users to share computing resources on-demand. The cloud computing framework cannot be directly mapped to cloud robotic systems with ad hoc networks since cloud robotic systems have additional constraints such as limited bandwidth and dynamic structure. However, most multirobotic applications with cooperative control adopt this decentralized approach to avoid a single point of failure. Robots need to continuously update intensive data to execute tasks in a coordinated manner, which implies real-time requirements. Thus, a resource allocation strategy is required, especially in such resource-constrained environments. This paper proposes a hierarchical auction-based mechanism, namely link quality matrix (LQM) auction, which is suitable for ad hoc networks by introducing a link quality indicator. The proposed algorithm produces a fast and robust method that is accurate and scalable. It reduces both global communication and unnecessary repeated computation. The proposed method is designed for firm real-time resource retrieval for physical multirobot systems. A joint surveillance scenario empirically validates the proposed mechanism by assessing several practical metrics. The results show that the proposed LQM auction outperforms state-of-the-art algorithms for resource allocation.
Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook
2014-01-01
Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data. PMID:25225874
Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook
2014-09-15
Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data.
High Resolution Nature Runs and the Big Data Challenge
NASA Technical Reports Server (NTRS)
Webster, W. Phillip; Duffy, Daniel Q.
2015-01-01
NASA's Global Modeling and Assimilation Office at Goddard Space Flight Center is undertaking a series of very computationally intensive Nature Runs and a downscaled reanalysis. The nature runs use the GEOS-5 as an Atmospheric General Circulation Model (AGCM) while the reanalysis uses the GEOS-5 in Data Assimilation mode. This paper will present computational challenges from three runs, two of which are AGCM and one is downscaled reanalysis using the full DAS. The nature runs will be completed at two surface grid resolutions, 7 and 3 kilometers and 72 vertical levels. The 7 km run spanned 2 years (2005-2006) and produced 4 PB of data while the 3 km run will span one year and generate 4 BP of data. The downscaled reanalysis (MERRA-II Modern-Era Reanalysis for Research and Applications) will cover 15 years and generate 1 PB of data. Our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. MERRA/AS enables MapReduce analytics over MERRA reanalysis data collection by bringing together the high-performance computing, scalable data management, and a domain-specific climate data services API. NASA's High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC comprises a high speed Infinib and network, high performance file systems and object storage, and a virtual system environments specific for data intensive, science applications. These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs
SAR processing in the cloud for oil detection in the Arctic
NASA Astrophysics Data System (ADS)
Garron, J.; Stoner, C.; Meyer, F. J.
2016-12-01
A new world of opportunity is being thawed from the ice of the Arctic, driven by decreased persistent Arctic sea-ice cover, increases in shipping, tourism, natural resource development. Tools that can automatically monitor key sea ice characteristics and potential oil spills are essential for safe passage in these changing waters. Synthetic aperture radar (SAR) data can be used to discriminate sea ice types and oil on the ocean surface and also for feature tracking. Additionally, SAR can image the earth through the night and most weather conditions. SAR data is volumetrically large and requires significant computing power to manipulate. Algorithms designed to identify key environmental features, like oil spills, in SAR imagery require secondary processing, and are computationally intensive, which can functionally limit their application in a real-time setting. Cloud processing is designed to manage big data and big data processing jobs by means of small cycles of off-site computations, eliminating up-front hardware costs. Pairing SAR data with cloud processing has allowed us to create and solidify a processing pipeline for SAR data products in the cloud to compare operational algorithms efficiency and effectiveness when run using an Alaska Satellite Facility (ASF) defined Amazon Machine Image (AMI). The products created from this secondary processing, were compared to determine which algorithm was most accurate in Arctic feature identification, and what operational conditions were required to produce the results on the ASF defined AMI. Results will be used to inform a series of recommendations to oil-spill response data managers and SAR users interested in expanding their analytical computing power.
BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark
Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung
2016-01-01
Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today’s data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG’s simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact. PMID:27390389
Experience in using commercial clouds in CMS
NASA Astrophysics Data System (ADS)
Bauerdick, L.; Bockelman, B.; Dykstra, D.; Fuess, S.; Garzoglio, G.; Girone, M.; Gutsche, O.; Holzman, B.; Hufnagel, D.; Kim, H.; Kennedy, R.; Mason, D.; Spentzouris, P.; Timm, S.; Tiradani, A.; Vaandering, E.; CMS Collaboration
2017-10-01
Historically high energy physics computing has been performed on large purpose-built computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.
Experience in using commercial clouds in CMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bauerdick, L.; Bockelman, B.; Dykstra, D.
Historically high energy physics computing has been performed on large purposebuilt computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is amore » growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.« less
Diversity in computing technologies and strategies for dynamic resource allocation
Garzoglio, G.; Gutsche, O.
2015-12-23
Here, High Energy Physics (HEP) is a very data intensive and trivially parallelizable science discipline. HEP is probing nature at increasingly finer details requiring ever increasing computational resources to process and analyze experimental data. In this paper, we discuss how HEP provisioned resources so far using Grid technologies, how HEP is starting to include new resource providers like commercial Clouds and HPC installations, and how HEP is transparently provisioning resources at these diverse providers.
Cloud Computing and Its Applications in GIS
NASA Astrophysics Data System (ADS)
Kang, Cao
2011-12-01
Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
Comparison of the different approaches to generate holograms from data acquired with a Kinect sensor
NASA Astrophysics Data System (ADS)
Kang, Ji-Hoon; Leportier, Thibault; Ju, Byeong-Kwon; Song, Jin Dong; Lee, Kwang-Hoon; Park, Min-Chul
2017-05-01
Data of real scenes acquired in real-time with a Kinect sensor can be processed with different approaches to generate a hologram. 3D models can be generated from a point cloud or a mesh representation. The advantage of the point cloud approach is that computation process is well established since it involves only diffraction and propagation of point sources between parallel planes. On the other hand, the mesh representation enables to reduce the number of elements necessary to represent the object. Then, even though the computation time for the contribution of a single element increases compared to a simple point, the total computation time can be reduced significantly. However, the algorithm is more complex since propagation of elemental polygons between non-parallel planes should be implemented. Finally, since a depth map of the scene is acquired at the same time than the intensity image, a depth layer approach can also be adopted. This technique is appropriate for a fast computation since propagation of an optical wavefront from one plane to another can be handled efficiently with the fast Fourier transform. Fast computation with depth layer approach is convenient for real time applications, but point cloud method is more appropriate when high resolution is needed. In this study, since Kinect can be used to obtain both point cloud and depth map, we examine the different approaches that can be adopted for hologram computation and compare their performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Malik, Saif Ur Rehman; Khan, Samee U.; Ewen, Sam J.
2015-03-14
As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobiledevices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amountmore » of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies. Moreover, the breakdown of both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.« less
Openwebglobe 2: Visualization of Complex 3D-GEODATA in the (mobile) Webbrowser
NASA Astrophysics Data System (ADS)
Christen, M.
2016-06-01
Providing worldwide high resolution data for virtual globes consists of compute and storage intense tasks for processing data. Furthermore, rendering complex 3D-Geodata, such as 3D-City models with an extremely high polygon count and a vast amount of textures at interactive framerates is still a very challenging task, especially on mobile devices. This paper presents an approach for processing, caching and serving massive geospatial data in a cloud-based environment for large scale, out-of-core, highly scalable 3D scene rendering on a web based virtual globe. Cloud computing is used for processing large amounts of geospatial data and also for providing 2D and 3D map data to a large amount of (mobile) web clients. In this paper the approach for processing, rendering and caching very large datasets in the currently developed virtual globe "OpenWebGlobe 2" is shown, which displays 3D-Geodata on nearly every device.
NASA Astrophysics Data System (ADS)
Dombrovsky, Leonid A.; Reviznikov, Dmitry L.; Kryukov, Alexei P.; Levashov, Vladimir Yu
2017-10-01
An effect of shielding of an intense solar radiation towards a solar probe with the use of micron-sized SiC particles generated during ablation of a composite thermal protection material is estimated on a basis of numerical solution to a combined radiative and heat transfer problem. The radiative properties of particles are calculated using the Mie theory, and the spectral two-flux model is employed in radiative transfer calculations for non-uniform particle clouds. A computational model for generation and evolution of the cloud is based on a conjugated heat transfer problem taking into account heating and thermal destruction of the matrix of thermal protection material and sublimation of SiC particles in the generated cloud. The effect of light pressure, which is especially important for small particles, is also taken into account. The computational data for mass loss due to the particle cloud sublimation showed the low value about 1 kg/m2 per hour at the distance between the vehicle and the Sun surface of about four radii of the Sun. This indicates that embedding of silicon carbide or other particles into a thermal protection layer and the resulting generation of a particle cloud can be considered as a promising way to improve the possibilities of space missions due to a significant decrease in the vehicle working distance from the solar photosphere.
Big data mining analysis method based on cloud computing
NASA Astrophysics Data System (ADS)
Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao
2017-08-01
Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.
77 FR 74829 - Notice of Public Meeting-Cloud Computing and Big Data Forum and Workshop
Federal Register 2010, 2011, 2012, 2013, 2014
2012-12-18
...--Cloud Computing and Big Data Forum and Workshop AGENCY: National Institute of Standards and Technology... Standards and Technology (NIST) announces a Cloud Computing and Big Data Forum and Workshop to be held on... followed by a one-day hands-on workshop. The NIST Cloud Computing and Big Data Forum and Workshop will...
NASA Astrophysics Data System (ADS)
Zhou, S.; Tao, W. K.; Li, X.; Matsui, T.; Sun, X. H.; Yang, X.
2015-12-01
A cloud-resolving model (CRM) is an atmospheric numerical model that can numerically resolve clouds and cloud systems at 0.25~5km horizontal grid spacings. The main advantage of the CRM is that it can allow explicit interactive processes between microphysics, radiation, turbulence, surface, and aerosols without subgrid cloud fraction, overlapping and convective parameterization. Because of their fine resolution and complex physical processes, it is challenging for the CRM community to i) visualize/inter-compare CRM simulations, ii) diagnose key processes for cloud-precipitation formation and intensity, and iii) evaluate against NASA's field campaign data and L1/L2 satellite data products due to large data volume (~10TB) and complexity of CRM's physical processes. We have been building the Super Cloud Library (SCL) upon a Hadoop framework, capable of CRM database management, distribution, visualization, subsetting, and evaluation in a scalable way. The current SCL capability includes (1) A SCL data model enables various CRM simulation outputs in NetCDF, including the NASA-Unified Weather Research and Forecasting (NU-WRF) and Goddard Cumulus Ensemble (GCE) model, to be accessed and processed by Hadoop, (2) A parallel NetCDF-to-CSV converter supports NU-WRF and GCE model outputs, (3) A technique visualizes Hadoop-resident data with IDL, (4) A technique subsets Hadoop-resident data, compliant to the SCL data model, with HIVE or Impala via HUE's Web interface, (5) A prototype enables a Hadoop MapReduce application to dynamically access and process data residing in a parallel file system, PVFS2 or CephFS, where high performance computing (HPC) simulation outputs such as NU-WRF's and GCE's are located. We are testing Apache Spark to speed up SCL data processing and analysis.With the SCL capabilities, SCL users can conduct large-domain on-demand tasks without downloading voluminous CRM datasets and various observations from NASA Field Campaigns and Satellite data to a local computer, and inter-compare CRM output and data with GCE and NU-WRF.
ProteoCloud: a full-featured open source proteomics cloud computing pipeline.
Muth, Thilo; Peters, Julian; Blackburn, Jonathan; Rapp, Erdmann; Martens, Lennart
2013-08-02
We here present the ProteoCloud pipeline, a freely available, full-featured cloud-based platform to perform computationally intensive, exhaustive searches in a cloud environment using five different peptide identification algorithms. ProteoCloud is entirely open source, and is built around an easy to use and cross-platform software client with a rich graphical user interface. This client allows full control of the number of cloud instances to initiate and of the spectra to assign for identification. It also enables the user to track progress, and to visualize and interpret the results in detail. Source code, binaries and documentation are all available at http://proteocloud.googlecode.com. Copyright © 2012 Elsevier B.V. All rights reserved.
A Review Study on Cloud Computing Issues
NASA Astrophysics Data System (ADS)
Kanaan Kadhim, Qusay; Yusof, Robiah; Sadeq Mahdi, Hamid; Al-shami, Sayed Samer Ali; Rahayu Selamat, Siti
2018-05-01
Cloud computing is the most promising current implementation of utility computing in the business world, because it provides some key features over classic utility computing, such as elasticity to allow clients dynamically scale-up and scale-down the resources in execution time. Nevertheless, cloud computing is still in its premature stage and experiences lack of standardization. The security issues are the main challenges to cloud computing adoption. Thus, critical industries such as government organizations (ministries) are reluctant to trust cloud computing due to the fear of losing their sensitive data, as it resides on the cloud with no knowledge of data location and lack of transparency of Cloud Service Providers (CSPs) mechanisms used to secure their data and applications which have created a barrier against adopting this agile computing paradigm. This study aims to review and classify the issues that surround the implementation of cloud computing which a hot area that needs to be addressed by future research.
NASA Technical Reports Server (NTRS)
Minnis, Patrick; Young, David F.; Sassen, Kenneth; Alvarez, Joseph M.; Grund, Christian J.
1989-01-01
Cirrus cloud radiative and physical characteristics are determined using a combination of ground-based, aircraft, and satellite measurements taken as part of the First ISCCP Regional Experiment (FIRE) Cirrus Intensive Field Observations (IFO) during October and November 1986. Lidar backscatter data are used to define cloud base, center, and top heights and the corresponding temperatures. Coincident GOES 4 km visible (0.65 microns) and 8 km infrared window (11.5 microns) radiances are analyzed to determine cloud emittances and reflectances. Infrared optical depth is computed from the emittance results. Visible optical depth is derived from reflectance using a theoretical ice crystal scattering model and an empirical bidirectional reflectance mode. No clouds with visible optical depths greater than 5 or infrared optical depths less than 0.1 were used in the analysis. Average cloud thickness ranged from 0.5 km to 8 km for the 71 scenes. An average visible scattering efficiency of 2.1 was found for this data set. The results reveal a significant dependence of scattering efficiency on cloud temperature.
NASA Astrophysics Data System (ADS)
Dong, Yumin; Xiao, Shufen; Ma, Hongyang; Chen, Libo
2016-12-01
Cloud computing and big data have become the developing engine of current information technology (IT) as a result of the rapid development of IT. However, security protection has become increasingly important for cloud computing and big data, and has become a problem that must be solved to develop cloud computing. The theft of identity authentication information remains a serious threat to the security of cloud computing. In this process, attackers intrude into cloud computing services through identity authentication information, thereby threatening the security of data from multiple perspectives. Therefore, this study proposes a model for cloud computing protection and management based on quantum authentication, introduces the principle of quantum authentication, and deduces the quantum authentication process. In theory, quantum authentication technology can be applied in cloud computing for security protection. This technology cannot be cloned; thus, it is more secure and reliable than classical methods.
NASA Astrophysics Data System (ADS)
Gupta, S.; Lohani, B.
2014-05-01
Mobile augmented reality system is the next generation technology to visualise 3D real world intelligently. The technology is expanding at a fast pace to upgrade the status of a smart phone to an intelligent device. The research problem identified and presented in the current work is to view actual dimensions of various objects that are captured by a smart phone in real time. The methodology proposed first establishes correspondence between LiDAR point cloud, that are stored in a server, and the image t hat is captured by a mobile. This correspondence is established using the exterior and interior orientation parameters of the mobile camera and the coordinates of LiDAR data points which lie in the viewshed of the mobile camera. A pseudo intensity image is generated using LiDAR points and their intensity. Mobile image and pseudo intensity image are then registered using image registration method SIFT thereby generating a pipeline to locate a point in point cloud corresponding to a point (pixel) on the mobile image. The second part of the method uses point cloud data for computing dimensional information corresponding to the pairs of points selected on mobile image and fetch the dimensions on top of the image. This paper describes all steps of the proposed method. The paper uses an experimental setup to mimic the mobile phone and server system and presents some initial but encouraging results
Future of Department of Defense Cloud Computing Amid Cultural Confusion
2013-03-01
enterprise cloud - computing environment and transition to a public cloud service provider. Services have started the development of individual cloud - computing environments...endorsing cloud computing . It addresses related issues in matters of service culture changes and how strategic leaders will dictate the future of cloud ...through data center consolidation and individual Service provided cloud computing .
Scientific Services on the Cloud
NASA Astrophysics Data System (ADS)
Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong
Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
Castaño-Díez, Daniel
2017-01-01
Dynamo is a package for the processing of tomographic data. As a tool for subtomogram averaging, it includes different alignment and classification strategies. Furthermore, its data-management module allows experiments to be organized in groups of tomograms, while offering specialized three-dimensional tomographic browsers that facilitate visualization, location of regions of interest, modelling and particle extraction in complex geometries. Here, a technical description of the package is presented, focusing on its diverse strategies for optimizing computing performance. Dynamo is built upon mbtools (middle layer toolbox), a general-purpose MATLAB library for object-oriented scientific programming specifically developed to underpin Dynamo but usable as an independent tool. Its structure intertwines a flexible MATLAB codebase with precompiled C++ functions that carry the burden of numerically intensive operations. The package can be delivered as a precompiled standalone ready for execution without a MATLAB license. Multicore parallelization on a single node is directly inherited from the high-level parallelization engine provided for MATLAB, automatically imparting a balanced workload among the threads in computationally intense tasks such as alignment and classification, but also in logistic-oriented tasks such as tomogram binning and particle extraction. Dynamo supports the use of graphical processing units (GPUs), yielding considerable speedup factors both for native Dynamo procedures (such as the numerically intensive subtomogram alignment) and procedures defined by the user through its MATLAB-based GPU library for three-dimensional operations. Cloud-based virtual computing environments supplied with a pre-installed version of Dynamo can be publicly accessed through the Amazon Elastic Compute Cloud (EC2), enabling users to rent GPU computing time on a pay-as-you-go basis, thus avoiding upfront investments in hardware and longterm software maintenance. PMID:28580909
Castaño-Díez, Daniel
2017-06-01
Dynamo is a package for the processing of tomographic data. As a tool for subtomogram averaging, it includes different alignment and classification strategies. Furthermore, its data-management module allows experiments to be organized in groups of tomograms, while offering specialized three-dimensional tomographic browsers that facilitate visualization, location of regions of interest, modelling and particle extraction in complex geometries. Here, a technical description of the package is presented, focusing on its diverse strategies for optimizing computing performance. Dynamo is built upon mbtools (middle layer toolbox), a general-purpose MATLAB library for object-oriented scientific programming specifically developed to underpin Dynamo but usable as an independent tool. Its structure intertwines a flexible MATLAB codebase with precompiled C++ functions that carry the burden of numerically intensive operations. The package can be delivered as a precompiled standalone ready for execution without a MATLAB license. Multicore parallelization on a single node is directly inherited from the high-level parallelization engine provided for MATLAB, automatically imparting a balanced workload among the threads in computationally intense tasks such as alignment and classification, but also in logistic-oriented tasks such as tomogram binning and particle extraction. Dynamo supports the use of graphical processing units (GPUs), yielding considerable speedup factors both for native Dynamo procedures (such as the numerically intensive subtomogram alignment) and procedures defined by the user through its MATLAB-based GPU library for three-dimensional operations. Cloud-based virtual computing environments supplied with a pre-installed version of Dynamo can be publicly accessed through the Amazon Elastic Compute Cloud (EC2), enabling users to rent GPU computing time on a pay-as-you-go basis, thus avoiding upfront investments in hardware and longterm software maintenance.
Arctic Boreal Vulnerability Experiment (ABoVE) Science Cloud
NASA Astrophysics Data System (ADS)
Duffy, D.; Schnase, J. L.; McInerney, M.; Webster, W. P.; Sinno, S.; Thompson, J. H.; Griffith, P. C.; Hoy, E.; Carroll, M.
2014-12-01
The effects of climate change are being revealed at alarming rates in the Arctic and Boreal regions of the planet. NASA's Terrestrial Ecology Program has launched a major field campaign to study these effects over the next 5 to 8 years. The Arctic Boreal Vulnerability Experiment (ABoVE) will challenge scientists to take measurements in the field, study remote observations, and even run models to better understand the impacts of a rapidly changing climate for areas of Alaska and western Canada. The NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center (GSFC) has partnered with the Terrestrial Ecology Program to create a science cloud designed for this field campaign - the ABoVE Science Cloud. The cloud combines traditional high performance computing with emerging technologies to create an environment specifically designed for large-scale climate analytics. The ABoVE Science Cloud utilizes (1) virtualized high-speed InfiniBand networks, (2) a combination of high-performance file systems and object storage, and (3) virtual system environments tailored for data intensive, science applications. At the center of the architecture is a large object storage environment, much like a traditional high-performance file system, that supports data proximal processing using technologies like MapReduce on a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high performance compute resources with many processing cores and large memory coupled to the storage through an InfiniBand network. Virtual systems can be tailored to a specific scientist and provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems. In this talk, we will present the architectural components of the science cloud and examples of how it is being used to meet the needs of the ABoVE campaign. In our experience, the science cloud approach significantly lowers the barriers and risks to organizations that require high performance computing solutions and provides the NCCS with the agility required to meet our customers' rapidly increasing and evolving requirements.
Efficient terrestrial laser scan segmentation exploiting data structure
NASA Astrophysics Data System (ADS)
Mahmoudabadi, Hamid; Olsen, Michael J.; Todorovic, Sinisa
2016-09-01
New technologies such as lidar enable the rapid collection of massive datasets to model a 3D scene as a point cloud. However, while hardware technology continues to advance, processing 3D point clouds into informative models remains complex and time consuming. A common approach to increase processing efficiently is to segment the point cloud into smaller sections. This paper proposes a novel approach for point cloud segmentation using computer vision algorithms to analyze panoramic representations of individual laser scans. These panoramas can be quickly created using an inherent neighborhood structure that is established during the scanning process, which scans at fixed angular increments in a cylindrical or spherical coordinate system. In the proposed approach, a selected image segmentation algorithm is applied on several input layers exploiting this angular structure including laser intensity, range, normal vectors, and color information. These segments are then mapped back to the 3D point cloud so that modeling can be completed more efficiently. This approach does not depend on pre-defined mathematical models and consequently setting parameters for them. Unlike common geometrical point cloud segmentation methods, the proposed method employs the colorimetric and intensity data as another source of information. The proposed algorithm is demonstrated on several datasets encompassing variety of scenes and objects. Results show a very high perceptual (visual) level of segmentation and thereby the feasibility of the proposed algorithm. The proposed method is also more efficient compared to Random Sample Consensus (RANSAC), which is a common approach for point cloud segmentation.
Bigdata Driven Cloud Security: A Survey
NASA Astrophysics Data System (ADS)
Raja, K.; Hanifa, Sabibullah Mohamed
2017-08-01
Cloud Computing (CC) is a fast-growing technology to perform massive-scale and complex computing. It eliminates the need to maintain expensive computing hardware, dedicated space, and software. Recently, it has been observed that massive growth in the scale of data or big data generated through cloud computing. CC consists of a front-end, includes the users’ computers and software required to access the cloud network, and back-end consists of various computers, servers and database systems that create the cloud. In SaaS (Software as-a-Service - end users to utilize outsourced software), PaaS (Platform as-a-Service-platform is provided) and IaaS (Infrastructure as-a-Service-physical environment is outsourced), and DaaS (Database as-a-Service-data can be housed within a cloud), where leading / traditional cloud ecosystem delivers the cloud services become a powerful and popular architecture. Many challenges and issues are in security or threats, most vital barrier for cloud computing environment. The main barrier to the adoption of CC in health care relates to Data security. When placing and transmitting data using public networks, cyber attacks in any form are anticipated in CC. Hence, cloud service users need to understand the risk of data breaches and adoption of service delivery model during deployment. This survey deeply covers the CC security issues (covering Data Security in Health care) so as to researchers can develop the robust security application models using Big Data (BD) on CC (can be created / deployed easily). Since, BD evaluation is driven by fast-growing cloud-based applications developed using virtualized technologies. In this purview, MapReduce [12] is a good example of big data processing in a cloud environment, and a model for Cloud providers.
Cognitive Approaches for Medicine in Cloud Computing.
Ogiela, Urszula; Takizawa, Makoto; Ogiela, Lidia
2018-03-03
This paper will present the application potential of the cognitive approach to data interpretation, with special reference to medical areas. The possibilities of using the meaning approach to data description and analysis will be proposed for data analysis tasks in Cloud Computing. The methods of cognitive data management in Cloud Computing are aimed to support the processes of protecting data against unauthorised takeover and they serve to enhance the data management processes. The accomplishment of the proposed tasks will be the definition of algorithms for the execution of meaning data interpretation processes in safe Cloud Computing. • We proposed a cognitive methods for data description. • Proposed a techniques for secure data in Cloud Computing. • Application of cognitive approaches for medicine was described.
An Overview of Cloud Computing in Distributed Systems
NASA Astrophysics Data System (ADS)
Divakarla, Usha; Kumari, Geetha
2010-11-01
Cloud computing is the emerging trend in the field of distributed computing. Cloud computing evolved from grid computing and distributed computing. Cloud plays an important role in huge organizations in maintaining huge data with limited resources. Cloud also helps in resource sharing through some specific virtual machines provided by the cloud service provider. This paper gives an overview of the cloud organization and some of the basic security issues pertaining to the cloud.
Local storage federation through XRootD architecture for interactive distributed analysis
NASA Astrophysics Data System (ADS)
Colamaria, F.; Colella, D.; Donvito, G.; Elia, D.; Franco, A.; Luparello, G.; Maggi, G.; Miniello, G.; Vallero, S.; Vino, G.
2015-12-01
A cloud-based Virtual Analysis Facility (VAF) for the ALICE experiment at the LHC has been deployed in Bari. Similar facilities are currently running in other Italian sites with the aim to create a federation of interoperating farms able to provide their computing resources for interactive distributed analysis. The use of cloud technology, along with elastic provisioning of computing resources as an alternative to the grid for running data intensive analyses, is the main challenge of these facilities. One of the crucial aspects of the user-driven analysis execution is the data access. A local storage facility has the disadvantage that the stored data can be accessed only locally, i.e. from within the single VAF. To overcome such a limitation a federated infrastructure, which provides full access to all the data belonging to the federation independently from the site where they are stored, has been set up. The federation architecture exploits both cloud computing and XRootD technologies, in order to provide a dynamic, easy-to-use and well performing solution for data handling. It should allow the users to store the files and efficiently retrieve the data, since it implements a dynamic distributed cache among many datacenters in Italy connected to one another through the high-bandwidth national network. Details on the preliminary architecture implementation and performance studies are discussed.
The relationships between precipitation, convective cloud and tropical cyclone intensity change
NASA Astrophysics Data System (ADS)
Ruan, Z.; Wu, Q.
2017-12-01
Using 16 years precipitation, brightness temperature (IR BT) data and tropical cyclone (TC) information, this study explores the relationship between precipitation, convective cloud and tropical cyclone (TC) intensity change in the Western North Pacific Ocean. It is found that TC intensity has positive relation with TC precipitation. TC precipitation increases with increased TC intensity. Based on the different phase of diurnal cycle, convective TC clouds were divided into very cold deep convective clouds (IR BTs<208K) and cold high clouds (208K
A new data collaboration service based on cloud computing security
NASA Astrophysics Data System (ADS)
Ying, Ren; Li, Hua-Wei; Wang, Li na
2017-09-01
With the rapid development of cloud computing, the storage and usage of data have undergone revolutionary changes. Data owners can store data in the cloud. While bringing convenience, it also brings many new challenges to cloud data security. A key issue is how to support a secure data collaboration service that supports access and updates to cloud data. This paper proposes a secure, efficient and extensible data collaboration service, which prevents data leaks in cloud storage, supports one to many encryption mechanisms, and also enables cloud data writing and fine-grained access control.
TethysCluster: A comprehensive approach for harnessing cloud resources for hydrologic modeling
NASA Astrophysics Data System (ADS)
Nelson, J.; Jones, N.; Ames, D. P.
2015-12-01
Advances in water resources modeling are improving the information that can be supplied to support decisions affecting the safety and sustainability of society. However, as water resources models become more sophisticated and data-intensive they require more computational power to run. Purchasing and maintaining the computing facilities needed to support certain modeling tasks has been cost-prohibitive for many organizations. With the advent of the cloud, the computing resources needed to address this challenge are now available and cost-effective, yet there still remains a significant technical barrier to leverage these resources. This barrier inhibits many decision makers and even trained engineers from taking advantage of the best science and tools available. Here we present the Python tools TethysCluster and CondorPy, that have been developed to lower the barrier to model computation in the cloud by providing (1) programmatic access to dynamically scalable computing resources, (2) a batch scheduling system to queue and dispatch the jobs to the computing resources, (3) data management for job inputs and outputs, and (4) the ability to dynamically create, submit, and monitor computing jobs. These Python tools leverage the open source, computing-resource management, and job management software, HTCondor, to offer a flexible and scalable distributed-computing environment. While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision-support systems deployed as web apps.
Dynamic Collaboration Infrastructure for Hydrologic Science
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.
2016-12-01
Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.
Charlebois, Kathleen; Palmour, Nicole; Knoppers, Bartha Maria
2016-01-01
This study aims to understand the influence of the ethical and legal issues on cloud computing adoption in the field of genomics research. To do so, we adapted Diffusion of Innovation (DoI) theory to enable understanding of how key stakeholders manage the various ethical and legal issues they encounter when adopting cloud computing. Twenty semi-structured interviews were conducted with genomics researchers, patient advocates and cloud service providers. Thematic analysis generated five major themes: 1) Getting comfortable with cloud computing; 2) Weighing the advantages and the risks of cloud computing; 3) Reconciling cloud computing with data privacy; 4) Maintaining trust and 5) Anticipating the cloud by creating the conditions for cloud adoption. Our analysis highlights the tendency among genomics researchers to gradually adopt cloud technology. Efforts made by cloud service providers to promote cloud computing adoption are confronted by researchers’ perpetual cost and security concerns, along with a lack of familiarity with the technology. Further underlying those fears are researchers’ legal responsibility with respect to the data that is stored on the cloud. Alternative consent mechanisms aimed at increasing patients’ control over the use of their data also provide a means to circumvent various institutional and jurisdictional hurdles that restrict access by creating siloed databases. However, the risk of creating new, cloud-based silos may run counter to the goal in genomics research to increase data sharing on a global scale. PMID:27755563
Charlebois, Kathleen; Palmour, Nicole; Knoppers, Bartha Maria
2016-01-01
This study aims to understand the influence of the ethical and legal issues on cloud computing adoption in the field of genomics research. To do so, we adapted Diffusion of Innovation (DoI) theory to enable understanding of how key stakeholders manage the various ethical and legal issues they encounter when adopting cloud computing. Twenty semi-structured interviews were conducted with genomics researchers, patient advocates and cloud service providers. Thematic analysis generated five major themes: 1) Getting comfortable with cloud computing; 2) Weighing the advantages and the risks of cloud computing; 3) Reconciling cloud computing with data privacy; 4) Maintaining trust and 5) Anticipating the cloud by creating the conditions for cloud adoption. Our analysis highlights the tendency among genomics researchers to gradually adopt cloud technology. Efforts made by cloud service providers to promote cloud computing adoption are confronted by researchers' perpetual cost and security concerns, along with a lack of familiarity with the technology. Further underlying those fears are researchers' legal responsibility with respect to the data that is stored on the cloud. Alternative consent mechanisms aimed at increasing patients' control over the use of their data also provide a means to circumvent various institutional and jurisdictional hurdles that restrict access by creating siloed databases. However, the risk of creating new, cloud-based silos may run counter to the goal in genomics research to increase data sharing on a global scale.
Enabling Earth Science Through Cloud Computing
NASA Technical Reports Server (NTRS)
Hardman, Sean; Riofrio, Andres; Shams, Khawaja; Freeborn, Dana; Springer, Paul; Chafin, Brian
2012-01-01
Cloud Computing holds tremendous potential for missions across the National Aeronautics and Space Administration. Several flight missions are already benefiting from an investment in cloud computing for mission critical pipelines and services through faster processing time, higher availability, and drastically lower costs available on cloud systems. However, these processes do not currently extend to general scientific algorithms relevant to earth science missions. The members of the Airborne Cloud Computing Environment task at the Jet Propulsion Laboratory have worked closely with the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to integrate cloud computing into their science data processing pipeline. This paper details the efforts involved in deploying a science data system for the CARVE mission, evaluating and integrating cloud computing solutions with the system and porting their science algorithms for execution in a cloud environment.
Capturing and analyzing wheelchair maneuvering patterns with mobile cloud computing.
Fu, Jicheng; Hao, Wei; White, Travis; Yan, Yuqing; Jones, Maria; Jan, Yih-Kuen
2013-01-01
Power wheelchairs have been widely used to provide independent mobility to people with disabilities. Despite great advancements in power wheelchair technology, research shows that wheelchair related accidents occur frequently. To ensure safe maneuverability, capturing wheelchair maneuvering patterns is fundamental to enable other research, such as safe robotic assistance for wheelchair users. In this study, we propose to record, store, and analyze wheelchair maneuvering data by means of mobile cloud computing. Specifically, the accelerometer and gyroscope sensors in smart phones are used to record wheelchair maneuvering data in real-time. Then, the recorded data are periodically transmitted to the cloud for storage and analysis. The analyzed results are then made available to various types of users, such as mobile phone users, traditional desktop users, etc. The combination of mobile computing and cloud computing leverages the advantages of both techniques and extends the smart phone's capabilities of computing and data storage via the Internet. We performed a case study to implement the mobile cloud computing framework using Android smart phones and Google App Engine, a popular cloud computing platform. Experimental results demonstrated the feasibility of the proposed mobile cloud computing framework.
NASA Astrophysics Data System (ADS)
Timm, S.; Cooper, G.; Fuess, S.; Garzoglio, G.; Holzman, B.; Kennedy, R.; Grassano, D.; Tiradani, A.; Krishnamurthy, R.; Vinayagam, S.; Raicu, I.; Wu, H.; Ren, S.; Noh, S.-Y.
2017-10-01
The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores. This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timm, S.; Cooper, G.; Fuess, S.
The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores.more » This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.« less
Scaling predictive modeling in drug development with cloud computing.
Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola
2015-01-26
Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
A scoping review of cloud computing in healthcare.
Griebel, Lena; Prokosch, Hans-Ulrich; Köpcke, Felix; Toddenroth, Dennis; Christoph, Jan; Leb, Ines; Engel, Igor; Sedlmayr, Martin
2015-03-19
Cloud computing is a recent and fast growing area of development in healthcare. Ubiquitous, on-demand access to virtually endless resources in combination with a pay-per-use model allow for new ways of developing, delivering and using services. Cloud computing is often used in an "OMICS-context", e.g. for computing in genomics, proteomics and molecular medicine, while other field of application still seem to be underrepresented. Thus, the objective of this scoping review was to identify the current state and hot topics in research on cloud computing in healthcare beyond this traditional domain. MEDLINE was searched in July 2013 and in December 2014 for publications containing the terms "cloud computing" and "cloud-based". Each journal and conference article was categorized and summarized independently by two researchers who consolidated their findings. 102 publications have been analyzed and 6 main topics have been found: telemedicine/teleconsultation, medical imaging, public health and patient self-management, hospital management and information systems, therapy, and secondary use of data. Commonly used features are broad network access for sharing and accessing data and rapid elasticity to dynamically adapt to computing demands. Eight articles favor the pay-for-use characteristics of cloud-based services avoiding upfront investments. Nevertheless, while 22 articles present very general potentials of cloud computing in the medical domain and 66 articles describe conceptual or prototypic projects, only 14 articles report from successful implementations. Further, in many articles cloud computing is seen as an analogy to internet-/web-based data sharing and the characteristics of the particular cloud computing approach are unfortunately not really illustrated. Even though cloud computing in healthcare is of growing interest only few successful implementations yet exist and many papers just use the term "cloud" synonymously for "using virtual machines" or "web-based" with no described benefit of the cloud paradigm. The biggest threat to the adoption in the healthcare domain is caused by involving external cloud partners: many issues of data safety and security are still to be solved. Until then, cloud computing is favored more for singular, individual features such as elasticity, pay-per-use and broad network access, rather than as cloud paradigm on its own.
When cloud computing meets bioinformatics: a review.
Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong
2013-10-01
In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.
The Metadata Cloud: The Last Piece of a Distributed Data System Model
NASA Astrophysics Data System (ADS)
King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.
2012-12-01
Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
Charting a Security Landscape in the Clouds: Data Protection and Collaboration in Cloud Storage
2016-07-01
cloud computing is perhaps the most revolutionary force in the information technology industry today. This field encompasses many different domains...characteristic shared by all cloud computing tasks is that they involve storing data in the cloud . In this report, we therefore aim to describe and rank the...CONCLUSION The advent of cloud computing has caused government organizations to rethink their IT architectures so that they can take advantage of the
The application of cloud computing to scientific workflows: a study of cost and performance.
Berriman, G Bruce; Deelman, Ewa; Juve, Gideon; Rynge, Mats; Vöckler, Jens-S
2013-01-28
The current model of transferring data from data centres to desktops for analysis will soon be rendered impractical by the accelerating growth in the volume of science datasets. Processing will instead often take place on high-performance servers co-located with data. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. We report here the results of investigations of the applicability of commercial cloud computing to scientific computing, with an emphasis on astronomy, including investigations of what types of applications can be run cheaply and efficiently on the cloud, and an example of an application well suited to the cloud: processing a large dataset to create a new science product.
Securing the Data Storage and Processing in Cloud Computing Environment
ERIC Educational Resources Information Center
Owens, Rodney
2013-01-01
Organizations increasingly utilize cloud computing architectures to reduce costs and energy consumption both in the data warehouse and on mobile devices by better utilizing the computing resources available. However, the security and privacy issues with publicly available cloud computing infrastructures have not been studied to a sufficient depth…
Health Informatics for Neonatal Intensive Care Units: An Analytical Modeling Perspective
Mench-Bressan, Nadja; McGregor, Carolyn; Pugh, James Edward
2015-01-01
The effective use of data within intensive care units (ICUs) has great potential to create new cloud-based health analytics solutions for disease prevention or earlier condition onset detection. The Artemis project aims to achieve the above goals in the area of neonatal ICUs (NICU). In this paper, we proposed an analytical model for the Artemis cloud project which will be deployed at McMaster Children’s Hospital in Hamilton. We collect not only physiological data but also the infusion pumps data that are attached to NICU beds. Using the proposed analytical model, we predict the amount of storage, memory, and computation power required for the system. Capacity planning and tradeoff analysis would be more accurate and systematic by applying the proposed analytical model in this paper. Numerical results are obtained using real inputs acquired from McMaster Children’s Hospital and a pilot deployment of the system at The Hospital for Sick Children (SickKids) in Toronto. PMID:27170907
Hybrid cloud: bridging of private and public cloud computing
NASA Astrophysics Data System (ADS)
Aryotejo, Guruh; Kristiyanto, Daniel Y.; Mufadhol
2018-05-01
Cloud Computing is quickly emerging as a promising paradigm in the recent years especially for the business sector. In addition, through cloud service providers, cloud computing is widely used by Information Technology (IT) based startup company to grow their business. However, the level of most businesses awareness on data security issues is low, since some Cloud Service Provider (CSP) could decrypt their data. Hybrid Cloud Deployment Model (HCDM) has characteristic as open source, which is one of secure cloud computing model, thus HCDM may solve data security issues. The objective of this study is to design, deploy and evaluate a HCDM as Infrastructure as a Service (IaaS). In the implementation process, Metal as a Service (MAAS) engine was used as a base to build an actual server and node. Followed by installing the vsftpd application, which serves as FTP server. In comparison with HCDM, public cloud was adopted through public cloud interface. As a result, the design and deployment of HCDM was conducted successfully, instead of having good security, HCDM able to transfer data faster than public cloud significantly. To the best of our knowledge, Hybrid Cloud Deployment model is one of secure cloud computing model due to its characteristic as open source. Furthermore, this study will serve as a base for future studies about Hybrid Cloud Deployment model which may relevant for solving big security issues of IT-based startup companies especially in Indonesia.
Research on Key Technologies of Cloud Computing
NASA Astrophysics Data System (ADS)
Zhang, Shufen; Yan, Hongcan; Chen, Xuebin
With the development of multi-core processors, virtualization, distributed storage, broadband Internet and automatic management, a new type of computing mode named cloud computing is produced. It distributes computation task on the resource pool which consists of massive computers, so the application systems can obtain the computing power, the storage space and software service according to its demand. It can concentrate all the computing resources and manage them automatically by the software without intervene. This makes application offers not to annoy for tedious details and more absorbed in his business. It will be advantageous to innovation and reduce cost. It's the ultimate goal of cloud computing to provide calculation, services and applications as a public facility for the public, So that people can use the computer resources just like using water, electricity, gas and telephone. Currently, the understanding of cloud computing is developing and changing constantly, cloud computing still has no unanimous definition. This paper describes three main service forms of cloud computing: SAAS, PAAS, IAAS, compared the definition of cloud computing which is given by Google, Amazon, IBM and other companies, summarized the basic characteristics of cloud computing, and emphasized on the key technologies such as data storage, data management, virtualization and programming model.
If It's in the Cloud, Get It on Paper: Cloud Computing Contract Issues
ERIC Educational Resources Information Center
Trappler, Thomas J.
2010-01-01
Much recent discussion has focused on the pros and cons of cloud computing. Some institutions are attracted to cloud computing benefits such as rapid deployment, flexible scalability, and low initial start-up cost, while others are concerned about cloud computing risks such as those related to data location, level of service, and security…
AstroCloud, a Cyber-Infrastructure for Astronomy Research: Cloud Computing Environments
NASA Astrophysics Data System (ADS)
Li, C.; Wang, J.; Cui, C.; He, B.; Fan, D.; Yang, Y.; Chen, J.; Zhang, H.; Yu, C.; Xiao, J.; Wang, C.; Cao, Z.; Fan, Y.; Hong, Z.; Li, S.; Mi, L.; Wan, W.; Wang, J.; Yin, S.
2015-09-01
AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on CloudStack, an open source software, we set up the cloud computing environment for AstroCloud Project. It consists of five distributed nodes across the mainland of China. Users can use and analysis data in this cloud computing environment. Based on GlusterFS, we built a scalable cloud storage system. Each user has a private space, which can be shared among different virtual machines and desktop systems. With this environments, astronomer can access to astronomical data collected by different telescopes and data centers easily, and data producers can archive their datasets safely.
Cloud Based Educational Systems and Its Challenges and Opportunities and Issues
ERIC Educational Resources Information Center
Paul, Prantosh Kr.; Lata Dangwal, Kiran
2014-01-01
Cloud Computing (CC) is actually is a set of hardware, software, networks, storage, services an interface combines to deliver aspects of computing as a service. Cloud Computing (CC) actually uses the central remote servers to maintain data and applications. Practically Cloud Computing (CC) is extension of Grid computing with independency and…
Migrating Educational Data and Services to Cloud Computing: Exploring Benefits and Challenges
ERIC Educational Resources Information Center
Lahiri, Minakshi; Moseley, James L.
2013-01-01
"Cloud computing" is currently the "buzzword" in the Information Technology field. Cloud computing facilitates convenient access to information and software resources as well as easy storage and sharing of files and data, without the end users being aware of the details of the computing technology behind the process. This…
ASSURED CLOUD COMPUTING UNIVERSITY CENTER OFEXCELLENCE (ACC UCOE)
2018-01-18
average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed...infrastructure security -Design of algorithms and techniques for real- time assuredness in cloud computing -Map-reduce task assignment with data locality...46 DESIGN OF ALGORITHMS AND TECHNIQUES FOR REAL- TIME ASSUREDNESS IN CLOUD COMPUTING
Sector and Sphere: the design and implementation of a high-performance data cloud
Gu, Yunhong; Grossman, Robert L.
2009-01-01
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source. PMID:19451100
Design for Run-Time Monitor on Cloud Computing
NASA Astrophysics Data System (ADS)
Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Bio and health informatics meets cloud : BioVLab as an example.
Chae, Heejoon; Jung, Inuk; Lee, Hyungro; Marru, Suresh; Lee, Seong-Whan; Kim, Sun
2013-01-01
The exponential increase of genomic data brought by the advent of the next or the third generation sequencing (NGS) technologies and the dramatic drop in sequencing cost have driven biological and medical sciences to data-driven sciences. This revolutionary paradigm shift comes with challenges in terms of data transfer, storage, computation, and analysis of big bio/medical data. Cloud computing is a service model sharing a pool of configurable resources, which is a suitable workbench to address these challenges. From the medical or biological perspective, providing computing power and storage is the most attractive feature of cloud computing in handling the ever increasing biological data. As data increases in size, many research organizations start to experience the lack of computing power, which becomes a major hurdle in achieving research goals. In this paper, we review the features of publically available bio and health cloud systems in terms of graphical user interface, external data integration, security and extensibility of features. We then discuss about issues and limitations of current cloud systems and conclude with suggestion of a biological cloud environment concept, which can be defined as a total workbench environment assembling computational tools and databases for analyzing bio/medical big data in particular application domains.
Geometric Data Perturbation-Based Personal Health Record Transactions in Cloud Computing
Balasubramaniam, S.; Kavitha, V.
2015-01-01
Cloud computing is a new delivery model for information technology services and it typically involves the provision of dynamically scalable and often virtualized resources over the Internet. However, cloud computing raises concerns on how cloud service providers, user organizations, and governments should handle such information and interactions. Personal health records represent an emerging patient-centric model for health information exchange, and they are outsourced for storage by third parties, such as cloud providers. With these records, it is necessary for each patient to encrypt their own personal health data before uploading them to cloud servers. Current techniques for encryption primarily rely on conventional cryptographic approaches. However, key management issues remain largely unsolved with these cryptographic-based encryption techniques. We propose that personal health record transactions be managed using geometric data perturbation in cloud computing. In our proposed scheme, the personal health record database is perturbed using geometric data perturbation and outsourced to the Amazon EC2 cloud. PMID:25767826
Geometric data perturbation-based personal health record transactions in cloud computing.
Balasubramaniam, S; Kavitha, V
2015-01-01
Cloud computing is a new delivery model for information technology services and it typically involves the provision of dynamically scalable and often virtualized resources over the Internet. However, cloud computing raises concerns on how cloud service providers, user organizations, and governments should handle such information and interactions. Personal health records represent an emerging patient-centric model for health information exchange, and they are outsourced for storage by third parties, such as cloud providers. With these records, it is necessary for each patient to encrypt their own personal health data before uploading them to cloud servers. Current techniques for encryption primarily rely on conventional cryptographic approaches. However, key management issues remain largely unsolved with these cryptographic-based encryption techniques. We propose that personal health record transactions be managed using geometric data perturbation in cloud computing. In our proposed scheme, the personal health record database is perturbed using geometric data perturbation and outsourced to the Amazon EC2 cloud.
NASA Astrophysics Data System (ADS)
Aneri, Parikh; Sumathy, S.
2017-11-01
Cloud computing provides services over the internet and provides application resources and data to the users based on their demand. Base of the Cloud Computing is consumer provider model. Cloud provider provides resources which consumer can access using cloud computing model in order to build their application based on their demand. Cloud data center is a bulk of resources on shared pool architecture for cloud user to access. Virtualization is the heart of the Cloud computing model, it provides virtual machine as per application specific configuration and those applications are free to choose their own configuration. On one hand, there is huge number of resources and on other hand it has to serve huge number of requests effectively. Therefore, resource allocation policy and scheduling policy play very important role in allocation and managing resources in this cloud computing model. This paper proposes the load balancing policy using Hungarian algorithm. Hungarian Algorithm provides dynamic load balancing policy with a monitor component. Monitor component helps to increase cloud resource utilization by managing the Hungarian algorithm by monitoring its state and altering its state based on artificial intelligent. CloudSim used in this proposal is an extensible toolkit and it simulates cloud computing environment.
Hu, Yu-Chen
2018-01-01
The emergence of smart Internet of Things (IoT) devices has highly favored the realization of smart homes in a down-stream sector of a smart grid. The underlying objective of Demand Response (DR) schemes is to actively engage customers to modify their energy consumption on domestic appliances in response to pricing signals. Domestic appliance scheduling is widely accepted as an effective mechanism to manage domestic energy consumption intelligently. Besides, to residential customers for DR implementation, maintaining a balance between energy consumption cost and users’ comfort satisfaction is a challenge. Hence, in this paper, a constrained Particle Swarm Optimization (PSO)-based residential consumer-centric load-scheduling method is proposed. The method can be further featured with edge computing. In contrast with cloud computing, edge computing—a method of optimizing cloud computing technologies by driving computing capabilities at the IoT edge of the Internet as one of the emerging trends in engineering technology—addresses bandwidth-intensive contents and latency-sensitive applications required among sensors and central data centers through data analytics at or near the source of data. A non-intrusive load-monitoring technique proposed previously is utilized to automatic determination of physical characteristics of power-intensive home appliances from users’ life patterns. The swarm intelligence, constrained PSO, is used to minimize the energy consumption cost while considering users’ comfort satisfaction for DR implementation. The residential consumer-centric load-scheduling method proposed in this paper is evaluated under real-time pricing with inclining block rates and is demonstrated in a case study. The experimentation reported in this paper shows the proposed residential consumer-centric load-scheduling method can re-shape loads by home appliances in response to DR signals. Moreover, a phenomenal reduction in peak power consumption is achieved by 13.97%. PMID:29702607
NASA Astrophysics Data System (ADS)
Yoon, S.
2016-12-01
To define geodetic reference frame using GPS data collected by Continuously Operating Reference Stations (CORS) network, historical GPS data needs to be reprocessed regularly. Reprocessing GPS data collected by upto 2000 CORS sites for the last two decades requires a lot of computational resource. At National Geodetic Survey (NGS), there has been one completed reprocessing in 2011, and currently, the second reprocessing is undergoing. For the first reprocessing effort, in-house computing resource was utilized. In the current second reprocessing effort, outsourced cloud computing platform is being utilized. In this presentation, the outline of data processing strategy at NGS is described as well as the effort to parallelize the data processing procedure in order to maximize the benefit of the cloud computing. The time and cost savings realized by utilizing cloud computing approach will also be discussed.
Toward a web-based real-time radiation treatment planning system in a cloud computing environment.
Na, Yong Hum; Suh, Tae-Suk; Kapp, Daniel S; Xing, Lei
2013-09-21
To exploit the potential dosimetric advantages of intensity modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT), an in-depth approach is required to provide efficient computing methods. This needs to incorporate clinically related organ specific constraints, Monte Carlo (MC) dose calculations, and large-scale plan optimization. This paper describes our first steps toward a web-based real-time radiation treatment planning system in a cloud computing environment (CCE). The Amazon Elastic Compute Cloud (EC2) with a master node (named m2.xlarge containing 17.1 GB of memory, two virtual cores with 3.25 EC2 Compute Units each, 420 GB of instance storage, 64-bit platform) is used as the backbone of cloud computing for dose calculation and plan optimization. The master node is able to scale the workers on an 'on-demand' basis. MC dose calculation is employed to generate accurate beamlet dose kernels by parallel tasks. The intensity modulation optimization uses total-variation regularization (TVR) and generates piecewise constant fluence maps for each initial beam direction in a distributed manner over the CCE. The optimized fluence maps are segmented into deliverable apertures. The shape of each aperture is iteratively rectified to be a sequence of arcs using the manufacture's constraints. The output plan file from the EC2 is sent to the simple storage service. Three de-identified clinical cancer treatment plans have been studied for evaluating the performance of the new planning platform with 6 MV flattening filter free beams (40 × 40 cm(2)) from the Varian TrueBeam(TM) STx linear accelerator. A CCE leads to speed-ups of up to 14-fold for both dose kernel calculations and plan optimizations in the head and neck, lung, and prostate cancer cases considered in this study. The proposed system relies on a CCE that is able to provide an infrastructure for parallel and distributed computing. The resultant plans from the cloud computing are identical to PC-based IMRT and VMAT plans, confirming the reliability of the cloud computing platform. This cloud computing infrastructure has been established for a radiation treatment planning. It substantially improves the speed of inverse planning and makes future on-treatment adaptive re-planning possible.
Toward a web-based real-time radiation treatment planning system in a cloud computing environment
NASA Astrophysics Data System (ADS)
Hum Na, Yong; Suh, Tae-Suk; Kapp, Daniel S.; Xing, Lei
2013-09-01
To exploit the potential dosimetric advantages of intensity modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT), an in-depth approach is required to provide efficient computing methods. This needs to incorporate clinically related organ specific constraints, Monte Carlo (MC) dose calculations, and large-scale plan optimization. This paper describes our first steps toward a web-based real-time radiation treatment planning system in a cloud computing environment (CCE). The Amazon Elastic Compute Cloud (EC2) with a master node (named m2.xlarge containing 17.1 GB of memory, two virtual cores with 3.25 EC2 Compute Units each, 420 GB of instance storage, 64-bit platform) is used as the backbone of cloud computing for dose calculation and plan optimization. The master node is able to scale the workers on an ‘on-demand’ basis. MC dose calculation is employed to generate accurate beamlet dose kernels by parallel tasks. The intensity modulation optimization uses total-variation regularization (TVR) and generates piecewise constant fluence maps for each initial beam direction in a distributed manner over the CCE. The optimized fluence maps are segmented into deliverable apertures. The shape of each aperture is iteratively rectified to be a sequence of arcs using the manufacture’s constraints. The output plan file from the EC2 is sent to the simple storage service. Three de-identified clinical cancer treatment plans have been studied for evaluating the performance of the new planning platform with 6 MV flattening filter free beams (40 × 40 cm2) from the Varian TrueBeamTM STx linear accelerator. A CCE leads to speed-ups of up to 14-fold for both dose kernel calculations and plan optimizations in the head and neck, lung, and prostate cancer cases considered in this study. The proposed system relies on a CCE that is able to provide an infrastructure for parallel and distributed computing. The resultant plans from the cloud computing are identical to PC-based IMRT and VMAT plans, confirming the reliability of the cloud computing platform. This cloud computing infrastructure has been established for a radiation treatment planning. It substantially improves the speed of inverse planning and makes future on-treatment adaptive re-planning possible.
NASA Technical Reports Server (NTRS)
Minnis, Patrick; Young, David F.; Heck, Patrick W.; Liou, Kuo-Nan; Takano, Yoshihide
1992-01-01
The First ISCCP (International Satellite Cloud Climatology Project) Regional Experiment (FIRE) Phase II Intensive Field Observations (IFO) were taken over southeastern Kansas between November 13 and December 7,1991, to determine cirrus cloud properties. The observations include in situ microphysical data; surface, aircraft, and satellite remote sensing; and measurements of divergence over meso- and smaller-scale areas using wind profilers. Satellite remote sensing of cloud characteristics is an essential aspect for understanding and predicting the role of clouds in climate variations. The objectives of the satellite cloud analysis during FIRE are to validate cloud property retrievals, develop advanced methods for extracting cloud information from satellite-measured radiances, and provide multiscale cloud data for cloud process studies and for verification of cloud generation models. This paper presents the initial results of cloud property analyses during FIRE-II using Geostationary Operational Environmental Satellite (GOES) data and NOAA Advanced Very High Resolution Radiometer (AVHRR) radiances.
Reviews on Security Issues and Challenges in Cloud Computing
NASA Astrophysics Data System (ADS)
An, Y. Z.; Zaaba, Z. F.; Samsudin, N. F.
2016-11-01
Cloud computing is an Internet-based computing service provided by the third party allowing share of resources and data among devices. It is widely used in many organizations nowadays and becoming more popular because it changes the way of how the Information Technology (IT) of an organization is organized and managed. It provides lots of benefits such as simplicity and lower costs, almost unlimited storage, least maintenance, easy utilization, backup and recovery, continuous availability, quality of service, automated software integration, scalability, flexibility and reliability, easy access to information, elasticity, quick deployment and lower barrier to entry. While there is increasing use of cloud computing service in this new era, the security issues of the cloud computing become a challenges. Cloud computing must be safe and secure enough to ensure the privacy of the users. This paper firstly lists out the architecture of the cloud computing, then discuss the most common security issues of using cloud and some solutions to the security issues since security is one of the most critical aspect in cloud computing due to the sensitivity of user's data.
Secure data sharing in public cloud
NASA Astrophysics Data System (ADS)
Venkataramana, Kanaparti; Naveen Kumar, R.; Tatekalva, Sandhya; Padmavathamma, M.
2012-04-01
Secure multi-party protocols have been proposed for entities (organizations or individuals) that don't fully trust each other to share sensitive information. Many types of entities need to collect, analyze, and disseminate data rapidly and accurately, without exposing sensitive information to unauthorized or untrusted parties. Solutions based on secure multiparty computation guarantee privacy and correctness, at an extra communication (too costly in communication to be practical) and computation cost. The high overhead motivates us to extend this SMC to cloud environment which provides large computation and communication capacity which makes SMC to be used between multiple clouds (i.e., it may between private or public or hybrid clouds).Cloud may encompass many high capacity servers which acts as a hosts which participate in computation (IaaS and PaaS) for final result, which is controlled by Cloud Trusted Authority (CTA) for secret sharing within the cloud. The communication between two clouds is controlled by High Level Trusted Authority (HLTA) which is one of the hosts in a cloud which provides MgaaS (Management as a Service). Due to high risk for security in clouds, HLTA generates and distributes public keys and private keys by using Carmichael-R-Prime- RSA algorithm for exchange of private data in SMC between itself and clouds. In cloud, CTA creates Group key for Secure communication between the hosts in cloud based on keys sent by HLTA for exchange of Intermediate values and shares for computation of final result. Since this scheme is extended to be used in clouds( due to high availability and scalability to increase computation power) it is possible to implement SMC practically for privacy preserving in data mining at low cost for the clients.
GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service.
Wang, Yiqi; Li, Gen; Ma, Mark; He, Fazhong; Song, Zhuo; Zhang, Wei; Wu, Chengkun
2018-01-19
Whole-genome sequencing (WGS) plays an increasingly important role in clinical practice and public health. Due to the big data size, WGS data analysis is usually compute-intensive and IO-intensive. Currently it usually takes 30 to 40 h to finish a 50× WGS analysis task, which is far from the ideal speed required by the industry. Furthermore, the high-end infrastructure required by WGS computing is costly in terms of time and money. In this paper, we aim to improve the time efficiency of WGS analysis and minimize the cost by elastic cloud computing. We developed a distributed system, GT-WGS, for large-scale WGS analyses utilizing the Amazon Web Services (AWS). Our system won the first prize on the Wind and Cloud challenge held by Genomics and Cloud Technology Alliance conference (GCTA) committee. The system makes full use of the dynamic pricing mechanism of AWS. We evaluate the performance of GT-WGS with a 55× WGS dataset (400GB fastq) provided by the GCTA 2017 competition. In the best case, it only took 18.4 min to finish the analysis and the AWS cost of the whole process is only 16.5 US dollars. The accuracy of GT-WGS is 99.9% consistent with that of the Genome Analysis Toolkit (GATK) best practice. We also evaluated the performance of GT-WGS performance on a real-world dataset provided by the XiangYa hospital, which consists of 5× whole-genome dataset with 500 samples, and on average GT-WGS managed to finish one 5× WGS analysis task in 2.4 min at a cost of $3.6. WGS is already playing an important role in guiding therapeutic intervention. However, its application is limited by the time cost and computing cost. GT-WGS excelled as an efficient and affordable WGS analyses tool to address this problem. The demo video and supplementary materials of GT-WGS can be accessed at https://github.com/Genetalks/wgs_analysis_demo .
Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds
NASA Astrophysics Data System (ADS)
Li, Rui; Chen, Lei; Li, Wen-Syan
Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.
Provenance based data integrity checking and verification in cloud environments
Haq, Inam Ul; Jan, Bilal; Khan, Fakhri Alam; Ahmad, Awais
2017-01-01
Cloud computing is a recent tendency in IT that moves computing and data away from desktop and hand-held devices into large scale processing hubs and data centers respectively. It has been proposed as an effective solution for data outsourcing and on demand computing to control the rising cost of IT setups and management in enterprises. However, with Cloud platforms user’s data is moved into remotely located storages such that users lose control over their data. This unique feature of the Cloud is facing many security and privacy challenges which need to be clearly understood and resolved. One of the important concerns that needs to be addressed is to provide the proof of data integrity, i.e., correctness of the user’s data stored in the Cloud storage. The data in Clouds is physically not accessible to the users. Therefore, a mechanism is required where users can check if the integrity of their valuable data is maintained or compromised. For this purpose some methods are proposed like mirroring, checksumming and using third party auditors amongst others. However, these methods use extra storage space by maintaining multiple copies of data or the presence of a third party verifier is required. In this paper, we address the problem of proving data integrity in Cloud computing by proposing a scheme through which users are able to check the integrity of their data stored in Clouds. In addition, users can track the violation of data integrity if occurred. For this purpose, we utilize a relatively new concept in the Cloud computing called “Data Provenance”. Our scheme is capable to reduce the need of any third party services, additional hardware support and the replication of data items on client side for integrity checking. PMID:28545151
Provenance based data integrity checking and verification in cloud environments.
Imran, Muhammad; Hlavacs, Helmut; Haq, Inam Ul; Jan, Bilal; Khan, Fakhri Alam; Ahmad, Awais
2017-01-01
Cloud computing is a recent tendency in IT that moves computing and data away from desktop and hand-held devices into large scale processing hubs and data centers respectively. It has been proposed as an effective solution for data outsourcing and on demand computing to control the rising cost of IT setups and management in enterprises. However, with Cloud platforms user's data is moved into remotely located storages such that users lose control over their data. This unique feature of the Cloud is facing many security and privacy challenges which need to be clearly understood and resolved. One of the important concerns that needs to be addressed is to provide the proof of data integrity, i.e., correctness of the user's data stored in the Cloud storage. The data in Clouds is physically not accessible to the users. Therefore, a mechanism is required where users can check if the integrity of their valuable data is maintained or compromised. For this purpose some methods are proposed like mirroring, checksumming and using third party auditors amongst others. However, these methods use extra storage space by maintaining multiple copies of data or the presence of a third party verifier is required. In this paper, we address the problem of proving data integrity in Cloud computing by proposing a scheme through which users are able to check the integrity of their data stored in Clouds. In addition, users can track the violation of data integrity if occurred. For this purpose, we utilize a relatively new concept in the Cloud computing called "Data Provenance". Our scheme is capable to reduce the need of any third party services, additional hardware support and the replication of data items on client side for integrity checking.
Risk in the Clouds?: Security Issues Facing Government Use of Cloud Computing
NASA Astrophysics Data System (ADS)
Wyld, David C.
Cloud computing is poised to become one of the most important and fundamental shifts in how computing is consumed and used. Forecasts show that government will play a lead role in adopting cloud computing - for data storage, applications, and processing power, as IT executives seek to maximize their returns on limited procurement budgets in these challenging economic times. After an overview of the cloud computing concept, this article explores the security issues facing public sector use of cloud computing and looks to the risk and benefits of shifting to cloud-based models. It concludes with an analysis of the challenges that lie ahead for government use of cloud resources.
Translational bioinformatics in the cloud: an affordable alternative
2010-01-01
With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073
Performance, Agility and Cost of Cloud Computing Services for NASA GES DISC Giovanni Application
NASA Astrophysics Data System (ADS)
Pham, L.; Chen, A.; Wharton, S.; Winter, E. L.; Lynnes, C.
2013-12-01
The NASA Goddard Earth Science Data and Information Services Center (GES DISC) is investigating the performance, agility and cost of Cloud computing for GES DISC applications. Giovanni (Geospatial Interactive Online Visualization ANd aNalysis Infrastructure), one of the core applications at the GES DISC for online climate-related Earth science data access, subsetting, analysis, visualization, and downloading, was used to evaluate the feasibility and effort of porting an application to the Amazon Cloud Services platform. The performance and the cost of running Giovanni on the Amazon Cloud were compared to similar parameters for the GES DISC local operational system. A Giovanni Time-Series analysis of aerosol absorption optical depth (388nm) from OMI (Ozone Monitoring Instrument)/Aura was selected for these comparisons. All required data were pre-cached in both the Cloud and local system to avoid data transfer delays. The 3-, 6-, 12-, and 24-month data were used for analysis on the Cloud and local system respectively, and the processing times for the analysis were used to evaluate system performance. To investigate application agility, Giovanni was installed and tested on multiple Cloud platforms. The cost of using a Cloud computing platform mainly consists of: computing, storage, data requests, and data transfer in/out. The Cloud computing cost is calculated based on the hourly rate, and the storage cost is calculated based on the rate of Gigabytes per month. Cost for incoming data transfer is free, and for data transfer out, the cost is based on the rate in Gigabytes. The costs for a local server system consist of buying hardware/software, system maintenance/updating, and operating cost. The results showed that the Cloud platform had a 38% better performance and cost 36% less than the local system. This investigation shows the potential of cloud computing to increase system performance and lower the overall cost of system management.
ERIC Educational Resources Information Center
Buckman, Joel; Gold, Stephanie
2012-01-01
This article outlines privacy and data security compliance issues facing postsecondary education institutions when they utilize cloud computing and concludes with a practical list of do's and dont's. Cloud computing does not change an institution's privacy and data security obligations. It does involve reliance on a third party, which requires an…
A Novel College Network Resource Management Method using Cloud Computing
NASA Astrophysics Data System (ADS)
Lin, Chen
At present information construction of college mainly has construction of college networks and management information system; there are many problems during the process of information. Cloud computing is development of distributed processing, parallel processing and grid computing, which make data stored on the cloud, make software and services placed in the cloud and build on top of various standards and protocols, you can get it through all kinds of equipments. This article introduces cloud computing and function of cloud computing, then analyzes the exiting problems of college network resource management, the cloud computing technology and methods are applied in the construction of college information sharing platform.
An Interactive Web-Based Analysis Framework for Remote Sensing Cloud Computing
NASA Astrophysics Data System (ADS)
Wang, X. Z.; Zhang, H. M.; Zhao, J. H.; Lin, Q. H.; Zhou, Y. C.; Li, J. H.
2015-07-01
Spatiotemporal data, especially remote sensing data, are widely used in ecological, geographical, agriculture, and military research and applications. With the development of remote sensing technology, more and more remote sensing data are accumulated and stored in the cloud. An effective way for cloud users to access and analyse these massive spatiotemporal data in the web clients becomes an urgent issue. In this paper, we proposed a new scalable, interactive and web-based cloud computing solution for massive remote sensing data analysis. We build a spatiotemporal analysis platform to provide the end-user with a safe and convenient way to access massive remote sensing data stored in the cloud. The lightweight cloud storage system used to store public data and users' private data is constructed based on open source distributed file system. In it, massive remote sensing data are stored as public data, while the intermediate and input data are stored as private data. The elastic, scalable, and flexible cloud computing environment is built using Docker, which is a technology of open-source lightweight cloud computing container in the Linux operating system. In the Docker container, open-source software such as IPython, NumPy, GDAL, and Grass GIS etc., are deployed. Users can write scripts in the IPython Notebook web page through the web browser to process data, and the scripts will be submitted to IPython kernel to be executed. By comparing the performance of remote sensing data analysis tasks executed in Docker container, KVM virtual machines and physical machines respectively, we can conclude that the cloud computing environment built by Docker makes the greatest use of the host system resources, and can handle more concurrent spatial-temporal computing tasks. Docker technology provides resource isolation mechanism in aspects of IO, CPU, and memory etc., which offers security guarantee when processing remote sensing data in the IPython Notebook. Users can write complex data processing code on the web directly, so they can design their own data processing algorithm.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
a Method for the Registration of Hemispherical Photographs and Tls Intensity Images
NASA Astrophysics Data System (ADS)
Schmidt, A.; Schilling, A.; Maas, H.-G.
2012-07-01
Terrestrial laser scanners generate dense and accurate 3D point clouds with minimal effort, which represent the geometry of real objects, while image data contains texture information of object surfaces. Based on the complementary characteristics of both data sets, a combination is very appealing for many applications, including forest-related tasks. In the scope of our research project, independent data sets of a plain birch stand have been taken by a full-spherical laser scanner and a hemispherical digital camera. Previously, both kinds of data sets have been considered separately: Individual trees were successfully extracted from large 3D point clouds, and so-called forest inventory parameters could be determined. Additionally, a simplified tree topology representation was retrieved. From hemispherical images, leaf area index (LAI) values, as a very relevant parameter for describing a stand, have been computed. The objective of our approach is to merge a 3D point cloud with image data in a way that RGB values are assigned to each 3D point. So far, segmentation and classification of TLS point clouds in forestry applications was mainly based on geometrical aspects of the data set. However, a 3D point cloud with colour information provides valuable cues exceeding simple statistical evaluation of geometrical object features and thus may facilitate the analysis of the scan data significantly.
Klonoff, David C
2017-07-01
The Internet of Things (IoT) is generating an immense volume of data. With cloud computing, medical sensor and actuator data can be stored and analyzed remotely by distributed servers. The results can then be delivered via the Internet. The number of devices in IoT includes such wireless diabetes devices as blood glucose monitors, continuous glucose monitors, insulin pens, insulin pumps, and closed-loop systems. The cloud model for data storage and analysis is increasingly unable to process the data avalanche, and processing is being pushed out to the edge of the network closer to where the data-generating devices are. Fog computing and edge computing are two architectures for data handling that can offload data from the cloud, process it nearby the patient, and transmit information machine-to-machine or machine-to-human in milliseconds or seconds. Sensor data can be processed near the sensing and actuating devices with fog computing (with local nodes) and with edge computing (within the sensing devices). Compared to cloud computing, fog computing and edge computing offer five advantages: (1) greater data transmission speed, (2) less dependence on limited bandwidths, (3) greater privacy and security, (4) greater control over data generated in foreign countries where laws may limit use or permit unwanted governmental access, and (5) lower costs because more sensor-derived data are used locally and less data are transmitted remotely. Connected diabetes devices almost all use fog computing or edge computing because diabetes patients require a very rapid response to sensor input and cannot tolerate delays for cloud computing.
Development of a Cloud Resolving Model for Heterogeneous Supercomputers
NASA Astrophysics Data System (ADS)
Sreepathi, S.; Norman, M. R.; Pal, A.; Hannah, W.; Ponder, C.
2017-12-01
A cloud resolving climate model is needed to reduce major systematic errors in climate simulations due to structural uncertainty in numerical treatments of convection - such as convective storm systems. This research describes the porting effort to enable SAM (System for Atmosphere Modeling) cloud resolving model on heterogeneous supercomputers using GPUs (Graphical Processing Units). We have isolated a standalone configuration of SAM that is targeted to be integrated into the DOE ACME (Accelerated Climate Modeling for Energy) Earth System model. We have identified key computational kernels from the model and offloaded them to a GPU using the OpenACC programming model. Furthermore, we are investigating various optimization strategies intended to enhance GPU utilization including loop fusion/fission, coalesced data access and loop refactoring to a higher abstraction level. We will present early performance results, lessons learned as well as optimization strategies. The computational platform used in this study is the Summitdev system, an early testbed that is one generation removed from Summit, the next leadership class supercomputer at Oak Ridge National Laboratory. The system contains 54 nodes wherein each node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. This work is part of a larger project, ACME-MMF component of the U.S. Department of Energy(DOE) Exascale Computing Project. The ACME-MMF approach addresses structural uncertainty in cloud processes by replacing traditional parameterizations with cloud resolving "superparameterization" within each grid cell of global climate model. Super-parameterization dramatically increases arithmetic intensity, making the MMF approach an ideal strategy to achieve good performance on emerging exascale computing architectures. The goal of the project is to integrate superparameterization into ACME, and explore its full potential to scientifically and computationally advance climate simulation and prediction.
Use of cloud computing in biomedicine.
Sobeslav, Vladimir; Maresova, Petra; Krejcar, Ondrej; Franca, Tanos C C; Kuca, Kamil
2016-12-01
Nowadays, biomedicine is characterised by a growing need for processing of large amounts of data in real time. This leads to new requirements for information and communication technologies (ICT). Cloud computing offers a solution to these requirements and provides many advantages, such as cost savings, elasticity and scalability of using ICT. The aim of this paper is to explore the concept of cloud computing and the related use of this concept in the area of biomedicine. Authors offer a comprehensive analysis of the implementation of the cloud computing approach in biomedical research, decomposed into infrastructure, platform and service layer, and a recommendation for processing large amounts of data in biomedicine. Firstly, the paper describes the appropriate forms and technological solutions of cloud computing. Secondly, the high-end computing paradigm of cloud computing aspects is analysed. Finally, the potential and current use of applications in scientific research of this technology in biomedicine is discussed.
A Secure and Verifiable Outsourced Access Control Scheme in Fog-Cloud Computing.
Fan, Kai; Wang, Junxiong; Wang, Xin; Li, Hui; Yang, Yintang
2017-07-24
With the rapid development of big data and Internet of things (IOT), the number of networking devices and data volume are increasing dramatically. Fog computing, which extends cloud computing to the edge of the network can effectively solve the bottleneck problems of data transmission and data storage. However, security and privacy challenges are also arising in the fog-cloud computing environment. Ciphertext-policy attribute-based encryption (CP-ABE) can be adopted to realize data access control in fog-cloud computing systems. In this paper, we propose a verifiable outsourced multi-authority access control scheme, named VO-MAACS. In our construction, most encryption and decryption computations are outsourced to fog devices and the computation results can be verified by using our verification method. Meanwhile, to address the revocation issue, we design an efficient user and attribute revocation method for it. Finally, analysis and simulation results show that our scheme is both secure and highly efficient.
Analysis and Research on Spatial Data Storage Model Based on Cloud Computing Platform
NASA Astrophysics Data System (ADS)
Hu, Yong
2017-12-01
In this paper, the data processing and storage characteristics of cloud computing are analyzed and studied. On this basis, a cloud computing data storage model based on BP neural network is proposed. In this data storage model, it can carry out the choice of server cluster according to the different attributes of the data, so as to complete the spatial data storage model with load balancing function, and have certain feasibility and application advantages.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-22
... explored in this series is cloud computing. The workshop on this topic will be held in Gaithersburg, MD on October 21, 2011. Assertion: ``Current implementations of cloud computing indicate a new approach to security'' Implementations of cloud computing have provided new ways of thinking about how to secure data...
CANFAR+Skytree: A Cloud Computing and Data Mining System for Astronomy
NASA Astrophysics Data System (ADS)
Ball, N. M.
2013-10-01
This is a companion Focus Demonstration article to the CANFAR+Skytree poster (Ball 2013, this volume), demonstrating the usage of the Skytree machine learning software on the Canadian Advanced Network for Astronomical Research (CANFAR) cloud computing system. CANFAR+Skytree is the world's first cloud computing system for data mining in astronomy.
Bioinformatics clouds for big data manipulation.
Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang
2012-11-28
As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.
Utilizing HDF4 File Content Maps for the Cloud
NASA Technical Reports Server (NTRS)
Lee, Hyokyung Joe
2016-01-01
We demonstrate a prototype study that HDF4 file content map can be used for efficiently organizing data in cloud object storage system to facilitate cloud computing. This approach can be extended to any binary data formats and to any existing big data analytics solution powered by cloud computing because HDF4 file content map project started as long term preservation of NASA data that doesn't require HDF4 APIs to access data.
Analysis of Polder Polarization Measurements During Astex and Eucrex Experiments
NASA Technical Reports Server (NTRS)
Chen, Hui; Han, Qingyuan; Chou, Joyce; Welch, Ronald M.
1997-01-01
Polarization is more sensitive than intensity to cloud microstructure such as the particle size and shape, and multiple scattering does not wash out features in polarization as effectively as it does in the intensity. Polarization measurements, particularly in the near IR, are potentially a valuable tool for cloud identification and for studies of the microphysics of clouds. The POLDER instrument is designed to provide wide field of view bidirectional images in polarized light. During the ASTEX-SOFIA campaign on June 12th, 1992, over the Atlantic Ocean (near the Azores Islands), images of homogeneous thick stratocumulus cloud fields were acquired. During the EUCREX'94 (April, 1994) campaign, the POLDER instrument was flying over the region of Brittany (France), taking observations of cirrus clouds. This study involves model studies and data analysis of POLDER observations. Both models and data analysis show that POLDER can be used to detect cloud thermodynamic phases. Model results show that polarized reflection in the Lamda =0.86 micron band is sensitive to cloud droplet sizes but not to cloud optical thickness. Comparison between model and data analysis reveals that cloud droplet sizes during ASTEX are about 5 microns, which agrees very well with the results of in situ measurements (4-5 microns). Knowing the retrieved cloud droplet sizes, the total reflected intensity of the POLDER measurements then can be used to retrieve cloud optical thickness. The close agreement between data analysis and model results during ASTEX also suggests the homogeneity of the cloud layer during that campaign.
Impact of office productivity cloud computing on energy consumption and greenhouse gas emissions.
Williams, Daniel R; Tang, Yinshan
2013-05-07
Cloud computing is usually regarded as being energy efficient and thus emitting less greenhouse gases (GHG) than traditional forms of computing. When the energy consumption of Microsoft's cloud computing Office 365 (O365) and traditional Office 2010 (O2010) software suites were tested and modeled, some cloud services were found to consume more energy than the traditional form. The developed model in this research took into consideration the energy consumption at the three main stages of data transmission; data center, network, and end user device. Comparable products from each suite were selected and activities were defined for each product to represent a different computing type. Microsoft provided highly confidential data for the data center stage, while the networking and user device stages were measured directly. A new measurement and software apportionment approach was defined and utilized allowing the power consumption of cloud services to be directly measured for the user device stage. Results indicated that cloud computing is more energy efficient for Excel and Outlook which consumed less energy and emitted less GHG than the standalone counterpart. The power consumption of the cloud based Outlook (8%) and Excel (17%) was lower than their traditional counterparts. However, the power consumption of the cloud version of Word was 17% higher than its traditional equivalent. A third mixed access method was also measured for Word which emitted 5% more GHG than the traditional version. It is evident that cloud computing may not provide a unified way forward to reduce energy consumption and GHG. Direct conversion from the standalone package into the cloud provision platform can now consider energy and GHG emissions at the software development and cloud service design stage using the methods described in this research.
NASA Technical Reports Server (NTRS)
Zhang, Z.; Meyer, K.; Platnick, S.; Oreopoulos, L.; Lee, D.; Yu, H.
2013-01-01
This paper describes an efficient and unique method for computing the shortwave direct radiative effect (DRE) of aerosol residing above low-level liquid-phase clouds using CALIOP and MODIS data. It accounts for the overlapping of aerosol and cloud rigorously by utilizing the joint histogram of cloud optical depth and cloud top pressure. Effects of sub-grid scale cloud and aerosol variations on DRE are accounted for. It is computationally efficient through using grid-level cloud and aerosol statistics, instead of pixel-level products, and a pre-computed look-up table in radiative transfer calculations. We verified that for smoke over the southeast Atlantic Ocean the method yields a seasonal mean instantaneous shortwave DRE that generally agrees with more rigorous pixel-level computation within 4%. We have also computed the annual mean instantaneous shortwave DRE of light-absorbing aerosols (i.e., smoke and polluted dust) over global ocean based on 4 yr of CALIOP and MODIS data. We found that the variability of the annual mean shortwave DRE of above-cloud light-absorbing aerosol is mainly driven by the optical depth of the underlying clouds.
NASA Technical Reports Server (NTRS)
Zhang, Z.; Meyer, K.; Platnick, S.; Oreopoulos, L.; Lee, D.; Yu, H.
2014-01-01
This paper describes an efficient and unique method for computing the shortwave direct radiative effect (DRE) of aerosol residing above low-level liquid-phase clouds using CALIOP and MODIS data. It accounts for the overlapping of aerosol and cloud rigorously by utilizing the joint histogram of cloud optical depth and cloud top pressure. Effects of sub-grid scale cloud and aerosol variations on DRE are accounted for. It is computationally efficient through using grid-level cloud and aerosol statistics, instead of pixel-level products, and a pre-computed look-up table in radiative transfer calculations. We verified that for smoke over the southeast Atlantic Ocean the method yields a seasonal mean instantaneous shortwave DRE that generally agrees with more rigorous pixel-level computation within 4. We have also computed the annual mean instantaneous shortwave DRE of light-absorbing aerosols (i.e., smoke and polluted dust) over global ocean based on 4 yr of CALIOP and MODIS data. We found that the variability of the annual mean shortwave DRE of above-cloud light-absorbing aerosol is mainly driven by the optical depth of the underlying clouds.
How to Cloud for Earth Scientists: An Introduction
NASA Technical Reports Server (NTRS)
Lynnes, Chris
2018-01-01
This presentation is a tutorial on getting started with cloud computing for the purposes of Earth Observation datasets. We first discuss some of the main advantages that cloud computing can provide for the Earth scientist: copious processing power, immense and affordable data storage, and rapid startup time. We also talk about some of the challenges of getting the most out of cloud computing: re-organizing the way data are analyzed, handling node failures and attending.
Security model for VM in cloud
NASA Astrophysics Data System (ADS)
Kanaparti, Venkataramana; Naveen K., R.; Rajani, S.; Padmvathamma, M.; Anitha, C.
2013-03-01
Cloud computing is a new approach emerged to meet ever-increasing demand for computing resources and to reduce operational costs and Capital Expenditure for IT services. As this new way of computation allows data and applications to be stored away from own corporate server, it brings more issues in security such as virtualization security, distributed computing, application security, identity management, access control and authentication. Even though Virtualization forms the basis for cloud computing it poses many threats in securing cloud. As most of Security threats lies at Virtualization layer in cloud we proposed this new Security Model for Virtual Machine in Cloud (SMVC) in which every process is authenticated by Trusted-Agent (TA) in Hypervisor as well as in VM. Our proposed model is designed to with-stand attacks by unauthorized process that pose threat to applications related to Data Mining, OLAP systems, Image processing which requires huge resources in cloud deployed on one or more VM's.
NASA Astrophysics Data System (ADS)
Lescinsky, D. T.; Wyborn, L. A.; Evans, B. J. K.; Allen, C.; Fraser, R.; Rankine, T.
2014-12-01
We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia's National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI's 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud. The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes. We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.
Do Clouds Compute? A Framework for Estimating the Value of Cloud Computing
NASA Astrophysics Data System (ADS)
Klems, Markus; Nimis, Jens; Tai, Stefan
On-demand provisioning of scalable and reliable compute services, along with a cost model that charges consumers based on actual service usage, has been an objective in distributed computing research and industry for a while. Cloud Computing promises to deliver on this objective: consumers are able to rent infrastructure in the Cloud as needed, deploy applications and store data, and access them via Web protocols on a pay-per-use basis. The acceptance of Cloud Computing, however, depends on the ability for Cloud Computing providers and consumers to implement a model for business value co-creation. Therefore, a systematic approach to measure costs and benefits of Cloud Computing is needed. In this paper, we discuss the need for valuation of Cloud Computing, identify key components, and structure these components in a framework. The framework assists decision makers in estimating Cloud Computing costs and to compare these costs to conventional IT solutions. We demonstrate by means of representative use cases how our framework can be applied to real world scenarios.
Retrieving and Indexing Spatial Data in the Cloud Computing Environment
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Wang, Sheng; Zhou, Daliang
In order to solve the drawbacks of spatial data storage in common Cloud Computing platform, we design and present a framework for retrieving, indexing, accessing and managing spatial data in the Cloud environment. An interoperable spatial data object model is provided based on the Simple Feature Coding Rules from the OGC such as Well Known Binary (WKB) and Well Known Text (WKT). And the classic spatial indexing algorithms like Quad-Tree and R-Tree are re-designed in the Cloud Computing environment. In the last we develop a prototype software based on Google App Engine to implement the proposed model.
76 FR 54112 - James Zadroga 9/11 Health and Compensation Act of 2010
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-31
... in the smoke cloud were most intense within and very near Ground Zero. By the time the smoke cloud... to eliminate consumption deductions when computing economic loss for injury claims. Accordingly, no... will consider such information in the context of the full claim. IV. Funding and Payment of Claims A...
A Hybrid Cloud Computing Service for Earth Sciences
NASA Astrophysics Data System (ADS)
Yang, C. P.
2016-12-01
Cloud Computing is becoming a norm for providing computing capabilities for advancing Earth sciences including big Earth data management, processing, analytics, model simulations, and many other aspects. A hybrid spatiotemporal cloud computing service is bulit at George Mason NSF spatiotemporal innovation center to meet this demands. This paper will report the service including several aspects: 1) the hardware includes 500 computing services and close to 2PB storage as well as connection to XSEDE Jetstream and Caltech experimental cloud computing environment for sharing the resource; 2) the cloud service is geographically distributed at east coast, west coast, and central region; 3) the cloud includes private clouds managed using open stack and eucalyptus, DC2 is used to bridge these and the public AWS cloud for interoperability and sharing computing resources when high demands surfing; 4) the cloud service is used to support NSF EarthCube program through the ECITE project, ESIP through the ESIP cloud computing cluster, semantics testbed cluster, and other clusters; 5) the cloud service is also available for the earth science communities to conduct geoscience. A brief introduction about how to use the cloud service will be included.
Bioinformatics clouds for big data manipulation
2012-01-01
Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475
NASA Technical Reports Server (NTRS)
Minnis, Patrick; Alvarez, Joseph M.; Young, David F.; Sassen, Kenneth; Grund, Christian J.
1990-01-01
The First ISCCP Regional Experiment (FIRE) Cirrus Intensive Field Observations (IFO) provide an opportunity to examine the relationships between the satellite observed radiances and various parameters which describe the bulk properties of clouds, such as cloud amount and cloud top height. Lidar derived cloud altitude data, radiosonde data, and satellite observed radiances are used to examine the relationships between visible reflectance, infrared emittance, and cloud top temperatures for cirrus clouds.
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.
Zhao, Yongan; Wang, Xiaofeng; Tang, Haixu
2018-05-09
The elastic and inexpensive computing resources such as clouds have been recognized as a useful solution to analyzing massive human genomic data (e.g., acquired by using next-generation sequencers) in biomedical researches. However, outsourcing human genome computation to public or commercial clouds was hindered due to privacy concerns: even a small number of human genome sequences contain sufficient information for identifying the donor of the genomic data. This issue cannot be directly addressed by existing security and cryptographic techniques (such as homomorphic encryption), because they are too heavyweight to carry out practical genome computation tasks on massive data. In this article, we present a secure algorithm to accomplish the read mapping, one of the most basic tasks in human genomic data analysis based on a hybrid cloud computing model. Comparing with the existing approaches, our algorithm delegates most computation to the public cloud, while only performing encryption and decryption on the private cloud, and thus makes the maximum use of the computing resource of the public cloud. Furthermore, our algorithm reports similar results as the nonsecure read mapping algorithms, including the alignment between reads and the reference genome, which can be directly used in the downstream analysis such as the inference of genomic variations. We implemented the algorithm in C++ and Python on a hybrid cloud system, in which the public cloud uses an Apache Spark system.
Electron cloud simulations for the main ring of J-PARC
NASA Astrophysics Data System (ADS)
Yee-Rendon, Bruce; Muto, Ryotaro; Ohmi, Kazuhito; Satou, Kenichirou; Tomizawa, Masahito; Toyama, Takeshi
2017-07-01
The simulation of beam instabilities is a helpful tool to evaluate potential threats against the machine protection of the high intensity beams. At Main Ring (MR) of J-PARC, signals related to the electron cloud have been observed during the slow beam extraction mode. Hence, several studies were conducted to investigate the mechanism that produces it, the results confirmed a strong dependence on the beam intensity and the bunch structure in the formation of the electron cloud, however, the precise explanation of its trigger conditions remains incomplete. To shed light on the problem, electron cloud simulations were done using an updated version of the computational model developed from previous works at KEK. The code employed the signals of the measurements to reproduce the events seen during the surveys.
2010-09-01
Cloud computing describes a new distributed computing paradigm for IT data and services that involves over-the-Internet provision of dynamically scalable and often virtualized resources. While cost reduction and flexibility in storage, services, and maintenance are important considerations when deciding on whether or how to migrate data and applications to the cloud, large organizations like the Department of Defense need to consider the organization and structure of data on the cloud and the operations on such data in order to reap the full benefit of cloud
Cloud computing for genomic data analysis and collaboration.
Langmead, Ben; Nellore, Abhinav
2018-04-01
Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D T; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D. T.; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Background Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. Results We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. Conclusions CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/. PMID:24897343
A Weibull distribution accrual failure detector for cloud computing.
Liu, Jiaxi; Wu, Zhibo; Wu, Jin; Dong, Jian; Zhao, Yao; Wen, Dongxin
2017-01-01
Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve this problem, a new accrual failure detector based on Weibull Distribution, called the Weibull Distribution Failure Detector, has been proposed specifically for cloud computing. It can adapt to the dynamic and unexpected network conditions in cloud computing. The performance of the Weibull Distribution Failure Detector is evaluated and compared based on public classical experiment data and cloud computing experiment data. The results show that the Weibull Distribution Failure Detector has better performance in terms of speed and accuracy in unstable scenarios, especially in cloud computing.
Bootstrapping and Maintaining Trust in the Cloud
2016-12-01
simultaneous cloud nodes. 1. INTRODUCTION The proliferation and popularity of infrastructure-as-a- service (IaaS) cloud computing services such as...Amazon Web Services and Google Compute Engine means more cloud tenants are hosting sensitive, private, and business critical data and applications in the...thousands of IaaS resources as they are elastically instantiated and terminated. Prior cloud trusted computing solutions address a subset of these features
Research on cloud-based remote measurement and analysis system
NASA Astrophysics Data System (ADS)
Gao, Zhiqiang; He, Lingsong; Su, Wei; Wang, Can; Zhang, Changfan
2015-02-01
The promising potential of cloud computing and its convergence with technologies such as cloud storage, cloud push, mobile computing allows for creation and delivery of newer type of cloud service. Combined with the thought of cloud computing, this paper presents a cloud-based remote measurement and analysis system. This system mainly consists of three parts: signal acquisition client, web server deployed on the cloud service, and remote client. This system is a special website developed using asp.net and Flex RIA technology, which solves the selective contradiction between two monitoring modes, B/S and C/S. This platform supplies customer condition monitoring and data analysis service by Internet, which was deployed on the cloud server. Signal acquisition device is responsible for data (sensor data, audio, video, etc.) collection and pushes the monitoring data to the cloud storage database regularly. Data acquisition equipment in this system is only conditioned with the function of data collection and network function such as smartphone and smart sensor. This system's scale can adjust dynamically according to the amount of applications and users, so it won't cause waste of resources. As a representative case study, we developed a prototype system based on Ali cloud service using the rotor test rig as the research object. Experimental results demonstrate that the proposed system architecture is feasible.
Intelligent cloud computing security using genetic algorithm as a computational tools
NASA Astrophysics Data System (ADS)
Razuky AL-Shaikhly, Mazin H.
2018-05-01
An essential change had occurred in the field of Information Technology which represented with cloud computing, cloud giving virtual assets by means of web yet awesome difficulties in the field of information security and security assurance. Currently main problem with cloud computing is how to improve privacy and security for cloud “cloud is critical security”. This paper attempts to solve cloud security by using intelligent system with genetic algorithm as wall to provide cloud data secure, all services provided by cloud must detect who receive and register it to create list of users (trusted or un-trusted) depend on behavior. The execution of present proposal has shown great outcome.
Homomorphic encryption experiments on IBM's cloud quantum computing platform
NASA Astrophysics Data System (ADS)
Huang, He-Liang; Zhao, You-Wei; Li, Tan; Li, Feng-Guang; Du, Yu-Tao; Fu, Xiang-Qun; Zhang, Shuo; Wang, Xiang; Bao, Wan-Su
2017-02-01
Quantum computing has undergone rapid development in recent years. Owing to limitations on scalability, personal quantum computers still seem slightly unrealistic in the near future. The first practical quantum computer for ordinary users is likely to be on the cloud. However, the adoption of cloud computing is possible only if security is ensured. Homomorphic encryption is a cryptographic protocol that allows computation to be performed on encrypted data without decrypting them, so it is well suited to cloud computing. Here, we first applied homomorphic encryption on IBM's cloud quantum computer platform. In our experiments, we successfully implemented a quantum algorithm for linear equations while protecting our privacy. This demonstration opens a feasible path to the next stage of development of cloud quantum information technology.
SPARCCS - Smartphone-Assisted Readiness, Command and Control System
2012-06-01
and database needs. By doing this SPARCCS takes advantage of all the capabilities cloud computing has to offer, especially that of disbursed data...40092829/ Microsoft. (2011). Cloud Computing . Retrieved September 24, 2011, http ://www.microsoft.com/industry/government/guides/cloud_computing/2...Command, and Control System) to address these issues. We use smartphones in conjunction with cloud computing to extend the benefits of collaborative
Survey on Security Issues in File Management in Cloud Computing Environment
NASA Astrophysics Data System (ADS)
Gupta, Udit
2015-06-01
Cloud computing has pervaded through every aspect of Information technology in past decade. It has become easier to process plethora of data, generated by various devices in real time, with the advent of cloud networks. The privacy of users data is maintained by data centers around the world and hence it has become feasible to operate on that data from lightweight portable devices. But with ease of processing comes the security aspect of the data. One such security aspect is secure file transfer either internally within cloud or externally from one cloud network to another. File management is central to cloud computing and it is paramount to address the security concerns which arise out of it. This survey paper aims to elucidate the various protocols which can be used for secure file transfer and analyze the ramifications of using each protocol.
Investigation into Cloud Computing for More Robust Automated Bulk Image Geoprocessing
NASA Technical Reports Server (NTRS)
Brown, Richard B.; Smoot, James C.; Underwood, Lauren; Armstrong, C. Duane
2012-01-01
Geospatial resource assessments frequently require timely geospatial data processing that involves large multivariate remote sensing data sets. In particular, for disasters, response requires rapid access to large data volumes, substantial storage space and high performance processing capability. The processing and distribution of this data into usable information products requires a processing pipeline that can efficiently manage the required storage, computing utilities, and data handling requirements. In recent years, with the availability of cloud computing technology, cloud processing platforms have made available a powerful new computing infrastructure resource that can meet this need. To assess the utility of this resource, this project investigates cloud computing platforms for bulk, automated geoprocessing capabilities with respect to data handling and application development requirements. This presentation is of work being conducted by Applied Sciences Program Office at NASA-Stennis Space Center. A prototypical set of image manipulation and transformation processes that incorporate sample Unmanned Airborne System data were developed to create value-added products and tested for implementation on the "cloud". This project outlines the steps involved in creating and testing of open source software developed process code on a local prototype platform, and then transitioning this code with associated environment requirements into an analogous, but memory and processor enhanced cloud platform. A data processing cloud was used to store both standard digital camera panchromatic and multi-band image data, which were subsequently subjected to standard image processing functions such as NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), band stacking, reprojection, and other similar type data processes. Cloud infrastructure service providers were evaluated by taking these locally tested processing functions, and then applying them to a given cloud-enabled infrastructure to assesses and compare environment setup options and enabled technologies. This project reviews findings that were observed when cloud platforms were evaluated for bulk geoprocessing capabilities based on data handling and application development requirements.
Rasdaman for Big Spatial Raster Data
NASA Astrophysics Data System (ADS)
Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.
2015-12-01
Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.
A Secure and Verifiable Outsourced Access Control Scheme in Fog-Cloud Computing
Fan, Kai; Wang, Junxiong; Wang, Xin; Li, Hui; Yang, Yintang
2017-01-01
With the rapid development of big data and Internet of things (IOT), the number of networking devices and data volume are increasing dramatically. Fog computing, which extends cloud computing to the edge of the network can effectively solve the bottleneck problems of data transmission and data storage. However, security and privacy challenges are also arising in the fog-cloud computing environment. Ciphertext-policy attribute-based encryption (CP-ABE) can be adopted to realize data access control in fog-cloud computing systems. In this paper, we propose a verifiable outsourced multi-authority access control scheme, named VO-MAACS. In our construction, most encryption and decryption computations are outsourced to fog devices and the computation results can be verified by using our verification method. Meanwhile, to address the revocation issue, we design an efficient user and attribute revocation method for it. Finally, analysis and simulation results show that our scheme is both secure and highly efficient. PMID:28737733
Adventures in Private Cloud: Balancing Cost and Capability at the CloudSat Data Processing Center
NASA Astrophysics Data System (ADS)
Partain, P.; Finley, S.; Fluke, J.; Haynes, J. M.; Cronk, H. Q.; Miller, S. D.
2016-12-01
Since the beginning of the CloudSat Mission in 2006, The CloudSat Data Processing Center (DPC) at the Cooperative Institute for Research in the Atmosphere (CIRA) has been ingesting data from the satellite and other A-Train sensors, producing data products, and distributing them to researchers around the world. The computing infrastructure was specifically designed to fulfill the requirements as specified at the beginning of what nominally was a two-year mission. The environment consisted of servers dedicated to specific processing tasks in a rigid workflow to generate the required products. To the benefit of science and with credit to the mission engineers, CloudSat has lasted well beyond its planned lifetime and is still collecting data ten years later. Over that period requirements of the data processing system have greatly expanded and opportunities for providing value-added services have presented themselves. But while demands on the system have increased, the initial design allowed for very little expansion in terms of scalability and flexibility. The design did change to include virtual machine processing nodes and distributed workflows but infrastructure management was still a time consuming task when system modification was required to run new tests or implement new processes. To address the scalability, flexibility, and manageability of the system Cloud computing methods and technologies are now being employed. The use of a public cloud like Amazon Elastic Compute Cloud or Google Compute Engine was considered but, among other issues, data transfer and storage cost becomes a problem especially when demand fluctuates as a result of reprocessing and the introduction of new products and services. Instead, the existing system was converted to an on premises private Cloud using the OpenStack computing platform and Ceph software defined storage to reap the benefits of the Cloud computing paradigm. This work details the decisions that were made, the benefits that have been realized, the difficulties that were encountered and issues that still exist.
CloudMan as a platform for tool, data, and analysis distribution.
Afgan, Enis; Chapman, Brad; Taylor, James
2012-11-27
Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.
Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures
NASA Astrophysics Data System (ADS)
Nguyen, P. T.; Chapman, D. R.; Halem, M.
2012-12-01
New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in the Nino 4 region, as well as a 1.9 Kelvin decadal Arctic warming in the 4u and 12u spectral regions. Additionally, we will present the frequency of extreme global warming events by the use of a normalized maximum BT in a grid cell relative to its local standard deviation. A low-latency Hadoop scheduling environment maintains data integrity and fault tolerance in a MapReduce data intensive Cloud environment while improving the "time to solution" metric by 35% when compared to a more traditional parallel processing system for the same dataset. Our next step will be to improve the usability of our Hadoop task scheduling system, to enable rapid prototyping of data intensive experiments by means of processing "kernels". We will report on the performance and experience of implementing these experiments on the NEX testbed, and propose the use of a graphical directed acyclic graph (DAG) interface to help us develop on-demand scientific experiments. Our workflow system works within Hadoop infrastructure as a replacement for the FIFO or FairScheduler, thus the use of Apache "Pig" latin or other Apache tools may also be worth investigating on the NEX system to improve the usability of our workflow scheduling infrastructure for rapid experimentation.
Towards Dynamic Remote Data Auditing in Computational Clouds
Khurram Khan, Muhammad; Anuar, Nor Badrul
2014-01-01
Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server. PMID:25121114
Towards dynamic remote data auditing in computational clouds.
Sookhak, Mehdi; Akhunzada, Adnan; Gani, Abdullah; Khurram Khan, Muhammad; Anuar, Nor Badrul
2014-01-01
Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing.
Cole, Brian S; Moore, Jason H
2018-03-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing
Moore, Jason H.
2018-01-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416
NASA Astrophysics Data System (ADS)
Ruan, Zhenxin; Wu, Qiaoyan
2018-01-01
In this paper, satellite-based precipitation, clouds with infrared (IR) brightness temperature (BT), and tropical cyclone (TC) data from 2000 to 2015 are used to explore the relationship between precipitation, convective cloud, and TC intensity change in the Western North Pacific Ocean. An IR BT of 208 K was chosen as a threshold for deep convection based on different diurnal cycles of IR BT. More precipitation and colder clouds with 208 K < IR BT < 240 K are found as storms intensify, while TC 24 h future intensity change is closely connected with very deep convective clouds with IR BT < 208 K. Intensifying TCs follow the occurrence of colder clouds with IR BT < 208 K with greater areal extents. As an indicator of very deep convective clouds, IR BT < 208 K is suggested to be a good predictor of TC intensity change. Based upon the 16 year analysis in the western North Pacific, TCs under the conditions that the mean temperature of very deep convective clouds is less than 201 K, and the coverage of this type of clouds is more than 27.4% within a radius of 300 km of the TC center, will more likely undergo rapid intensification after 24 h.
A Simple Technique for Securing Data at Rest Stored in a Computing Cloud
NASA Astrophysics Data System (ADS)
Sedayao, Jeff; Su, Steven; Ma, Xiaohao; Jiang, Minghao; Miao, Kai
"Cloud Computing" offers many potential benefits, including cost savings, the ability to deploy applications and services quickly, and the ease of scaling those application and services once they are deployed. A key barrier for enterprise adoption is the confidentiality of data stored on Cloud Computing Infrastructure. Our simple technique implemented with Open Source software solves this problem by using public key encryption to render stored data at rest unreadable by unauthorized personnel, including system administrators of the cloud computing service on which the data is stored. We validate our approach on a network measurement system implemented on PlanetLab. We then use it on a service where confidentiality is critical - a scanning application that validates external firewall implementations.
On the Large-Scaling Issues of Cloud-based Applications for Earth Science Dat
NASA Astrophysics Data System (ADS)
Hua, H.
2016-12-01
Next generation science data systems are needed to address the incoming flood of data from new missions such as NASA's SWOT and NISAR where its SAR data volumes and data throughput rates are order of magnitude larger than present day missions. Existing missions, such as OCO-2, may also require high turn-around time for processing different science scenarios where on-premise and even traditional HPC computing environments may not meet the high processing needs. Additionally, traditional means of procuring hardware on-premise are already limited due to facilities capacity constraints for these new missions. Experiences have shown that to embrace efficient cloud computing approaches for large-scale science data systems requires more than just moving existing code to cloud environments. At large cloud scales, we need to deal with scaling and cost issues. We present our experiences on deploying multiple instances of our hybrid-cloud computing science data system (HySDS) to support large-scale processing of Earth Science data products. We will explore optimization approaches to getting best performance out of hybrid-cloud computing as well as common issues that will arise when dealing with large-scale computing. Novel approaches were utilized to do processing on Amazon's spot market, which can potentially offer 75%-90% costs savings but with an unpredictable computing environment based on market forces.
ERIC Educational Resources Information Center
Islam, Muhammad Faysal
2013-01-01
Cloud computing offers the advantage of on-demand, reliable and cost efficient computing solutions without the capital investment and management resources to build and maintain in-house data centers and network infrastructures. Scalability of cloud solutions enable consumers to upgrade or downsize their services as needed. In a cloud environment,…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hasenkamp, Daren; Sim, Alexander; Wehner, Michael
Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, whilemore » we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.« less
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing
NASA Technical Reports Server (NTRS)
Pham, Long; Chen, Aijun; Kempler, Steven; Lynnes, Christopher; Theobald, Michael; Asghar, Esfandiari; Campino, Jane; Vollmer, Bruce
2011-01-01
Cloud Computing has been implemented in several commercial arenas. The NASA Nebula Cloud Computing platform is an Infrastructure as a Service (IaaS) built in 2008 at NASA Ames Research Center and 2010 at GSFC. Nebula is an open source Cloud platform intended to: a) Make NASA realize significant cost savings through efficient resource utilization, reduced energy consumption, and reduced labor costs. b) Provide an easier way for NASA scientists and researchers to efficiently explore and share large and complex data sets. c) Allow customers to provision, manage, and decommission computing capabilities on an as-needed bases
Design and Implement of Astronomical Cloud Computing Environment In China-VO
NASA Astrophysics Data System (ADS)
Li, Changhua; Cui, Chenzhou; Mi, Linying; He, Boliang; Fan, Dongwei; Li, Shanshan; Yang, Sisi; Xu, Yunfei; Han, Jun; Chen, Junyi; Zhang, Hailong; Yu, Ce; Xiao, Jian; Wang, Chuanjun; Cao, Zihuang; Fan, Yufeng; Liu, Liang; Chen, Xiao; Song, Wenming; Du, Kangyu
2017-06-01
Astronomy cloud computing environment is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on virtualization technology, astronomy cloud computing environment was designed and implemented by China-VO team. It consists of five distributed nodes across the mainland of China. Astronomer can get compuitng and storage resource in this cloud computing environment. Through this environments, astronomer can easily search and analyze astronomical data collected by different telescopes and data centers , and avoid the large scale dataset transportation.
A Weibull distribution accrual failure detector for cloud computing
Wu, Zhibo; Wu, Jin; Zhao, Yao; Wen, Dongxin
2017-01-01
Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve this problem, a new accrual failure detector based on Weibull Distribution, called the Weibull Distribution Failure Detector, has been proposed specifically for cloud computing. It can adapt to the dynamic and unexpected network conditions in cloud computing. The performance of the Weibull Distribution Failure Detector is evaluated and compared based on public classical experiment data and cloud computing experiment data. The results show that the Weibull Distribution Failure Detector has better performance in terms of speed and accuracy in unstable scenarios, especially in cloud computing. PMID:28278229
Cloud Computing for Geosciences--GeoCloud for standardized geospatial service platforms (Invited)
NASA Astrophysics Data System (ADS)
Nebert, D. D.; Huang, Q.; Yang, C.
2013-12-01
The 21st century geoscience faces challenges of Big Data, spike computing requirements (e.g., when natural disaster happens), and sharing resources through cyberinfrastructure across different organizations (Yang et al., 2011). With flexibility and cost-efficiency of computing resources a primary concern, cloud computing emerges as a promising solution to provide core capabilities to address these challenges. Many governmental and federal agencies are adopting cloud technologies to cut costs and to make federal IT operations more efficient (Huang et al., 2010). However, it is still difficult for geoscientists to take advantage of the benefits of cloud computing to facilitate the scientific research and discoveries. This presentation reports using GeoCloud to illustrate the process and strategies used in building a common platform for geoscience communities to enable the sharing, integration of geospatial data, information and knowledge across different domains. GeoCloud is an annual incubator project coordinated by the Federal Geographic Data Committee (FGDC) in collaboration with the U.S. General Services Administration (GSA) and the Department of Health and Human Services. It is designed as a staging environment to test and document the deployment of a common GeoCloud community platform that can be implemented by multiple agencies. With these standardized virtual geospatial servers, a variety of government geospatial applications can be quickly migrated to the cloud. In order to achieve this objective, multiple projects are nominated each year by federal agencies as existing public-facing geospatial data services. From the initial candidate projects, a set of common operating system and software requirements was identified as the baseline for platform as a service (PaaS) packages. Based on these developed common platform packages, each project deploys and monitors its web application, develops best practices, and documents cost and performance information. This paper presents the background, architectural design, and activities of GeoCloud in support of the Geospatial Platform Initiative. System security strategies and approval processes for migrating federal geospatial data, information, and applications into cloud, and cost estimation for cloud operations are covered. Finally, some lessons learned from the GeoCloud project are discussed as reference for geoscientists to consider in the adoption of cloud computing.
Lin, Yu-Hsiu; Hu, Yu-Chen
2018-04-27
The emergence of smart Internet of Things (IoT) devices has highly favored the realization of smart homes in a down-stream sector of a smart grid. The underlying objective of Demand Response (DR) schemes is to actively engage customers to modify their energy consumption on domestic appliances in response to pricing signals. Domestic appliance scheduling is widely accepted as an effective mechanism to manage domestic energy consumption intelligently. Besides, to residential customers for DR implementation, maintaining a balance between energy consumption cost and users’ comfort satisfaction is a challenge. Hence, in this paper, a constrained Particle Swarm Optimization (PSO)-based residential consumer-centric load-scheduling method is proposed. The method can be further featured with edge computing. In contrast with cloud computing, edge computing—a method of optimizing cloud computing technologies by driving computing capabilities at the IoT edge of the Internet as one of the emerging trends in engineering technology—addresses bandwidth-intensive contents and latency-sensitive applications required among sensors and central data centers through data analytics at or near the source of data. A non-intrusive load-monitoring technique proposed previously is utilized to automatic determination of physical characteristics of power-intensive home appliances from users’ life patterns. The swarm intelligence, constrained PSO, is used to minimize the energy consumption cost while considering users’ comfort satisfaction for DR implementation. The residential consumer-centric load-scheduling method proposed in this paper is evaluated under real-time pricing with inclining block rates and is demonstrated in a case study. The experimentation reported in this paper shows the proposed residential consumer-centric load-scheduling method can re-shape loads by home appliances in response to DR signals. Moreover, a phenomenal reduction in peak power consumption is achieved by 13.97%.
Bioinformatics and Microarray Data Analysis on the Cloud.
Calabrese, Barbara; Cannataro, Mario
2016-01-01
High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.
Spatial Analysis of Great Lakes Regional Icing Cloud Liquid Water Content
NASA Technical Reports Server (NTRS)
Ryerson, Charles C.; Koenig, George G.; Melloh, Rae A.; Meese, Debra A.; Reehorst, Andrew L.; Miller, Dean R.
2003-01-01
Abstract Clustering of cloud microphysical conditions, such as liquid water content (LWC) and drop size, can affect the rate and shape of ice accretion and the airworthiness of aircraft. Clustering may also degrade the accuracy of cloud LWC measurements from radars and microwave radiometers being developed by the government for remotely mapping icing conditions ahead of aircraft in flight. This paper evaluates spatial clustering of LWC in icing clouds using measurements collected during NASA research flights in the Great Lakes region. We used graphical and analytical approaches to describe clustering. The analytical approach involves determining the average size of clusters and computing a clustering intensity parameter. We analyzed flight data composed of 1-s-frequency LWC measurements for 12 periods ranging from 17.4 minutes (73 km) to 45.3 minutes (190 km) in duration. Graphically some flight segments showed evidence of consistency with regard to clustering patterns. Cluster intensity varied from 0.06, indicating little clustering, to a high of 2.42. Cluster lengths ranged from 0.1 minutes (0.6 km) to 4.1 minutes (17.3 km). Additional analyses will allow us to determine if clustering climatologies can be developed to characterize cluster conditions by region, time period, or weather condition. Introduction
NASA Astrophysics Data System (ADS)
Ramachandran, R.; Murphy, K. J.; Baynes, K.; Lynnes, C.
2016-12-01
With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way Earth observation data is processed, analyzed, and visualized. The cloud infrastructure provides the flexibility to scale up to large volumes of data and handle high velocity data streams efficiently. Having freely available Earth observation data collocated on a cloud infrastructure creates opportunities for innovation and value-added data re-use in ways unforeseen by the original data provider. These innovations spur new industries and applications and spawn new scientific pathways that were previously limited due to data volume and computational infrastructure issues. NASA, in collaboration with Amazon, Google, and Microsoft, have jointly developed a set of recommendations to enable efficient transfer of Earth observation data from existing data systems to a cloud computing infrastructure. The purpose of these recommendations is to provide guidelines against which all data providers can evaluate existing data systems and be used to improve any issues uncovered to enable efficient search, access, and use of large volumes of data. Additionally, these guidelines ensure that all cloud providers utilize a common methodology for bulk-downloading data from data providers thus preventing the data providers from building custom capabilities to meet the needs of individual cloud providers. The intent is to share these recommendations with other Federal agencies and organizations that serve Earth observation to enable efficient search, access, and use of large volumes of data. Additionally, the adoption of these recommendations will benefit data users interested in moving large volumes of data from data systems to any other location. These data users include the cloud providers, cloud users such as scientists, and other users working in a high performance computing environment who need to move large volumes of data.
Distributed Hydrologic Modeling Apps for Decision Support in the Cloud
NASA Astrophysics Data System (ADS)
Swain, N. R.; Latu, K.; Christiensen, S.; Jones, N.; Nelson, J.
2013-12-01
Advances in computation resources and greater availability of water resources data represent an untapped resource for addressing hydrologic uncertainties in water resources decision-making. The current practice of water authorities relies on empirical, lumped hydrologic models to estimate watershed response. These models are not capable of taking advantage of many of the spatial datasets that are now available. Physically-based, distributed hydrologic models are capable of using these data resources and providing better predictions through stochastic analysis. However, there exists a digital divide that discourages many science-minded decision makers from using distributed models. This divide can be spanned using a combination of existing web technologies. The purpose of this presentation is to present a cloud-based environment that will offer hydrologic modeling tools or 'apps' for decision support and the web technologies that have been selected to aid in its implementation. Compared to the more commonly used lumped-parameter models, distributed models, while being more intuitive, are still data intensive, computationally expensive, and difficult to modify for scenario exploration. However, web technologies such as web GIS, web services, and cloud computing have made the data more accessible, provided an inexpensive means of high-performance computing, and created an environment for developing user-friendly apps for distributed modeling. Since many water authorities are primarily interested in the scenario exploration exercises with hydrologic models, we are creating a toolkit that facilitates the development of a series of apps for manipulating existing distributed models. There are a number of hurdles that cloud-based hydrologic modeling developers face. One of these is how to work with the geospatial data inherent with this class of models in a web environment. Supporting geospatial data in a website is beyond the capabilities of standard web frameworks and it requires the use of additional software. In particular, there are at least three elements that are needed: a geospatially enabled database, a map server, and geoprocessing toolbox. We recommend a software stack for geospatial web application development comprising: MapServer, PostGIS, and 52 North with Python as the scripting language to tie them together. Another hurdle that must be cleared is managing the cloud-computing load. We are using HTCondor as a solution to this end. Finally, we are creating a scripting environment wherein developers will be able to create apps that use existing hydrologic models in our system with minimal effort. This capability will be accomplished by creating a plugin for a Python content management system called CKAN. We are currently developing cyberinfrastructure that utilizes this stack and greatly lowers the investment required to deploy cloud-based modeling apps. This material is based upon work supported by the National Science Foundation under Grant No. 1135482
Benefits of cloud computing for PACS and archiving.
Koch, Patrick
2012-01-01
The goal of cloud-based services is to provide easy, scalable access to computing resources and IT services. The healthcare industry requires a private cloud that adheres to government mandates designed to ensure privacy and security of patient data while enabling access by authorized users. Cloud-based computing in the imaging market has evolved from a service that provided cost effective disaster recovery for archived data to fully featured PACS and vendor neutral archiving services that can address the needs of healthcare providers of all sizes. Healthcare providers worldwide are now using the cloud to distribute images to remote radiologists while supporting advanced reading tools, deliver radiology reports and imaging studies to referring physicians, and provide redundant data storage. Vendor managed cloud services eliminate large capital investments in equipment and maintenance, as well as staffing for the data center--creating a reduction in total cost of ownership for the healthcare provider.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Ferreira da Silva, R.; Deelman, E.; Atkinson, M.
2016-12-01
We present the Data-Intensive workflows as a Service (DIaaS) model for enabling easy data-intensive workflow composition and deployment on clouds using containers. DIaaS model backbone is Asterism, an integrated solution for running data-intensive stream-based applications on heterogeneous systems, which combines the benefits of dispel4py with Pegasus workflow systems. The stream-based executions of an Asterism workflow are managed by dispel4py, while the data movement between different e-Infrastructures, and the coordination of the application execution are automatically managed by Pegasus. DIaaS combines Asterism framework with Docker containers to provide an integrated, complete, easy-to-use, portable approach to run data-intensive workflows on distributed platforms. Three containers integrate the DIaaS model: a Pegasus node, and an MPI and an Apache Storm clusters. Container images are described as Dockerfiles (available online at http://github.com/dispel4py/pegasus_dispel4py), linked to Docker Hub for providing continuous integration (automated image builds), and image storing and sharing. In this model, all required software (workflow systems and execution engines) for running scientific applications are packed into the containers, which significantly reduces the effort (and possible human errors) required by scientists or VRE administrators to build such systems. The most common use of DIaaS will be to act as a backend of VREs or Scientific Gateways to run data-intensive applications, deploying cloud resources upon request. We have demonstrated the feasibility of DIaaS using the data-intensive seismic ambient noise cross-correlation application (Figure 1). The application preprocesses (Phase1) and cross-correlates (Phase2) traces from several seismic stations. The application is submitted via Pegasus (Container1), and Phase1 and Phase2 are executed in the MPI (Container2) and Storm (Container3) clusters respectively. Although both phases could be executed within the same environment, this setup demonstrates the flexibility of DIaaS to run applications across e-Infrastructures. In summary, DIaaS delivers specialized software to execute data-intensive applications in a scalable, efficient, and robust manner reducing the engineering time and computational cost.
Cloud Computing with iPlant Atmosphere.
McKay, Sheldon J; Skidmore, Edwin J; LaRose, Christopher J; Mercer, Andre W; Noutsos, Christos
2013-10-15
Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. Copyright © 2013 John Wiley & Sons, Inc.
Privacy authentication using key attribute-based encryption in mobile cloud computing
NASA Astrophysics Data System (ADS)
Mohan Kumar, M.; Vijayan, R.
2017-11-01
Mobile Cloud Computing is becoming more popular in nowadays were users of smartphones are getting increased. So, the security level of cloud computing as to be increased. Privacy Authentication using key-attribute based encryption helps the users for business development were the data sharing with the organization using the cloud in a secured manner. In Privacy Authentication the sender of data will have permission to add their receivers to whom the data access provided for others the access denied. In sender application, the user can choose the file which is to be sent to receivers and then that data will be encrypted using Key-attribute based encryption using AES algorithm. In which cipher created, and that stored in Amazon Cloud along with key value and the receiver list.
State of the Art of Network Security Perspectives in Cloud Computing
NASA Astrophysics Data System (ADS)
Oh, Tae Hwan; Lim, Shinyoung; Choi, Young B.; Park, Kwang-Roh; Lee, Heejo; Choi, Hyunsang
Cloud computing is now regarded as one of social phenomenon that satisfy customers' needs. It is possible that the customers' needs and the primary principle of economy - gain maximum benefits from minimum investment - reflects realization of cloud computing. We are living in the connected society with flood of information and without connected computers to the Internet, our activities and work of daily living will be impossible. Cloud computing is able to provide customers with custom-tailored features of application software and user's environment based on the customer's needs by adopting on-demand outsourcing of computing resources through the Internet. It also provides cloud computing users with high-end computing power and expensive application software package, and accordingly the users will access their data and the application software where they are located at the remote system. As the cloud computing system is connected to the Internet, network security issues of cloud computing are considered as mandatory prior to real world service. In this paper, survey and issues on the network security in cloud computing are discussed from the perspective of real world service environments.
CloudMan as a platform for tool, data, and analysis distribution
2012-01-01
Background Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. Results CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. Conclusions With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions. PMID:23181507
The Ethics of Cloud Computing.
de Bruin, Boudewijn; Floridi, Luciano
2017-02-01
Cloud computing is rapidly gaining traction in business. It offers businesses online services on demand (such as Gmail, iCloud and Salesforce) and allows them to cut costs on hardware and IT support. This is the first paper in business ethics dealing with this new technology. It analyzes the informational duties of hosting companies that own and operate cloud computing datacentres (e.g., Amazon). It considers the cloud services providers leasing 'space in the cloud' from hosting companies (e.g., Dropbox, Salesforce). And it examines the business and private 'clouders' using these services. The first part of the paper argues that hosting companies, services providers and clouders have mutual informational (epistemic) obligations to provide and seek information about relevant issues such as consumer privacy, reliability of services, data mining and data ownership. The concept of interlucency is developed as an epistemic virtue governing ethically effective communication. The second part considers potential forms of government restrictions on or proscriptions against the development and use of cloud computing technology. Referring to the concept of technology neutrality, it argues that interference with hosting companies and cloud services providers is hardly ever necessary or justified. It is argued, too, however, that businesses using cloud services (e.g., banks, law firms, hospitals etc. storing client data in the cloud) will have to follow rather more stringent regulations.
NASA Astrophysics Data System (ADS)
Marinos, Alexandros; Briscoe, Gerard
Cloud Computing is rising fast, with its data centres growing at an unprecedented rate. However, this has come with concerns over privacy, efficiency at the expense of resilience, and environmental sustainability, because of the dependence on Cloud vendors such as Google, Amazon and Microsoft. Our response is an alternative model for the Cloud conceptualisation, providing a paradigm for Clouds in the community, utilising networked personal computers for liberation from the centralised vendor model. Community Cloud Computing (C3) offers an alternative architecture, created by combing the Cloud with paradigms from Grid Computing, principles from Digital Ecosystems, and sustainability from Green Computing, while remaining true to the original vision of the Internet. It is more technically challenging than Cloud Computing, having to deal with distributed computing issues, including heterogeneous nodes, varying quality of service, and additional security constraints. However, these are not insurmountable challenges, and with the need to retain control over our digital lives and the potential environmental consequences, it is a challenge we must pursue.
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Cloud Technology May Widen Genomic Bottleneck - TCGA
Computational biologist Dr. Ilya Shmulevich suggests that renting cloud computing power might widen the bottleneck for analyzing genomic data. Learn more about his experience with the Cloud in this TCGA in Action Case Study.
Privacy-preserving public auditing for data integrity in cloud
NASA Astrophysics Data System (ADS)
Shaik Saleem, M.; Murali, M.
2018-04-01
Cloud computing which has collected extent concentration from communities of research and with industry research development, a large pool of computing resources using virtualized sharing method like storage, processing power, applications and services. The users of cloud are vend with on demand resources as they want in the cloud computing. Outsourced file of the cloud user can easily tampered as it is stored at the third party service providers databases, so there is no integrity of cloud users data as it has no control on their data, therefore providing security assurance to the users data has become one of the primary concern for the cloud service providers. Cloud servers are not responsible for any data loss as it doesn’t provide the security assurance to the cloud user data. Remote data integrity checking (RDIC) licenses an information to data storage server, to determine that it is really storing an owners data truthfully. RDIC is composed of security model and ID-based RDIC where it is responsible for the security of every server and make sure the data privacy of cloud user against the third party verifier. Generally, by running a two-party Remote data integrity checking (RDIC) protocol the clients would themselves be able to check the information trustworthiness of their cloud. Within the two party scenario the verifying result is given either from the information holder or the cloud server may be considered as one-sided. Public verifiability feature of RDIC gives the privilege to all its users to verify whether the original data is modified or not. To ensure the transparency of the publicly verifiable RDIC protocols, Let’s figure out there exists a TPA who is having knowledge and efficiency to verify the work to provide the condition clearly by publicly verifiable RDIC protocols.
HammerCloud: A Stress Testing System for Distributed Analysis
NASA Astrophysics Data System (ADS)
van der Ster, Daniel C.; Elmsheuser, Johannes; Úbeda García, Mario; Paladin, Massimo
2011-12-01
Distributed analysis of LHC data is an I/O-intensive activity which places large demands on the internal network, storage, and local disks at remote computing facilities. Commissioning and maintaining a site to provide an efficient distributed analysis service is therefore a challenge which can be aided by tools to help evaluate a variety of infrastructure designs and configurations. HammerCloud is one such tool; it is a stress testing service which is used by central operations teams, regional coordinators, and local site admins to (a) submit arbitrary number of analysis jobs to a number of sites, (b) maintain at a steady-state a predefined number of jobs running at the sites under test, (c) produce web-based reports summarizing the efficiency and performance of the sites under test, and (d) present a web-interface for historical test results to both evaluate progress and compare sites. HammerCloud was built around the distributed analysis framework Ganga, exploiting its API for grid job management. HammerCloud has been employed by the ATLAS experiment for continuous testing of many sites worldwide, and also during large scale computing challenges such as STEP'09 and UAT'09, where the scale of the tests exceeded 10,000 concurrently running and 1,000,000 total jobs over multi-day periods. In addition, HammerCloud is being adopted by the CMS experiment; the plugin structure of HammerCloud allows the execution of CMS jobs using their official tool (CRAB).
Unidata Cyberinfrastructure in the Cloud
NASA Astrophysics Data System (ADS)
Ramamurthy, M. K.; Young, J. W.
2016-12-01
Data services, software, and user support are critical components of geosciences cyber-infrastructure to help researchers to advance science. With the maturity of and significant advances in cloud computing, it has recently emerged as an alternative new paradigm for developing and delivering a broad array of services over the Internet. Cloud computing is now mature enough in usability in many areas of science and education, bringing the benefits of virtualized and elastic remote services to infrastructure, software, computation, and data. Cloud environments reduce the amount of time and money spent to procure, install, and maintain new hardware and software, and reduce costs through resource pooling and shared infrastructure. Given the enormous potential of cloud-based services, Unidata has been moving to augment its software, services, data delivery mechanisms to align with the cloud-computing paradigm. To realize the above vision, Unidata has worked toward: * Providing access to many types of data from a cloud (e.g., via the THREDDS Data Server, RAMADDA and EDEX servers); * Deploying data-proximate tools to easily process, analyze, and visualize those data in a cloud environment cloud for consumption by any one, by any device, from anywhere, at any time; * Developing and providing a range of pre-configured and well-integrated tools and services that can be deployed by any university in their own private or public cloud settings. Specifically, Unidata has developed Docker for "containerized applications", making them easy to deploy. Docker helps to create "disposable" installs and eliminates many configuration challenges. Containerized applications include tools for data transport, access, analysis, and visualization: THREDDS Data Server, Integrated Data Viewer, GEMPAK, Local Data Manager, RAMADDA Data Server, and Python tools; * Leveraging Jupyter as a central platform and hub with its powerful set of interlinking tools to connect interactively data servers, Python scientific libraries, scripts, and workflows; * Exploring end-to-end modeling and prediction capabilities in the cloud; * Partnering with NOAA and public cloud vendors (e.g., Amazon and OCC) on the NOAA Big Data Project to harness their capabilities and resources for the benefit of the academic community.
Infrared Data for Storm Analysis
NASA Technical Reports Server (NTRS)
Adler, R.
1982-01-01
The papers in this section include: 1) 'Thunderstorm Top Structure Observed by Aircraft Overflights with an Infrared Radiometer'; 2) 'Thunderstorm Intensity as Determined from Satellite Data'; 3) 'Relation of Satellite-Based Thunderstorm Intensity to Radar-Estimated Rainfall'; 4) 'A Simple Physical Basis for Relating Geosynchronous Satellite Infrared Observations to Thunderstorm Rainfall'; 5) 'Satellite-Observed Cloud-Top Height Changes in Tornadic Thunderstorms'; 6) 'Predicting Tropical Cyclone Intensity Using Satellite-Measured Equivalent Blackbody Temperatures of Cloud Tops'.
Visual Analysis of Cloud Computing Performance Using Behavioral Lines.
Muelder, Chris; Zhu, Biao; Chen, Wei; Zhang, Hongxin; Ma, Kwan-Liu
2016-02-29
Cloud computing is an essential technology to Big Data analytics and services. A cloud computing system is often comprised of a large number of parallel computing and storage devices. Monitoring the usage and performance of such a system is important for efficient operations, maintenance, and security. Tracing every application on a large cloud system is untenable due to scale and privacy issues. But profile data can be collected relatively efficiently by regularly sampling the state of the system, including properties such as CPU load, memory usage, network usage, and others, creating a set of multivariate time series for each system. Adequate tools for studying such large-scale, multidimensional data are lacking. In this paper, we present a visual based analysis approach to understanding and analyzing the performance and behavior of cloud computing systems. Our design is based on similarity measures and a layout method to portray the behavior of each compute node over time. When visualizing a large number of behavioral lines together, distinct patterns often appear suggesting particular types of performance bottleneck. The resulting system provides multiple linked views, which allow the user to interactively explore the data by examining the data or a selected subset at different levels of detail. Our case studies, which use datasets collected from two different cloud systems, show that this visual based approach is effective in identifying trends and anomalies of the systems.
A Secure and Efficient Audit Mechanism for Dynamic Shared Data in Cloud Storage
2014-01-01
With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data. PMID:24959630
A secure and efficient audit mechanism for dynamic shared data in cloud storage.
Kwon, Ohmin; Koo, Dongyoung; Shin, Yongjoo; Yoon, Hyunsoo
2014-01-01
With popularization of cloud services, multiple users easily share and update their data through cloud storage. For data integrity and consistency in the cloud storage, the audit mechanisms were proposed. However, existing approaches have some security vulnerabilities and require a lot of computational overheads. This paper proposes a secure and efficient audit mechanism for dynamic shared data in cloud storage. The proposed scheme prevents a malicious cloud service provider from deceiving an auditor. Moreover, it devises a new index table management method and reduces the auditing cost by employing less complex operations. We prove the resistance against some attacks and show less computation cost and shorter time for auditing when compared with conventional approaches. The results present that the proposed scheme is secure and efficient for cloud storage services managing dynamic shared data.
Cloud computing can simplify HIT infrastructure management.
Glaser, John
2011-08-01
Software as a Service (SaaS), built on cloud computing technology, is emerging as the forerunner in IT infrastructure because it helps healthcare providers reduce capital investments. Cloud computing leads to predictable, monthly, fixed operating expenses for hospital IT staff. Outsourced cloud computing facilities are state-of-the-art data centers boasting some of the most sophisticated networking equipment on the market. The SaaS model helps hospitals safeguard against technology obsolescence, minimizes maintenance requirements, and simplifies management.
NASA Astrophysics Data System (ADS)
Panitkin, Sergey; Barreiro Megino, Fernando; Caballero Bejar, Jose; Benjamin, Doug; Di Girolamo, Alessandro; Gable, Ian; Hendrix, Val; Hover, John; Kucharczyk, Katarzyna; Medrano Llamas, Ramon; Love, Peter; Ohman, Henrik; Paterson, Michael; Sobie, Randall; Taylor, Ryan; Walker, Rodney; Zaytsev, Alexander; Atlas Collaboration
2014-06-01
The computing model of the ATLAS experiment was designed around the concept of grid computing and, since the start of data taking, this model has proven very successful. However, new cloud computing technologies bring attractive features to improve the operations and elasticity of scientific distributed computing. ATLAS sees grid and cloud computing as complementary technologies that will coexist at different levels of resource abstraction, and two years ago created an R&D working group to investigate the different integration scenarios. The ATLAS Cloud Computing R&D has been able to demonstrate the feasibility of offloading work from grid to cloud sites and, as of today, is able to integrate transparently various cloud resources into the PanDA workload management system. The ATLAS Cloud Computing R&D is operating various PanDA queues on private and public resources and has provided several hundred thousand CPU days to the experiment. As a result, the ATLAS Cloud Computing R&D group has gained a significant insight into the cloud computing landscape and has identified points that still need to be addressed in order to fully utilize this technology. This contribution will explain the cloud integration models that are being evaluated and will discuss ATLAS' learning during the collaboration with leading commercial and academic cloud providers.
Dalpé, Gratien; Joly, Yann
2014-09-01
Healthcare-related bioinformatics databases are increasingly offering the possibility to maintain, organize, and distribute DNA sequencing data. Different national and international institutions are currently hosting such databases that offer researchers website platforms where they can obtain sequencing data on which they can perform different types of analysis. Until recently, this process remained mostly one-dimensional, with most analysis concentrated on a limited amount of data. However, newer genome sequencing technology is producing a huge amount of data that current computer facilities are unable to handle. An alternative approach has been to start adopting cloud computing services for combining the information embedded in genomic and model system biology data, patient healthcare records, and clinical trials' data. In this new technological paradigm, researchers use virtual space and computing power from existing commercial or not-for-profit cloud service providers to access, store, and analyze data via different application programming interfaces. Cloud services are an alternative to the need of larger data storage; however, they raise different ethical, legal, and social issues. The purpose of this Commentary is to summarize how cloud computing can contribute to bioinformatics-based drug discovery and to highlight some of the outstanding legal, ethical, and social issues that are inherent in the use of cloud services. © 2014 Wiley Periodicals, Inc.
Fusion of light-field and photogrammetric surface form data
NASA Astrophysics Data System (ADS)
Sims-Waterhouse, Danny; Piano, Samanta; Leach, Richard K.
2017-08-01
Photogrammetry based systems are able to produce 3D reconstructions of an object given a set of images taken from different orientations. In this paper, we implement a light-field camera within a photogrammetry system in order to capture additional depth information, as well as the photogrammetric point cloud. Compared to a traditional camera that only captures the intensity of the incident light, a light-field camera also provides angular information for each pixel. In principle, this additional information allows 2D images to be reconstructed at a given focal plane, and hence a depth map can be computed. Through the fusion of light-field and photogrammetric data, we show that it is possible to improve the measurement uncertainty of a millimetre scale 3D object, compared to that from the individual systems. By imaging a series of test artefacts from various positions, individual point clouds were produced from depth-map information and triangulation of corresponding features between images. Using both measurements, data fusion methods were implemented in order to provide a single point cloud with reduced measurement uncertainty.
Managing Laboratory Data Using Cloud Computing as an Organizational Tool
ERIC Educational Resources Information Center
Bennett, Jacqueline; Pence, Harry E.
2011-01-01
One of the most significant difficulties encountered when directing undergraduate research and developing new laboratory experiments is how to efficiently manage the data generated by a number of students. Cloud computing, where both software and computer files reside online, offers a solution to this data-management problem and allows researchers…
Cloud access to interoperable IVOA-compliant VOSpace storage
NASA Astrophysics Data System (ADS)
Bertocco, S.; Dowler, P.; Gaudet, S.; Major, B.; Pasian, F.; Taffoni, G.
2018-07-01
Handling, processing and archiving the huge amount of data produced by the new generation of experiments and instruments in Astronomy and Astrophysics are among the more exciting challenges to address in designing the future data management infrastructures and computing services. We investigated the feasibility of a data management and computation infrastructure, available world-wide, with the aim of merging the FAIR data management provided by IVOA standards with the efficiency and reliability of a cloud approach. Our work involved the Canadian Advanced Network for Astronomy Research (CANFAR) infrastructure and the European EGI federated cloud (EFC). We designed and deployed a pilot data management and computation infrastructure that provides IVOA-compliant VOSpace storage resources and wide access to interoperable federated clouds. In this paper, we detail the main user requirements covered, the technical choices and the implemented solutions and we describe the resulting Hybrid cloud Worldwide infrastructure, its benefits and limitations.
Unidata's Vision for Transforming Geoscience by Moving Data Services and Software to the Cloud
NASA Astrophysics Data System (ADS)
Ramamurthy, M. K.; Fisher, W.; Yoksas, T.
2014-12-01
Universities are facing many challenges: shrinking budgets, rapidly evolving information technologies, exploding data volumes, multidisciplinary science requirements, and high student expectations. These changes are upending traditional approaches to accessing and using data and software. It is clear that Unidata's products and services must evolve to support new approaches to research and education. After years of hype and ambiguity, cloud computing is maturing in usability in many areas of science and education, bringing the benefits of virtualized and elastic remote services to infrastructure, software, computation, and data. Cloud environments reduce the amount of time and money spent to procure, install, and maintain new hardware and software, and reduce costs through resource pooling and shared infrastructure. Cloud services aimed at providing any resource, at any time, from any place, using any device are increasingly being embraced by all types of organizations. Given this trend and the enormous potential of cloud-based services, Unidata is taking moving to augment its products, services, data delivery mechanisms and applications to align with the cloud-computing paradigm. Specifically, Unidata is working toward establishing a community-based development environment that supports the creation and use of software services to build end-to-end data workflows. The design encourages the creation of services that can be broken into small, independent chunks that provide simple capabilities. Chunks could be used individually to perform a task, or chained into simple or elaborate workflows. The services will also be portable, allowing their use in researchers' own cloud-based computing environments. In this talk, we present a vision for Unidata's future in a cloud-enabled data services and discuss our initial efforts to deploy a subset of Unidata data services and tools in the Amazon EC2 and Microsoft Azure cloud environments, including the transfer of real-time meteorological data into its cloud instances, product generation using those data, and the deployment of TDS, McIDAS ADDE and AWIPS II data servers and the Integrated Data Server visualization tool.
Energy Consumption Management of Virtual Cloud Computing Platform
NASA Astrophysics Data System (ADS)
Li, Lin
2017-11-01
For energy consumption management research on virtual cloud computing platforms, energy consumption management of virtual computers and cloud computing platform should be understood deeper. Only in this way can problems faced by energy consumption management be solved. In solving problems, the key to solutions points to data centers with high energy consumption, so people are in great need to use a new scientific technique. Virtualization technology and cloud computing have become powerful tools in people’s real life, work and production because they have strong strength and many advantages. Virtualization technology and cloud computing now is in a rapid developing trend. It has very high resource utilization rate. In this way, the presence of virtualization and cloud computing technologies is very necessary in the constantly developing information age. This paper has summarized, explained and further analyzed energy consumption management questions of the virtual cloud computing platform. It eventually gives people a clearer understanding of energy consumption management of virtual cloud computing platform and brings more help to various aspects of people’s live, work and son on.
Signal and image processing algorithm performance in a virtual and elastic computing environment
NASA Astrophysics Data System (ADS)
Bennett, Kelly W.; Robertson, James
2013-05-01
The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
Practical implementation of tetrahedral mesh reconstruction in emission tomography
Boutchko, R.; Sitek, A.; Gullberg, G. T.
2014-01-01
This paper presents a practical implementation of image reconstruction on tetrahedral meshes optimized for emission computed tomography with parallel beam geometry. Tetrahedral mesh built on a point cloud is a convenient image representation method, intrinsically three-dimensional and with a multi-level resolution property. Image intensities are defined at the mesh nodes and linearly interpolated inside each tetrahedron. For the given mesh geometry, the intensities can be computed directly from tomographic projections using iterative reconstruction algorithms with a system matrix calculated using an exact analytical formula. The mesh geometry is optimized for a specific patient using a two stage process. First, a noisy image is reconstructed on a finely-spaced uniform cloud. Then, the geometry of the representation is adaptively transformed through boundary-preserving node motion and elimination. Nodes are removed in constant intensity regions, merged along the boundaries, and moved in the direction of the mean local intensity gradient in order to provide higher node density in the boundary regions. Attenuation correction and detector geometric response are included in the system matrix. Once the mesh geometry is optimized, it is used to generate the final system matrix for ML-EM reconstruction of node intensities and for visualization of the reconstructed images. In dynamic PET or SPECT imaging, the system matrix generation procedure is performed using a quasi-static sinogram, generated by summing projection data from multiple time frames. This system matrix is then used to reconstruct the individual time frame projections. Performance of the new method is evaluated by reconstructing simulated projections of the NCAT phantom and the method is then applied to dynamic SPECT phantom and patient studies and to a dynamic microPET rat study. Tetrahedral mesh-based images are compared to the standard voxel-based reconstruction for both high and low signal-to-noise ratio projection datasets. The results demonstrate that the reconstructed images represented as tetrahedral meshes based on point clouds offer image quality comparable to that achievable using a standard voxel grid while allowing substantial reduction in the number of unknown intensities to be reconstructed and reducing the noise. PMID:23588373
Practical implementation of tetrahedral mesh reconstruction in emission tomography
NASA Astrophysics Data System (ADS)
Boutchko, R.; Sitek, A.; Gullberg, G. T.
2013-05-01
This paper presents a practical implementation of image reconstruction on tetrahedral meshes optimized for emission computed tomography with parallel beam geometry. Tetrahedral mesh built on a point cloud is a convenient image representation method, intrinsically three-dimensional and with a multi-level resolution property. Image intensities are defined at the mesh nodes and linearly interpolated inside each tetrahedron. For the given mesh geometry, the intensities can be computed directly from tomographic projections using iterative reconstruction algorithms with a system matrix calculated using an exact analytical formula. The mesh geometry is optimized for a specific patient using a two stage process. First, a noisy image is reconstructed on a finely-spaced uniform cloud. Then, the geometry of the representation is adaptively transformed through boundary-preserving node motion and elimination. Nodes are removed in constant intensity regions, merged along the boundaries, and moved in the direction of the mean local intensity gradient in order to provide higher node density in the boundary regions. Attenuation correction and detector geometric response are included in the system matrix. Once the mesh geometry is optimized, it is used to generate the final system matrix for ML-EM reconstruction of node intensities and for visualization of the reconstructed images. In dynamic PET or SPECT imaging, the system matrix generation procedure is performed using a quasi-static sinogram, generated by summing projection data from multiple time frames. This system matrix is then used to reconstruct the individual time frame projections. Performance of the new method is evaluated by reconstructing simulated projections of the NCAT phantom and the method is then applied to dynamic SPECT phantom and patient studies and to a dynamic microPET rat study. Tetrahedral mesh-based images are compared to the standard voxel-based reconstruction for both high and low signal-to-noise ratio projection datasets. The results demonstrate that the reconstructed images represented as tetrahedral meshes based on point clouds offer image quality comparable to that achievable using a standard voxel grid while allowing substantial reduction in the number of unknown intensities to be reconstructed and reducing the noise.
Terrestrial laser scanning in monitoring of anthropogenic objects
NASA Astrophysics Data System (ADS)
Zaczek-Peplinska, Janina; Kowalska, Maria
2017-12-01
The registered xyz coordinates in the form of a point cloud captured by terrestrial laser scanner and the intensity values (I) assigned to them make it possible to perform geometric and spectral analyses. Comparison of point clouds registered in different time periods requires conversion of the data to a common coordinate system and proper data selection is necessary. Factors like point distribution dependant on the distance between the scanner and the surveyed surface, angle of incidence, tasked scan's density and intensity value have to be taken into consideration. A prerequisite for running a correct analysis of the obtained point clouds registered during periodic measurements using a laser scanner is the ability to determine the quality and accuracy of the analysed data. The article presents a concept of spectral data adjustment based on geometric analysis of a surface as well as examples of geometric analyses integrating geometric and physical data in one cloud of points: cloud point coordinates, recorded intensity values, and thermal images of an object. The experiments described here show multiple possibilities of usage of terrestrial laser scanning data and display the necessity of using multi-aspect and multi-source analyses in anthropogenic object monitoring. The article presents examples of multisource data analyses with regard to Intensity value correction due to the beam's incidence angle. The measurements were performed using a Leica Nova MS50 scanning total station, Z+F Imager 5010 scanner and the integrated Z+F T-Cam thermal camera.
USGEO DMWG Cloud Computing Recommendations
NASA Astrophysics Data System (ADS)
de la Beaujardiere, J.; McInerney, M.; Frame, M. T.; Summers, C.
2017-12-01
The US Group on Earth Observations (USGEO) Data Management Working Group (DMWG) has been developing Cloud Computing Recommendations for Earth Observations. This inter-agency report is currently in draft form; DMWG hopes to have released the report as a public Request for Information (RFI) by the time of AGU. The recommendations are geared toward organizations that have already decided to use the Cloud for some of their activities (i.e., the focus is not on "why you should use the Cloud," but rather "If you plan to use the Cloud, consider these suggestions.") The report comprises Introductory Material, including Definitions, Potential Cloud Benefits, and Potential Cloud Disadvantages, followed by Recommendations in several areas: Assessing When to Use the Cloud, Transferring Data to the Cloud, Data and Metadata Contents, Developing Applications in the Cloud, Cost Minimization, Security Considerations, Monitoring and Metrics, Agency Support, and Earth Observations-specific recommendations. This talk will summarize the recommendations and invite comment on the RFI.
Survey on Security Issues in Cloud Computing and Associated Mitigation Techniques
NASA Astrophysics Data System (ADS)
Bhadauria, Rohit; Sanyal, Sugata
2012-06-01
Cloud Computing holds the potential to eliminate the requirements for setting up of high-cost computing infrastructure for IT-based solutions and services that the industry uses. It promises to provide a flexible IT architecture, accessible through internet for lightweight portable devices. This would allow multi-fold increase in the capacity or capabilities of the existing and new software. In a cloud computing environment, the entire data reside over a set of networked resources, enabling the data to be accessed through virtual machines. Since these data-centers may lie in any corner of the world beyond the reach and control of users, there are multifarious security and privacy challenges that need to be understood and taken care of. Also, one can never deny the possibility of a server breakdown that has been witnessed, rather quite often in the recent times. There are various issues that need to be dealt with respect to security and privacy in a cloud computing scenario. This extensive survey paper aims to elaborate and analyze the numerous unresolved issues threatening the cloud computing adoption and diffusion affecting the various stake-holders linked to it.
Formation of a knudsen layer in electronically induced desorption
NASA Astrophysics Data System (ADS)
Sibold, D.; Urbassek, H. M.
1992-10-01
For intense desorption fluxes, particles desorbed by electronic transitions (DIET) from a surface into a vacuum may thermalize in the gas cloud forming above the surface. In immediate vicinity to the surface, however, a non-equilibrium layer (the Knudsen layer) exists which separates the recently desorbed, non-thermal particles from the thermalized gas cloud. We investigate by Monte Carlo computer simulation the time it takes to form a Knudsen layer, and its properties. It is found that a Knudsen layer, and thus also a thermalized gas cloud, is formed after around 200 mean free flight times of the desorbing particles, corresponding to a desorption of 20 monolayers. At the end of the Knudsen layer, the gas density will be higher, and the flow velocity and temperature smaller, than literature values indicate for thermal desorption. These data are of fundamental interest for the modeling of gas-kinetic and gas-dynamic effects in DIET.
Archive Management of NASA Earth Observation Data to Support Cloud Analysis
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Baynes, Kathleen; McInerney, Mark A.
2017-01-01
NASA collects, processes and distributes petabytes of Earth Observation (EO) data from satellites, aircraft, in situ instruments and model output, with an order of magnitude increase expected by 2024. Cloud-based web object storage (WOS) of these data can simplify the execution of such an increase. More importantly, it can also facilitate user analysis of those volumes by making the data available to the massively parallel computing power in the cloud. However, storing EO data in cloud WOS has a ripple effect throughout the NASA archive system with unexpected challenges and opportunities. One challenge is modifying data servicing software (such as Web Coverage Service servers) to access and subset data that are no longer on a directly accessible file system, but rather in cloud WOS. Opportunities include refactoring of the archive software to a cloud-native architecture; virtualizing data products by computing on demand; and reorganizing data to be more analysis-friendly.
Human face recognition using eigenface in cloud computing environment
NASA Astrophysics Data System (ADS)
Siregar, S. T. M.; Syahputra, M. F.; Rahmat, R. F.
2018-02-01
Doing a face recognition for one single face does not take a long time to process, but if we implement attendance system or security system on companies that have many faces to be recognized, it will take a long time. Cloud computing is a computing service that is done not on a local device, but on an internet connected to a data center infrastructure. The system of cloud computing also provides a scalability solution where cloud computing can increase the resources needed when doing larger data processing. This research is done by applying eigenface while collecting data as training data is also done by using REST concept to provide resource, then server can process the data according to existing stages. After doing research and development of this application, it can be concluded by implementing Eigenface, recognizing face by applying REST concept as endpoint in giving or receiving related information to be used as a resource in doing model formation to do face recognition.
The Development of an Educational Cloud for IS Curriculum through a Student-Run Data Center
ERIC Educational Resources Information Center
Hwang, Drew; Pike, Ron; Manson, Dan
2016-01-01
The industry-wide emphasis on cloud computing has created a new focus in Information Systems (IS) education. As the demand for graduates with adequate knowledge and skills in cloud computing is on the rise, IS educators are facing a challenge to integrate cloud technology into their curricula. Although public cloud tools and services are available…
WE-B-BRD-01: Innovation in Radiation Therapy Planning II: Cloud Computing in RT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moore, K; Kagadis, G; Xing, L
As defined by the National Institute of Standards and Technology, cloud computing is “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” Despite the omnipresent role of computers in radiotherapy, cloud computing has yet to achieve widespread adoption in clinical or research applications, though the transition to such “on-demand” access is underway. As this transition proceeds, new opportunities for aggregate studies and efficient use of computational resources are set againstmore » new challenges in patient privacy protection, data integrity, and management of clinical informatics systems. In this Session, current and future applications of cloud computing and distributed computational resources will be discussed in the context of medical imaging, radiotherapy research, and clinical radiation oncology applications. Learning Objectives: Understand basic concepts of cloud computing. Understand how cloud computing could be used for medical imaging applications. Understand how cloud computing could be employed for radiotherapy research.4. Understand how clinical radiotherapy software applications would function in the cloud.« less
Lebeda, Frank J; Zalatoris, Jeffrey J; Scheerer, Julia B
2018-02-07
This position paper summarizes the development and the present status of Department of Defense (DoD) and other government policies and guidances regarding cloud computing services. Due to the heterogeneous and growing biomedical big datasets, cloud computing services offer an opportunity to mitigate the associated storage and analysis requirements. Having on-demand network access to a shared pool of flexible computing resources creates a consolidated system that should reduce potential duplications of effort in military biomedical research. Interactive, online literature searches were performed with Google, at the Defense Technical Information Center, and at two National Institutes of Health research portfolio information sites. References cited within some of the collected documents also served as literature resources. We gathered, selected, and reviewed DoD and other government cloud computing policies and guidances published from 2009 to 2017. These policies were intended to consolidate computer resources within the government and reduce costs by decreasing the number of federal data centers and by migrating electronic data to cloud systems. Initial White House Office of Management and Budget information technology guidelines were developed for cloud usage, followed by policies and other documents from the DoD, the Defense Health Agency, and the Armed Services. Security standards from the National Institute of Standards and Technology, the Government Services Administration, the DoD, and the Army were also developed. Government Services Administration and DoD Inspectors General monitored cloud usage by the DoD. A 2016 Government Accountability Office report characterized cloud computing as being economical, flexible and fast. A congressionally mandated independent study reported that the DoD was active in offering a wide selection of commercial cloud services in addition to its milCloud system. Our findings from the Department of Health and Human Services indicated that the security infrastructure in cloud services may be more compliant with the Health Insurance Portability and Accountability Act of 1996 regulations than traditional methods. To gauge the DoD's adoption of cloud technologies proposed metrics included cost factors, ease of use, automation, availability, accessibility, security, and policy compliance. Since 2009, plans and policies were developed for the use of cloud technology to help consolidate and reduce the number of data centers which were expected to reduce costs, improve environmental factors, enhance information technology security, and maintain mission support for service members. Cloud technologies were also expected to improve employee efficiency and productivity. Federal cloud computing policies within the last decade also offered increased opportunities to advance military healthcare. It was assumed that these opportunities would benefit consumers of healthcare and health science data by allowing more access to centralized cloud computer facilities to store, analyze, search and share relevant data, to enhance standardization, and to reduce potential duplications of effort. We recommend that cloud computing be considered by DoD biomedical researchers for increasing connectivity, presumably by facilitating communications and data sharing, among the various intra- and extramural laboratories. We also recommend that policies and other guidances be updated to include developing additional metrics that will help stakeholders evaluate the above mentioned assumptions and expectations. Published by Oxford University Press on behalf of the Association of Military Surgeons of the United States 2018. This work is written by (a) US Government employee(s) and is in the public domain in the US.
The Potentials of Using Cloud Computing in Schools: A Systematic Literature Review
ERIC Educational Resources Information Center
Hartmann, Simon Birk; Braae, Lotte Qulleq Nygaard; Pedersen, Sine; Khalid, Md. Saifuddin
2017-01-01
Cloud Computing (CC) refers to the physical structure of a communications network, where data is stored in large data centers and can be accessed anywhere, at any time, and from different devices. This systematic literature review identifies and categorizes the potential and barriers of cloud-based teaching in schools from an international…
'Cloud computing' and clinical trials: report from an ECRIN workshop.
Ohmann, Christian; Canham, Steve; Danielyan, Edgar; Robertshaw, Steve; Legré, Yannick; Clivio, Luca; Demotes, Jacques
2015-07-29
Growing use of cloud computing in clinical trials prompted the European Clinical Research Infrastructures Network, a European non-profit organisation established to support multinational clinical research, to organise a one-day workshop on the topic to clarify potential benefits and risks. The issues that arose in that workshop are summarised and include the following: the nature of cloud computing and the cloud computing industry; the risks in using cloud computing services now; the lack of explicit guidance on this subject, both generally and with reference to clinical trials; and some possible ways of reducing risks. There was particular interest in developing and using a European 'community cloud' specifically for academic clinical trial data. It was recognised that the day-long workshop was only the start of an ongoing process. Future discussion needs to include clarification of trial-specific regulatory requirements for cloud computing and involve representatives from the relevant regulatory bodies.
Cloud Computing - A Unified Approach for Surveillance Issues
NASA Astrophysics Data System (ADS)
Rachana, C. R.; Banu, Reshma, Dr.; Ahammed, G. F. Ali, Dr.; Parameshachari, B. D., Dr.
2017-08-01
Cloud computing describes highly scalable resources provided as an external service via the Internet on a basis of pay-per-use. From the economic point of view, the main attractiveness of cloud computing is that users only use what they need, and only pay for what they actually use. Resources are available for access from the cloud at any time, and from any location through networks. Cloud computing is gradually replacing the traditional Information Technology Infrastructure. Securing data is one of the leading concerns and biggest issue for cloud computing. Privacy of information is always a crucial pointespecially when an individual’s personalinformation or sensitive information is beingstored in the organization. It is indeed true that today; cloud authorization systems are notrobust enough. This paper presents a unified approach for analyzing the various security issues and techniques to overcome the challenges in the cloud environment.
Virtual Geophysics Laboratory: Exploiting the Cloud and Empowering Geophysicsts
NASA Astrophysics Data System (ADS)
Fraser, Ryan; Vote, Josh; Goh, Richard; Cox, Simon
2013-04-01
Over the last five decades geoscientists from Australian state and federal agencies have collected and assembled around 3 Petabytes of geoscience data sets under public funding. As a consequence of technological progress, data is now being acquired at exponential rates and in higher resolution than ever before. Effective use of these big data sets challenges the storage and computational infrastructure of most organizations. The Virtual Geophysics Laboratory (VGL) is a scientific workflow portal addresses some of the resulting issues by providing Australian geophysicists with access to a Web 2.0 or Rich Internet Application (RIA) based integrated environment that exploits eResearch tools and Cloud computing technology, and promotes collaboration between the user community. VGL simplifies and automates large portions of what were previously manually intensive scientific workflow processes, allowing scientists to focus on the natural science problems, rather than computer science and IT. A number of geophysical processing codes are incorporated to support multiple workflows. For example a gravity inversion can be performed by combining the Escript/Finley codes (from the University of Queensland) with the gravity data registered in VGL. Likewise, tectonic processes can also be modeled by combining the Underworld code (from Monash University) with one of the various 3D models available to VGL. Cloud services provide scalable and cost effective compute resources. VGL is built on top of mature standards-compliant information services, many deployed using the Spatial Information Services Stack (SISS), which provides direct access to geophysical data. A large number of data sets from Geoscience Australia assist users in data discovery. GeoNetwork provides a metadata catalog to store workflow results for future use, discovery and provenance tracking. VGL has been developed in collaboration with the research community using incremental software development practices and open source tools. While developed to provide the geophysics research community with a sustainable platform and scalable infrastructure; VGL has also developed a number of concepts, patterns and generic components of which have been reused for cases beyond geophysics, including natural hazards, satellite processing and other areas requiring spatial data discovery and processing. Future plans for VGL include a number of improvements in both functional and non-functional areas in response to its user community needs and advancement in information technologies. In particular, research is underway in the following areas (a) distributed and parallel workflow processing in the cloud, (b) seamless integration with various cloud providers, and (c) integration with virtual laboratories representing other science domains. Acknowledgements: VGL was developed by CSIRO in collaboration with Geoscience Australia, National Computational Infrastructure, Australia National University, Monash University and University of Queensland, and has been supported by the Australian Government's Education Investment Funds through NeCTAR.
2011-01-01
Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.
Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P
2010-01-15
A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.
SenSyF Experience on Integration of EO Services in a Generic, Cloud-Based EO Exploitation Platform
NASA Astrophysics Data System (ADS)
Almeida, Nuno; Catarino, Nuno; Gutierrez, Antonio; Grosso, Nuno; Andrade, Joao; Caumont, Herve; Goncalves, Pedro; Villa, Guillermo; Mangin, Antoine; Serra, Romain; Johnsen, Harald; Grydeland, Tom; Emsley, Stephen; Jauch, Eduardo; Moreno, Jose; Ruiz, Antonio
2016-08-01
SenSyF is a cloud-based data processing framework for EO- based services. It has been pioneer in addressing Big Data issues from the Earth Observation point of view, and is a precursor of several of the technologies and methodologies that will be deployed in ESA's Thematic Exploitation Platforms and other related systems.The SenSyF system focuses on developing fully automated data management, together with access to a processing and exploitation framework, including Earth Observation specific tools. SenSyF is both a development and validation platform for data intensive applications using Earth Observation data. With SenSyF, scientific, institutional or commercial institutions developing EO- based applications and services can take advantage of distributed computational and storage resources, tailored for applications dependent on big Earth Observation data, and without resorting to deep infrastructure and technological investments.This paper describes the integration process and the experience gathered from different EO Service providers during the project.
NASA Astrophysics Data System (ADS)
Li, J.; Zhang, T.; Huang, Q.; Liu, Q.
2014-12-01
Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
Mobile healthcare information management utilizing Cloud Computing and Android OS.
Doukas, Charalampos; Pliakas, Thomas; Maglogiannis, Ilias
2010-01-01
Cloud Computing provides functionality for managing information data in a distributed, ubiquitous and pervasive manner supporting several platforms, systems and applications. This work presents the implementation of a mobile system that enables electronic healthcare data storage, update and retrieval using Cloud Computing. The mobile application is developed using Google's Android operating system and provides management of patient health records and medical images (supporting DICOM format and JPEG2000 coding). The developed system has been evaluated using the Amazon's S3 cloud service. This article summarizes the implementation details and presents initial results of the system in practice.
NASA Technical Reports Server (NTRS)
Minnis, Patrick; Young, David F.; Sassen, Kenneth; Alvarez, Joseph M.; Grund, Christian J.
1996-01-01
Cirrus cloud radiative and physical characteristics are determined using a combination of ground based, aircraft, and satellite measurements taken as part of the First ISCCP Region Experiment (FIRE) cirrus intensive field observations (IFO) during October and November 1986. Lidar backscatter data are used with rawinsonde data to define cloud base, center and top heights and the corresponding temperatures. Coincident GOES-4 4-km visible (0.65 micrometer) and 8-km infrared window (11.5 micrometer) radiances are analyzed to determine cloud emittances and reflectances. Infrared optical depth is computed from the emittance results. Visible optical depth is derived from reflectance using a theoretical ice crystal scattering model and an empirical bidirectional reflectance model. No clouds with visible optical depths greater than 5 or infrared optical depths less than 0.1 were used in the analysis. Average cloud thickness ranged from 0.5 km to 8.0 km for the 71 scenes. Mean vertical beam emittances derived from cloud-center temperatures were 062 for all scenes compared to 0.33 for the case study (27-28 October) reflecting the thinner clouds observed for the latter scenes. Relationships between cloud emittance , extinction coefficients, and temperature for the case study are very similar to those derived from earlier surface-based studies. The thicker clouds seen during the other IFO days yield different results. Emittances derived using cloud-top temperature wer ratioed to those determined from cloud-center temperature. A nearly linear relationship between these ratios and cloud-center temperature holds promise for determining actual cloud-top temperature and cloud thickness from visible and infrared radiance pairs. The mean ratio of the visible scattering optical depth to the infrared absorption optical depth was 2.13 for these data. This scattering efficiency ratio shows a significant dependence on cloud temperature. Values of mean scattering efficiency as high as 2.6 suggest the presence of small ice particles at temperatures below 230 K. the parameterization of visible reflectance in terms of cloud optical depth and clear sky reflectance shows promise as a simplified method for interpreting visible satellite data reflected from cirrus clouds. Large uncertainties in the optical parameters due to cloud reflectance anisotropy and shading were found by analyzing data for various solar zenith angles and for simultaneous advanced very high resolution radiometer (AVHRR) data. Inhomogeneities in the cloud fields result in uneven cloud shading that apparently causes the occurrence of anomalously dark, cloud pixels in the GOES data. These shading effects complicate the interpretation of the satellite data. The results highlight the need for additional study or cirrus cloud scattering processes and remote sensing techniques.
An Efficient Virtual Machine Consolidation Scheme for Multimedia Cloud Computing.
Han, Guangjie; Que, Wenhui; Jia, Gangyong; Shu, Lei
2016-02-18
Cloud computing has innovated the IT industry in recent years, as it can delivery subscription-based services to users in the pay-as-you-go model. Meanwhile, multimedia cloud computing is emerging based on cloud computing to provide a variety of media services on the Internet. However, with the growing popularity of multimedia cloud computing, its large energy consumption cannot only contribute to greenhouse gas emissions, but also result in the rising of cloud users' costs. Therefore, the multimedia cloud providers should try to minimize its energy consumption as much as possible while satisfying the consumers' resource requirements and guaranteeing quality of service (QoS). In this paper, we have proposed a remaining utilization-aware (RUA) algorithm for virtual machine (VM) placement, and a power-aware algorithm (PA) is proposed to find proper hosts to shut down for energy saving. These two algorithms have been combined and applied to cloud data centers for completing the process of VM consolidation. Simulation results have shown that there exists a trade-off between the cloud data center's energy consumption and service-level agreement (SLA) violations. Besides, the RUA algorithm is able to deal with variable workload to prevent hosts from overloading after VM placement and to reduce the SLA violations dramatically.
An Efficient Virtual Machine Consolidation Scheme for Multimedia Cloud Computing
Han, Guangjie; Que, Wenhui; Jia, Gangyong; Shu, Lei
2016-01-01
Cloud computing has innovated the IT industry in recent years, as it can delivery subscription-based services to users in the pay-as-you-go model. Meanwhile, multimedia cloud computing is emerging based on cloud computing to provide a variety of media services on the Internet. However, with the growing popularity of multimedia cloud computing, its large energy consumption cannot only contribute to greenhouse gas emissions, but also result in the rising of cloud users’ costs. Therefore, the multimedia cloud providers should try to minimize its energy consumption as much as possible while satisfying the consumers’ resource requirements and guaranteeing quality of service (QoS). In this paper, we have proposed a remaining utilization-aware (RUA) algorithm for virtual machine (VM) placement, and a power-aware algorithm (PA) is proposed to find proper hosts to shut down for energy saving. These two algorithms have been combined and applied to cloud data centers for completing the process of VM consolidation. Simulation results have shown that there exists a trade-off between the cloud data center’s energy consumption and service-level agreement (SLA) violations. Besides, the RUA algorithm is able to deal with variable workload to prevent hosts from overloading after VM placement and to reduce the SLA violations dramatically. PMID:26901201
BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.
Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong
2013-12-01
The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.
Computational biology in the cloud: methods and new insights from computing at scale.
Kasson, Peter M
2013-01-01
The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
Electrostatic plasma lens for focusing negatively charged particle beams.
Goncharov, A A; Dobrovolskiy, A M; Dunets, S M; Litovko, I V; Gushenets, V I; Oks, E M
2012-02-01
We describe the current status of ongoing research and development of the electrostatic plasma lens for focusing and manipulating intense negatively charged particle beams, electrons, and negative ions. The physical principle of this kind of plasma lens is based on magnetic isolation electrons providing creation of a dynamical positive space charge cloud in shortly restricted volume propagating beam. Here, the new results of experimental investigations and computer simulations of wide-aperture, intense electron beam focusing by plasma lens with positive space charge cloud produced due to the cylindrical anode layer accelerator creating a positive ion stream towards an axis system is presented.
Challenges in Securing the Interface Between the Cloud and Pervasive Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lagesse, Brent J
2011-01-01
Cloud computing presents an opportunity for pervasive systems to leverage computational and storage resources to accomplish tasks that would not normally be possible on such resource-constrained devices. Cloud computing can enable hardware designers to build lighter systems that last longer and are more mobile. Despite the advantages cloud computing offers to the designers of pervasive systems, there are some limitations of leveraging cloud computing that must be addressed. We take the position that cloud-based pervasive system must be secured holistically and discuss ways this might be accomplished. In this paper, we discuss a pervasive system utilizing cloud computing resources andmore » issues that must be addressed in such a system. In this system, the user's mobile device cannot always have network access to leverage resources from the cloud, so it must make intelligent decisions about what data should be stored locally and what processes should be run locally. As a result of these decisions, the user becomes vulnerable to attacks while interfacing with the pervasive system.« less
NASA Technical Reports Server (NTRS)
Minnis, Patrick; Young, David F.; Sassen, Kenneth; Alvarez, Joseph M.; Grund, Christian J.
1990-01-01
Cirrus cloud radiative and physical characteristics are determined using a combination of ground-based, aircraft, and satellite measurements taken as part of the FIRE Cirrus Intensive Field Observations (IFO) during October and November 1986. Lidar backscatter data are used with rawinsonde data to define cloud base, center, and top heights and the corresponding temperatures. Coincident GOES 4-km visible (0.65 micro-m) and 8-km infrared window (11.5 micro-m) radiances are analyzed to determine cloud emittances and reflectances. Infrared optical depth is computed from the emittance results. Visible optical depth is derived from reflectance using a theoretical ice crystal scattering model and an empirical bidirectional reflectance model. No clouds with visible optical depths greater than 5 or infrared optical depths less than 0.1 were used in the analysis. Average cloud thickness ranged from 0.5 km to 8.0 km for the 71 scenes. Mean vertical beam emittances derived from cloud-center temperatures were 0.62 for all scenes compared to 0.33 for the case study (27-28 October) reflecting the thinner clouds observed for the latter scenes. Relationships between cloud emittance, extinction coefficients, and temperature for the case study are very similar to those derived from earlier surface- based studies. The thicker clouds seen during the other IFO days yield different results. Emittances derived using cloud-top temperature were ratioed to those determined from cloud-center temperature. A nearly linear relationship between these ratios and cloud-center temperature holds promise for determining actual cloud-top temperatures and cloud thicknesses from visible and infrared radiance pairs. The mean ratio of the visible scattering optical depth to the infrared absorption optical depth was 2.13 for these data. This scattering efficiency ratio shows a significant dependence on cloud temperature. Values of mean scattering efficiency as high as 2.6 suggest the presence of small ice particles at temperatures below 230 K. The parameterization of visible reflectance in terms of cloud optical depth and clear-sky reflectance shows promise as a simplified method for interpreting visible satellite data reflected from cirrus clouds. Large uncertainties in the optical parameters due to cloud reflectance anisotropy and shading were found by analyzing data for various solar zenith angles and for simultaneous AVHRR data. Inhomogeneities in the cloud fields result in uneven cloud shading that apparently causes the occurrence of anomalously dark, cloudy pixels in the GOES data. These shading effects complicate the interpretation of the satellite data. The results highlight the need for additional study of cirrus cloud scattering processes and remote sensing techniques.
Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing.
Trudgian, David C; Mirzaei, Hamid
2012-12-07
We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include modular local and remote scheduling for data processing jobs. The pipeline can now be run on a single PC or server, a local cluster, a remote HPC cluster, and/or the Amazon Web Services (AWS) cloud. We provide public images that allow easy deployment of CPFP in its entirety in the AWS cloud. This significantly reduces the effort necessary to use the software, and allows proteomics laboratories to pay for compute time ad hoc, rather than obtaining and maintaining expensive local server clusters. Alternatively the Amazon cloud can be used to increase the throughput of a local installation of CPFP as necessary. We demonstrate that cloud CPFP allows users to process data at higher speed than local installations but with similar cost and lower staff requirements. In addition to the computational improvements, the web interface to CPFP is simplified, and other functionalities are enhanced. The software is under active development at two leading institutions and continues to be released under an open-source license at http://cpfp.sourceforge.net.
A resource management architecture based on complex network theory in cloud computing federation
NASA Astrophysics Data System (ADS)
Zhang, Zehua; Zhang, Xuejie
2011-10-01
Cloud Computing Federation is a main trend of Cloud Computing. Resource Management has significant effect on the design, realization, and efficiency of Cloud Computing Federation. Cloud Computing Federation has the typical characteristic of the Complex System, therefore, we propose a resource management architecture based on complex network theory for Cloud Computing Federation (abbreviated as RMABC) in this paper, with the detailed design of the resource discovery and resource announcement mechanisms. Compare with the existing resource management mechanisms in distributed computing systems, a Task Manager in RMABC can use the historical information and current state data get from other Task Managers for the evolution of the complex network which is composed of Task Managers, thus has the advantages in resource discovery speed, fault tolerance and adaptive ability. The result of the model experiment confirmed the advantage of RMABC in resource discovery performance.
Feasibility of Homomorphic Encryption for Sharing I2B2 Aggregate-Level Data in the Cloud
Raisaro, Jean Louis; Klann, Jeffrey G; Wagholikar, Kavishwar B; Estiri, Hossein; Hubaux, Jean-Pierre; Murphy, Shawn N
2018-01-01
The biomedical community is lagging in the adoption of cloud computing for the management of medical data. The primary obstacles are concerns about privacy and security. In this paper, we explore the feasibility of using advanced privacy-enhancing technologies in order to enable the sharing of sensitive clinical data in a public cloud. Our goal is to facilitate sharing of clinical data in the cloud by minimizing the risk of unintended leakage of sensitive clinical information. In particular, we focus on homomorphic encryption, a specific type of encryption that offers the ability to run computation on the data while the data remains encrypted. This paper demonstrates that homomorphic encryption can be used efficiently to compute aggregating queries on the ciphertexts, along with providing end-to-end confidentiality of aggregate-level data from the i2b2 data model. PMID:29888067
Feasibility of Homomorphic Encryption for Sharing I2B2 Aggregate-Level Data in the Cloud.
Raisaro, Jean Louis; Klann, Jeffrey G; Wagholikar, Kavishwar B; Estiri, Hossein; Hubaux, Jean-Pierre; Murphy, Shawn N
2018-01-01
The biomedical community is lagging in the adoption of cloud computing for the management of medical data. The primary obstacles are concerns about privacy and security. In this paper, we explore the feasibility of using advanced privacy-enhancing technologies in order to enable the sharing of sensitive clinical data in a public cloud. Our goal is to facilitate sharing of clinical data in the cloud by minimizing the risk of unintended leakage of sensitive clinical information. In particular, we focus on homomorphic encryption, a specific type of encryption that offers the ability to run computation on the data while the data remains encrypted. This paper demonstrates that homomorphic encryption can be used efficiently to compute aggregating queries on the ciphertexts, along with providing end-to-end confidentiality of aggregate-level data from the i2b2 data model.
Cloud4Psi: cloud computing for 3D protein structure similarity searching.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-10-01
Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-01-01
Summary: Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Availability and implementation: Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. Contact: dariusz.mrozek@polsl.pl PMID:24930141
Military clouds: utilization of cloud computing systems at the battlefield
NASA Astrophysics Data System (ADS)
Süleyman, Sarıkürk; Volkan, Karaca; İbrahim, Kocaman; Ahmet, Şirzai
2012-05-01
Cloud computing is known as a novel information technology (IT) concept, which involves facilitated and rapid access to networks, servers, data saving media, applications and services via Internet with minimum hardware requirements. Use of information systems and technologies at the battlefield is not new. Information superiority is a force multiplier and is crucial to mission success. Recent advances in information systems and technologies provide new means to decision makers and users in order to gain information superiority. These developments in information technologies lead to a new term, which is known as network centric capability. Similar to network centric capable systems, cloud computing systems are operational today. In the near future extensive use of military clouds at the battlefield is predicted. Integrating cloud computing logic to network centric applications will increase the flexibility, cost-effectiveness, efficiency and accessibility of network-centric capabilities. In this paper, cloud computing and network centric capability concepts are defined. Some commercial cloud computing products and applications are mentioned. Network centric capable applications are covered. Cloud computing supported battlefield applications are analyzed. The effects of cloud computing systems on network centric capability and on the information domain in future warfare are discussed. Battlefield opportunities and novelties which might be introduced to network centric capability by cloud computing systems are researched. The role of military clouds in future warfare is proposed in this paper. It was concluded that military clouds will be indispensible components of the future battlefield. Military clouds have the potential of improving network centric capabilities, increasing situational awareness at the battlefield and facilitating the settlement of information superiority.
NASA Astrophysics Data System (ADS)
Evans, J. D.; Hao, W.; Chettri, S.
2013-12-01
The cloud is proving to be a uniquely promising platform for scientific computing. Our experience with processing satellite data using Amazon Web Services highlights several opportunities for enhanced performance, flexibility, and cost effectiveness in the cloud relative to traditional computing -- for example: - Direct readout from a polar-orbiting satellite such as the Suomi National Polar-Orbiting Partnership (S-NPP) requires bursts of processing a few times a day, separated by quiet periods when the satellite is out of receiving range. In the cloud, by starting and stopping virtual machines in minutes, we can marshal significant computing resources quickly when needed, but not pay for them when not needed. To take advantage of this capability, we are automating a data-driven approach to the management of cloud computing resources, in which new data availability triggers the creation of new virtual machines (of variable size and processing power) which last only until the processing workflow is complete. - 'Spot instances' are virtual machines that run as long as one's asking price is higher than the provider's variable spot price. Spot instances can greatly reduce the cost of computing -- for software systems that are engineered to withstand unpredictable interruptions in service (as occurs when a spot price exceeds the asking price). We are implementing an approach to workflow management that allows data processing workflows to resume with minimal delays after temporary spot price spikes. This will allow systems to take full advantage of variably-priced 'utility computing.' - Thanks to virtual machine images, we can easily launch multiple, identical machines differentiated only by 'user data' containing individualized instructions (e.g., to fetch particular datasets or to perform certain workflows or algorithms) This is particularly useful when (as is the case with S-NPP data) we need to launch many very similar machines to process an unpredictable number of data files concurrently. Our experience shows the viability and flexibility of this approach to workflow management for scientific data processing. - Finally, cloud computing is a promising platform for distributed volunteer ('interstitial') computing, via mechanisms such as the Berkeley Open Infrastructure for Network Computing (BOINC) popularized with the SETI@Home project and others such as ClimatePrediction.net and NASA's Climate@Home. Interstitial computing faces significant challenges as commodity computing shifts from (always on) desktop computers towards smartphones and tablets (untethered and running on scarce battery power); but cloud computing offers significant slack capacity. This capacity includes virtual machines with unused RAM or underused CPUs; virtual storage volumes allocated (& paid for) but not full; and virtual machines that are paid up for the current hour but whose work is complete. We are devising ways to facilitate the reuse of these resources (i.e., cloud-based interstitial computing) for satellite data processing and related analyses. We will present our findings and research directions on these and related topics.
NASA Technical Reports Server (NTRS)
Poehler, H. A.
1977-01-01
For a summer thunderstorm, for which simultaneous, airborne electric field measurements and Lightning Detection and Ranging (LDAR) System data was available, measurements were coordinated to present a picture of the electric field intensity near cloud electrical discharges detected by the LDAR System. Radar precipitation echos from NOAA's 10 cm weather radar and measured airborne electric field intensities were superimposed on LDAR PPI plots to present a coordinated data picture of thunderstorm activity.
The Cloud Area Padovana: from pilot to production
NASA Astrophysics Data System (ADS)
Andreetto, P.; Costa, F.; Crescente, A.; Dorigo, A.; Fantinel, S.; Fanzago, F.; Sgaravatto, M.; Traldi, S.; Verlato, M.; Zangrando, L.
2017-10-01
The Cloud Area Padovana has been running for almost two years. This is an OpenStack-based scientific cloud, spread across two different sites: the INFN Padova Unit and the INFN Legnaro National Labs. The hardware resources have been scaled horizontally and vertically, by upgrading some hypervisors and by adding new ones: currently it provides about 1100 cores. Some in-house developments were also integrated in the OpenStack dashboard, such as a tool for user and project registrations with direct support for the INFN-AAI Identity Provider as a new option for the user authentication. In collaboration with the EU-funded Indigo DataCloud project, the integration with Docker-based containers has been experimented with and will be available in production soon. This computing facility now satisfies the computational and storage demands of more than 70 users affiliated with about 20 research projects. We present here the architecture of this Cloud infrastructure, the tools and procedures used to operate it. We also focus on the lessons learnt in these two years, describing the problems that were found and the corrective actions that had to be applied. We also discuss about the chosen strategy for upgrades, which combines the need to promptly integrate the OpenStack new developments, the demand to reduce the downtimes of the infrastructure, and the need to limit the effort requested for such updates. We also discuss how this Cloud infrastructure is being used. In particular we focus on two big physics experiments which are intensively exploiting this computing facility: CMS and SPES. CMS deployed on the cloud a complex computational infrastructure, composed of several user interfaces for job submission in the Grid environment/local batch queues or for interactive processes; this is fully integrated with the local Tier-2 facility. To avoid a static allocation of the resources, an elastic cluster, based on cernVM, has been configured: it allows to automatically create and delete virtual machines according to the user needs. SPES, using a client-server system called TraceWin, exploits INFN’s virtual resources performing a very large number of simulations on about a thousand nodes elastically managed.
Cloud computing approaches to accelerate drug discovery value chain.
Garg, Vibhav; Arora, Suchir; Gupta, Chitra
2011-12-01
Continued advancements in the area of technology have helped high throughput screening (HTS) evolve from a linear to parallel approach by performing system level screening. Advanced experimental methods used for HTS at various steps of drug discovery (i.e. target identification, target validation, lead identification and lead validation) can generate data of the order of terabytes. As a consequence, there is pressing need to store, manage, mine and analyze this data to identify informational tags. This need is again posing challenges to computer scientists to offer the matching hardware and software infrastructure, while managing the varying degree of desired computational power. Therefore, the potential of "On-Demand Hardware" and "Software as a Service (SAAS)" delivery mechanisms cannot be denied. This on-demand computing, largely referred to as Cloud Computing, is now transforming the drug discovery research. Also, integration of Cloud computing with parallel computing is certainly expanding its footprint in the life sciences community. The speed, efficiency and cost effectiveness have made cloud computing a 'good to have tool' for researchers, providing them significant flexibility, allowing them to focus on the 'what' of science and not the 'how'. Once reached to its maturity, Discovery-Cloud would fit best to manage drug discovery and clinical development data, generated using advanced HTS techniques, hence supporting the vision of personalized medicine.
NASA Astrophysics Data System (ADS)
Nguyen, L.; Chee, T.; Minnis, P.; Palikonda, R.; Smith, W. L., Jr.; Spangenberg, D.
2016-12-01
The NASA LaRC Satellite ClOud and Radiative Property retrieval System (SatCORPS) processes and derives near real-time (NRT) global cloud products from operational geostationary satellite imager datasets. These products are being used in NRT to improve forecast model, aircraft icing warnings, and support aircraft field campaigns. Next generation satellites, such as the Japanese Himawari-8 and the upcoming NOAA GOES-R, present challenges for NRT data processing and product dissemination due to the increase in temporal and spatial resolution. The volume of data is expected to increase to approximately 10 folds. This increase in data volume will require additional IT resources to keep up with the processing demands to satisfy NRT requirements. In addition, these resources are not readily available due to cost and other technical limitations. To anticipate and meet these computing resource requirements, we have employed a hybrid cloud computing environment to augment the generation of SatCORPS products. This paper will describe the workflow to ingest, process, and distribute SatCORPS products and the technologies used. Lessons learn from working on both AWS Clouds and GovCloud will be discussed: benefits, similarities, and differences that could impact decision to use cloud computing and storage. A detail cost analysis will be presented. In addition, future cloud utilization, parallelization, and architecture layout will be discussed for GOES-R.
A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks.
Yue, Kun; Fang, Qiyu; Wang, Xiaoling; Li, Jin; Liu, Weiyi
2015-12-01
Bayesian network (BN) has been adopted as the underlying model for representing and inferring uncertain knowledge. As the basis of realistic applications centered on probabilistic inferences, learning a BN from data is a critical subject of machine learning, artificial intelligence, and big data paradigms. Currently, it is necessary to extend the classical methods for learning BNs with respect to data-intensive computing or in cloud environments. In this paper, we propose a parallel and incremental approach for data-intensive learning of BNs from massive, distributed, and dynamically changing data by extending the classical scoring and search algorithm and using MapReduce. First, we adopt the minimum description length as the scoring metric and give the two-pass MapReduce-based algorithms for computing the required marginal probabilities and scoring the candidate graphical model from sample data. Then, we give the corresponding strategy for extending the classical hill-climbing algorithm to obtain the optimal structure, as well as that for storing a BN by
NASA Astrophysics Data System (ADS)
Zhou, Jianfeng; Xu, Benda; Peng, Chuan; Yang, Yang; Huo, Zhuoxi
2015-08-01
AIRE-Linux is a dedicated Linux system for astronomers. Modern astronomy faces two big challenges: massive observed raw data which covers the whole electromagnetic spectrum, and overmuch professional data processing skill which exceeds personal or even a small team's abilities. AIRE-Linux, which is a specially designed Linux and will be distributed to users by Virtual Machine (VM) images in Open Virtualization Format (OVF), is to help astronomers confront the challenges. Most astronomical software packages, such as IRAF, MIDAS, CASA, Heasoft etc., will be integrated into AIRE-Linux. It is easy for astronomers to configure and customize the system and use what they just need. When incorporated into cloud computing platforms, AIRE-Linux will be able to handle data intensive and computing consuming tasks for astronomers. Currently, a Beta version of AIRE-Linux is ready for download and testing.
Atlas2 Cloud: a framework for personal genome analysis in the cloud
2012-01-01
Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms. PMID:23134663
Atlas2 Cloud: a framework for personal genome analysis in the cloud.
Evani, Uday S; Challis, Danny; Yu, Jin; Jackson, Andrew R; Paithankar, Sameer; Bainbridge, Matthew N; Jakkamsetti, Adinarayana; Pham, Peter; Coarfa, Cristian; Milosavljevic, Aleksandar; Yu, Fuli
2012-01-01
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Cotes-Ruiz, Iván Tomás; Prado, Rocío P.; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás
2017-01-01
Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique. PMID:28085932
Cotes-Ruiz, Iván Tomás; Prado, Rocío P; García-Galán, Sebastián; Muñoz-Expósito, José Enrique; Ruiz-Reyes, Nicolás
2017-01-01
Nowadays, the growing computational capabilities of Cloud systems rely on the reduction of the consumed power of their data centers to make them sustainable and economically profitable. The efficient management of computing resources is at the heart of any energy-aware data center and of special relevance is the adaptation of its performance to workload. Intensive computing applications in diverse areas of science generate complex workload called workflows, whose successful management in terms of energy saving is still at its beginning. WorkflowSim is currently one of the most advanced simulators for research on workflows processing, offering advanced features such as task clustering and failure policies. In this work, an expected power-aware extension of WorkflowSim is presented. This new tool integrates a power model based on a computing-plus-communication design to allow the optimization of new management strategies in energy saving considering computing, reconfiguration and networks costs as well as quality of service, and it incorporates the preeminent strategy for on host energy saving: Dynamic Voltage Frequency Scaling (DVFS). The simulator is designed to be consistent in different real scenarios and to include a wide repertory of DVFS governors. Results showing the validity of the simulator in terms of resources utilization, frequency and voltage scaling, power, energy and time saving are presented. Also, results achieved by the intra-host DVFS strategy with different governors are compared to those of the data center using a recent and successful DVFS-based inter-host scheduling strategy as overlapped mechanism to the DVFS intra-host technique.
A Systematic Literature Mapping of Risk Analysis of Big Data in Cloud Computing Environment
NASA Astrophysics Data System (ADS)
Bee Yusof Ali, Hazirah; Marziana Abdullah, Lili; Kartiwi, Mira; Nordin, Azlin; Salleh, Norsaremah; Sham Awang Abu Bakar, Normi
2018-05-01
This paper investigates previous literature that focusses on the three elements: risk assessment, big data and cloud. We use a systematic literature mapping method to search for journals and proceedings. The systematic literature mapping process is utilized to get a properly screened and focused literature. With the help of inclusion and exclusion criteria, the search of literature is further narrowed. Classification helps us in grouping the literature into categories. At the end of the mapping, gaps can be seen. The gap is where our focus should be in analysing risk of big data in cloud computing environment. Thus, a framework of how to assess the risk of security, privacy and trust associated with big data and cloud computing environment is highly needed.
Cloud Computing for Mission Design and Operations
NASA Technical Reports Server (NTRS)
Arrieta, Juan; Attiyah, Amy; Beswick, Robert; Gerasimantos, Dimitrios
2012-01-01
The space mission design and operations community already recognizes the value of cloud computing and virtualization. However, natural and valid concerns, like security, privacy, up-time, and vendor lock-in, have prevented a more widespread and expedited adoption into official workflows. In the interest of alleviating these concerns, we propose a series of guidelines for internally deploying a resource-oriented hub of data and algorithms. These guidelines provide a roadmap for implementing an architecture inspired in the cloud computing model: associative, elastic, semantical, interconnected, and adaptive. The architecture can be summarized as exposing data and algorithms as resource-oriented Web services, coordinated via messaging, and running on virtual machines; it is simple, and based on widely adopted standards, protocols, and tools. The architecture may help reduce common sources of complexity intrinsic to data-driven, collaborative interactions and, most importantly, it may provide the means for teams and agencies to evaluate the cloud computing model in their specific context, with minimal infrastructure changes, and before committing to a specific cloud services provider.
Behavior Life Style Analysis for Mobile Sensory Data in Cloud Computing through MapReduce
Hussain, Shujaat; Bang, Jae Hun; Han, Manhyung; Ahmed, Muhammad Idris; Amin, Muhammad Bilal; Lee, Sungyoung; Nugent, Chris; McClean, Sally; Scotney, Bryan; Parr, Gerard
2014-01-01
Cloud computing has revolutionized healthcare in today's world as it can be seamlessly integrated into a mobile application and sensor devices. The sensory data is then transferred from these devices to the public and private clouds. In this paper, a hybrid and distributed environment is built which is capable of collecting data from the mobile phone application and store it in the cloud. We developed an activity recognition application and transfer the data to the cloud for further processing. Big data technology Hadoop MapReduce is employed to analyze the data and create user timeline of user's activities. These activities are visualized to find useful health analytics and trends. In this paper a big data solution is proposed to analyze the sensory data and give insights into user behavior and lifestyle trends. PMID:25420151
Behavior life style analysis for mobile sensory data in cloud computing through MapReduce.
Hussain, Shujaat; Bang, Jae Hun; Han, Manhyung; Ahmed, Muhammad Idris; Amin, Muhammad Bilal; Lee, Sungyoung; Nugent, Chris; McClean, Sally; Scotney, Bryan; Parr, Gerard
2014-11-20
Cloud computing has revolutionized healthcare in today's world as it can be seamlessly integrated into a mobile application and sensor devices. The sensory data is then transferred from these devices to the public and private clouds. In this paper, a hybrid and distributed environment is built which is capable of collecting data from the mobile phone application and store it in the cloud. We developed an activity recognition application and transfer the data to the cloud for further processing. Big data technology Hadoop MapReduce is employed to analyze the data and create user timeline of user's activities. These activities are visualized to find useful health analytics and trends. In this paper a big data solution is proposed to analyze the sensory data and give insights into user behavior and lifestyle trends.
Evolution of the ATLAS PanDA workload management system for exascale computational science
NASA Astrophysics Data System (ADS)
Maeno, T.; De, K.; Klimentov, A.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.; Yu, D.; Atlas Collaboration
2014-06-01
An important foundation underlying the impressive success of data processing and analysis in the ATLAS experiment [1] at the LHC [2] is the Production and Distributed Analysis (PanDA) workload management system [3]. PanDA was designed specifically for ATLAS and proved to be highly successful in meeting all the distributed computing needs of the experiment. However, the core design of PanDA is not experiment specific. The PanDA workload management system is capable of meeting the needs of other data intensive scientific applications. Alpha-Magnetic Spectrometer [4], an astro-particle experiment on the International Space Station, and the Compact Muon Solenoid [5], an LHC experiment, have successfully evaluated PanDA and are pursuing its adoption. In this paper, a description of the new program of work to develop a generic version of PanDA will be given, as well as the progress in extending PanDA's capabilities to support supercomputers and clouds and to leverage intelligent networking. PanDA has demonstrated at a very large scale the value of automated dynamic brokering of diverse workloads across distributed computing resources. The next generation of PanDA will allow other data-intensive sciences and a wider exascale community employing a variety of computing platforms to benefit from ATLAS' experience and proven tools.
Arnold, Robert W; Jacob, Jack; Matrix, Zinnia
2012-01-01
Screening by neonatologists and staging by ophthalmologists is a cost-effective intervention, but inadvertent missed examinations create a high liability. Paper tracking, bedside schedule reminders, and a computer scheduling and reminder program were compared for speed of input and retrospective missed examination rate. A neonatal intensive care unit (NICU) process was then programmed for cloud-based distribution for inpatient and outpatient retinopathy of prematurity monitoring. Over 11 years, 367 premature infants in one NICU were prospectively monitored. The initial paper system missed 11% of potential examinations, the Windows server-based system missed 2%, and the current cloud-based system missed 0% of potential inpatient and outpatient examinations. Computer input of examinations took the same or less time than paper recording. A computer application with a deliberate NICU process improved the proportion of eligible neonates getting their scheduled eye examinations in a timely manner. Copyright 2012, SLACK Incorporated.
Are Cloud Environments Ready for Scientific Applications?
NASA Astrophysics Data System (ADS)
Mehrotra, P.; Shackleford, K.
2011-12-01
Cloud computing environments are becoming widely available both in the commercial and government sectors. They provide flexibility to rapidly provision resources in order to meet dynamic and changing computational needs without the customers incurring capital expenses and/or requiring technical expertise. Clouds also provide reliable access to resources even though the end-user may not have in-house expertise for acquiring or operating such resources. Consolidation and pooling in a cloud environment allow organizations to achieve economies of scale in provisioning or procuring computing resources and services. Because of these and other benefits, many businesses and organizations are migrating their business applications (e.g., websites, social media, and business processes) to cloud environments-evidenced by the commercial success of offerings such as the Amazon EC2. In this paper, we focus on the feasibility of utilizing cloud environments for scientific workloads and workflows particularly of interest to NASA scientists and engineers. There is a wide spectrum of such technical computations. These applications range from small workstation-level computations to mid-range computing requiring small clusters to high-performance simulations requiring supercomputing systems with high bandwidth/low latency interconnects. Data-centric applications manage and manipulate large data sets such as satellite observational data and/or data previously produced by high-fidelity modeling and simulation computations. Most of the applications are run in batch mode with static resource requirements. However, there do exist situations that have dynamic demands, particularly ones with public-facing interfaces providing information to the general public, collaborators and partners, as well as to internal NASA users. In the last few months we have been studying the suitability of cloud environments for NASA's technical and scientific workloads. We have ported several applications to multiple cloud environments including NASA's Nebula environment, Amazon's EC2, Magellan at NERSC, and SGI's Cyclone system. We critically examined the performance of the applications on these systems. We also collected information on the usability of these cloud environments. In this talk we will present the results of our study focusing on the efficacy of using clouds for NASA's scientific applications.
Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing.
Shatil, Anwar S; Younas, Sohail; Pourreza, Hossein; Figley, Chase R
2015-01-01
With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications.
Now and next-generation sequencing techniques: future of sequence analysis using cloud computing.
Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav
2012-01-01
Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed "cloud computing") has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows.
Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing
Shatil, Anwar S.; Younas, Sohail; Pourreza, Hossein; Figley, Chase R.
2015-01-01
With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications. PMID:27279746
Infrastructures for Distributed Computing: the case of BESIII
NASA Astrophysics Data System (ADS)
Pellegrino, J.
2018-05-01
The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.
NASA Astrophysics Data System (ADS)
Nguyen, L.; Chee, T.; Palikonda, R.; Smith, W. L., Jr.; Bedka, K. M.; Spangenberg, D.; Vakhnin, A.; Lutz, N. E.; Walter, J.; Kusterer, J.
2017-12-01
Cloud Computing offers new opportunities for large-scale scientific data producers to utilize Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) IT resources to process and deliver data products in an operational environment where timely delivery, reliability, and availability are critical. The NASA Langley Research Center Atmospheric Science Data Center (ASDC) is building and testing a private and public facing cloud for users in the Science Directorate to utilize as an everyday production environment. The NASA SatCORPS (Satellite ClOud and Radiation Property Retrieval System) team processes and derives near real-time (NRT) global cloud products from operational geostationary (GEO) satellite imager datasets. To deliver these products, we will utilize the public facing cloud and OpenShift to deploy a load-balanced webserver for data storage, access, and dissemination. The OpenStack private cloud will host data ingest and computational capabilities for SatCORPS processing. This paper will discuss the SatCORPS migration towards, and usage of, the ASDC Cloud Services in an operational environment. Detailed lessons learned from use of prior cloud providers, specifically the Amazon Web Services (AWS) GovCloud and the Government Cloud administered by the Langley Managed Cloud Environment (LMCE) will also be discussed.
Applications integration in a hybrid cloud computing environment: modelling and platform
NASA Astrophysics Data System (ADS)
Li, Qing; Wang, Ze-yuan; Li, Wei-hua; Li, Jun; Wang, Cheng; Du, Rui-yang
2013-08-01
With the development of application services providers and cloud computing, more and more small- and medium-sized business enterprises use software services and even infrastructure services provided by professional information service companies to replace all or part of their information systems (ISs). These information service companies provide applications, such as data storage, computing processes, document sharing and even management information system services as public resources to support the business process management of their customers. However, no cloud computing service vendor can satisfy the full functional IS requirements of an enterprise. As a result, enterprises often have to simultaneously use systems distributed in different clouds and their intra enterprise ISs. Thus, this article presents a framework to integrate applications deployed in public clouds and intra ISs. A run-time platform is developed and a cross-computing environment process modelling technique is also developed to improve the feasibility of ISs under hybrid cloud computing environments.
Unidata's Vision for Transforming Geoscience by Moving Data Services and Software to the Cloud
NASA Astrophysics Data System (ADS)
Ramamurthy, Mohan; Fisher, Ward; Yoksas, Tom
2015-04-01
Universities are facing many challenges: shrinking budgets, rapidly evolving information technologies, exploding data volumes, multidisciplinary science requirements, and high expectations from students who have grown up with smartphones and tablets. These changes are upending traditional approaches to accessing and using data and software. Unidata recognizes that its products and services must evolve to support new approaches to research and education. After years of hype and ambiguity, cloud computing is maturing in usability in many areas of science and education, bringing the benefits of virtualized and elastic remote services to infrastructure, software, computation, and data. Cloud environments reduce the amount of time and money spent to procure, install, and maintain new hardware and software, and reduce costs through resource pooling and shared infrastructure. Cloud services aimed at providing any resource, at any time, from any place, using any device are increasingly being embraced by all types of organizations. Given this trend and the enormous potential of cloud-based services, Unidata is taking moving to augment its products, services, data delivery mechanisms and applications to align with the cloud-computing paradigm. Specifically, Unidata is working toward establishing a community-based development environment that supports the creation and use of software services to build end-to-end data workflows. The design encourages the creation of services that can be broken into small, independent chunks that provide simple capabilities. Chunks could be used individually to perform a task, or chained into simple or elaborate workflows. The services will also be portable in the form of downloadable Unidata-in-a-box virtual images, allowing their use in researchers' own cloud-based computing environments. In this talk, we present a vision for Unidata's future in a cloud-enabled data services and discuss our ongoing efforts to deploy a suite of Unidata data services and tools in the Amazon EC2 and Microsoft Azure cloud environments, including the transfer of real-time meteorological data into its cloud instances, product generation using those data, and the deployment of TDS, McIDAS ADDE and AWIPS II data servers and the Integrated Data Server visualization tool.
NASA Technical Reports Server (NTRS)
Berendes, Todd; Sengupta, Sailes K.; Welch, Ron M.; Wielicki, Bruce A.; Navar, Murgesh
1992-01-01
A semiautomated methodology is developed for estimating cumulus cloud base heights on the basis of high spatial resolution Landsat MSS data, using various image-processing techniques to match cloud edges with their corresponding shadow edges. The cloud base height is then estimated by computing the separation distance between the corresponding generalized Hough transform reference points. The differences between the cloud base heights computed by these means and a manual verification technique are of the order of 100 m or less; accuracies of 50-70 m may soon be possible via EOS instruments.
Archive Management of NASA Earth Observation Data to Support Cloud Analysis
NASA Technical Reports Server (NTRS)
Lynnes, Christopher; Baynes, Kathleen; McInerney, Mark
2017-01-01
NASA collects, processes and distributes petabytes of Earth Observation (EO) data from satellites, aircraft, in situ instruments and model output, with an order of magnitude increase expected by 2024. Cloud-based web object storage (WOS) of these data can simplify the execution of such an increase. More importantly, it can also facilitate user analysis of those volumes by making the data available to the massively parallel computing power in the cloud. However, storing EO data in cloud WOS has a ripple effect throughout the NASA archive system with unexpected challenges and opportunities. One challenge is modifying data servicing software (such as Web Coverage Service servers) to access and subset data that are no longer on a directly accessible file system, but rather in cloud WOS. Opportunities include refactoring of the archive software to a cloud-native architecture; virtualizing data products by computing on demand; and reorganizing data to be more analysis-friendly. Reviewed by Mark McInerney ESDIS Deputy Project Manager.
Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.
Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507
Operating Dedicated Data Centers - Is It Cost-Effective?
NASA Astrophysics Data System (ADS)
Ernst, M.; Hogue, R.; Hollowell, C.; Strecker-Kellog, W.; Wong, A.; Zaytsev, A.
2014-06-01
The advent of cloud computing centres such as Amazon's EC2 and Google's Computing Engine has elicited comparisons with dedicated computing clusters. Discussions on appropriate usage of cloud resources (both academic and commercial) and costs have ensued. This presentation discusses a detailed analysis of the costs of operating and maintaining the RACF (RHIC and ATLAS Computing Facility) compute cluster at Brookhaven National Lab and compares them with the cost of cloud computing resources under various usage scenarios. An extrapolation of likely future cost effectiveness of dedicated computing resources is also presented.
Intensity-corrected Herschel Observations of Nearby Isolated Low-mass Clouds
NASA Astrophysics Data System (ADS)
Sadavoy, Sarah I.; Keto, Eric; Bourke, Tyler L.; Dunham, Michael M.; Myers, Philip C.; Stephens, Ian W.; Di Francesco, James; Webb, Kristi; Stutz, Amelia M.; Launhardt, Ralf; Tobin, John J.
2018-01-01
We present intensity-corrected Herschel maps at 100, 160, 250, 350, and 500 μm for 56 isolated low-mass clouds. We determine the zero-point corrections for Herschel Photodetector Array Camera and Spectrometer (PACS) and Spectral Photometric Imaging Receiver (SPIRE) maps from the Herschel Science Archive (HSA) using Planck data. Since these HSA maps are small, we cannot correct them using typical methods. Here we introduce a technique to measure the zero-point corrections for small Herschel maps. We use radial profiles to identify offsets between the observed HSA intensities and the expected intensities from Planck. Most clouds have reliable offset measurements with this technique. In addition, we find that roughly half of the clouds have underestimated HSA-SPIRE intensities in their outer envelopes relative to Planck, even though the HSA-SPIRE maps were previously zero-point corrected. Using our technique, we produce corrected Herschel intensity maps for all 56 clouds and determine their line-of-sight average dust temperatures and optical depths from modified blackbody fits. The clouds have typical temperatures of ∼14–20 K and optical depths of ∼10‑5–10‑3. Across the whole sample, we find an anticorrelation between temperature and optical depth. We also find lower temperatures than what was measured in previous Herschel studies, which subtracted out a background level from their intensity maps to circumvent the zero-point correction. Accurate Herschel observations of clouds are key to obtaining accurate density and temperature profiles. To make such future analyses possible, intensity-corrected maps for all 56 clouds are publicly available in the electronic version. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.
Flexible services for the support of research.
Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John
2013-01-28
Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
Computational provenance in hydrologic science: a snow mapping example.
Dozier, Jeff; Frew, James
2009-03-13
Computational provenance--a record of the antecedents and processing history of digital information--is key to properly documenting computer-based scientific research. To support investigations in hydrologic science, we produce the daily fractional snow-covered area from NASA's moderate-resolution imaging spectroradiometer (MODIS). From the MODIS reflectance data in seven wavelengths, we estimate the fraction of each 500 m pixel that snow covers. The daily products have data gaps and errors because of cloud cover and sensor viewing geometry, so we interpolate and smooth to produce our best estimate of the daily snow cover. To manage the data, we have developed the Earth System Science Server (ES3), a software environment for data-intensive Earth science, with unique capabilities for automatically and transparently capturing and managing the provenance of arbitrary computations. Transparent acquisition avoids the scientists having to express their computations in specific languages or schemas in order for provenance to be acquired and maintained. ES3 models provenance as relationships between processes and their input and output files. It is particularly suited to capturing the provenance of an evolving algorithm whose components span multiple languages and execution environments.
Automatic Extraction of Road Markings from Mobile Laser-Point Cloud Using Intensity Data
NASA Astrophysics Data System (ADS)
Yao, L.; Chen, Q.; Qin, C.; Wu, H.; Zhang, S.
2018-04-01
With the development of intelligent transportation, road's high precision information data has been widely applied in many fields. This paper proposes a concise and practical way to extract road marking information from point cloud data collected by mobile mapping system (MMS). The method contains three steps. Firstly, road surface is segmented through edge detection from scan lines. Then the intensity image is generated by inverse distance weighted (IDW) interpolation and the road marking is extracted by using adaptive threshold segmentation based on integral image without intensity calibration. Moreover, the noise is reduced by removing a small number of plaque pixels from binary image. Finally, point cloud mapped from binary image is clustered into marking objects according to Euclidean distance, and using a series of algorithms including template matching and feature attribute filtering for the classification of linear markings, arrow markings and guidelines. Through processing the point cloud data collected by RIEGL VUX-1 in case area, the results show that the F-score of marking extraction is 0.83, and the average classification rate is 0.9.
Tools for Analyzing Computing Resource Management Strategies and Algorithms for SDR Clouds
NASA Astrophysics Data System (ADS)
Marojevic, Vuk; Gomez-Miguelez, Ismael; Gelonch, Antoni
2012-09-01
Software defined radio (SDR) clouds centralize the computing resources of base stations. The computing resource pool is shared between radio operators and dynamically loads and unloads digital signal processing chains for providing wireless communications services on demand. Each new user session request particularly requires the allocation of computing resources for executing the corresponding SDR transceivers. The huge amount of computing resources of SDR cloud data centers and the numerous session requests at certain hours of a day require an efficient computing resource management. We propose a hierarchical approach, where the data center is divided in clusters that are managed in a distributed way. This paper presents a set of computing resource management tools for analyzing computing resource management strategies and algorithms for SDR clouds. We use the tools for evaluating a different strategies and algorithms. The results show that more sophisticated algorithms can achieve higher resource occupations and that a tradeoff exists between cluster size and algorithm complexity.
Federated data storage and management infrastructure
NASA Astrophysics Data System (ADS)
Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.
2016-10-01
The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.
OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.
2016-12-01
The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time-series. The presentation will cover the architecture of OceanXtremes, parallelization of the climatology computation and anomaly detection algorithms using Spark, example results for SST and other time-series, and parallel performance metrics.
Large-Scale, Multi-Sensor Atmospheric Data Fusion Using Hybrid Cloud Computing
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.
2015-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, MODIS, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. HySDS is a Hybrid-Cloud Science Data System that has been developed and applied under NASA AIST, MEaSUREs, and ACCESS grants. HySDS uses the SciFlow workflow engine to partition analysis workflows into parallel tasks (e.g. segmenting by time or space) that are pushed into a durable job queue. The tasks are "pulled" from the queue by worker Virtual Machines (VM's) and executed in an on-premise Cloud (Eucalyptus or OpenStack) or at Amazon in the public Cloud or govCloud. In this way, years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the transferred data. We are using HySDS to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a MEASURES grant. We will present the architecture of HySDS, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. Our system demonstrates how one can pull A-Train variables (Levels 2 & 3) on-demand into the Amazon Cloud, and cache only those variables that are heavily used, so that any number of compute jobs can be executed "near" the multi-sensor data. Decade-long, multi-sensor studies can be performed without pre-staging data, with the researcher paying only his own Cloud compute bill.
Cloud-based Web Services for Near-Real-Time Web access to NPP Satellite Imagery and other Data
NASA Astrophysics Data System (ADS)
Evans, J. D.; Valente, E. G.
2010-12-01
We are building a scalable, cloud computing-based infrastructure for Web access to near-real-time data products synthesized from the U.S. National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP) and other geospatial and meteorological data. Given recent and ongoing changes in the the NPP and NPOESS programs (now Joint Polar Satellite System), the need for timely delivery of NPP data is urgent. We propose an alternative to a traditional, centralized ground segment, using distributed Direct Broadcast facilities linked to industry-standard Web services by a streamlined processing chain running in a scalable cloud computing environment. Our processing chain, currently implemented on Amazon.com's Elastic Compute Cloud (EC2), retrieves raw data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) and synthesizes data products such as Sea-Surface Temperature, Vegetation Indices, etc. The cloud computing approach lets us grow and shrink computing resources to meet large and rapid fluctuations (twice daily) in both end-user demand and data availability from polar-orbiting sensors. Early prototypes have delivered various data products to end-users with latencies between 6 and 32 minutes. We have begun to replicate machine instances in the cloud, so as to reduce latency and maintain near-real time data access regardless of increased data input rates or user demand -- all at quite moderate monthly costs. Our service-based approach (in which users invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored and composite (e.g., false-color multiband) products on demand. To facilitate broad impact and adoption of our technology, we have emphasized open, industry-standard software interfaces and open source software. Through our work, we envision the widespread establishment of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sensors. A scalable architecture based on cloud computing ensures cost-effective, real-time processing and delivery of NPP and other data. Access via standard Web services maximizes its interoperability and usefulness.
An Analysis of Cloud Computing with Amazon Web Services for the Atmospheric Science Data Center
NASA Astrophysics Data System (ADS)
Gleason, J. L.; Little, M. M.
2013-12-01
NASA science and engineering efforts rely heavily on compute and data handling systems. The nature of NASA science data is such that it is not restricted to NASA users, instead it is widely shared across a globally distributed user community including scientists, educators, policy decision makers, and the public. Therefore NASA science computing is a candidate use case for cloud computing where compute resources are outsourced to an external vendor. Amazon Web Services (AWS) is a commercial cloud computing service developed to use excess computing capacity at Amazon, and potentially provides an alternative to costly and potentially underutilized dedicated acquisitions whenever NASA scientists or engineers require additional data processing. AWS desires to provide a simplified avenue for NASA scientists and researchers to share large, complex data sets with external partners and the public. AWS has been extensively used by JPL for a wide range of computing needs and was previously tested on a NASA Agency basis during the Nebula testing program. Its ability to support the Langley Science Directorate needs to be evaluated by integrating it with real world operational needs across NASA and the associated maturity that would come with that. The strengths and weaknesses of this architecture and its ability to support general science and engineering applications has been demonstrated during the previous testing. The Langley Office of the Chief Information Officer in partnership with the Atmospheric Sciences Data Center (ASDC) has established a pilot business interface to utilize AWS cloud computing resources on a organization and project level pay per use model. This poster discusses an effort to evaluate the feasibility of the pilot business interface from a project level perspective by specifically using a processing scenario involving the Clouds and Earth's Radiant Energy System (CERES) project.
Data distribution method of workflow in the cloud environment
NASA Astrophysics Data System (ADS)
Wang, Yong; Wu, Junjuan; Wang, Ying
2017-08-01
Cloud computing for workflow applications provides the required high efficiency calculation and large storage capacity and it also brings challenges to the protection of trade secrets and other privacy data. Because of privacy data will cause the increase of the data transmission time, this paper presents a new data allocation algorithm based on data collaborative damage degree, to improve the existing data allocation strategy? Safety and public cloud computer algorithm depends on the private cloud; the static allocation method in the initial stage only to the non-confidential data division to improve the original data, in the operational phase will continue to generate data to dynamically adjust the data distribution scheme. The experimental results show that the improved method is effective in reducing the data transmission time.
Performance Comparison of Big Data Analytics With NEXUS and Giovanni
NASA Astrophysics Data System (ADS)
Jacob, J. C.; Huang, T.; Lynnes, C.
2016-12-01
NEXUS is an emerging data-intensive analysis framework developed with a new approach for handling science data that enables large-scale data analysis. It is available through open source. We compare performance of NEXUS and Giovanni for 3 statistics algorithms applied to NASA datasets. Giovanni is a statistics web service at NASA Distributed Active Archive Centers (DAACs). NEXUS is a cloud-computing environment developed at JPL and built on Apache Solr, Cassandra, and Spark. We compute global time-averaged map, correlation map, and area-averaged time series. The first two algorithms average over time to produce a value for each pixel in a 2-D map. The third algorithm averages spatially to produce a single value for each time step. This talk is our report on benchmark comparison findings that indicate 15x speedup with NEXUS over Giovanni to compute area-averaged time series of daily precipitation rate for the Tropical Rainfall Measuring Mission (TRMM with 0.25 degree spatial resolution) for the Continental United States over 14 years (2000-2014) with 64-way parallelism and 545 tiles per granule. 16-way parallelism with 16 tiles per granule worked best with NEXUS for computing an 18-year (1998-2015) TRMM daily precipitation global time averaged map (2.5 times speedup) and 18-year global map of correlation between TRMM daily precipitation and TRMM real time daily precipitation (7x speedup). These and other benchmark results will be presented along with key lessons learned in applying the NEXUS tiling approach to big data analytics in the cloud.
Privacy-Preserving Location-Based Service Scheme for Mobile Sensing Data.
Xie, Qingqing; Wang, Liangmin
2016-11-25
With the wide use of mobile sensing application, more and more location-embedded data are collected and stored in mobile clouds, such as iCloud, Samsung cloud, etc. Using these data, the cloud service provider (CSP) can provide location-based service (LBS) for users. However, the mobile cloud is untrustworthy. The privacy concerns force the sensitive locations to be stored on the mobile cloud in an encrypted form. However, this brings a great challenge to utilize these data to provide efficient LBS. To solve this problem, we propose a privacy-preserving LBS scheme for mobile sensing data, based on the RSA (for Rivest, Shamir and Adleman) algorithm and ciphertext policy attribute-based encryption (CP-ABE) scheme. The mobile cloud can perform location distance computing and comparison efficiently for authorized users, without location privacy leakage. In the end, theoretical security analysis and experimental evaluation demonstrate that our scheme is secure against the chosen plaintext attack (CPA) and efficient enough for practical applications in terms of user side computation overhead.
Privacy-Preserving Location-Based Service Scheme for Mobile Sensing Data †
Xie, Qingqing; Wang, Liangmin
2016-01-01
With the wide use of mobile sensing application, more and more location-embedded data are collected and stored in mobile clouds, such as iCloud, Samsung cloud, etc. Using these data, the cloud service provider (CSP) can provide location-based service (LBS) for users. However, the mobile cloud is untrustworthy. The privacy concerns force the sensitive locations to be stored on the mobile cloud in an encrypted form. However, this brings a great challenge to utilize these data to provide efficient LBS. To solve this problem, we propose a privacy-preserving LBS scheme for mobile sensing data, based on the RSA (for Rivest, Shamir and Adleman) algorithm and ciphertext policy attribute-based encryption (CP-ABE) scheme. The mobile cloud can perform location distance computing and comparison efficiently for authorized users, without location privacy leakage. In the end, theoretical security analysis and experimental evaluation demonstrate that our scheme is secure against the chosen plaintext attack (CPA) and efficient enough for practical applications in terms of user side computation overhead. PMID:27897984
Quantitative analysis of night skyglow amplification under cloudy conditions
NASA Astrophysics Data System (ADS)
Kocifaj, Miroslav; Solano Lamphar, Héctor Antonio
2014-10-01
The radiance produced by artificial light is a major source of nighttime over-illumination. It can, however, be treated experimentally using ground-based and satellite data. These two types of data complement each other and together have a high information content. For instance, the satellite data enable upward light emissions to be normalized, and this in turn allows skyglow levels at the ground to be modelled under cloudy or overcast conditions. Excessive night lighting imposes an unacceptable burden on nature, humans and professional astronomy. For this reason, there is a pressing need to determine the total amount of downwelling diffuse radiation. Undoubtedly, cloudy periods can cause a significant increase in skyglow as a result of amplification owing to diffuse reflection from clouds. While it is recognized that the amplification factor (AF) varies with cloud cover, the effects of different types of clouds, of atmospheric turbidity and of the geometrical relationships between the positions of an individual observer, the cloud layer, and the light source are in general poorly known. In this paper the AF is quantitatively analysed considering different aerosol optical depths (AODs), urban layout sizes and cloud types with specific albedos and altitudes. The computational results show that the AF peaks near the edges of a city rather than at its centre. In addition, the AF appears to be a decreasing function of AOD, which is particularly important when modelling the skyglow in regions with apparent temporal or seasonal variability of atmospheric turbidity. The findings in this paper will be useful to those designing engineering applications or modelling light pollution, as well as to astronomers and environmental scientists who aim to predict the amplification of skyglow caused by clouds. In addition, the semi-analytical formulae can be used to estimate the AF levels, especially in densely populated metropolitan regions for which detailed computations may be CPU-intensive. These new results are of theoretical and experimental significance as they will motivate experimentalists to collect data from various regions to build an overall picture of the AF, and will encourage modellers to test the consistency with theoretical predictions.
Consolidation of cloud computing in ATLAS
NASA Astrophysics Data System (ADS)
Taylor, Ryan P.; Domingues Cordeiro, Cristovao Jose; Giordano, Domenico; Hover, John; Kouba, Tomas; Love, Peter; McNab, Andrew; Schovancova, Jaroslava; Sobie, Randall; ATLAS Collaboration
2017-10-01
Throughout the first half of LHC Run 2, ATLAS cloud computing has undergone a period of consolidation, characterized by building upon previously established systems, with the aim of reducing operational effort, improving robustness, and reaching higher scale. This paper describes the current state of ATLAS cloud computing. Cloud activities are converging on a common contextualization approach for virtual machines, and cloud resources are sharing monitoring and service discovery components. We describe the integration of Vacuum resources, streamlined usage of the Simulation at Point 1 cloud for offline processing, extreme scaling on Amazon compute resources, and procurement of commercial cloud capacity in Europe. Finally, building on the previously established monitoring infrastructure, we have deployed a real-time monitoring and alerting platform which coalesces data from multiple sources, provides flexible visualization via customizable dashboards, and issues alerts and carries out corrective actions in response to problems.
CloudNeo: a cloud pipeline for identifying patient-specific tumor neoantigens.
Bais, Preeti; Namburi, Sandeep; Gatti, Daniel M; Zhang, Xinyu; Chuang, Jeffrey H
2017-10-01
We present CloudNeo, a cloud-based computational workflow for identifying patient-specific tumor neoantigens from next generation sequencing data. Tumor-specific mutant peptides can be detected by the immune system through their interactions with the human leukocyte antigen complex, and neoantigen presence has recently been shown to correlate with anti T-cell immunity and efficacy of checkpoint inhibitor therapy. However computing capabilities to identify neoantigens from genomic sequencing data are a limiting factor for understanding their role. This challenge has grown as cancer datasets become increasingly abundant, making them cumbersome to store and analyze on local servers. Our cloud-based pipeline provides scalable computation capabilities for neoantigen identification while eliminating the need to invest in local infrastructure for data transfer, storage or compute. The pipeline is a Common Workflow Language (CWL) implementation of human leukocyte antigen (HLA) typing using Polysolver or HLAminer combined with custom scripts for mutant peptide identification and NetMHCpan for neoantigen prediction. We have demonstrated the efficacy of these pipelines on Amazon cloud instances through the Seven Bridges Genomics implementation of the NCI Cancer Genomics Cloud, which provides graphical interfaces for running and editing, infrastructure for workflow sharing and version tracking, and access to TCGA data. The CWL implementation is at: https://github.com/TheJacksonLaboratory/CloudNeo. For users who have obtained licenses for all internal software, integrated versions in CWL and on the Seven Bridges Cancer Genomics Cloud platform (https://cgc.sbgenomics.com/, recommended version) can be obtained by contacting the authors. jeff.chuang@jax.org. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
The design of an m-Health monitoring system based on a cloud computing platform
NASA Astrophysics Data System (ADS)
Xu, Boyi; Xu, Lida; Cai, Hongming; Jiang, Lihong; Luo, Yang; Gu, Yizhi
2017-01-01
Compared to traditional medical services provided within hospitals, m-Health monitoring systems (MHMSs) face more challenges in personalised health data processing. To achieve personalised and high-quality health monitoring by means of new technologies, such as mobile network and cloud computing, in this paper, a framework of an m-Health monitoring system based on a cloud computing platform (Cloud-MHMS) is designed to implement pervasive health monitoring. Furthermore, the modules of the framework, which are Cloud Storage and Multiple Tenants Access Control Layer, Healthcare Data Annotation Layer, and Healthcare Data Analysis Layer, are discussed. In the data storage layer, a multiple tenant access method is designed to protect patient privacy. In the data annotation layer, linked open data are adopted to augment health data interoperability semantically. In the data analysis layer, the process mining algorithm and similarity calculating method are implemented to support personalised treatment plan selection. These three modules cooperate to implement the core functions in the process of health monitoring, which are data storage, data processing, and data analysis. Finally, we study the application of our architecture in the monitoring of antimicrobial drug usage to demonstrate the usability of our method in personal healthcare analysis.
Design and deployment of an elastic network test-bed in IHEP data center based on SDN
NASA Astrophysics Data System (ADS)
Zeng, Shan; Qi, Fazhi; Chen, Gang
2017-10-01
High energy physics experiments produce huge amounts of raw data, while because of the sharing characteristics of the network resources, there is no guarantee of the available bandwidth for each experiment which may cause link congestion problems. On the other side, with the development of cloud computing technologies, IHEP have established a cloud platform based on OpenStack which can ensure the flexibility of the computing and storage resources, and more and more computing applications have been deployed on virtual machines established by OpenStack. However, under the traditional network architecture, network capability can’t be required elastically, which becomes the bottleneck of restricting the flexible application of cloud computing. In order to solve the above problems, we propose an elastic cloud data center network architecture based on SDN, and we also design a high performance controller cluster based on OpenDaylight. In the end, we present our current test results.
Now and Next-Generation Sequencing Techniques: Future of Sequence Analysis Using Cloud Computing
Thakur, Radhe Shyam; Bandopadhyay, Rajib; Chaudhary, Bratati; Chatterjee, Sourav
2012-01-01
Advances in the field of sequencing techniques have resulted in the greatly accelerated production of huge sequence datasets. This presents immediate challenges in database maintenance at datacenters. It provides additional computational challenges in data mining and sequence analysis. Together these represent a significant overburden on traditional stand-alone computer resources, and to reach effective conclusions quickly and efficiently, the virtualization of the resources and computation on a pay-as-you-go concept (together termed “cloud computing”) has recently appeared. The collective resources of the datacenter, including both hardware and software, can be available publicly, being then termed a public cloud, the resources being provided in a virtual mode to the clients who pay according to the resources they employ. Examples of public companies providing these resources include Amazon, Google, and Joyent. The computational workload is shifted to the provider, which also implements required hardware and software upgrades over time. A virtual environment is created in the cloud corresponding to the computational and data storage needs of the user via the internet. The task is then performed, the results transmitted to the user, and the environment finally deleted after all tasks are completed. In this discussion, we focus on the basics of cloud computing, and go on to analyze the prerequisites and overall working of clouds. Finally, the applications of cloud computing in biological systems, particularly in comparative genomics, genome informatics, and SNP detection are discussed with reference to traditional workflows. PMID:23248640
Speeding Up Geophysical Research Using Docker Containers Within Multi-Cloud Environment.
NASA Astrophysics Data System (ADS)
Synytsky, R.; Henadiy, S.; Lobzakov, V.; Kolesnikov, L.; Starovoit, Y. O.
2016-12-01
How useful are the geophysical observations in a scope of minimizing losses from natural disasters today? Does it help to decrease number of human victims during tsunami and earthquake? Unfortunately it's still at early stage these days. It's a big goal and achievement to make such observations more useful by improving early warning and prediction systems with the help of cloud computing. Cloud computing technologies have proved the ability to speed up application development in many areas for 10 years already. Cloud unlocks new opportunities for geoscientists by providing access to modern data processing tools and algorithms including real-time high-performance computing, big data processing, artificial intelligence and others. Emerging lightweight cloud technologies, such as Docker containers, are gaining wide traction in IT due to the fact of faster and more efficient deployment of different applications in a cloud environment. It allows to deploy and manage geophysical applications and systems in minutes across multiple clouds and data centers that becomes of utmost importance for the next generation applications. In this session we'll demonstrate how Docker containers technology within multi-cloud can accelerate the development of applications specifically designed for geophysical researches.
The role of dedicated data computing centers in the age of cloud computing
NASA Astrophysics Data System (ADS)
Caramarcu, Costin; Hollowell, Christopher; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr
2017-10-01
Brookhaven National Laboratory (BNL) anticipates significant growth in scientific programs with large computing and data storage needs in the near future and has recently reorganized support for scientific computing to meet these needs. A key component is the enhanced role of the RHIC-ATLAS Computing Facility (RACF) in support of high-throughput and high-performance computing (HTC and HPC) at BNL. This presentation discusses the evolving role of the RACF at BNL, in light of its growing portfolio of responsibilities and its increasing integration with cloud (academic and for-profit) computing activities. We also discuss BNL’s plan to build a new computing center to support the new responsibilities of the RACF and present a summary of the cost benefit analysis done, including the types of computing activities that benefit most from a local data center vs. cloud computing. This analysis is partly based on an updated cost comparison of Amazon EC2 computing services and the RACF, which was originally conducted in 2012.
Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A
2017-11-01
Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.
Protecting genomic data analytics in the cloud: state of the art and opportunities.
Tang, Haixu; Jiang, Xiaoqian; Wang, Xiaofeng; Wang, Shuang; Sofia, Heidi; Fox, Dov; Lauter, Kristin; Malin, Bradley; Telenti, Amalio; Xiong, Li; Ohno-Machado, Lucila
2016-10-13
The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users. In analyses involving multiple institutions, there is additional concern about data being used beyond agreed research scope and being prcoessed in untrused computational environments, which may not satisfy institutional policies. To systematically investigate these issues, the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, 'anonymization' and SHaring) hosted the second Critical Assessment of Data Privacy and Protection competition to assess the capacity of cryptographic technologies for protecting computation over human genomes in the cloud and promoting cross-institutional collaboration. Data scientists were challenged to design and engineer practical algorithms for secure outsourcing of genome computation tasks in working software, whereby analyses are performed only on encrypted data. They were also challenged to develop approaches to enable secure collaboration on data from genomic studies generated by multiple organizations (e.g., medical centers) to jointly compute aggregate statistics without sharing individual-level records. The results of the competition indicated that secure computation techniques can enable comparative analysis of human genomes, but greater efficiency (in terms of compute time and memory utilization) are needed before they are sufficiently practical for real world environments.
NASA Astrophysics Data System (ADS)
Schnase, J. L.; Duffy, D.; Tamkin, G. S.; Nadeau, D.; Thompson, J. H.; Grieg, C. M.; McInerney, M.; Webster, W. P.
2013-12-01
Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS built on this principle. MERRA/AS enables MapReduce analytics over NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRA/AS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRA/AS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data.
NASA Technical Reports Server (NTRS)
Schnase, John L.; Duffy, Daniel Quinn; Tamkin, Glenn S.; Nadeau, Denis; Thompson, John H.; Grieg, Christina M.; McInerney, Mark A.; Webster, William P.
2014-01-01
Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we it see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRAAS) is an example of cloud-enabled CAaaS built on this principle. MERRAAS enables MapReduce analytics over NASAs Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRAAS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRAAS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data.
WPS mediation: An approach to process geospatial data on different computing backends
NASA Astrophysics Data System (ADS)
Giuliani, Gregory; Nativi, Stefano; Lehmann, Anthony; Ray, Nicolas
2012-10-01
The OGC Web Processing Service (WPS) specification allows generating information by processing distributed geospatial data made available through Spatial Data Infrastructures (SDIs). However, current SDIs have limited analytical capacities and various problems emerge when trying to use them in data and computing-intensive domains such as environmental sciences. These problems are usually not or only partially solvable using single computing resources. Therefore, the Geographic Information (GI) community is trying to benefit from the superior storage and computing capabilities offered by distributed computing (e.g., Grids, Clouds) related methods and technologies. Currently, there is no commonly agreed approach to grid-enable WPS. No implementation allows one to seamlessly execute a geoprocessing calculation following user requirements on different computing backends, ranging from a stand-alone GIS server up to computer clusters and large Grid infrastructures. Considering this issue, this paper presents a proof of concept by mediating different geospatial and Grid software packages, and by proposing an extension of WPS specification through two optional parameters. The applicability of this approach will be demonstrated using a Normalized Difference Vegetation Index (NDVI) mediated WPS process, highlighting benefits, and issues that need to be further investigated to improve performances.
Convective Cloud Towers and Precipitation Initiation, Frequency and Intensity
NASA Astrophysics Data System (ADS)
Vant-hull, B.; Mahani, S. E.; Autones, F.; Rabin, R.; Mecikalski, J. R.; Khanbilvardi, R.
2012-12-01
: Geosynchronous satellite retrieval of precipitation is desirable because it would provide continuous observation throughout most of the globe in regions where radar data is not available. In the current work the distribution of precipitation rates is examined as a function of cloud tower area and cloud top temperature. A thunderstorm tracking algorithm developed at Meteo-France is used to track cumulus towers that are matched up with radar data at 5 minute 1 km resolution. It is found that roughly half of the precipitation occurs in the cloud mass that surrounds the towers, and when a tower is first detected the precipitation is already in progress 50% of the time. The average density of precipitation per area is greater as the towers become smaller and colder, yet the averaged shape of the precipitation intensity distribution is remarkably constant in all convective situations with cloud tops warmer than 220 K. This suggests that on average all convective precipitation events look the same, unaffected by the higher frequency of occurrence per area inside the convective towers. Only once the cloud tops are colder than 220 K does the precipitation intensity distribution become weighted towards higher instantaneous intensities. Radar precipitation shown in shades of green to blue, lightning in orange; black diamonds are coldest points in each tower. Ratio of number of pixels of given precipitation inside versus outside the convective towers, for various average cloud top temperatures. A flat plot indicates the distribution of rainfall inside and outside the towers has the same shape.
NASA Astrophysics Data System (ADS)
Blodgett, D. L.; Booth, N.; Walker, J.; Kunicki, T.
2012-12-01
The U.S. Geological Survey Center for Integrated Data Analytics (CIDA), in holding with the President's Digital Government Strategy and the Department of Interior's IT Transformation initiative, has evolved its data center and application architecture toward the "cloud" paradigm. In this case, "cloud" refers to a goal of developing services that may be distributed to infrastructure anywhere on the Internet. This transition has taken place across the entire data management spectrum from data center location to physical hardware configuration to software design and implementation. In CIDA's case, physical hardware resides in Madison at the Wisconsin Water Science Center, in South Dakota at the Earth Resources Observation and Science Center (EROS), and in the near future at a DOI approved commercial vendor. Tasks normally conducted on desktop-based GIS software with local copies of data in proprietary formats are now done using browser-based interfaces to web processing services drawing on a network of standard data-source web services. Organizations are gaining economies of scale through data center consolidation and the creation of private cloud services as well as taking advantage of the commoditization of data processing services. Leveraging open standards for data and data management take advantage of this commoditization and provide the means to reliably build distributed service based systems. This presentation will use CIDA's experience as an illustration of the benefits and hurdles of moving to the cloud. Replicating, reformatting, and processing large data sets, such as downscaled climate projections, traditionally present a substantial challenge to environmental science researchers who need access to data subsets and derived products. The USGS Geo Data Portal (GDP) project uses cloud concepts to help earth system scientists' access subsets, spatial summaries, and derivatives of commonly needed very large data. The GDP project has developed a reusable architecture and advanced processing services that currently accesses archives hosted at Lawrence Livermore National Lab, Oregon State University, the University Corporation for Atmospheric Research, and the U.S. Geological Survey, among others. Several examples of how the GDP project uses cloud concepts will be highlighted in this presentation: 1) The high bandwidth network connectivity of large data centers reduces the need for data replication and storage local to processing services. 2) Standard data serving web services, like OPeNDAP, Web Coverage Services, and Web Feature Services allow GDP services to remotely access custom subsets of data in a variety of formats, further reducing the need for data replication and reformatting. 3) The GDP services use standard web service APIs to allow browser-based user interfaces to run complex and compute-intensive processes for users from any computer with an Internet connection. The combination of physical infrastructure and application architecture implemented for the Geo Data Portal project offer an operational example of how distributed data and processing on the cloud can be used to aid earth system science.
NASA Technical Reports Server (NTRS)
Smith, William L., Jr.; Minnis, Patrick; Alvarez, Joseph M.; Uttal, Taneil; Intrieri, Janet M.; Ackerman, Thomas P.; Clothiaux, Eugene
1993-01-01
Cloud-top height is a major factor determining the outgoing longwave flux at the top of the atmosphere. The downwelling radiation from the cloud strongly affects the cooling rate within the atmosphere and the longwave radiation incident at the surface. Thus, determination of cloud-base temperature is important for proper calculation of fluxes below the cloud. Cloud-base altitude is also an important factor in aircraft operations. Cloud-top height or temperature can be derived in a straightforward manner using satellite-based infrared data. Cloud-base temperature, however, is not observable from the satellite, but is related to the height, phase, and optical depth of the cloud in addition to other variables. This study uses surface and satellite data taken during the First ISCCP Regional Experiment (FIRE) Phase-2 Intensive Field Observation (IFO) period (13 Nov. - 7 Dec. 1991, to improve techniques for deriving cloud-base height from conventional satellite data.
Modeling Lidar Multiple Scattering
NASA Astrophysics Data System (ADS)
Sato, Kaori; Okamoto, Hajime; Ishimoto, Hiroshi
2016-06-01
A practical model to simulate multiply scattered lidar returns from inhomogeneous cloud layers are developed based on Backward Monte Carlo (BMC) simulations. The estimated time delay of the backscattered intensities returning from different vertical grids by the developed model agreed well with that directly obtained from BMC calculations. The method was applied to the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) satellite data to improve the synergetic retrieval of cloud microphysics with CloudSat radar data at optically thick cloud grids. Preliminary results for retrieving mass fraction of co-existing cloud particles and drizzle size particles within lowlevel clouds are demonstrated.
NASA Astrophysics Data System (ADS)
Wang, Zhe; Wang, Zhenhui; Cao, Xiaozhong; Tao, Fa
2018-01-01
Clouds are currently observed by both ground-based and satellite remote sensing techniques. Each technique has its own strengths and weaknesses depending on the observation method, instrument performance and the methods used for retrieval. It is important to study synergistic cloud measurements to improve the reliability of the observations and to verify the different techniques. The FY-2 geostationary orbiting meteorological satellites continuously observe the sky over China. Their cloud top temperature product can be processed to retrieve the cloud top height (CTH). The ground-based millimeter wavelength cloud radar can acquire information about the vertical structure of clouds-such as the cloud base height (CBH), CTH and the cloud thickness-and can continuously monitor changes in the vertical profiles of clouds. The CTHs were retrieved using both cloud top temperature data from the FY-2 satellites and the cloud radar reflectivity data for the same time period (June 2015 to May 2016) and the resulting datasets were compared in order to evaluate the accuracy of CTH retrievals using FY-2 satellites. The results show that the concordance rate of cloud detection between the two datasets was 78.1%. Higher consistencies were obtained for thicker clouds with larger echo intensity and for more continuous clouds. The average difference in the CTH between the two techniques was 1.46 km. The difference in CTH between low- and mid-level clouds was less than that for high-level clouds. An attenuation threshold of the cloud radar for rainfall was 0.2 mm/min; a rainfall intensity below this threshold had no effect on the CTH. The satellite CTH can be used to compensate for the attenuation error in the cloud radar data.
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter
Loganathan, Shyamala; Mukherjee, Saswati
2015-01-01
Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms. PMID:26473166
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter.
Loganathan, Shyamala; Mukherjee, Saswati
2015-01-01
Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms.
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
NASA Astrophysics Data System (ADS)
Klimentov, A.; Buncic, P.; De, K.; Jha, S.; Maeno, T.; Mount, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Porter, R. J.; Read, K. F.; Vaniachine, A.; Wells, J. C.; Wenaus, T.
2015-05-01
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled ‘Next Generation Workload Management and Analysis System for Big Data’ (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.
A cloud computing based platform for sleep behavior and chronic diseases collaborative research.
Kuo, Mu-Hsing; Borycki, Elizabeth; Kushniruk, Andre; Huang, Yueh-Min; Hung, Shu-Hui
2014-01-01
The objective of this study is to propose a Cloud Computing based platform for sleep behavior and chronic disease collaborative research. The platform consists of two main components: (1) a sensing bed sheet with textile sensors to automatically record patient's sleep behaviors and vital signs, and (2) a service-oriented cloud computing architecture (SOCCA) that provides a data repository and allows for sharing and analysis of collected data. Also, we describe our systematic approach to implementing the SOCCA. We believe that the new cloud-based platform can provide nurse and other health professional researchers located in differing geographic locations with a cost effective, flexible, secure and privacy-preserved research environment.
Cloud based intelligent system for delivering health care as a service.
Kaur, Pankaj Deep; Chana, Inderveer
2014-01-01
The promising potential of cloud computing and its convergence with technologies such as mobile computing, wireless networks, sensor technologies allows for creation and delivery of newer type of cloud services. In this paper, we advocate the use of cloud computing for the creation and management of cloud based health care services. As a representative case study, we design a Cloud Based Intelligent Health Care Service (CBIHCS) that performs real time monitoring of user health data for diagnosis of chronic illness such as diabetes. Advance body sensor components are utilized to gather user specific health data and store in cloud based storage repositories for subsequent analysis and classification. In addition, infrastructure level mechanisms are proposed to provide dynamic resource elasticity for CBIHCS. Experimental results demonstrate that classification accuracy of 92.59% is achieved with our prototype system and the predicted patterns of CPU usage offer better opportunities for adaptive resource elasticity. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Real-time WAMI streaming target tracking in fog
NASA Astrophysics Data System (ADS)
Chen, Yu; Blasch, Erik; Chen, Ning; Deng, Anna; Ling, Haibin; Chen, Genshe
2016-05-01
Real-time information fusion based on WAMI (Wide-Area Motion Imagery), FMV (Full Motion Video), and Text data is highly desired for many mission critical emergency or security applications. Cloud Computing has been considered promising to achieve big data integration from multi-modal sources. In many mission critical tasks, however, powerful Cloud technology cannot satisfy the tight latency tolerance as the servers are allocated far from the sensing platform, actually there is no guaranteed connection in the emergency situations. Therefore, data processing, information fusion, and decision making are required to be executed on-site (i.e., near the data collection). Fog Computing, a recently proposed extension and complement for Cloud Computing, enables computing on-site without outsourcing jobs to a remote Cloud. In this work, we have investigated the feasibility of processing streaming WAMI in the Fog for real-time, online, uninterrupted target tracking. Using a single target tracking algorithm, we studied the performance of a Fog Computing prototype. The experimental results are very encouraging that validated the effectiveness of our Fog approach to achieve real-time frame rates.
77 FR 65177 - Swap Data Repositories: Interpretative Statement Regarding the Confidentiality and...
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-25
... participation in standard-setting bodies to develop international standards relevant to the swap markets. Cloud Strategix, LLC (``Cloud Strategix''), representing the data hosting and cloud computing industry, in... Roundtable, June 6, 2012; (iii) Cloud Strategix, LLC, June 5, 2012; and (iv) the Depository Trust & Clearing...
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-07
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations
NASA Technical Reports Server (NTRS)
Lynnes, Chris; Little, Mike; Huang, Thomas; Jacob, Joseph; Yang, Phil; Kuo, Kwo-Sen
2016-01-01
Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based file systems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations
NASA Astrophysics Data System (ADS)
Lynnes, C.; Little, M. M.; Huang, T.; Jacob, J. C.; Yang, C. P.; Kuo, K. S.
2016-12-01
Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based filesystems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy
NASA Astrophysics Data System (ADS)
Popic, Victoria; Batzoglou, Serafim
2017-05-01
Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party.
A hybrid cloud read aligner based on MinHash and kmer voting that preserves privacy
Popic, Victoria; Batzoglou, Serafim
2017-01-01
Low-cost clouds can alleviate the compute and storage burden of the genome sequencing data explosion. However, moving personal genome data analysis to the cloud can raise serious privacy concerns. Here, we devise a method named Balaur, a privacy preserving read mapper for hybrid clouds based on locality sensitive hashing and kmer voting. Balaur can securely outsource a substantial fraction of the computation to the public cloud, while being highly competitive in accuracy and speed with non-private state-of-the-art read aligners on short read data. We also show that the method is significantly faster than the state of the art in long read mapping. Therefore, Balaur can enable institutions handling massive genomic data sets to shift part of their analysis to the cloud without sacrificing accuracy or exposing sensitive information to an untrusted third party. PMID:28508884
An Automatic Procedure for Combining Digital Images and Laser Scanner Data
NASA Astrophysics Data System (ADS)
Moussa, W.; Abdel-Wahab, M.; Fritsch, D.
2012-07-01
Besides improving both the geometry and the visual quality of the model, the integration of close-range photogrammetry and terrestrial laser scanning techniques directs at filling gaps in laser scanner point clouds to avoid modeling errors, reconstructing more details in higher resolution and recovering simple structures with less geometric details. Thus, within this paper a flexible approach for the automatic combination of digital images and laser scanner data is presented. Our approach comprises two methods for data fusion. The first method starts by a marker-free registration of digital images based on a point-based environment model (PEM) of a scene which stores the 3D laser scanner point clouds associated with intensity and RGB values. The PEM allows the extraction of accurate control information for the direct computation of absolute camera orientations with redundant information by means of accurate space resection methods. In order to use the computed relations between the digital images and the laser scanner data, an extended Helmert (seven-parameter) transformation is introduced and its parameters are estimated. Precedent to that, in the second method, the local relative orientation parameters of the camera images are calculated by means of an optimized Structure and Motion (SaM) reconstruction method. Then, using the determined transformation parameters results in having absolute oriented images in relation to the laser scanner data. With the resulting absolute orientations we have employed robust dense image reconstruction algorithms to create oriented dense image point clouds, which are automatically combined with the laser scanner data to form a complete detailed representation of a scene. Examples of different data sets are shown and experimental results demonstrate the effectiveness of the presented procedures.
Snore related signals processing in a private cloud computing system.
Qian, Kun; Guo, Jian; Xu, Huijie; Zhu, Zhaomeng; Zhang, Gongxuan
2014-09-01
Snore related signals (SRS) have been demonstrated to carry important information about the obstruction site and degree in the upper airway of Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) patients in recent years. To make this acoustic signal analysis method more accurate and robust, big SRS data processing is inevitable. As an emerging concept and technology, cloud computing has motivated numerous researchers and engineers to exploit applications both in academic and industry field, which could have an ability to implement a huge blue print in biomedical engineering. Considering the security and transferring requirement of biomedical data, we designed a system based on private cloud computing to process SRS. Then we set the comparable experiments of processing a 5-hour audio recording of an OSAHS patient by a personal computer, a server and a private cloud computing system to demonstrate the efficiency of the infrastructure we proposed.
Prototyping manufacturing in the cloud
NASA Astrophysics Data System (ADS)
Ciortea, E. M.
2017-08-01
This paper attempts a theoretical approach to cloud systems with impacts on production systems. I call systems as cloud computing because form a relatively new concept in the field of informatics, representing an overall distributed computing services, applications, access to information and data storage without the user to know the physical location and configuration of systems. The advantages of this approach are especially computing speed and storage capacity without investment in additional configurations, synchronizing user data, data processing using web applications. The disadvantage is that it wants to identify a solution for data security, leading to mistrust users. The case study is applied to a module of the system of production, because the system is complex.
Translational Biomedical Informatics in the Cloud: Present and Future
Chen, Jiajia; Qian, Fuliang; Yan, Wenying; Shen, Bairong
2013-01-01
Next generation sequencing and other high-throughput experimental techniques of recent decades have driven the exponential growth in publicly available molecular and clinical data. This information explosion has prepared the ground for the development of translational bioinformatics. The scale and dimensionality of data, however, pose obvious challenges in data mining, storage, and integration. In this paper we demonstrated the utility and promise of cloud computing for tackling the big data problems. We also outline our vision that cloud computing could be an enabling tool to facilitate translational bioinformatics research. PMID:23586054
Statistical analysis and use of the VAS radiance data
NASA Technical Reports Server (NTRS)
Fuelberg, H. E.
1984-01-01
Special radiosonde soundings at 75 km spacings and 3 hour intervals provided an opportunity to learn more about mesoscale data and storm-environment interactions. Relatively small areas of intense convection produce major changes in surrounding fields of thermodynamic, kinematic, and energy variables. The Red River Valley tornado outbreak was studied. Satellite imagery and surface data were used to specify cloud information needed in the radiative heating/cooling calculations. A feasibility study for computing boundary layer winds from satellite-derived thermal data was completed. Winds obtained from TIROS-N retrievals compared very favorably with corresponding values from concurrent rawisonde thermal data, and both sets of thermally-derived winds showed good agreements with observed values.
Relating Convective and Stratiform Rain to Latent Heating
NASA Technical Reports Server (NTRS)
Tao, Wei-Kuo; Lang, Stephen; Zeng, Xiping; Shige, Shoichi; Takayabu, Yukari
2010-01-01
The relationship among surface rainfall, its intensity, and its associated stratiform amount is established by examining observed precipitation data from the Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR). The results show that for moderate-high stratiform fractions, rain probabilities are strongly skewed toward light rain intensities. For convective-type rain, the peak probability of occurrence shifts to higher intensities but is still significantly skewed toward weaker rain rates. The main differences between the distributions for oceanic and continental rain are for heavily convective rain. The peak occurrence, as well as the tail of the distribution containing the extreme events, is shifted to higher intensities for continental rain. For rainy areas sampled at 0.58 horizontal resolution, the occurrence of conditional rain rates over 100 mm/day is significantly higher over land. Distributions of rain intensity versus stratiform fraction for simulated precipitation data obtained from cloud-resolving model (CRM) simulations are quite similar to those from the satellite, providing a basis for mapping simulated cloud quantities to the satellite observations. An improved convective-stratiform heating (CSH) algorithm is developed based on two sources of information: gridded rainfall quantities (i.e., the conditional intensity and the stratiform fraction) observed from the TRMM PR and synthetic cloud process data (i.e., latent heating, eddy heat flux convergence, and radiative heating/cooling) obtained from CRM simulations of convective cloud systems. The new CSH algorithm-derived heating has a noticeably different heating structure over both ocean and land regions compared to the previous CSH algorithm. Major differences between the new and old algorithms include a significant increase in the amount of low- and midlevel heating, a downward emphasis in the level of maximum cloud heating by about 1 km, and a larger variance between land and ocean in the new CSH algorithm.
Symmetrical compression distance for arrhythmia discrimination in cloud-based big-data services.
Lillo-Castellano, J M; Mora-Jiménez, I; Santiago-Mozos, R; Chavarría-Asso, F; Cano-González, A; García-Alberola, A; Rojo-Álvarez, J L
2015-07-01
The current development of cloud computing is completely changing the paradigm of data knowledge extraction in huge databases. An example of this technology in the cardiac arrhythmia field is the SCOOP platform, a national-level scientific cloud-based big data service for implantable cardioverter defibrillators. In this scenario, we here propose a new methodology for automatic classification of intracardiac electrograms (EGMs) in a cloud computing system, designed for minimal signal preprocessing. A new compression-based similarity measure (CSM) is created for low computational burden, so-called weighted fast compression distance, which provides better performance when compared with other CSMs in the literature. Using simple machine learning techniques, a set of 6848 EGMs extracted from SCOOP platform were classified into seven cardiac arrhythmia classes and one noise class, reaching near to 90% accuracy when previous patient arrhythmia information was available and 63% otherwise, hence overcoming in all cases the classification provided by the majority class. Results show that this methodology can be used as a high-quality service of cloud computing, providing support to physicians for improving the knowledge on patient diagnosis.
Parameterization of cloud lidar backscattering profiles by means of asymmetrical Gaussians
NASA Astrophysics Data System (ADS)
del Guasta, Massimo; Morandi, Marco; Stefanutti, Leopoldo
1995-06-01
A fitting procedure for cloud lidar data processing is shown that is based on the computation of the first three moments of the vertical-backscattering (or -extinction) profile. Single-peak clouds or single cloud layers are approximated to asymmetrical Gaussians. The algorithm is particularly stable with respect to noise and processing errors, and it is much faster than the equivalent least-squares approach. Multilayer clouds can easily be treated as a sum of single asymmetrical Gaussian peaks. The method is suitable for cloud-shape parametrization in noisy lidar signatures (like those expected from satellite lidars). It also permits an improvement of cloud radiative-property computations that are based on huge lidar data sets for which storage and careful examination of single lidar profiles can't be carried out.
Further developments in cloud statistics for computer simulations
NASA Technical Reports Server (NTRS)
Chang, D. T.; Willand, J. H.
1972-01-01
This study is a part of NASA's continued program to provide global statistics of cloud parameters for computer simulation. The primary emphasis was on the development of the data bank of the global statistical distributions of cloud types and cloud layers and their applications in the simulation of the vertical distributions of in-cloud parameters such as liquid water content. These statistics were compiled from actual surface observations as recorded in Standard WBAN forms. Data for a total of 19 stations were obtained and reduced. These stations were selected to be representative of the 19 primary cloud climatological regions defined in previous studies of cloud statistics. Using the data compiled in this study, a limited study was conducted of the hemogeneity of cloud regions, the latitudinal dependence of cloud-type distributions, the dependence of these statistics on sample size, and other factors in the statistics which are of significance to the problem of simulation. The application of the statistics in cloud simulation was investigated. In particular, the inclusion of the new statistics in an expanded multi-step Monte Carlo simulation scheme is suggested and briefly outlined.
NASA Technical Reports Server (NTRS)
Bencic, Timothy J.; Fagan, Amy; Van Zante, Judith F.; Kirkegaard, Jonathan P.; Rohler, David P.; Maniyedath, Arjun; Izen, Steven H.
2013-01-01
A light extinction tomography technique has been developed to monitor ice water clouds upstream of a direct connected engine in the Propulsion Systems Laboratory (PSL) at NASA Glenn Research Center (GRC). The system consists of 60 laser diodes with sheet generating optics and 120 detectors mounted around a 36-inch diameter ring. The sources are pulsed sequentially while the detectors acquire line-of-sight extinction data for each laser pulse. Using computed tomography algorithms, the extinction data are analyzed to produce a plot of the relative water content in the measurement plane. To target the low-spatial-frequency nature of ice water clouds, unique tomography algorithms were developed using filtered back-projection methods and direct inversion methods that use Gaussian basis functions. With the availability of a priori knowledge of the mean droplet size and the total water content at some point in the measurement plane, the tomography system can provide near real-time in-situ quantitative full-field total water content data at a measurement plane approximately 5 feet upstream of the engine inlet. Results from ice crystal clouds in the PSL are presented. In addition to the optical tomography technique, laser sheet imaging has also been applied in the PSL to provide planar ice cloud uniformity and relative water content data during facility calibration before the tomography system was available and also as validation data for the tomography system. A comparison between the laser sheet system and light extinction tomography resulting data are also presented. Very good agreement of imaged intensity and water content is demonstrated for both techniques. Also, comparative studies between the two techniques show excellent agreement in calculation of bulk total water content averaged over the center of the pipe.
NASA Astrophysics Data System (ADS)
Tsuboki, Kazuhisa
2017-04-01
Typhoons are the most devastating weather system occurring in the western North Pacific and the South China Sea. Violent wind and heavy rainfall associated with a typhoon cause huge disaster in East Asia including Japan. In 2013, Supertyphoon Haiyan struck the Philippines caused a very high storm surge and more than 7000 people were killed. In 2015, two typhoons approached the main islands of Japan and severe flood occurred in the northern Kanto region. Typhoons are still the largest cause of natural disaster in East Asia. Moreover, many researches have projected increase of typhoon intensity with the climate change. This suggests that a typhoon risk is increasing in East Asia. However, the historical data of typhoon include large uncertainty. In particular, intensity data of the most intense typhoon category have larger error after the US aircraft reconnaissance of typhoon was terminated in 1987.The main objective of the present study is improvements of typhoon intensity estimations and of forecasts of intensity and track. We will perform aircraft observation of typhoon and the observed data are assimilated to numerical models to improve intensity estimation. Using radars and balloons, observations of thermodynamical and cloud-microphysical processes of typhoons will be also performed to improve physical processes of numerical model. In typhoon seasons (mostly in August and September), we will perform aircraft observations of typhoons. Using dropsondes from the aircraft, temperature, humidity, pressure, and wind are measured in surroundings of the typhoon inner core region. The dropsonde data are assimilated to a cloud-resolving model which has been developed in Nagoya University and named the Cloud Resolving Storm Simulator (CReSS). Then, more accurate estimations and forecasts of the typhoon intensity will be made as well as typhoon tracks. Furthermore, we will utilize a ground-based balloon with microscope camera, X-band precipitation radar, Ka-band cloud radar, aerosol sonde, and a drone to observe typhoon-associated clouds and precipitation. After a test flight in March 2017, typhoon observations will be made for next 4 years; 2017-2020. The main target area of observation is the south of Okinawa where a typhoon reaches the maximum intensity and often changes its moving direction. This research will advance aircraft observation technique of typhoon in Japan. The aircraft observation will be a breakthrough to improve typhoon intensity estimations. Assimilation of the aircraft observation data to the cloud-resolving model will improve intensity estimations and forecasts of typhoons. This is the first step for the future advanced aircraft observation and will contribute to prevention or reduction of typhoon disasters.
NASA Astrophysics Data System (ADS)
Morikawa, Y.; Murata, K. T.; Watari, S.; Kato, H.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Shimojo, S.
2010-12-01
Main methodologies of Solar-Terrestrial Physics (STP) so far are theoretical, experimental and observational, and computer simulation approaches. Recently "informatics" is expected as a new (fourth) approach to the STP studies. Informatics is a methodology to analyze large-scale data (observation data and computer simulation data) to obtain new findings using a variety of data processing techniques. At NICT (National Institute of Information and Communications Technology, Japan) we are now developing a new research environment named "OneSpaceNet". The OneSpaceNet is a cloud-computing environment specialized for science works, which connects many researchers with high-speed network (JGN: Japan Gigabit Network). The JGN is a wide-area back-born network operated by NICT; it provides 10G network and many access points (AP) over Japan. The OneSpaceNet also provides with rich computer resources for research studies, such as super-computers, large-scale data storage area, licensed applications, visualization devices (like tiled display wall: TDW), database/DBMS, cluster computers (4-8 nodes) for data processing and communication devices. What is amazing in use of the science cloud is that a user simply prepares a terminal (low-cost PC). Once connecting the PC to JGN2plus, the user can make full use of the rich resources of the science cloud. Using communication devices, such as video-conference system, streaming and reflector servers, and media-players, the users on the OneSpaceNet can make research communications as if they belong to a same (one) laboratory: they are members of a virtual laboratory. The specification of the computer resources on the OneSpaceNet is as follows: The size of data storage we have developed so far is almost 1PB. The number of the data files managed on the cloud storage is getting larger and now more than 40,000,000. What is notable is that the disks forming the large-scale storage are distributed to 5 data centers over Japan (but the storage system performs as one disk). There are three supercomputers allocated on the cloud, one from Tokyo, one from Osaka and the other from Nagoya. One's simulation job data on any supercomputers are saved on the cloud data storage (same directory); it is a kind of virtual computing environment. The tiled display wall has 36 panels acting as one display; the pixel (resolution) size of it is as large as 18000x4300. This size is enough to preview or analyze the large-scale computer simulation data. It also allows us to take a look of multiple (e.g., 100 pictures) on one screen together with many researchers. In our talk we also present a brief report of the initial results using the OneSpaceNet for Global MHD simulations as an example of successful use of our science cloud; (i) Ultra-high time resolution visualization of Global MHD simulations on the large-scale storage and parallel processing system on the cloud, (ii) Database of real-time Global MHD simulation and statistic analyses of the data, and (iii) 3D Web service of Global MHD simulations.
1984-07-01
aerosols and sub pixel-sized clouds all tend to increase Channel 1 with respect to Channel 2 and reduce the computed VIN. Further, the Guide states that... computation of the VIN. Large scale cloud contamination of pixels, while diffi- cult to correct for, can at least be monitored and affected pixels...techniques have been developed for computer cloud screening. See, for example, Horvath et al. (1982), Gray and McCrary (1981a) and Nixon et al. (1983
Searching for SNPs with cloud computing
2009-01-01
As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. Crossbow is available from http://bowtie-bio.sourceforge.net/crossbow/. PMID:19930550
Cloud computing in medical imaging.
Kagadis, George C; Kloukinas, Christos; Moore, Kevin; Philbin, Jim; Papadimitroulas, Panagiotis; Alexakos, Christos; Nagy, Paul G; Visvikis, Dimitris; Hendee, William R
2013-07-01
Over the past century technology has played a decisive role in defining, driving, and reinventing procedures, devices, and pharmaceuticals in healthcare. Cloud computing has been introduced only recently but is already one of the major topics of discussion in research and clinical settings. The provision of extensive, easily accessible, and reconfigurable resources such as virtual systems, platforms, and applications with low service cost has caught the attention of many researchers and clinicians. Healthcare researchers are moving their efforts to the cloud, because they need adequate resources to process, store, exchange, and use large quantities of medical data. This Vision 20/20 paper addresses major questions related to the applicability of advanced cloud computing in medical imaging. The paper also considers security and ethical issues that accompany cloud computing.
Extended outlook: description, utilization, and daily applications of cloud technology in radiology.
Gerard, Perry; Kapadia, Neil; Chang, Patricia T; Acharya, Jay; Seiler, Michael; Lefkovitz, Zvi
2013-12-01
The purpose of this article is to discuss the concept of cloud technology, its role in medical applications and radiology, the role of the radiologist in using and accessing these vast resources of information, and privacy concerns and HIPAA compliance strategies. Cloud computing is the delivery of shared resources, software, and information to computers and other devices as a metered service. This technology has a promising role in the sharing of patient medical information and appears to be particularly suited for application in radiology, given the field's inherent need for storage and access to large amounts of data. The radiology cloud has significant strengths, such as providing centralized storage and access, reducing unnecessary repeat radiologic studies, and potentially allowing radiologic second opinions more easily. There are significant cost advantages to cloud computing because of a decreased need for infrastructure and equipment by the institution. Private clouds may be used to ensure secure storage of data and compliance with HIPAA. In choosing a cloud service, there are important aspects, such as disaster recovery plans, uptime, and security audits, that must be considered. Given that the field of radiology has become almost exclusively digital in recent years, the future of secure storage and easy access to imaging studies lies within cloud computing technology.
Application-oriented offloading in heterogeneous networks for mobile cloud computing
NASA Astrophysics Data System (ADS)
Tseng, Fan-Hsun; Cho, Hsin-Hung; Chang, Kai-Di; Li, Jheng-Cong; Shih, Timothy K.
2018-04-01
Nowadays Internet applications have become more complicated that mobile device needs more computing resources for shorter execution time but it is restricted to limited battery capacity. Mobile cloud computing (MCC) is emerged to tackle the finite resource problem of mobile device. MCC offloads the tasks and jobs of mobile devices to cloud and fog environments by using offloading scheme. It is vital to MCC that which task should be offloaded and how to offload efficiently. In the paper, we formulate the offloading problem between mobile device and cloud data center and propose two algorithms based on application-oriented for minimum execution time, i.e. the Minimum Offloading Time for Mobile device (MOTM) algorithm and the Minimum Execution Time for Cloud data center (METC) algorithm. The MOTM algorithm minimizes offloading time by selecting appropriate offloading links based on application categories. The METC algorithm minimizes execution time in cloud data center by selecting virtual and physical machines with corresponding resource requirements of applications. Simulation results show that the proposed mechanism not only minimizes total execution time for mobile devices but also decreases their energy consumption.
Galaxy CloudMan: delivering cloud compute clusters.
Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James
2010-12-21
Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
75 FR 80042 - Information Privacy and Innovation in the Internet Economy
Federal Register 2010, 2011, 2012, 2013, 2014
2010-12-21
... statistics that provide evidence of concern--or comments explaining why concerns are unwarranted--about cloud computing data privacy and security in the commercial context. We also seek data that links any such concerns to decisions to adopt, or refrain from adopting, cloud computing services. (41) The Task Force...
Integration of end-user Cloud storage for CMS analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riahi, Hassen; Aimar, Alberto; Ayllon, Alejandro Alvarez
End-user Cloud storage is increasing rapidly in popularity in research communities thanks to the collaboration capabilities it offers, namely synchronisation and sharing. CERN IT has implemented a model of such storage named, CERNBox, integrated with the CERN AuthN and AuthZ services. To exploit the use of the end-user Cloud storage for the distributed data analysis activity, the CMS experiment has started the integration of CERNBox as a Grid resource. This will allow CMS users to make use of their own storage in the Cloud for their analysis activities as well as to benefit from synchronisation and sharing capabilities to achievemore » results faster and more effectively. It will provide an integration model of Cloud storages in the Grid, which is implemented and commissioned over the world’s largest computing Grid infrastructure, Worldwide LHC Computing Grid (WLCG). In this paper, we present the integration strategy and infrastructure changes needed in order to transparently integrate end-user Cloud storage with the CMS distributed computing model. We describe the new challenges faced in data management between Grid and Cloud and how they were addressed, along with details of the support for Cloud storage recently introduced into the WLCG data movement middleware, FTS3. Finally, the commissioning experience of CERNBox for the distributed data analysis activity is also presented.« less
Integration of end-user Cloud storage for CMS analysis
Riahi, Hassen; Aimar, Alberto; Ayllon, Alejandro Alvarez; ...
2017-05-19
End-user Cloud storage is increasing rapidly in popularity in research communities thanks to the collaboration capabilities it offers, namely synchronisation and sharing. CERN IT has implemented a model of such storage named, CERNBox, integrated with the CERN AuthN and AuthZ services. To exploit the use of the end-user Cloud storage for the distributed data analysis activity, the CMS experiment has started the integration of CERNBox as a Grid resource. This will allow CMS users to make use of their own storage in the Cloud for their analysis activities as well as to benefit from synchronisation and sharing capabilities to achievemore » results faster and more effectively. It will provide an integration model of Cloud storages in the Grid, which is implemented and commissioned over the world’s largest computing Grid infrastructure, Worldwide LHC Computing Grid (WLCG). In this paper, we present the integration strategy and infrastructure changes needed in order to transparently integrate end-user Cloud storage with the CMS distributed computing model. We describe the new challenges faced in data management between Grid and Cloud and how they were addressed, along with details of the support for Cloud storage recently introduced into the WLCG data movement middleware, FTS3. Finally, the commissioning experience of CERNBox for the distributed data analysis activity is also presented.« less
Interoperating Cloud-based Virtual Farms
NASA Astrophysics Data System (ADS)
Bagnasco, S.; Colamaria, F.; Colella, D.; Casula, E.; Elia, D.; Franco, A.; Lusso, S.; Luparello, G.; Masera, M.; Miniello, G.; Mura, D.; Piano, S.; Vallero, S.; Venaruzzo, M.; Vino, G.
2015-12-01
The present work aims at optimizing the use of computing resources available at the grid Italian Tier-2 sites of the ALICE experiment at CERN LHC by making them accessible to interactive distributed analysis, thanks to modern solutions based on cloud computing. The scalability and elasticity of the computing resources via dynamic (“on-demand”) provisioning is essentially limited by the size of the computing site, reaching the theoretical optimum only in the asymptotic case of infinite resources. The main challenge of the project is to overcome this limitation by federating different sites through a distributed cloud facility. Storage capacities of the participating sites are seen as a single federated storage area, preventing the need of mirroring data across them: high data access efficiency is guaranteed by location-aware analysis software and storage interfaces, in a transparent way from an end-user perspective. Moreover, the interactive analysis on the federated cloud reduces the execution time with respect to grid batch jobs. The tests of the investigated solutions for both cloud computing and distributed storage on wide area network will be presented.
Security and Cloud Outsourcing Framework for Economic Dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Security and Cloud Outsourcing Framework for Economic Dispatch
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi; ...
2017-04-24
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
NASA Technical Reports Server (NTRS)
Farrugia, C. J.; Burlaga, L. F.; Osherovich, V. A.; Richardson, I. G.; Freeman, M. P.; Lepping, R. P.; Lazarus, A. J.
1993-01-01
High time resolution interplanetary magnetic field and plasma measurements of an interplanetary magnetic cloud and its interaction with the earth's magnetosphere on January 14/15, 1988 are interpreted and discussed. It is argued that the data are consistent with the theoretical model of magnetic clouds as flux ropes of local straight cylindrical geometry. The data also suggest that this cloud is aligned with its axis in the ecliptic plane and pointing in the east-west direction. Evidence consisting of the intensity and directional distribution of energetic particle in the magnetic cloud argues in favor of the connectedness of the magnetic field lines to the sun's surface. The intensities of about 0.5 MeV ions is rapidly enhanced and the particles stream in a collimated beam along the magnetic field preferentially from the west of the sun. The particles travel form a flare site along the cloud magnetic field lines, which are thus presumably still attached to the sun.
Lai, Chin-Feng; Chen, Min; Pan, Jeng-Shyang; Youn, Chan-Hyun; Chao, Han-Chieh
2014-03-01
As cloud computing and wireless body sensor network technologies become gradually developed, ubiquitous healthcare services prevent accidents instantly and effectively, as well as provides relevant information to reduce related processing time and cost. This study proposes a co-processing intermediary framework integrated cloud and wireless body sensor networks, which is mainly applied to fall detection and 3-D motion reconstruction. In this study, the main focuses includes distributed computing and resource allocation of processing sensing data over the computing architecture, network conditions and performance evaluation. Through this framework, the transmissions and computing time of sensing data are reduced to enhance overall performance for the services of fall events detection and 3-D motion reconstruction.
A European Federated Cloud: Innovative distributed computing solutions by EGI
NASA Astrophysics Data System (ADS)
Sipos, Gergely; Turilli, Matteo; Newhouse, Steven; Kacsuk, Peter
2013-04-01
The European Grid Infrastructure (EGI) is the result of pioneering work that has, over the last decade, built a collaborative production infrastructure of uniform services through the federation of national resource providers that supports multi-disciplinary science across Europe and around the world. This presentation will provide an overview of the recently established 'federated cloud computing services' that the National Grid Initiatives (NGIs), operators of EGI, offer to scientific communities. The presentation will explain the technical capabilities of the 'EGI Federated Cloud' and the processes whereby earth and space science researchers can engage with it. EGI's resource centres have been providing services for collaborative, compute- and data-intensive applications for over a decade. Besides the well-established 'grid services', several NGIs already offer privately run cloud services to their national researchers. Many of these researchers recently expressed the need to share these cloud capabilities within their international research collaborations - a model similar to the way the grid emerged through the federation of institutional batch computing and file storage servers. To facilitate the setup of a pan-European cloud service from the NGIs' resources, the EGI-InSPIRE project established a Federated Cloud Task Force in September 2011. The Task Force has a mandate to identify and test technologies for a multinational federated cloud that could be provisioned within EGI by the NGIs. A guiding principle for the EGI Federated Cloud is to remain technology neutral and flexible for both resource providers and users: • Resource providers are allowed to use any cloud hypervisor and management technology to join virtualised resources into the EGI Federated Cloud as long as the site is subscribed to the user-facing interfaces selected by the EGI community. • Users can integrate high level services - such as brokers, portals and customised Virtual Research Environments - with the EGI Federated Cloud as long as these services access cloud resources through the user-facing interfaces selected by the EGI community. The Task Force will be closed in May 2013. It already • Identified key enabling technologies by which a multinational, federated 'Infrastructure as a Service' (IaaS) type cloud can be built from the NGIs' resources; • Deployed a test bed to evaluate the integration of virtualised resources within EGI and to engage with early adopter use cases from different scientific domains; • Integrated cloud resources into the EGI production infrastructure through cloud specific bindings of the EGI information system, monitoring system, authentication system, etc.; • Collected and catalogued requirements concerning the federated cloud services from the feedback of early adopter use cases; • Provided feedback and requirements to relevant technology providers on their implementations and worked with these providers to address those requirements; • Identified issues that need to be addressed by other areas of EGI (such as portal solutions, resource allocation policies, marketing and user support) to reach a production system. The Task Force will publish a blueprint in April 2013. The blueprint will drive the establishment of a production level EGI Federated Cloud service after May 2013.
Feeney, James M; Montgomery, Stephanie C; Wolf, Laura; Jayaraman, Vijay; Twohig, Michael
2016-09-01
Among transferred trauma patients, challenges with the transfer of radiographic studies include problems loading or viewing the studies at the receiving hospitals, and problems manipulating, reconstructing, or evalu- ating the transferred images. Cloud-based image transfer systems may address some ofthese problems. We reviewed the charts of patients trans- ferred during one year surrounding the adoption of a cloud computing data transfer system. We compared the rates of repeat imaging before (precloud) and af- ter (postcloud) the adoption of the cloud-based data transfer system. During the precloud period, 28 out of 100 patients required 90 repeat studies. With the cloud computing transfer system in place, three out of 134 patients required seven repeat films. There was a statistically significant decrease in the proportion of patients requiring repeat films (28% to 2.2%, P < .0001). Based on an annualized volume of 200 trauma patient transfers, the cost savings estimated using three methods of cost analysis, is between $30,272 and $192,453.
Legal issues in clouds: towards a risk inventory.
Djemame, Karim; Barnitzke, Benno; Corrales, Marcelo; Kiran, Mariam; Jiang, Ming; Armstrong, Django; Forgó, Nikolaus; Nwankwo, Iheanyi
2013-01-28
Cloud computing technologies have reached a high level of development, yet a number of obstacles still exist that must be overcome before widespread commercial adoption can become a reality. In a cloud environment, end users requesting services and cloud providers negotiate service-level agreements (SLAs) that provide explicit statements of all expectations and obligations of the participants. If cloud computing is to experience widespread commercial adoption, then incorporating risk assessment techniques is essential during SLA negotiation and service operation. This article focuses on the legal issues surrounding risk assessment in cloud computing. Specifically, it analyses risk regarding data protection and security, and presents the requirements of an inherent risk inventory. The usefulness of such a risk inventory is described in the context of the OPTIMIS project.
Use of cloud computing technology in natural hazard assessment and emergency management
NASA Astrophysics Data System (ADS)
Webley, P. W.; Dehn, J.
2015-12-01
During a natural hazard event, the most up-to-date data needs to be in the hands of those on the front line. Decision support system tools can be developed to provide access to pre-made outputs to quickly assess the hazard and potential risk. However, with the ever growing availability of new satellite data as well as ground and airborne data generated in real-time there is a need to analyze the large volumes of data in an easy-to-access and effective environment. With the growth in the use of cloud computing, where the analysis and visualization system can grow with the needs of the user, then these facilities can used to provide this real-time analysis. Think of a central command center uploading the data to the cloud compute system and then those researchers in-the-field connecting to a web-based tool to view the newly acquired data. New data can be added by any user and then viewed instantly by anyone else in the organization through the cloud computing interface. This provides the ideal tool for collaborative data analysis, hazard assessment and decision making. We present the rationale for developing a cloud computing systems and illustrate how this tool can be developed for use in real-time environments. Users would have access to an interactive online image analysis tool without the need for specific remote sensing software on their local system therefore increasing their understanding of the ongoing hazard and mitigate its impact on the surrounding region.
The JASMIN Cloud: specialised and hybrid to meet the needs of the Environmental Sciences Community
NASA Astrophysics Data System (ADS)
Kershaw, Philip; Lawrence, Bryan; Churchill, Jonathan; Pritchard, Matt
2014-05-01
Cloud computing provides enormous opportunities for the research community. The large public cloud providers provide near-limitless scaling capability. However, adapting Cloud to scientific workloads is not without its problems. The commodity nature of the public cloud infrastructure can be at odds with the specialist requirements of the research community. Issues such as trust, ownership of data, WAN bandwidth and costing models make additional barriers to more widespread adoption. Alongside the application of public cloud for scientific applications, a number of private cloud initiatives are underway in the research community of which the JASMIN Cloud is one example. Here, cloud service models are being effectively super-imposed over more established services such as data centres, compute cluster facilities and Grids. These have the potential to deliver the specialist infrastructure needed for the science community coupled with the benefits of a Cloud service model. The JASMIN facility based at the Rutherford Appleton Laboratory was established in 2012 to support the data analysis requirements of the climate and Earth Observation community. In its first year of operation, the 5PB of available storage capacity was filled and the hosted compute capability used extensively. JASMIN has modelled the concept of a centralised large-volume data analysis facility. Key characteristics have enabled success: peta-scale fast disk connected via low latency networks to compute resources and the use of virtualisation for effective management of the resources for a range of users. A second phase is now underway funded through NERC's (Natural Environment Research Council) Big Data initiative. This will see significant expansion to the resources available with a doubling of disk-based storage to 12PB and an increase of compute capacity by a factor of ten to over 3000 processing cores. This expansion is accompanied by a broadening in the scope for JASMIN, as a service available to the entire UK environmental science community. Experience with the first phase demonstrated the range of user needs. A trade-off is needed between access privileges to resources, flexibility of use and security. This has influenced the form and types of service under development for the new phase. JASMIN will deploy a specialised private cloud organised into "Managed" and "Unmanaged" components. In the Managed Cloud, users have direct access to the storage and compute resources for optimal performance but for reasons of security, via a more restrictive PaaS (Platform-as-a-Service) interface. The Unmanaged Cloud is deployed in an isolated part of the network but co-located with the rest of the infrastructure. This enables greater liberty to tenants - full IaaS (Infrastructure-as-a-Service) capability to provision customised infrastructure - whilst at the same time protecting more sensitive parts of the system from direct access using these elevated privileges. The private cloud will be augmented with cloud-bursting capability so that it can exploit the resources available from public clouds, making it effectively a hybrid solution. A single interface will overlay the functionality of both the private cloud and external interfaces to public cloud providers giving users the flexibility to migrate resources between infrastructures as requirements dictate.
A Cloud-Based Infrastructure for Near-Real-Time Processing and Dissemination of NPP Data
NASA Astrophysics Data System (ADS)
Evans, J. D.; Valente, E. G.; Chettri, S. S.
2011-12-01
We are building a scalable cloud-based infrastructure for generating and disseminating near-real-time data products from a variety of geospatial and meteorological data sources, including the new National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP). Our approach relies on linking Direct Broadcast and other data streams to a suite of scientific algorithms coordinated by NASA's International Polar-Orbiter Processing Package (IPOPP). The resulting data products are directly accessible to a wide variety of end-user applications, via industry-standard protocols such as OGC Web Services, Unidata Local Data Manager, or OPeNDAP, using open source software components. The processing chain employs on-demand computing resources from Amazon.com's Elastic Compute Cloud and NASA's Nebula cloud services. Our current prototype targets short-term weather forecasting, in collaboration with NASA's Short-term Prediction Research and Transition (SPoRT) program and the National Weather Service. Direct Broadcast is especially crucial for NPP, whose current ground segment is unlikely to deliver data quickly enough for short-term weather forecasters and other near-real-time users. Direct Broadcast also allows full local control over data handling, from the receiving antenna to end-user applications: this provides opportunities to streamline processes for data ingest, processing, and dissemination, and thus to make interpreted data products (Environmental Data Records) available to practitioners within minutes of data capture at the sensor. Cloud computing lets us grow and shrink computing resources to meet large and rapid fluctuations in data availability (twice daily for polar orbiters) - and similarly large fluctuations in demand from our target (near-real-time) users. This offers a compelling business case for cloud computing: the processing or dissemination systems can grow arbitrarily large to sustain near-real time data access despite surges in data volumes or user demand, but that computing capacity (and hourly costs) can be dropped almost instantly once the surge passes. Cloud computing also allows low-risk experimentation with a variety of machine architectures (processor types; bandwidth, memory, and storage capacities, etc.) and of system configurations (including massively parallel computing patterns). Finally, our service-based approach (in which user applications invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored products on demand. To maximize the usefulness and impact of our technology, we have emphasized open, industry-standard software interfaces. We are also using and developing open source software to facilitate the widespread adoption of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sources.
Analysis of the frontier technology of agricultural IoT and its predication research
NASA Astrophysics Data System (ADS)
Han, Shuqing; Zhang, Jianhua; Zhu, Mengshuai; Wu, Jianzhai; Shen, Chen; Kong, Fantao
2017-09-01
Agricultural IoT (Internet of Things) develops rapidly. Nanotechnology, biotechnology and optoelectronic technology are successfully integrated into the agricultural sensor technology. Big data, cloud computing and artificial intelligence technology have also been successfully used in IoT. This paper carries out the research on integration of agricultural sensor technology, nanotechnology, biotechnology and optoelectronic technology and the application of big data, cloud computing and artificial intelligence technology in agricultural IoT. The advantages and development of the integration of nanotechnology, biotechnology and optoelectronic technology with agricultural sensor technology were discussed. The application of big data, cloud computing and artificial intelligence technology in IoT and their development trend were analysed.
Unidata cyberinfrastructure in the cloud: A progress report
NASA Astrophysics Data System (ADS)
Ramamurthy, Mohan
2016-04-01
Data services, software, and committed support are critical components of geosciences cyber-infrastructure that can help scientists address problems of unprecedented complexity, scale, and scope. Unidata is currently working on innovative ideas, new paradigms, and novel techniques to complement and extend its offerings. Our goal is to empower users so that they can tackle major, heretofore difficult problems. Unidata recognizes that its products and services must evolve to support new approaches to research and education. After years of hype and ambiguity, cloud computing is maturing in usability in many areas of science and education, bringing the benefits of virtualized and elastic remote services to infrastructure, software, computation, and data. Cloud environments reduce the amount of time and money spent to procure, install, and maintain new hardware and software, and reduce costs through resource pooling and shared infrastructure. Cloud services aimed at providing any resource, at any time, from any place, using any device are increasingly being embraced by all types of organizations. Given this trend and the enormous potential of cloud-based services, Unidata is moving to augment its products, services, data delivery mechanisms and applications to align with the cloud-computing paradigm. To realize the above vision, Unidata is working toward: * Providing access to many types of data from a cloud (e.g., TDS, RAMADDA and EDEX); * Deploying data-proximate tools to easily process, analyze and visualize those data in a cloud environment cloud for consumption by any one, by any device, from anywhere, at any time; * Developing and providing a range of pre-configured and well-integrated tools and services that can be deployed by any university in their own private or public cloud settings. Specifically, Unidata has developed Docker for "containerized applications", making them easy to deploy. Docker helps to create "disposable" installs and eliminates many configuration challenges. Containerized applications include tools for data transport, access, analysis, and visualization: THREDDS Data Server, Integrated Data Viewer, GEMPAK, Local Data Manager, RAMADDA Data Server, and Python tools; * Fostering partnerships with NOAA and public cloud vendors (e.g., Amazon) to harness their capabilities and resources for the benefit of the academic community.
Towards Wearable Cognitive Assistance
2013-12-01
ABSTRACT unclassified c. THIS PAGE unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Keywords: mobile computing, cloud...It presents a muli-tiered mobile system architecture that offers tight end-to-end latency bounds on compute-intensive cognitive assistance...to an entire neighborhood or an entire city is extremely expensive and time-consuming. Physical infrastructure in public spaces tends to evolve very
Galaxy CloudMan: delivering cloud compute clusters
2010-01-01
Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983
NASA Technical Reports Server (NTRS)
Suomi, V. E.
1975-01-01
The complete output of the Synchronous Meteorological Satellite was recorded on one inch magnetic tape. A quality control subsystem tests cloud track vectors against four sets of criteria: (1) rejection if best match occurs on correlation boundary; (2) rejection if major correlation peak is not distinct and significantly greater than secondary peak; (3) rejection if correlation is not persistent; and (4) rejection if acceleration is too great. A cloud height program determines cloud optical thickness from visible data and computer infrared emissivity. From infrared data and temperature profile, cloud height is determined. A functional description and electronic schematics of equipment are given.
Cloudbursting - Solving the 3-body problem
NASA Astrophysics Data System (ADS)
Chang, G.; Heistand, S.; Vakhnin, A.; Huang, T.; Zimdars, P.; Hua, H.; Hood, R.; Koenig, J.; Mehrotra, P.; Little, M. M.; Law, E.
2014-12-01
Many science projects in the future will be accomplished through collaboration among 2 or more NASA centers along with, potentially, external scientists. Science teams will be composed of more geographically dispersed individuals and groups. However, the current computing environment does not make this easy and seamless. By being able to share computing resources among members of a multi-center team working on a science/ engineering project, limited pre-competition funds could be more efficiently applied and technical work could be conducted more effectively with less time spent moving data or waiting for computing resources to free up. Based on the work from an NASA CIO IT Labs task, this presentation will highlight our prototype work in identifying the feasibility and identify the obstacles, both technical and management, to perform "Cloudbursting" among private clouds located at three different centers. We will demonstrate the use of private cloud computing infrastructure at the Jet Propulsion Laboratory, Langley Research Center, and Ames Research Center to provide elastic computation to each other to perform parallel Earth Science data imaging. We leverage elastic load balancing and auto-scaling features at each data center so that each location can independently define how many resources to allocate to a particular job that was "bursted" from another data center and demonstrate that compute capacity scales up and down with the job. We will also discuss future work in the area, which could include the use of cloud infrastructure from different cloud framework providers as well as other cloud service providers.
Lightweight Data Systems in the Cloud: Costs, Benefits and Best Practices
NASA Astrophysics Data System (ADS)
Fatland, R.; Arendt, A. A.; Howe, B.; Hess, N. J.; Futrelle, J.
2015-12-01
We present here a simple analysis of both the cost and the benefit of using the cloud in environmental science circa 2016. We present this set of ideas to enable the potential 'cloud adopter' research scientist to explore and understand the tradeoffs in moving some aspect of their compute work to the cloud. We present examples, design patterns and best practices as an evolving body of knowledge that help optimize benefit to the research team. Thematically this generally means not starting from a blank page but rather learning how to find 90% of the solution to a problem pre-built. We will touch on four topics of interest. (1) Existing cloud data resources (NASA, WHOI BCO DMO, etc) and how they can be discovered, used and improved. (2) How to explore, compare and evaluate cost and compute power from many cloud options, particularly in relation to data scale (size/complexity). (3) What are simple / fast 'Lightweight Data System' procedures that take from 20 minutes to one day to implement and that have a clear immediate payoff in environmental data-driven research. Examples include publishing a SQL Share URL at (EarthCube's) CINERGI as a registered data resource and creating executable papers on a cloud-hosted Jupyter instance, particularly iPython notebooks. (4) Translating the computational terminology landscape ('cloud', 'HPC cluster', 'hadoop', 'spark', 'machine learning') into examples from the community of practice to help the geoscientist build or expand their mental map. In the course of this discussion -- which is about resource discovery, adoption and mastery -- we provide direction to online resources in support of these themes.
An Eight-Month Sample of Marine Stratocumulus Cloud Fraction, Albedo, and Integrated Liquid Water.
NASA Astrophysics Data System (ADS)
Fairall, C. W.; Hare, J. E.; Snider, J. B.
1990-08-01
As part of the First International Satellite Cloud Climatology Regional Experiment (FIRE), a surface meteorology and shortwave/longwave irradiance station was operated in a marine stratocumulus regime on the northwest tip of San Nicolas island off the coast of Southern California. Measurements were taken from March through October 1987, including a FIRE Intensive Field Operation (IFO) held in July. Algorithms were developed to use the longwave irradiance data to estimate fractional cloudiness and to use the shortwave irradiance to estimate cloud albedo and integrated cloud liquid water content. Cloud base height is estimated from computations of the lifting condensation level. The algorithms are tested against direct measurements made during the IFO; a 30% adjustment was made to the liquid water parameterization. The algorithms are then applied to the entire database. The stratocumulus clouds over the island are found to have a cloud base height of about 400 m, an integrated liquid water content of 75 gm2, a fractional cloudiness of 0.95, and an albedo of 0.55. Integrated liquid water content rarely exceeds 350 g m2 and albedo rarely exceeds 0.90 for stratocumulus clouds. Over the summer months, the average cloud fraction shows a maximum at sunrise of 0.74 and a minimum at sunset of 0.41. Over the same period, the average cloud albedo shows a maximum of 0.61 at sunrise and a minimum of 0.31 a few hours after local noon (although the estimate is more uncertain because of the extreme solar zenith angle). The use of joint frequency distributions of fractional cloudiness with solar transmittance or cloud base height to classify cloud types appears to be useful.
NASA Astrophysics Data System (ADS)
Kancírová, M.; Kudela, K.; Erlykin, A. D.; Wolfendale, A. W.
2016-10-01
A detailed analysis has been made based on annual meteorological and cosmic ray data from the Lomnicky stit mountain observatory (LS, 2634 masl; 49.40°N, 20.22°E; vertical cut-off rigidity 3.85 GV), from the standpoint of looking for possible solar cycle (including cosmic ray) manifestations. A comparison of the mountain data with the Global average for the cloud cover in general shows no correlation but there is a possible small correlation for low clouds (LCC in the Global satellite data). However, whereas it cannot be claimed that cloud cover observed at Lomnicky stit (LSCC) can be used directly as a proxy for the Global LCC, its examination has value because it is an independent estimate of cloud cover and one that has a different altitude weighting to that adopted in the satellite-derived LCC. This statement is derived from satellite data (http://isccp.giss.nasa.gov/climanal7.html) which shows the time series for the period 1983-2010 for 9 cloud regimes. There is a significant correlation only between cosmic ray (CR) intensity (and sunspot number (SSN)) and the cloud cover of the types cirrus and stratus. This effect is mainly confined to the CR intensity minimum during the epoch around 1990, when the SSN was at its maximum. This fact, together with the present study of the correlation of LSCC with our measured CR intensity, shows that there is no firm evidence for a significant contribution of CR induced ionization to the local (or, indeed, Global) cloud cover. Pressure effects are the preferred cause of the cloud cover changes. A consequence is that there is no evidence favouring a contribution of CR to the Global Warming problem. Our analysis shows that the LS data are consistent with the Gas Laws for a stable mass of atmosphere.
NASA Technical Reports Server (NTRS)
Zhang, Zhibo; Meyer, Kerry G.; Platnick, Steven; Oreopoulos, Lazaros; Lee, Dongmin; Yu, Hongbin
2014-01-01
This paper describes an efficient and unique method for computing the shortwave direct radiative effect (DRE) of aerosol residing above low-level liquid-phase clouds using CALIOP and MODIS data. It addresses the overlap of aerosol and cloud rigorously by utilizing the joint histogram of cloud optical depth and cloud top pressure while also accounting for subgrid-scale variations of aerosols. The method is computationally efficient because of its use of grid-level cloud and aerosol statistics, instead of pixel-level products, and a pre-computed look-up table based on radiative transfer calculations. We verify that for smoke over the southeast Atlantic Ocean the method yields a seasonal mean instantaneous (approximately 1:30PM local time) shortwave DRE of above cloud aerosol (ACA) that generally agrees with more rigorous pixel-level computation within 4 percent. We also estimate the impact of potential CALIOP aerosol optical depth (AOD) retrieval bias of ACA on DRE. We find that the regional and seasonal mean instantaneous DRE of ACA over southeast Atlantic Ocean would increase, from the original value of 6.4 W m(-2) based on operational CALIOP AOD to 9.6 W m(-2) if CALIOP AOD retrieval are biased low by a factor of 1.5 (Meyer et al., 2013) and further to 30.9 W m(-2) if CALIOP AOD retrieval are biased low by a factor of 5 as suggested in (Jethva et al., 2014). In contrast, the instantaneous ACA radiative forcing efficiency (RFE) remains relatively invariant in all cases at about 53 W m(-2) AOD(-1), suggesting a near linear relation between the instantaneous RFE and AOD. We also compute the annual mean instantaneous shortwave DRE of light-absorbing aerosols (i.e., smoke and polluted dust) over global oceans based on 4 years of CALIOP and MODIS data. We find that the variability of the annual mean shortwave DRE of above-cloud light-absorbing aerosol is mainly driven by the optical depth of the underlying clouds. While we demonstrate our method using CALIOP and MODIS data, it can also be extended to other satellite data sets, as well as climate model outputs.
Quantitative Investigation of the Technologies That Support Cloud Computing
ERIC Educational Resources Information Center
Hu, Wenjin
2014-01-01
Cloud computing is dramatically shaping modern IT infrastructure. It virtualizes computing resources, provides elastic scalability, serves as a pay-as-you-use utility, simplifies the IT administrators' daily tasks, enhances the mobility and collaboration of data, and increases user productivity. We focus on providing generalized black-box…
Data Center Consolidation: A Step towards Infrastructure Clouds
NASA Astrophysics Data System (ADS)
Winter, Markus
Application service providers face enormous challenges and rising costs in managing and operating a growing number of heterogeneous system and computing landscapes. Limitations of traditional computing environments force IT decision-makers to reorganize computing resources within the data center, as continuous growth leads to an inefficient utilization of the underlying hardware infrastructure. This paper discusses a way for infrastructure providers to improve data center operations based on the findings of a case study on resource utilization of very large business applications and presents an outlook beyond server consolidation endeavors, transforming corporate data centers into compute clouds.
NASA Astrophysics Data System (ADS)
Furht, Borko
In the introductory chapter we define the concept of cloud computing and cloud services, and we introduce layers and types of cloud computing. We discuss the differences between cloud computing and cloud services. New technologies that enabled cloud computing are presented next. We also discuss cloud computing features, standards, and security issues. We introduce the key cloud computing platforms, their vendors, and their offerings. We discuss cloud computing challenges and the future of cloud computing.
ERIC Educational Resources Information Center
Anshari, Muhammad; Alas, Yabit; Guan, Lim Sei
2016-01-01
Utilizing online learning resources (OLR) from multi channels in learning activities promise extended benefits from traditional based learning-centred to a collaborative based learning-centred that emphasises pervasive learning anywhere and anytime. While compiling big data, cloud computing, and semantic web into OLR offer a broader spectrum of…
Biomedical cloud computing with Amazon Web Services.
Fusaro, Vincent A; Patil, Prasad; Gafni, Erik; Wall, Dennis P; Tonellato, Peter J
2011-08-01
In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.
The Awareness and Challenges of Cloud Computing Adoption on Tertiary Education in Malaysia
NASA Astrophysics Data System (ADS)
Hazreeni Hamzah, Nor; Mahmud, Maziah; Zukri, Shamsunarnie Mohamed; Yaacob, Wan Fairos Wan; Yacob, Jusoh
2017-09-01
This preliminary study aims to investigate the awareness of the adoption of cloud computing among the academicians in tertiary education in Malaysia. Besides, this study also want to explore the possible challenges faced by the academician while adopting this new technology. The pilot study was done on 40 lecturers in Universiti Teknologi MARA Kampus Kota Bharu (UiTMKB) by using self administered questionnaire. The results found that almost half (40 percent) were not aware on the existing of cloud computing in teaching and learning (T&L) process. The challenges confronting the adoption of cloud computing are data insecurity, data insecurity, unsolicited advertisement, lock-in, reluctance to eliminate staff positions, privacy concerns, reliability challenge, regulatory compliance concerns/user control and institutional culture/resistance to change in technology. This possible challenges can be factorized in two major factors which were security and dependency factor and user control and mentality factor.
Enabling a Scientific Cloud Marketplace: VGL (Invited)
NASA Astrophysics Data System (ADS)
Fraser, R.; Woodcock, R.; Wyborn, L. A.; Vote, J.; Rankine, T.; Cox, S. J.
2013-12-01
The Virtual Geophysics Laboratory (VGL) provides a flexible, web based environment where researchers can browse data and use a variety of scientific software packaged into tool kits that run in the Cloud. Both data and tool kits are published by multiple researchers and registered with the VGL infrastructure forming a data and application marketplace. The VGL provides the basic work flow of Discovery and Access to the disparate data sources and a Library for tool kits and scripting to drive the scientific codes. Computation is then performed on the Research or Commercial Clouds. Provenance information is collected throughout the work flow and can be published alongside the results allowing for experiment comparison and sharing with other researchers. VGL's "mix and match" approach to data, computational resources and scientific codes, enables a dynamic approach to scientific collaboration. VGL allows scientists to publish their specific contribution, be it data, code, compute or work flow, knowing the VGL framework will provide other components needed for a complete application. Other scientists can choose the pieces that suit them best to assemble an experiment. The coarse grain workflow of the VGL framework combined with the flexibility of the scripting library and computational toolkits allows for significant customisation and sharing amongst the community. The VGL utilises the cloud computational and storage resources from the Australian academic research cloud provided by the NeCTAR initiative and a large variety of data accessible from national and state agencies via the Spatial Information Services Stack (SISS - http://siss.auscope.org). VGL v1.2 screenshot - http://vgl.auscope.org
Center for Technology for Advanced Scientific Componet Software (TASCS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Govindaraju, Madhusudhan
Advanced Scientific Computing Research Computer Science FY 2010Report Center for Technology for Advanced Scientific Component Software: Distributed CCA State University of New York, Binghamton, NY, 13902 Summary The overall objective of Binghamton's involvement is to work on enhancements of the CCA environment, motivated by the applications and research initiatives discussed in the proposal. This year we are working on re-focusing our design and development efforts to develop proof-of-concept implementations that have the potential to significantly impact scientific components. We worked on developing parallel implementations for non-hydrostatic code and worked on a model coupling interface for biogeochemical computations coded in MATLAB.more » We also worked on the design and implementation modules that will be required for the emerging MapReduce model to be effective for scientific applications. Finally, we focused on optimizing the processing of scientific datasets on multi-core processors. Research Details We worked on the following research projects that we are working on applying to CCA-based scientific applications. 1. Non-Hydrostatic Hydrodynamics: Non-static hydrodynamics are significantly more accurate at modeling internal waves that may be important in lake ecosystems. Non-hydrostatic codes, however, are significantly more computationally expensive, often prohibitively so. We have worked with Chin Wu at the University of Wisconsin to parallelize non-hydrostatic code. We have obtained a speed up of about 26 times maximum. Although this is significant progress, we hope to improve the performance further, such that it becomes a practical alternative to hydrostatic codes. 2. Model-coupling for water-based ecosystems: To answer pressing questions about water resources requires that physical models (hydrodynamics) be coupled with biological and chemical models. Most hydrodynamics codes are written in Fortran, however, while most ecologists work in MATLAB. This disconnect creates a great barrier. To address this, we are working on a model coupling interface that will allow biogeochemical computations written in MATLAB to couple with Fortran codes. This will greatly improve the productivity of ecosystem scientists. 2. Low overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications: Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive distributed applications. MapReduce however carries its own burdens. Through experiments using Hadoop in the context of diverse applications, we uncovered latencies and delay conditions potentially inhibiting the expected performance of a parallel execution in CPU-intensive applications. Furthermore, as it currently stands, MapReduce is favored for data-centric applications, and as such tends to be solely applied to disk-based applications. The paradigm, falls short in bringing its novelty to diskless systems dedicated to in-memory applications, and compute intensive programs processing much smaller data, but requiring intensive computations. In this project, we focused both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. We designed LEMO-MR [1], a Low overhead, elastic, configurable for in- memory applications, and on-demand fault tolerance, an optimized implementation of MapReduce, for both on disk and in memory applications. We conducted experiments to identify not only the necessary components of this model, but also trade offs and factors to be considered. We have initial results to show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. We have quantified the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. 3. Cache Performance Optimization for Processing XML and HDF-based Application Data on Multi-core Processors: It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors. Implementations of scientific middleware and applications that do not adapt to the programming paradigm when executing on emerging processors can severely impact the overall performance. In this project, we focused on the utilization of the L2 cache, which is a critical shared resource on chip multiprocessors (CMP). The access pattern of the shared L2 cache, which is dependent on how the application schedules and assigns processing work to each thread, can either enhance or hurt the ability to hide memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to inform scheduling decisions in multi-threaded programming. In this project, using the TAU toolkit for performance feedback from dual- and quad-core machines, we conducted performance analysis and recommendations on how processing threads can be scheduled on multi-core nodes to enhance the performance of a class of scientific applications that requires processing of HDF5 data. In particular, we quantified the gains associated with the use of the adaptations we have made to the Cache-Affinity and Balanced-Set scheduling algorithms to improve L2 cache performance, and hence the overall application execution time [2]. References: 1. Zacharia Fadika, Madhusudhan Govindaraju, ``MapReduce Implementation for Memory-Based and Processing Intensive Applications'', accepted in 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA, Nov 30 - Dec 3, 2010. 2. Rajdeep Bhowmik, Madhusudhan Govindaraju, ``Cache Performance Optimization for Processing XML-based Application Data on Multi-core Processors'', in proceedings of The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 17-20, 2010, Melbourne, Victoria, Australia. Contact Information: Madhusudhan Govindaraju Binghamton University State University of New York (SUNY) mgovinda@cs.binghamton.edu Phone: 607-777-4904« less
What CFOs should know before venturing into the cloud.
Rajendran, Janakan
2013-05-01
There are three major trends in the use of cloud-based services for healthcare IT: Cloud computing involves the hosting of health IT applications in a service provider cloud. Cloud storage is a data storage service that can involve, for example, long-term storage and archival of information such as clinical data, medical images, and scanned documents. Data center colocation involves rental of secure space in the cloud from a vendor, an approach that allows a hospital to share power capacity and proven security protocols, reducing costs.
On Study of Application of Big Data and Cloud Computing Technology in Smart Campus
NASA Astrophysics Data System (ADS)
Tang, Zijiao
2017-12-01
We live in an era of network and information, which means we produce and face a lot of data every day, however it is not easy for database in the traditional meaning to better store, process and analyze the mass data, therefore the big data was born at the right moment. Meanwhile, the development and operation of big data rest with cloud computing which provides sufficient space and resources available to process and analyze data of big data technology. Nowadays, the proposal of smart campus construction aims at improving the process of building information in colleges and universities, therefore it is necessary to consider combining big data technology and cloud computing technology into construction of smart campus to make campus database system and campus management system mutually combined rather than isolated, and to serve smart campus construction through integrating, storing, processing and analyzing mass data.
Secure Genomic Computation through Site-Wise Encryption
Zhao, Yongan; Wang, XiaoFeng; Tang, Haixu
2015-01-01
Commercial clouds provide on-demand IT services for big-data analysis, which have become an attractive option for users who have no access to comparable infrastructure. However, utilizing these services for human genome analysis is highly risky, as human genomic data contains identifiable information of human individuals and their disease susceptibility. Therefore, currently, no computation on personal human genomic data is conducted on public clouds. To address this issue, here we present a site-wise encryption approach to encrypt whole human genome sequences, which can be subject to secure searching of genomic signatures on public clouds. We implemented this method within the Hadoop framework, and tested it on the case of searching disease markers retrieved from the ClinVar database against patients’ genomic sequences. The secure search runs only one order of magnitude slower than the simple search without encryption, indicating our method is ready to be used for secure genomic computation on public clouds. PMID:26306278
Secure Genomic Computation through Site-Wise Encryption.
Zhao, Yongan; Wang, XiaoFeng; Tang, Haixu
2015-01-01
Commercial clouds provide on-demand IT services for big-data analysis, which have become an attractive option for users who have no access to comparable infrastructure. However, utilizing these services for human genome analysis is highly risky, as human genomic data contains identifiable information of human individuals and their disease susceptibility. Therefore, currently, no computation on personal human genomic data is conducted on public clouds. To address this issue, here we present a site-wise encryption approach to encrypt whole human genome sequences, which can be subject to secure searching of genomic signatures on public clouds. We implemented this method within the Hadoop framework, and tested it on the case of searching disease markers retrieved from the ClinVar database against patients' genomic sequences. The secure search runs only one order of magnitude slower than the simple search without encryption, indicating our method is ready to be used for secure genomic computation on public clouds.
Midekisa, Alemayehu; Holl, Felix; Savory, David J; Andrade-Pacheco, Ricardo; Gething, Peter W; Bennett, Adam; Sturrock, Hugh J W
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth's land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources.
Holl, Felix; Savory, David J.; Andrade-Pacheco, Ricardo; Gething, Peter W.; Bennett, Adam; Sturrock, Hugh J. W.
2017-01-01
Quantifying and monitoring the spatial and temporal dynamics of the global land cover is critical for better understanding many of the Earth’s land surface processes. However, the lack of regularly updated, continental-scale, and high spatial resolution (30 m) land cover data limit our ability to better understand the spatial extent and the temporal dynamics of land surface changes. Despite the free availability of high spatial resolution Landsat satellite data, continental-scale land cover mapping using high resolution Landsat satellite data was not feasible until now due to the need for high-performance computing to store, process, and analyze this large volume of high resolution satellite data. In this study, we present an approach to quantify continental land cover and impervious surface changes over a long period of time (15 years) using high resolution Landsat satellite observations and Google Earth Engine cloud computing platform. The approach applied here to overcome the computational challenges of handling big earth observation data by using cloud computing can help scientists and practitioners who lack high-performance computational resources. PMID:28953943
Advanced cloud fault tolerance system
NASA Astrophysics Data System (ADS)
Sumangali, K.; Benny, Niketa
2017-11-01
Cloud computing has become a prevalent on-demand service on the internet to store, manage and process data. A pitfall that accompanies cloud computing is the failures that can be encountered in the cloud. To overcome these failures, we require a fault tolerance mechanism to abstract faults from users. We have proposed a fault tolerant architecture, which is a combination of proactive and reactive fault tolerance. This architecture essentially increases the reliability and the availability of the cloud. In the future, we would like to compare evaluations of our proposed architecture with existing architectures and further improve it.
76 FR 17158 - Assumption Buster Workshop: Distributed Data Schemes Provide Security
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-28
... Schemes Provide Security''. Distributed data architectures, such as cloud computing, offer very attractive... locating your data in the cloud, and by breaking it up and replicating different segments throughout the...
High-Resiliency and Auto-Scaling of Large-Scale Cloud Computing for OCO-2 L2 Full Physics Processing
NASA Astrophysics Data System (ADS)
Hua, H.; Manipon, G.; Starch, M.; Dang, L. B.; Southam, P.; Wilson, B. D.; Avis, C.; Chang, A.; Cheng, C.; Smyth, M.; McDuffie, J. L.; Ramirez, P.
2015-12-01
Next generation science data systems are needed to address the incoming flood of data from new missions such as SWOT and NISAR where data volumes and data throughput rates are order of magnitude larger than present day missions. Additionally, traditional means of procuring hardware on-premise are already limited due to facilities capacity constraints for these new missions. Existing missions, such as OCO-2, may also require high turn-around time for processing different science scenarios where on-premise and even traditional HPC computing environments may not meet the high processing needs. We present our experiences on deploying a hybrid-cloud computing science data system (HySDS) for the OCO-2 Science Computing Facility to support large-scale processing of their Level-2 full physics data products. We will explore optimization approaches to getting best performance out of hybrid-cloud computing as well as common issues that will arise when dealing with large-scale computing. Novel approaches were utilized to do processing on Amazon's spot market, which can potentially offer ~10X costs savings but with an unpredictable computing environment based on market forces. We will present how we enabled high-tolerance computing in order to achieve large-scale computing as well as operational cost savings.
Construction and application of Red5 cluster based on OpenStack
NASA Astrophysics Data System (ADS)
Wang, Jiaqing; Song, Jianxin
2017-08-01
With the application and development of cloud computing technology in various fields, the resource utilization rate of the data center has been improved obviously, and the system based on cloud computing platform has also improved the expansibility and stability. In the traditional way, Red5 cluster resource utilization is low and the system stability is poor. This paper uses cloud computing to efficiently calculate the resource allocation ability, and builds a Red5 server cluster based on OpenStack. Multimedia applications can be published to the Red5 cloud server cluster. The system achieves the flexible construction of computing resources, but also greatly improves the stability of the cluster and service efficiency.
Secure Skyline Queries on Cloud Platform.
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-04-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.
Federated and Cloud Enabled Resources for Data Management and Utilization
NASA Astrophysics Data System (ADS)
Rankin, R.; Gordon, M.; Potter, R. G.; Satchwill, B.
2011-12-01
The emergence of cloud computing over the past three years has led to a paradigm shift in how data can be managed, processed and made accessible. Building on the federated data management system offered through the Canadian Space Science Data Portal (www.cssdp.ca), we demonstrate how heterogeneous and geographically distributed data sets and modeling tools have been integrated to form a virtual data center and computational modeling platform that has services for data processing and visualization embedded within it. We also discuss positive and negative experiences in utilizing Eucalyptus and OpenStack cloud applications, and job scheduling facilitated by Condor and Star Cluster. We summarize our findings by demonstrating use of these technologies in the Cloud Enabled Space Weather Data Assimilation and Modeling Platform CESWP (www.ceswp.ca), which is funded through Canarie's (canarie.ca) Network Enabled Platforms program in Canada.
Exploring the Universe with WISE and Cloud Computing
NASA Technical Reports Server (NTRS)
Benford, Dominic J.
2011-01-01
WISE is a recently-completed astronomical survey mission that has imaged the entire sky in four infrared wavelength bands. The large quantity of science images returned consists of 2,776,922 individual snapshots in various locations in each band which, along with ancillary data, totals around 110TB of raw, uncompressed data. Making the most use of this data requires advanced computing resources. I will discuss some initial attempts in the use of cloud computing to make this large problem tractable.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure
NASA Astrophysics Data System (ADS)
Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei
2011-09-01
Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Using CloudSat and the A-Train to Estimate Tropical Cyclone Intensity in the Western North Pacific
2014-09-01
CloudSat System Data Flow (from Cooperative Institute for Research in the Atmosphere 2008...radar Department of Defense Data Processing Center European Centre for Medium-Range Weather Forecasts Earth observing system Earth observing... system data and information system Earth sciences systems pathfinder hierarchical data format moderate resolution imaging spectroradiometer moist
A Cloud-Computing Service for Environmental Geophysics and Seismic Data Processing
NASA Astrophysics Data System (ADS)
Heilmann, B. Z.; Maggi, P.; Piras, A.; Satta, G.; Deidda, G. P.; Bonomi, E.
2012-04-01
Cloud computing is establishing worldwide as a new high performance computing paradigm that offers formidable possibilities to industry and science. The presented cloud-computing portal, part of the Grida3 project, provides an innovative approach to seismic data processing by combining open-source state-of-the-art processing software and cloud-computing technology, making possible the effective use of distributed computation and data management with administratively distant resources. We substituted the user-side demanding hardware and software requirements by remote access to high-performance grid-computing facilities. As a result, data processing can be done quasi in real-time being ubiquitously controlled via Internet by a user-friendly web-browser interface. Besides the obvious advantages over locally installed seismic-processing packages, the presented cloud-computing solution creates completely new possibilities for scientific education, collaboration, and presentation of reproducible results. The web-browser interface of our portal is based on the commercially supported grid portal EnginFrame, an open framework based on Java, XML, and Web Services. We selected the hosted applications with the objective to allow the construction of typical 2D time-domain seismic-imaging workflows as used for environmental studies and, originally, for hydrocarbon exploration. For data visualization and pre-processing, we chose the free software package Seismic Un*x. We ported tools for trace balancing, amplitude gaining, muting, frequency filtering, dip filtering, deconvolution and rendering, with a customized choice of options as services onto the cloud-computing portal. For structural imaging and velocity-model building, we developed a grid version of the Common-Reflection-Surface stack, a data-driven imaging method that requires no user interaction at run time such as manual picking in prestack volumes or velocity spectra. Due to its high level of automation, CRS stacking can benefit largely from the hardware parallelism provided by the cloud deployment. The resulting output, post-stack section, coherence, and NMO-velocity panels are used to generate a smooth migration-velocity model. Residual static corrections are calculated as a by-product of the stack and can be applied iteratively. As a final step, a time migrated subsurface image is obtained by a parallelized Kirchhoff time migration scheme. Processing can be done step-by-step or using a graphical workflow editor that can launch a series of pipelined tasks. The status of the submitted jobs is monitored by a dedicated service. All results are stored in project directories, where they can be downloaded of viewed directly in the browser. Currently, the portal has access to three research clusters having a total number of 70 nodes with 4 cores each. They are shared with four other cloud-computing applications bundled within the GRIDA3 project. To demonstrate the functionality of our "seismic cloud lab", we will present results obtained for three different types of data, all taken from hydrogeophysical studies: (1) a seismic reflection data set, made of compressional waves from explosive sources, recorded in Muravera, Sardinia; (2) a shear-wave data set from, Sardinia; (3) a multi-offset Ground-Penetrating-Radar data set from Larreule, France. The presented work was funded by the government of the Autonomous Region of Sardinia and by the Italian Ministry of Research and Education.
Assessing the Need for Supercomputing Resources Within the Pacific Area of Responsibility
2015-05-26
portion of today’s research and development dollars are going toward developing machines that will be better suited for addressing big data applications...2009; Radu Sion, “To Cloud or Not to? Musings on Clouds, Security and Big Data ,” in Secure Data Management, Vol. 8425, May 2014, pp. 3–5; Yao Chen...Applied Parallel and Scientific Computing, Vol. 7134, 2010. Sion, Radu, “To Cloud or Not to? Musings on Clouds, Security and Big Data ,” in Secure Data
NASA Technical Reports Server (NTRS)
Vaughan, M. A.; Winker, D. M.
1994-01-01
Intensive cloud lidar observations have been made by NASA Langley Research Center during the two observation phases of the ECLIPS project. Less intensive but longer term observations have been conducted as part of the FIRE extended time observation (ETO) program since 1987. We present a preliminary analysis of the vertical distribution of clouds based on these observations. A mean cirrus thickness of just under 1 km has been observed with a mean altitude of about 80 percent of the tropopause height. Based on the lidar data, cirrus coverage was estimated to be just under 20 percent, representing roughly 50 percent of all clouds studied. Cirrus was observed to have less seasonal variation than lower clouds. Mid-level clouds are found to occur primarily in association with frontal activity.
NASA Astrophysics Data System (ADS)
Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.
2017-12-01
Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of exploiting large computational and storage resources of Cloud Computing platforms for large scale DInSAR analysis. The presented Cloud Computing P-SBAS processing chain can be a precious tool in the perspective of developing operational services disposable for the EO scientific community related to hazard monitoring and risk prevention and mitigation.
Qu, Wei-ping; Liu, Wen-qing; Liu, Jian-guo; Lu, Yi-huai; Zhu, Jun; Qin, Min; Liu, Cheng
2006-11-01
In satellite remote-sensing detection, cloud as an interference plays a negative role in data retrieval. How to discern the cloud fields with high fidelity thus comes as a need to the following research. A new method rooting in atmospheric radiation characteristics of cloud layer, in the present paper, presents a sort of solution where single-band brightness variance ratio is used to detect the relative intensity of cloud clutter so as to delineate cloud field rapidly and exactly, and the formulae of brightness variance ratio of satellite image, image reflectance variance ratio, and brightness temperature variance ratio of thermal infrared image are also given to enable cloud elimination to produce data free from cloud interference. According to the variance of the penetrating capability for different spectra bands, an objective evaluation is done on cloud penetration of them with the factors that influence penetration effect. Finally, a multi-band data fusion task is completed using the image data of infrared penetration from cirrus nothus. Image data reconstruction is of good quality and exactitude to show the real data of visible band covered by cloud fields. Statistics indicates the consistency of waveband relativity with image data after the data fusion.
Serving ocean model data on the cloud
Meisinger, Michael; Farcas, Claudiu; Farcas, Emilia; Alexander, Charles; Arrott, Matthew; de La Beaujardiere, Jeff; Hubbard, Paul; Mendelssohn, Roy; Signell, Richard P.
2010-01-01
The NOAA-led Integrated Ocean Observing System (IOOS) and the NSF-funded Ocean Observatories Initiative Cyberinfrastructure Project (OOI-CI) are collaborating on a prototype data delivery system for numerical model output and other gridded data using cloud computing. The strategy is to take an existing distributed system for delivering gridded data and redeploy on the cloud, making modifications to the system that allow it to harness the scalability of the cloud as well as adding functionality that the scalability affords.
A Mobility Management Using Follow-Me Cloud-Cloudlet in Fog-Computing-Based RANs for Smart Cities.
Chen, Yuh-Shyan; Tsai, Yi-Ting
2018-02-06
Mobility management for supporting the location tracking and location-based service (LBS) is an important issue of smart city by providing the means for the smooth transportation of people and goods. The mobility is useful to contribute the innovation in both public and private transportation infrastructures for smart cities. With the assistance of edge/fog computing, this paper presents a fully new mobility management using the proposed follow-me cloud-cloudlet (FMCL) approach in fog-computing-based radio access networks (Fog-RANs) for smart cities. The proposed follow-me cloud-cloudlet approach is an integration strategy of follow-me cloud (FMC) and follow-me edge (FME) (or called cloudlet). A user equipment (UE) receives the data, transmitted from original cloud, into the original edge cloud before the handover operation. After the handover operation, an UE searches for a new cloud, called as a migrated cloud, and a new edge cloud, called as a migrated edge cloud near to UE, where the remaining data is migrated from the original cloud to the migrated cloud and all the remaining data are received in the new edge cloud. Existing FMC results do not have the property of the VM migration between cloudlets for the purpose of reducing the transmission latency, and existing FME results do not keep the property of the service migration between data centers for reducing the transmission latency. Our proposed FMCL approach can simultaneously keep the VM migration between cloudlets and service migration between data centers to significantly reduce the transmission latency. The new proposed mobility management using FMCL approach aims to reduce the total transmission time if some data packets are pre-scheduled and pre-stored into the cache of cloudlet if UE is switching from the previous Fog-RAN to the serving Fog-RAN. To illustrate the performance achievement, the mathematical analysis and simulation results are examined in terms of the total transmission time, the throughput, the probability of packet loss, and the number of control messages.
A Mobility Management Using Follow-Me Cloud-Cloudlet in Fog-Computing-Based RANs for Smart Cities
Tsai, Yi-Ting
2018-01-01
Mobility management for supporting the location tracking and location-based service (LBS) is an important issue of smart city by providing the means for the smooth transportation of people and goods. The mobility is useful to contribute the innovation in both public and private transportation infrastructures for smart cities. With the assistance of edge/fog computing, this paper presents a fully new mobility management using the proposed follow-me cloud-cloudlet (FMCL) approach in fog-computing-based radio access networks (Fog-RANs) for smart cities. The proposed follow-me cloud-cloudlet approach is an integration strategy of follow-me cloud (FMC) and follow-me edge (FME) (or called cloudlet). A user equipment (UE) receives the data, transmitted from original cloud, into the original edge cloud before the handover operation. After the handover operation, an UE searches for a new cloud, called as a migrated cloud, and a new edge cloud, called as a migrated edge cloud near to UE, where the remaining data is migrated from the original cloud to the migrated cloud and all the remaining data are received in the new edge cloud. Existing FMC results do not have the property of the VM migration between cloudlets for the purpose of reducing the transmission latency, and existing FME results do not keep the property of the service migration between data centers for reducing the transmission latency. Our proposed FMCL approach can simultaneously keep the VM migration between cloudlets and service migration between data centers to significantly reduce the transmission latency. The new proposed mobility management using FMCL approach aims to reduce the total transmission time if some data packets are pre-scheduled and pre-stored into the cache of cloudlet if UE is switching from the previous Fog-RAN to the serving Fog-RAN. To illustrate the performance achievement, the mathematical analysis and simulation results are examined in terms of the total transmission time, the throughput, the probability of packet loss, and the number of control messages. PMID:29415510
Influence of Ice Cloud Microphysics on Imager-Based Estimates of Earth's Radiation Budget
NASA Astrophysics Data System (ADS)
Loeb, N. G.; Kato, S.; Minnis, P.; Yang, P.; Sun-Mack, S.; Rose, F. G.; Hong, G.; Ham, S. H.
2016-12-01
A central objective of the Clouds and the Earth's Radiant Energy System (CERES) is to produce a long-term global climate data record of Earth's radiation budget from the TOA down to the surface along with the associated atmospheric and surface properties that influence it. CERES relies on a number of data sources, including broadband radiometers measuring incoming and reflected solar radiation and OLR, high-resolution spectral imagers, meteorological, aerosol and ozone assimilation data, and snow/sea-ice maps based on microwave radiometer data. While the TOA radiation budget is largely determined directly from accurate broadband radiometer measurements, the surface radiation budget is derived indirectly through radiative transfer model calculations initialized using imager-based cloud and aerosol retrievals and meteorological assimilation data. Because ice cloud particles exhibit a wide range of shapes, sizes and habits that cannot be independently retrieved a priori from passive visible/infrared imager measurements, assumptions about the scattering properties of ice clouds are necessary in order to retrieve ice cloud optical properties (e.g., optical depth) from imager radiances and to compute broadband radiative fluxes. This presentation will examine how the choice of an ice cloud particle model impacts computed shortwave (SW) radiative fluxes at the top-of-atmosphere (TOA) and surface. The ice cloud particle models considered correspond to those from prior, current and future CERES data product versions. During the CERES Edition2 (and Edition3) processing, ice cloud particles were assumed to be smooth hexagonal columns. In the Edition4, roughened hexagonal columns are assumed. The CERES team is now working on implementing in a future version an ice cloud particle model comprised of a two-habit ice cloud model consisting of roughened hexagonal columns and aggregates of roughened columnar elements. In each case, we use the same ice particle model in both the imager-based cloud retrievals (inverse problem) and the computed radiative fluxes (forward calculation). In addition to comparing radiative fluxes using the different ice cloud particle models, we also compare instantaneous TOA flux calculations with those observed by the CERES instrument.
A Brief Analysis of Development Situations and Trend of Cloud Computing
NASA Astrophysics Data System (ADS)
Yang, Wenyan
2017-12-01
in recent years, the rapid development of Internet technology has radically changed people's work, learning and lifestyles. More and more activities are completed by virtue of computers and networks. The amount of information and data generated is bigger day by day, and people rely more on computer, which makes computing power of computer fail to meet demands of accuracy and rapidity from people. The cloud computing technology has experienced fast development, which is widely applied in the computer industry as a result of advantages of high precision, fast computing and easy usage. Moreover, it has become a focus in information research at present. In this paper, the development situations and trend of cloud computing shall be analyzed and researched.
NASA Astrophysics Data System (ADS)
Sánchez-Martínez, V.; Borges, G.; Borrego, C.; del Peso, J.; Delfino, M.; Gomes, J.; González de la Hoz, S.; Pacheco Pages, A.; Salt, J.; Sedov, A.; Villaplana, M.; Wolters, H.
2014-06-01
In this contribution we describe the performance of the Iberian (Spain and Portugal) ATLAS cloud during the first LHC running period (March 2010-January 2013) in the context of the GRID Computing and Data Distribution Model. The evolution of the resources for CPU, disk and tape in the Iberian Tier-1 and Tier-2s is summarized. The data distribution over all ATLAS destinations is shown, focusing on the number of files transferred and the size of the data. The status and distribution of simulation and analysis jobs within the cloud are discussed. The Distributed Analysis tools used to perform physics analysis are explained as well. Cloud performance in terms of the availability and reliability of its sites is discussed. The effect of the changes in the ATLAS Computing Model on the cloud is analyzed. Finally, the readiness of the Iberian Cloud towards the first Long Shutdown (LS1) is evaluated and an outline of the foreseen actions to take in the coming years is given. The shutdown will be a good opportunity to improve and evolve the ATLAS Distributed Computing system to prepare for the future challenges of the LHC operation.
2012-01-01
Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941
El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter
2012-01-01
Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.
Integration of High-Performance Computing into Cloud Computing Services
NASA Astrophysics Data System (ADS)
Vouk, Mladen A.; Sills, Eric; Dreher, Patrick
High-Performance Computing (HPC) projects span a spectrum of computer hardware implementations ranging from peta-flop supercomputers, high-end tera-flop facilities running a variety of operating systems and applications, to mid-range and smaller computational clusters used for HPC application development, pilot runs and prototype staging clusters. What they all have in common is that they operate as a stand-alone system rather than a scalable and shared user re-configurable resource. The advent of cloud computing has changed the traditional HPC implementation. In this article, we will discuss a very successful production-level architecture and policy framework for supporting HPC services within a more general cloud computing infrastructure. This integrated environment, called Virtual Computing Lab (VCL), has been operating at NC State since fall 2004. Nearly 8,500,000 HPC CPU-Hrs were delivered by this environment to NC State faculty and students during 2009. In addition, we present and discuss operational data that show that integration of HPC and non-HPC (or general VCL) services in a cloud can substantially reduce the cost of delivering cloud services (down to cents per CPU hour).
Cloud Computing for the Grid: GridControl: A Software Platform to Support the Smart Grid
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
GENI Project: Cornell University is creating a new software platform for grid operators called GridControl that will utilize cloud computing to more efficiently control the grid. In a cloud computing system, there are minimal hardware and software demands on users. The user can tap into a network of computers that is housed elsewhere (the cloud) and the network runs computer applications for the user. The user only needs interface software to access all of the cloud’s data resources, which can be as simple as a web browser. Cloud computing can reduce costs, facilitate innovation through sharing, empower users, and improvemore » the overall reliability of a dispersed system. Cornell’s GridControl will focus on 4 elements: delivering the state of the grid to users quickly and reliably; building networked, scalable grid-control software; tailoring services to emerging smart grid uses; and simulating smart grid behavior under various conditions.« less
An Elliptic Curve Based Schnorr Cloud Security Model in Distributed Environment
Muthurajan, Vinothkumar; Narayanasamy, Balaji
2016-01-01
Cloud computing requires the security upgrade in data transmission approaches. In general, key-based encryption/decryption (symmetric and asymmetric) mechanisms ensure the secure data transfer between the devices. The symmetric key mechanisms (pseudorandom function) provide minimum protection level compared to asymmetric key (RSA, AES, and ECC) schemes. The presence of expired content and the irrelevant resources cause unauthorized data access adversely. This paper investigates how the integrity and secure data transfer are improved based on the Elliptic Curve based Schnorr scheme. This paper proposes a virtual machine based cloud model with Hybrid Cloud Security Algorithm (HCSA) to remove the expired content. The HCSA-based auditing improves the malicious activity prediction during the data transfer. The duplication in the cloud server degrades the performance of EC-Schnorr based encryption schemes. This paper utilizes the blooming filter concept to avoid the cloud server duplication. The combination of EC-Schnorr and blooming filter efficiently improves the security performance. The comparative analysis between proposed HCSA and the existing Distributed Hash Table (DHT) regarding execution time, computational overhead, and auditing time with auditing requests and servers confirms the effectiveness of HCSA in the cloud security model creation. PMID:26981584
An Elliptic Curve Based Schnorr Cloud Security Model in Distributed Environment.
Muthurajan, Vinothkumar; Narayanasamy, Balaji
2016-01-01
Cloud computing requires the security upgrade in data transmission approaches. In general, key-based encryption/decryption (symmetric and asymmetric) mechanisms ensure the secure data transfer between the devices. The symmetric key mechanisms (pseudorandom function) provide minimum protection level compared to asymmetric key (RSA, AES, and ECC) schemes. The presence of expired content and the irrelevant resources cause unauthorized data access adversely. This paper investigates how the integrity and secure data transfer are improved based on the Elliptic Curve based Schnorr scheme. This paper proposes a virtual machine based cloud model with Hybrid Cloud Security Algorithm (HCSA) to remove the expired content. The HCSA-based auditing improves the malicious activity prediction during the data transfer. The duplication in the cloud server degrades the performance of EC-Schnorr based encryption schemes. This paper utilizes the blooming filter concept to avoid the cloud server duplication. The combination of EC-Schnorr and blooming filter efficiently improves the security performance. The comparative analysis between proposed HCSA and the existing Distributed Hash Table (DHT) regarding execution time, computational overhead, and auditing time with auditing requests and servers confirms the effectiveness of HCSA in the cloud security model creation.
NASA Technical Reports Server (NTRS)
Nguyen, Louis; Minnis, Patrick; Spangenberg, Douglas A.; Nordeen, Michele L.; Palikonda, Rabindra; Khaiyer, Mandana M.; Gultepe, Ismail; Reehorst, Andrew L.
2004-01-01
Satellites are ideal for continuous monitoring of aircraft icing conditions in many situations over extensive areas. The satellite imager data are used to diagnose a number of cloud properties that can be used to develop icing intensity indices. Developing and validating these indices requires comparison with objective "cloud truth" data in addition to conventional pilot reports (PIREPS) of icing conditions. Minnis et al. examined the relationships between PIREPS icing and satellite-derived cloud properties. The Atlantic-THORPEX Regional Campaign (ATReC) and the second Alliance Icing Research Study (AIRS-II) field programs were conducted over the northeastern USA and southeastern Canada during late 2003 and early 2004. The aircraft and surface measurements are concerned primarily with the icing characteristics of clouds and, thus, are ideal for providing some validation information for the satellite remote sensing product. This paper starts the process of comparing cloud properties and icing indices derived from the Geostationary Operational Environmental Satellite (GOES) with the aircraft in situ measurements of several cloud properties during campaigns and some of the The comparisons include cloud phase, particle size, icing intensity, base and top altitudes, temperatures, and liquid water path. The results of this study are crucial for developing a more reliable and objective icing product from satellite data. This icing product, currently being derived from GOES data over the USA, is an important complement to more conventional products based on forecasts, and PIREPS.
NASA Technical Reports Server (NTRS)
Ahamed, Aakash; Bolten, John; Doyle, Colin; Fayne, Jessica
2016-01-01
Floods are the costliest natural disaster, causing approximately 6.8 million deaths in the twentieth century alone. Worldwide economic flood damage estimates in 2012 exceed $19 Billion USD. Extended duration floods also pose longer term threats to food security, water, sanitation, hygiene, and community livelihoods, particularly in developing countries. Projections by the Intergovernmental Panel on Climate Change (IPCC) suggest that precipitation extremes, rainfall intensity, storm intensity, and variability are increasing due to climate change. Increasing hydrologic uncertainty will likely lead to unprecedented extreme flood events. As such, there is a vital need to enhance and further develop traditional techniques used to rapidly assess flooding and extend analytical methods to estimate impacted population and infrastructure. Measuring flood extent in situ is generally impractical, time consuming, and can be inaccurate. Remotely sensed imagery acquired from space-borne and airborne sensors provides a viable platform for consistent and rapid wall-to-wall monitoring of large flood events through time. Terabytes of freely available satellite imagery are made available online each day by NASA, ESA, and other international space research institutions. Advances in cloud computing and data storage technologies allow researchers to leverage these satellite data and apply analytical methods at scale. Repeat-survey earth observations help provide insight about how natural phenomena change through time, including the progression and recession of floodwaters. In recent years, cloud-penetrating radar remote sensing techniques (e.g., Synthetic Aperture Radar) and high temporal resolution imagery platforms (e.g., MODIS and its 1-day return period), along with high performance computing infrastructure, have enabled significant advances in software systems that provide flood warning, assessments, and hazard reduction potential. By incorporating social and economic data, researchers can develop systems that automatically quantify the socioeconomic impacts resulting from flood disaster events.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele
2014-11-11
has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
Klimentov, A.; Buncic, P.; De, K.; ...
2015-05-22
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klimentov, A.; Buncic, P.; De, K.
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
An adaptive process-based cloud infrastructure for space situational awareness applications
NASA Astrophysics Data System (ADS)
Liu, Bingwei; Chen, Yu; Shen, Dan; Chen, Genshe; Pham, Khanh; Blasch, Erik; Rubin, Bruce
2014-06-01
Space situational awareness (SSA) and defense space control capabilities are top priorities for groups that own or operate man-made spacecraft. Also, with the growing amount of space debris, there is an increase in demand for contextual understanding that necessitates the capability of collecting and processing a vast amount sensor data. Cloud computing, which features scalable and flexible storage and computing services, has been recognized as an ideal candidate that can meet the large data contextual challenges as needed by SSA. Cloud computing consists of physical service providers and middleware virtual machines together with infrastructure, platform, and software as service (IaaS, PaaS, SaaS) models. However, the typical Virtual Machine (VM) abstraction is on a per operating systems basis, which is at too low-level and limits the flexibility of a mission application architecture. In responding to this technical challenge, a novel adaptive process based cloud infrastructure for SSA applications is proposed in this paper. In addition, the details for the design rationale and a prototype is further examined. The SSA Cloud (SSAC) conceptual capability will potentially support space situation monitoring and tracking, object identification, and threat assessment. Lastly, the benefits of a more granular and flexible cloud computing resources allocation are illustrated for data processing and implementation considerations within a representative SSA system environment. We show that the container-based virtualization performs better than hypervisor-based virtualization technology in an SSA scenario.
A Novel Cost Based Model for Energy Consumption in Cloud Computing
Horri, A.; Dastghaibyfard, Gh.
2015-01-01
Cloud data centers consume enormous amounts of electrical energy. To support green cloud computing, providers also need to minimize cloud infrastructure energy consumption while conducting the QoS. In this study, for cloud environments an energy consumption model is proposed for time-shared policy in virtualization layer. The cost and energy usage of time-shared policy were modeled in the CloudSim simulator based upon the results obtained from the real system and then proposed model was evaluated by different scenarios. In the proposed model, the cache interference costs were considered. These costs were based upon the size of data. The proposed model was implemented in the CloudSim simulator and the related simulation results indicate that the energy consumption may be considerable and that it can vary with different parameters such as the quantum parameter, data size, and the number of VMs on a host. Measured results validate the model and demonstrate that there is a tradeoff between energy consumption and QoS in the cloud environment. Also, measured results validate the model and demonstrate that there is a tradeoff between energy consumption and QoS in the cloud environment. PMID:25705716
A novel cost based model for energy consumption in cloud computing.
Horri, A; Dastghaibyfard, Gh
2015-01-01
Cloud data centers consume enormous amounts of electrical energy. To support green cloud computing, providers also need to minimize cloud infrastructure energy consumption while conducting the QoS. In this study, for cloud environments an energy consumption model is proposed for time-shared policy in virtualization layer. The cost and energy usage of time-shared policy were modeled in the CloudSim simulator based upon the results obtained from the real system and then proposed model was evaluated by different scenarios. In the proposed model, the cache interference costs were considered. These costs were based upon the size of data. The proposed model was implemented in the CloudSim simulator and the related simulation results indicate that the energy consumption may be considerable and that it can vary with different parameters such as the quantum parameter, data size, and the number of VMs on a host. Measured results validate the model and demonstrate that there is a tradeoff between energy consumption and QoS in the cloud environment. Also, measured results validate the model and demonstrate that there is a tradeoff between energy consumption and QoS in the cloud environment.
Jade: using on-demand cloud analysis to give scientists back their flow
NASA Astrophysics Data System (ADS)
Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.
2017-12-01
The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.
Towards a Multi-Mission, Airborne Science Data System Environment
NASA Astrophysics Data System (ADS)
Crichton, D. J.; Hardman, S.; Law, E.; Freeborn, D.; Kay-Im, E.; Lau, G.; Oswald, J.
2011-12-01
NASA earth science instruments are increasingly relying on airborne missions. However, traditionally, there has been limited common infrastructure support available to principal investigators in the area of science data systems. As a result, each investigator has been required to develop their own computing infrastructures for the science data system. Typically there is little software reuse and many projects lack sufficient resources to provide a robust infrastructure to capture, process, distribute and archive the observations acquired from airborne flights. At NASA's Jet Propulsion Laboratory (JPL), we have been developing a multi-mission data system infrastructure for airborne instruments called the Airborne Cloud Computing Environment (ACCE). ACCE encompasses the end-to-end lifecycle covering planning, provisioning of data system capabilities, and support for scientific analysis in order to improve the quality, cost effectiveness, and capabilities to enable new scientific discovery and research in earth observation. This includes improving data system interoperability across each instrument. A principal characteristic is being able to provide an agile infrastructure that is architected to allow for a variety of configurations of the infrastructure from locally installed compute and storage services to provisioning those services via the "cloud" from cloud computer vendors such as Amazon.com. Investigators often have different needs that require a flexible configuration. The data system infrastructure is built on the Apache's Object Oriented Data Technology (OODT) suite of components which has been used for a number of spaceborne missions and provides a rich set of open source software components and services for constructing science processing and data management systems. In 2010, a partnership was formed between the ACCE team and the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to support the data processing and data management needs. A principal goal is to provide support for the Fourier Transform Spectrometer (FTS) instrument which will produce over 700,000 soundings over the life of their three-year mission. The cost to purchase and operate a cluster-based system in order to generate Level 2 Full Physics products from this data was prohibitive. Through an evaluation of cloud computing solutions, Amazon's Elastic Compute Cloud (EC2) was selected for the CARVE deployment. As the ACCE infrastructure is developed and extended to form an infrastructure for airborne missions, the experience of working with CARVE has provided a number of lessons learned and has proven to be important in reinforcing the unique aspects of airborne missions and the importance of the ACCE infrastructure in developing a cost effective, flexible multi-mission capability that leverages emerging capabilities in cloud computing, workflow management, and distributed computing.
NASA Astrophysics Data System (ADS)
Aldeen Yousra, S.; Mazleena, Salleh
2018-05-01
Recent advancement in Information and Communication Technologies (ICT) demanded much of cloud services to sharing users’ private data. Data from various organizations are the vital information source for analysis and research. Generally, this sensitive or private data information involves medical, census, voter registration, social network, and customer services. Primary concern of cloud service providers in data publishing is to hide the sensitive information of individuals. One of the cloud services that fulfill the confidentiality concerns is Privacy Preserving Data Mining (PPDM). The PPDM service in Cloud Computing (CC) enables data publishing with minimized distortion and absolute privacy. In this method, datasets are anonymized via generalization to accomplish the privacy requirements. However, the well-known privacy preserving data mining technique called K-anonymity suffers from several limitations. To surmount those shortcomings, I propose a new heuristic anonymization framework for preserving the privacy of sensitive datasets when publishing on cloud. The advantages of K-anonymity, L-diversity and (α, k)-anonymity methods for efficient information utilization and privacy protection are emphasized. Experimental results revealed the superiority and outperformance of the developed technique than K-anonymity, L-diversity, and (α, k)-anonymity measure.
GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data
NASA Astrophysics Data System (ADS)
Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.
2016-12-01
Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.
A resource-sharing model based on a repeated game in fog computing.
Sun, Yan; Zhang, Nan
2017-03-01
With the rapid development of cloud computing techniques, the number of users is undergoing exponential growth. It is difficult for traditional data centers to perform many tasks in real time because of the limited bandwidth of resources. The concept of fog computing is proposed to support traditional cloud computing and to provide cloud services. In fog computing, the resource pool is composed of sporadic distributed resources that are more flexible and movable than a traditional data center. In this paper, we propose a fog computing structure and present a crowd-funding algorithm to integrate spare resources in the network. Furthermore, to encourage more resource owners to share their resources with the resource pool and to supervise the resource supporters as they actively perform their tasks, we propose an incentive mechanism in our algorithm. Simulation results show that our proposed incentive mechanism can effectively reduce the SLA violation rate and accelerate the completion of tasks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nicolae, Bogdan; Riteau, Pierre; Keahey, Kate
Storage elasticity on IaaS clouds is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. This paper provides a transparent solution that automatically boosts I/O bandwidth during peaks for underlying virtual disks, effectively avoiding over-provisioning without performance loss. The authors' proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and thus more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored. Furthermore, they introduce a performance and cost prediction methodology that can be used both independently tomore » estimate in advance what trade-off between performance and cost is possible, as well as an optimization technique that enables better cache size selection to meet the desired performance level with minimal cost. The authors demonstrate the benefits of their proposal both for microbenchmarks and for two real-life applications using large-scale experiments.« less
GES DISC Data Recipes in Jupyter Notebooks
NASA Astrophysics Data System (ADS)
Li, A.; Banavige, B.; Garimella, K.; Rice, J.; Shen, S.; Liu, Z.
2017-12-01
The Earth Science Data and Information System (ESDIS) Project manages twelve Distributed Active Archive Centers (DAACs) which are geographically dispersed across the United States. The DAACs are responsible for ingesting, processing, archiving, and distributing Earth science data produced from various sources (satellites, aircraft, field measurements, etc.). In response to projections of an exponential increase in data production, there has been a recent effort to prototype various DAAC activities in the cloud computing environment. This, in turn, led to the creation of an initiative, called the Cloud Analysis Toolkit to Enable Earth Science (CATEES), to develop a Python software package in order to transition Earth science data processing to the cloud. This project, in particular, supports CATEES and has two primary goals. One, transition data recipes created by the Goddard Earth Science Data and Information Service Center (GES DISC) DAAC into an interactive and educational environment using Jupyter Notebooks. Two, acclimate Earth scientists to cloud computing. To accomplish these goals, we create Jupyter Notebooks to compartmentalize the different steps of data analysis and help users obtain and parse data from the command line. We also develop a Docker container, comprised of Jupyter Notebooks, Python library dependencies, and command line tools, and configure it into an easy to deploy package. The end result is an end-to-end product that simulates the use case of end users working in the cloud computing environment.
Where Next for Marine Cloud Brightening Research?
NASA Astrophysics Data System (ADS)
Jenkins, A. K. L.; Forster, P.
2014-12-01
Realistic estimates of geoengineering effectiveness will be central to informed decision-making on its possible role in addressing climate change. Over the last decade, global-scale computer climate modelling of geoengineering has been developing. While these developments have allowed quantitative estimates of geoengineering effectiveness to be produced, the relative coarseness of the grid of these models (tens of kilometres) means that key practical details of the proposed geoengineering is not always realistically captured. This is particularly true for marine cloud brightening (MCB), where both the clouds, as well as the tens-of-meters scale sea-going implementation vessels cannot be captured in detail. Previous research using cloud resolving modelling has shown that neglecting such details may lead to MCB effectiveness being overestimated by up to half. Realism of MCB effectiveness will likely improve from ongoing developments in the understanding and modelling of clouds. We also propose that realism can be increased via more specific improvements (see figure). A readily achievable example would be the reframing of previous MCB effectiveness estimates in light of the cloud resolving scale findings. Incorporation of implementation details could also be made - via parameterisation - into future global-scale modelling of MCB. However, as significant unknowns regarding the design of the MCB aerosol production technique remain, resource-intensive cloud resolving computer modelling of MCB may be premature unless of broader benefit to the wider understanding of clouds. One of the most essential recommendations is for enhanced communication between climate scientists and MCB designers. This would facilitate the identification of potentially important design aspects necessary for realistic computer simulations. Such relationships could be mutually beneficial, with computer modelling potentially informing more efficient designs of the MCB implementation technique. (Acknowledgment) This work is part of the Integrated Assessment of Geoengineering Proposals (IAGP) project, funded by the Engineering and Physical Sciences Research Council and the Natural Environment Research Council (EP/I014721/1).
NASA Astrophysics Data System (ADS)
Delipetrev, Blagoj
2016-04-01
Presently, most of the existing software is desktop-based, designed to work on a single computer, which represents a major limitation in many ways, starting from limited computer processing, storage power, accessibility, availability, etc. The only feasible solution lies in the web and cloud. This abstract presents research and development of a cloud computing geospatial application for water resources based on free and open source software and open standards using hybrid deployment model of public - private cloud, running on two separate virtual machines (VMs). The first one (VM1) is running on Amazon web services (AWS) and the second one (VM2) is running on a Xen cloud platform. The presented cloud application is developed using free and open source software, open standards and prototype code. The cloud application presents a framework how to develop specialized cloud geospatial application that needs only a web browser to be used. This cloud application is the ultimate collaboration geospatial platform because multiple users across the globe with internet connection and browser can jointly model geospatial objects, enter attribute data and information, execute algorithms, and visualize results. The presented cloud application is: available all the time, accessible from everywhere, it is scalable, works in a distributed computer environment, it creates a real-time multiuser collaboration platform, the programing languages code and components are interoperable, and it is flexible in including additional components. The cloud geospatial application is implemented as a specialized water resources application with three web services for 1) data infrastructure (DI), 2) support for water resources modelling (WRM), 3) user management. The web services are running on two VMs that are communicating over the internet providing services to users. The application was tested on the Zletovica river basin case study with concurrent multiple users. The application is a state-of-the-art cloud geospatial collaboration platform. The presented solution is a prototype and can be used as a foundation for developing of any specialized cloud geospatial applications. Further research will be focused on distributing the cloud application on additional VMs, testing the scalability and availability of services.
Calibration of radio-astronomical data on the cloud. LOFAR, the pathway to SKA
NASA Astrophysics Data System (ADS)
Sabater, J.; Sánchez-Expósito, S.; Garrido, J.; Ruiz, J. E.; Best, P. N.; Verdes-Montenegro, L.
2015-05-01
The radio interferometer LOFAR (LOw Frequency ARray) is fully operational now. This Square Kilometre Array (SKA) pathfinder allows the observation of the sky at frequencies between 10 and 240 MHz, a relatively unexplored region of the spectrum. LOFAR is a software defined telescope: the data is mainly processed using specialized software running in common computing facilities. That means that the capabilities of the telescope are virtually defined by software and mainly limited by the available computing power. However, the quantity of data produced can quickly reach huge volumes (several Petabytes per day). After the correlation and pre-processing of the data in a dedicated cluster, the final dataset is handled to the user (typically several Terabytes). The calibration of these data requires a powerful computing facility in which the specific state of the art software under heavy continuous development can be easily installed and updated. That makes this case a perfect candidate for a cloud infrastructure which adds the advantages of an on demand, flexible solution. We present our approach to the calibration of LOFAR data using Ibercloud, the cloud infrastructure provided by Ibergrid. With the calibration work-flow adapted to the cloud, we can explore calibration strategies for the SKA and show how private or commercial cloud infrastructures (Ibercloud, Amazon EC2, Google Compute Engine, etc.) can help to solve the problems with big datasets that will be prevalent in the future of astronomy.
PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
2014-01-01
Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system. PMID:24894600
PVT: an efficient computational procedure to speed up next-generation sequence analysis.
Maji, Ranjan Kumar; Sarkar, Arijita; Khatua, Sunirmal; Dasgupta, Subhasis; Ghosh, Zhumur
2014-06-04
High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat's serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during 'spliced alignment' and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.
Bent, John M.; Faibish, Sorin; Grider, Gary
2015-06-30
Cloud object storage is enabled for archived data, such as checkpoints and results, of high performance computing applications using a middleware process. A plurality of archived files, such as checkpoint files and results, generated by a plurality of processes in a parallel computing system are stored by obtaining the plurality of archived files from the parallel computing system; converting the plurality of archived files to objects using a log structured file system middleware process; and providing the objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.
An Adaptive Multilevel Security Framework for the Data Stored in Cloud Environment
Dorairaj, Sudha Devi; Kaliannan, Thilagavathy
2015-01-01
Cloud computing is renowned for delivering information technology services based on internet. Nowadays, organizations are interested in moving their massive data and computations into cloud to reap their significant benefits of on demand service, resource pooling, and rapid elasticity that helps to satisfy the dynamically changing infrastructure demand without the burden of owning, managing, and maintaining it. Since the data needs to be secured throughout its life cycle, security of the data in cloud is a major challenge to be concentrated on because the data is in third party's premises. Any uniform simple or high level security method for all the data either compromises the sensitive data or proves to be too costly with increased overhead. Any common multiple method for all data becomes vulnerable when the common security pattern is identified at the event of successful attack on any information and also encourages more attacks on all other data. This paper suggests an adaptive multilevel security framework based on cryptography techniques that provide adequate security for the classified data stored in cloud. The proposed security system acclimates well for cloud environment and is also customizable and more reliant to meet the required level of security of data with different sensitivity that changes with business needs and commercial conditions. PMID:26258165
An Adaptive Multilevel Security Framework for the Data Stored in Cloud Environment.
Dorairaj, Sudha Devi; Kaliannan, Thilagavathy
2015-01-01
Cloud computing is renowned for delivering information technology services based on internet. Nowadays, organizations are interested in moving their massive data and computations into cloud to reap their significant benefits of on demand service, resource pooling, and rapid elasticity that helps to satisfy the dynamically changing infrastructure demand without the burden of owning, managing, and maintaining it. Since the data needs to be secured throughout its life cycle, security of the data in cloud is a major challenge to be concentrated on because the data is in third party's premises. Any uniform simple or high level security method for all the data either compromises the sensitive data or proves to be too costly with increased overhead. Any common multiple method for all data becomes vulnerable when the common security pattern is identified at the event of successful attack on any information and also encourages more attacks on all other data. This paper suggests an adaptive multilevel security framework based on cryptography techniques that provide adequate security for the classified data stored in cloud. The proposed security system acclimates well for cloud environment and is also customizable and more reliant to meet the required level of security of data with different sensitivity that changes with business needs and commercial conditions.
Content-based histopathology image retrieval using CometCloud.
Qi, Xin; Wang, Daihou; Rodero, Ivan; Diaz-Montes, Javier; Gensure, Rebekah H; Xing, Fuyong; Zhong, Hua; Goodell, Lauri; Parashar, Manish; Foran, David J; Yang, Lin
2014-08-26
The development of digital imaging technology is creating extraordinary levels of accuracy that provide support for improved reliability in different aspects of the image analysis, such as content-based image retrieval, image segmentation, and classification. This has dramatically increased the volume and rate at which data are generated. Together these facts make querying and sharing non-trivial and render centralized solutions unfeasible. Moreover, in many cases this data is often distributed and must be shared across multiple institutions requiring decentralized solutions. In this context, a new generation of data/information driven applications must be developed to take advantage of the national advanced cyber-infrastructure (ACI) which enable investigators to seamlessly and securely interact with information/data which is distributed across geographically disparate resources. This paper presents the development and evaluation of a novel content-based image retrieval (CBIR) framework. The methods were tested extensively using both peripheral blood smears and renal glomeruli specimens. The datasets and performance were evaluated by two pathologists to determine the concordance. The CBIR algorithms that were developed can reliably retrieve the candidate image patches exhibiting intensity and morphological characteristics that are most similar to a given query image. The methods described in this paper are able to reliably discriminate among subtle staining differences and spatial pattern distributions. By integrating a newly developed dual-similarity relevance feedback module into the CBIR framework, the CBIR results were improved substantially. By aggregating the computational power of high performance computing (HPC) and cloud resources, we demonstrated that the method can be successfully executed in minutes on the Cloud compared to weeks using standard computers. In this paper, we present a set of newly developed CBIR algorithms and validate them using two different pathology applications, which are regularly evaluated in the practice of pathology. Comparative experimental results demonstrate excellent performance throughout the course of a set of systematic studies. Additionally, we present and evaluate a framework to enable the execution of these algorithms across distributed resources. We show how parallel searching of content-wise similar images in the dataset significantly reduces the overall computational time to ensure the practical utility of the proposed CBIR algorithms.
NASA Astrophysics Data System (ADS)
Jones, Bob; Casu, Francesco
2013-04-01
The feasibility of using commercial cloud services for scientific research is of great interest to research organisations such as CERN, ESA and EMBL, to the suppliers of cloud-based services and to the national and European funding agencies. Through the Helix Nebula - the Science Cloud [1] initiative and with the support of the European Commission, these stakeholders are driving a two year pilot-phase during which procurement processes and governance issues for a framework of public/private partnership will be appraised. Three initial flagship use cases from high energy physics, molecular biology and earth-observation are being used to validate the approach, enable a cost-benefit analysis to be undertaken and prepare the next stage of the Science Cloud Strategic Plan [2] to be developed and approved. The power of Helix Nebula lies in a shared set of services for initially 3 very different sciences each supporting a global community and thus building a common e-Science platform. Of particular relevance is the ESA sponsored flagship application SuperSites Exploitation Platform (SSEP [3]) that offers the global geo-hazard community a common platform for the correlation and processing of observation data for supersites monitoring. The US-NSF Earth Cube [4] and Ocean Observatory Initiative [5] (OOI) are taking a similar approach for data intensive science. The work of Helix Nebula and its recent architecture model [6] has shown that is it technically feasible to allow publicly funded infrastructures, such as EGI [7] and GEANT [8], to interoperate with commercial cloud services. Such hybrid systems are in the interest of the existing users of publicly funded infrastructures and funding agencies because they will provide "freedom of choice" over the type of computing resources to be consumed and the manner in which they can be obtained. But to offer such freedom-of choice across a spectrum of suppliers, various issues such as intellectual property, legal responsibility, service quality agreements and related issues need to be addressed. Investigating these issues is one of the goals of the Helix Nebula initiative. The next generation of researchers will put aside the historical categorisation of research as a neatly defined set of disciplines and integrate the data from different sources and instruments into complex models that are as applicable to earth observation or biomedicine as they are to high-energy physics. This aggregation of datasets and development of new models will accelerate scientific development but will only be possible if the issues of data intensive science described above are addressed. The culture of science has the possibility to develop with the availability of Helix Nebula as a "Science Cloud" because: • Large scale datasets from many disciplines will be accessible • Scientists and others will be able to develop and contribute open source tools to expand the set of services available • Collaboration of scientists will take place around the on-demand availability of data, tools and services • Cross-domain research will advance at a faster pace due to the availability of a common platform. References: 1 http://www.helix-nebula.eu/ 2 http://cdsweb.cern.ch/record/1374172/files/CERN-OPEN-2011-036.pdf 3 http://www.helix-nebula.eu/index.php/helix-nebula-use-cases/uc3.html 4 http://www.nsf.gov/geo/earthcube/ 5 http://www.oceanobservatories.org/ 6 http://cdsweb.cern.ch/record/1478364/files/HelixNebula-NOTE-2012-001.pdf 7 http://www.nsf.gov/geo/earthcube/ 8 http://www.geant.net/
Volunteered Cloud Computing for Disaster Management
NASA Astrophysics Data System (ADS)
Evans, J. D.; Hao, W.; Chettri, S. R.
2014-12-01
Disaster management relies increasingly on interpreting earth observations and running numerical models; which require significant computing capacity - usually on short notice and at irregular intervals. Peak computing demand during event detection, hazard assessment, or incident response may exceed agency budgets; however some of it can be met through volunteered computing, which distributes subtasks to participating computers via the Internet. This approach has enabled large projects in mathematics, basic science, and climate research to harness the slack computing capacity of thousands of desktop computers. This capacity is likely to diminish as desktops give way to battery-powered mobile devices (laptops, smartphones, tablets) in the consumer market; but as cloud computing becomes commonplace, it may offer significant slack capacity -- if its users are given an easy, trustworthy mechanism for participating. Such a "volunteered cloud computing" mechanism would also offer several advantages over traditional volunteered computing: tasks distributed within a cloud have fewer bandwidth limitations; granular billing mechanisms allow small slices of "interstitial" computing at no marginal cost; and virtual storage volumes allow in-depth, reversible machine reconfiguration. Volunteered cloud computing is especially suitable for "embarrassingly parallel" tasks, including ones requiring large data volumes: examples in disaster management include near-real-time image interpretation, pattern / trend detection, or large model ensembles. In the context of a major disaster, we estimate that cloud users (if suitably informed) might volunteer hundreds to thousands of CPU cores across a large provider such as Amazon Web Services. To explore this potential, we are building a volunteered cloud computing platform and targeting it to a disaster management context. Using a lightweight, fault-tolerant network protocol, this platform helps cloud users join parallel computing projects; automates reconfiguration of their virtual machines; ensures accountability for donated computing; and optimizes the use of "interstitial" computing. Initial applications include fire detection from multispectral satellite imagery and flood risk mapping through hydrological simulations.
Zhu, Lingyun; Li, Lianjie; Meng, Chunyan
2014-12-01
There have been problems in the existing multiple physiological parameter real-time monitoring system, such as insufficient server capacity for physiological data storage and analysis so that data consistency can not be guaranteed, poor performance in real-time, and other issues caused by the growing scale of data. We therefore pro posed a new solution which was with multiple physiological parameters and could calculate clustered background data storage and processing based on cloud computing. Through our studies, a batch processing for longitudinal analysis of patients' historical data was introduced. The process included the resource virtualization of IaaS layer for cloud platform, the construction of real-time computing platform of PaaS layer, the reception and analysis of data stream of SaaS layer, and the bottleneck problem of multi-parameter data transmission, etc. The results were to achieve in real-time physiological information transmission, storage and analysis of a large amount of data. The simulation test results showed that the remote multiple physiological parameter monitoring system based on cloud platform had obvious advantages in processing time and load balancing over the traditional server model. This architecture solved the problems including long turnaround time, poor performance of real-time analysis, lack of extensibility and other issues, which exist in the traditional remote medical services. Technical support was provided in order to facilitate a "wearable wireless sensor plus mobile wireless transmission plus cloud computing service" mode moving towards home health monitoring for multiple physiological parameter wireless monitoring.
2011-08-01
5 Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis...classification of streaming data. Example input images (top left). All digit prototypes (cluster centers) found, with size proportional to frequency (top...Figure 4 Architetural diagram of running Blender on Amazon EC2 through Nimbis 1 http
GTZ: a fast compression and cloud transmission tool optimized for FASTQ files.
Xing, Yuting; Li, Gen; Wang, Zhenguo; Feng, Bolun; Song, Zhuo; Wu, Chengkun
2017-12-28
The dramatic development of DNA sequencing technology is generating real big data, craving for more storage and bandwidth. To speed up data sharing and bring data to computing resource faster and cheaper, it is necessary to develop a compression tool than can support efficient compression and transmission of sequencing data onto the cloud storage. This paper presents GTZ, a compression and transmission tool, optimized for FASTQ files. As a reference-free lossless FASTQ compressor, GTZ treats different lines of FASTQ separately, utilizes adaptive context modelling to estimate their characteristic probabilities, and compresses data blocks with arithmetic coding. GTZ can also be used to compress multiple files or directories at once. Furthermore, as a tool to be used in the cloud computing era, it is capable of saving compressed data locally or transmitting data directly into cloud by choice. We evaluated the performance of GTZ on some diverse FASTQ benchmarks. Results show that in most cases, it outperforms many other tools in terms of the compression ratio, speed and stability. GTZ is a tool that enables efficient lossless FASTQ data compression and simultaneous data transmission onto to cloud. It emerges as a useful tool for NGS data storage and transmission in the cloud environment. GTZ is freely available online at: https://github.com/Genetalks/gtz .
Towards real-time photon Monte Carlo dose calculation in the cloud
NASA Astrophysics Data System (ADS)
Ziegenhein, Peter; Kozin, Igor N.; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-01
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards real-time photon Monte Carlo dose calculation in the cloud.
Ziegenhein, Peter; Kozin, Igor N; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-07
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
GOT C+ Survey of [CII] 158 μm Emission: Atomic to Molecular Cloud Transitions in the Inner Galaxy
NASA Astrophysics Data System (ADS)
Velusamy, T.; Langer, W. D.; Willacy, K.; Pineda, J. L.; Goldsmith, P. F.
2013-03-01
We present the results of the distribution of CO-dark H2 gas in a sample of 2223 interstellar clouds in the inner Galaxy (l=-90° to +57°) detected in the velocity resolved [CII] spectra observed in the GOT C+ survey using the Herschel HIFI. We analyze the [CII] intensities along with the ancillary HI, 12CO and 13CO data for each cloud to determine their evolutionary state and to derive the H2 column densities in the C+ and C+/CO transition layers in the cloud. We discuss the overall Galactic distribution of the [CII] clouds and their properties as a function Galactic radius. GOT C+ results on the global distribution of [CII] clouds and CO-dark H2 gas traces the FUV intensity and star formation rate in the Galactic disk.
cryoem-cloud-tools: A software platform to deploy and manage cryo-EM jobs in the cloud.
Cianfrocco, Michael A; Lahiri, Indrajit; DiMaio, Frank; Leschziner, Andres E
2018-06-01
Access to streamlined computational resources remains a significant bottleneck for new users of cryo-electron microscopy (cryo-EM). To address this, we have developed tools that will submit cryo-EM analysis routines and atomic model building jobs directly to Amazon Web Services (AWS) from a local computer or laptop. These new software tools ("cryoem-cloud-tools") have incorporated optimal data movement, security, and cost-saving strategies, giving novice users access to complex cryo-EM data processing pipelines. Integrating these tools into the RELION processing pipeline and graphical user interface we determined a 2.2 Å structure of ß-galactosidase in ∼55 hours on AWS. We implemented a similar strategy to submit Rosetta atomic model building and refinement to AWS. These software tools dramatically reduce the barrier for entry of new users to cloud computing for cryo-EM and are freely available at cryoem-tools.cloud. Copyright © 2018. Published by Elsevier Inc.
Science in the cloud (SIC): A use case in MRI connectomics.
Kiar, Gregory; Gorgolewski, Krzysztof J; Kleissas, Dean; Roncal, William Gray; Litt, Brian; Wandell, Brian; Poldrack, Russel A; Wiener, Martin; Vogelstein, R Jacob; Burns, Randal; Vogelstein, Joshua T
2017-05-01
Modern technologies are enabling scientists to collect extraordinary amounts of complex and sophisticated data across a huge range of scales like never before. With this onslaught of data, we can allow the focal point to shift from data collection to data analysis. Unfortunately, lack of standardized sharing mechanisms and practices often make reproducing or extending scientific results very difficult. With the creation of data organization structures and tools that drastically improve code portability, we now have the opportunity to design such a framework for communicating extensible scientific discoveries. Our proposed solution leverages these existing technologies and standards, and provides an accessible and extensible model for reproducible research, called 'science in the cloud' (SIC). Exploiting scientific containers, cloud computing, and cloud data services, we show the capability to compute in the cloud and run a web service that enables intimate interaction with the tools and data presented. We hope this model will inspire the community to produce reproducible and, importantly, extensible results that will enable us to collectively accelerate the rate at which scientific breakthroughs are discovered, replicated, and extended. © The Author 2017. Published by Oxford University Press.
OpenTopography: Addressing Big Data Challenges Using Cloud Computing, HPC, and Data Analytics
NASA Astrophysics Data System (ADS)
Crosby, C. J.; Nandigam, V.; Phan, M.; Youn, C.; Baru, C.; Arrowsmith, R.
2014-12-01
OpenTopography (OT) is a geoinformatics-based data facility initiated in 2009 for democratizing access to high-resolution topographic data, derived products, and tools. Hosted at the San Diego Supercomputer Center (SDSC), OT utilizes cyberinfrastructure, including large-scale data management, high-performance computing, and service-oriented architectures to provide efficient Web based access to large, high-resolution topographic datasets. OT collocates data with processing tools to enable users to quickly access custom data and derived products for their application. OT's ongoing R&D efforts aim to solve emerging technical challenges associated with exponential growth in data, higher order data products, as well as user base. Optimization of data management strategies can be informed by a comprehensive set of OT user access metrics that allows us to better understand usage patterns with respect to the data. By analyzing the spatiotemporal access patterns within the datasets, we can map areas of the data archive that are highly active (hot) versus the ones that are rarely accessed (cold). This enables us to architect a tiered storage environment consisting of high performance disk storage (SSD) for the hot areas and less expensive slower disk for the cold ones, thereby optimizing price to performance. From a compute perspective, OT is looking at cloud based solutions such as the Microsoft Azure platform to handle sudden increases in load. An OT virtual machine image in Microsoft's VM Depot can be invoked and deployed quickly in response to increased system demand. OT has also integrated SDSC HPC systems like the Gordon supercomputer into our infrastructure tier to enable compute intensive workloads like parallel computation of hydrologic routing on high resolution topography. This capability also allows OT to scale to HPC resources during high loads to meet user demand and provide more efficient processing. With a growing user base and maturing scientific user community comes new requests for algorithms and processing capabilities. To address this demand, OT is developing an extensible service based architecture for integrating community-developed software. This "plugable" approach to Web service deployment will enable new processing and analysis tools to run collocated with OT hosted data.
A PACS archive architecture supported on cloud services.
Silva, Luís A Bastião; Costa, Carlos; Oliveira, José Luis
2012-05-01
Diagnostic imaging procedures have continuously increased over the last decade and this trend may continue in coming years, creating a great impact on storage and retrieval capabilities of current PACS. Moreover, many smaller centers do not have financial resources or requirements that justify the acquisition of a traditional infrastructure. Alternative solutions, such as cloud computing, may help address this emerging need. A tremendous amount of ubiquitous computational power, such as that provided by Google and Amazon, are used every day as a normal commodity. Taking advantage of this new paradigm, an architecture for a Cloud-based PACS archive that provides data privacy, integrity, and availability is proposed. The solution is independent from the cloud provider and the core modules were successfully instantiated in examples of two cloud computing providers. Operational metrics for several medical imaging modalities were tabulated and compared for Google Storage, Amazon S3, and LAN PACS. A PACS-as-a-Service archive that provides storage of medical studies using the Cloud was developed. The results show that the solution is robust and that it is possible to store, query, and retrieve all desired studies in a similar way as in a local PACS approach. Cloud computing is an emerging solution that promises high scalability of infrastructures, software, and applications, according to a "pay-as-you-go" business model. The presented architecture uses the cloud to setup medical data repositories and can have a significant impact on healthcare institutions by reducing IT infrastructures.
A System Architecture for Efficient Transmission of Massive DNA Sequencing Data.
Sağiroğlu, Mahmut Şamİl; Külekcİ, M Oğuzhan
2017-11-01
The DNA sequencing data analysis pipelines require significant computational resources. In that sense, cloud computing infrastructures appear as a natural choice for this processing. However, the first practical difficulty in reaching the cloud computing services is the transmission of the massive DNA sequencing data from where they are produced to where they will be processed. The daily practice here begins with compressing the data in FASTQ file format, and then sending these data via fast data transmission protocols. In this study, we address the weaknesses in that daily practice and present a new system architecture that incorporates the computational resources available on the client side while dynamically adapting itself to the available bandwidth. Our proposal considers the real-life scenarios, where the bandwidth of the connection between the parties may fluctuate, and also the computing power on the client side may be of any size ranging from moderate personal computers to powerful workstations. The proposed architecture aims at utilizing both the communication bandwidth and the computing resources for satisfying the ultimate goal of reaching the results as early as possible. We present a prototype implementation of the proposed architecture, and analyze several real-life cases, which provide useful insights for the sequencing centers, especially on deciding when to use a cloud service and in what conditions.
High performance data transfer
NASA Astrophysics Data System (ADS)
Cottrell, R.; Fang, C.; Hanushevsky, A.; Kreuger, W.; Yang, W.
2017-10-01
The exponentially increasing need for high speed data transfer is driven by big data, and cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software. This has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a scalable, balanced, easy to deploy and use way while minimizing power and space utilization. In collaboration with several commercial vendors, Proofs of Concept (PoC) consisting of clusters have been put together using off-the- shelf components to test the ZX scalability and ability to balance services using multiple cores, and links. The PoCs are based on SSD flash storage that is managed by a parallel file system. Each cluster occupies 4 rack units. Using the PoCs, between clusters we have achieved almost 200Gbps memory to memory over two 100Gbps links, and 70Gbps parallel file to parallel file with encryption over a 5000 mile 100Gbps link.
Cloud Compute for Global Climate Station Summaries
NASA Astrophysics Data System (ADS)
Baldwin, R.; May, B.; Cogbill, P.
2017-12-01
Global Climate Station Summaries are simple indicators of observational normals which include climatic data summarizations and frequency distributions. These typically are statistical analyses of station data over 5-, 10-, 20-, 30-year or longer time periods. The summaries are computed from the global surface hourly dataset. This dataset totaling over 500 gigabytes is comprised of 40 different types of weather observations with 20,000 stations worldwide. NCEI and the U.S. Navy developed these value added products in the form of hourly summaries from many of these observations. Enabling this compute functionality in the cloud is the focus of the project. An overview of approach and challenges associated with application transition to the cloud will be presented.
Automatic extraction of pavement markings on streets from point cloud data of mobile LiDAR
NASA Astrophysics Data System (ADS)
Gao, Yang; Zhong, Ruofei; Tang, Tao; Wang, Liuzhao; Liu, Xianlin
2017-08-01
Pavement markings provide an important foundation as they help to keep roads users safe. Accurate and comprehensive information about pavement markings assists the road regulators and is useful in developing driverless technology. Mobile light detection and ranging (LiDAR) systems offer new opportunities to collect and process accurate pavement markings’ information. Mobile LiDAR systems can directly obtain the three-dimensional (3D) coordinates of an object, thus defining spatial data and the intensity of (3D) objects in a fast and efficient way. The RGB attribute information of data points can be obtained based on the panoramic camera in the system. In this paper, we present a novel method process to automatically extract pavement markings using multiple attribute information of the laser scanning point cloud from the mobile LiDAR data. This method process utilizes a differential grayscale of RGB color, laser pulse reflection intensity, and the differential intensity to identify and extract pavement markings. We utilized point cloud density to remove the noise and used morphological operations to eliminate the errors. In the application, we tested our method process on different sections of roads in Beijing, China, and Buffalo, NY, USA. The results indicated that both correctness (p) and completeness (r) were higher than 90%. The method process of this research can be applied to extract pavement markings from huge point cloud data produced by mobile LiDAR.
Automatic registration of optical imagery with 3d lidar data using local combined mutual information
NASA Astrophysics Data System (ADS)
Parmehr, E. G.; Fraser, C. S.; Zhang, C.; Leach, J.
2013-10-01
Automatic registration of multi-sensor data is a basic step in data fusion for photogrammetric and remote sensing applications. The effectiveness of intensity-based methods such as Mutual Information (MI) for automated registration of multi-sensor image has been previously reported for medical and remote sensing applications. In this paper, a new multivariable MI approach that exploits complementary information of inherently registered LiDAR DSM and intensity data to improve the robustness of registering optical imagery and LiDAR point cloud, is presented. LiDAR DSM and intensity information has been utilised in measuring the similarity of LiDAR and optical imagery via the Combined MI. An effective histogramming technique is adopted to facilitate estimation of a 3D probability density function (pdf). In addition, a local similarity measure is introduced to decrease the complexity of optimisation at higher dimensions and computation cost. Therefore, the reliability of registration is improved due to the use of redundant observations of similarity. The performance of the proposed method for registration of satellite and aerial images with LiDAR data in urban and rural areas is experimentally evaluated and the results obtained are discussed.
Earthdata Cloud Analytics Project
NASA Technical Reports Server (NTRS)
Ramachandran, Rahul; Lynnes, Chris
2018-01-01
This presentation describes a nascent project in NASA to develop a framework to support end-user analytics of NASA's Earth science data in the cloud. The chief benefit of migrating EOSDIS (Earth Observation System Data and Information Systems) data to the cloud is to position the data next to enormous computing capacity to allow end users to process data at scale. The Earthdata Cloud Analytics project will user a service-based approach to facilitate the infusion of evolving analytics technology and the integration with non-NASA analytics or other complementary functionality at other agencies and in other nations.
The EPOS Vision for the Open Science Cloud
NASA Astrophysics Data System (ADS)
Jeffery, Keith; Harrison, Matt; Cocco, Massimo
2016-04-01
Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.
Sideloading - Ingestion of Large Point Clouds Into the Apache Spark Big Data Engine
NASA Astrophysics Data System (ADS)
Boehm, J.; Liu, K.; Alis, C.
2016-06-01
In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.
Cloud-based crowd sensing: a framework for location-based crowd analyzer and advisor
NASA Astrophysics Data System (ADS)
Aishwarya, K. C.; Nambi, A.; Hudson, S.; Nadesh, R. K.
2017-11-01
Cloud computing is an emerging field of computer science to integrate and explore large and powerful computing systems and storages for personal and also for enterprise requirements. Mobile Cloud Computing is the inheritance of this concept towards mobile hand-held devices. Crowdsensing, or to be precise, Mobile Crowdsensing is the process of sharing resources from an available group of mobile handheld devices that support sharing of different resources such as data, memory and bandwidth to perform a single task for collective reasons. In this paper, we propose a framework to use Crowdsensing and perform a crowd analyzer and advisor whether the user can go to the place or not. This is an ongoing research and is a new concept to which the direction of cloud computing has shifted and is viable for more expansion in the near future.
Secure Skyline Queries on Cloud Platform
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-01-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710
Binary-selectable detector holdoff circuit
NASA Technical Reports Server (NTRS)
Kadrmas, K. A.
1974-01-01
High-speed switching circuit protects detectors from sudden, extremely-intense backscattered radiation that results from short-range atmospheric dust layers, or low-level clouds, entering laser/radar field of view. Function of circuit is to provide computer-controlled switching of photodiode detector, preamplifier power-supply voltages, in approximately 10 nanoseconds.
Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola
2015-01-01
New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.
Capabilities and Advantages of Cloud Computing in the Implementation of Electronic Health Record.
Ahmadi, Maryam; Aslani, Nasim
2018-01-01
With regard to the high cost of the Electronic Health Record (EHR), in recent years the use of new technologies, in particular cloud computing, has increased. The purpose of this study was to review systematically the studies conducted in the field of cloud computing. The present study was a systematic review conducted in 2017. Search was performed in the Scopus, Web of Sciences, IEEE, Pub Med and Google Scholar databases by combination keywords. From the 431 article that selected at the first, after applying the inclusion and exclusion criteria, 27 articles were selected for surveyed. Data gathering was done by a self-made check list and was analyzed by content analysis method. The finding of this study showed that cloud computing is a very widespread technology. It includes domains such as cost, security and privacy, scalability, mutual performance and interoperability, implementation platform and independence of Cloud Computing, ability to search and exploration, reducing errors and improving the quality, structure, flexibility and sharing ability. It will be effective for electronic health record. According to the findings of the present study, higher capabilities of cloud computing are useful in implementing EHR in a variety of contexts. It also provides wide opportunities for managers, analysts and providers of health information systems. Considering the advantages and domains of cloud computing in the establishment of HER, it is recommended to use this technology.
Capabilities and Advantages of Cloud Computing in the Implementation of Electronic Health Record
Ahmadi, Maryam; Aslani, Nasim
2018-01-01
Background: With regard to the high cost of the Electronic Health Record (EHR), in recent years the use of new technologies, in particular cloud computing, has increased. The purpose of this study was to review systematically the studies conducted in the field of cloud computing. Methods: The present study was a systematic review conducted in 2017. Search was performed in the Scopus, Web of Sciences, IEEE, Pub Med and Google Scholar databases by combination keywords. From the 431 article that selected at the first, after applying the inclusion and exclusion criteria, 27 articles were selected for surveyed. Data gathering was done by a self-made check list and was analyzed by content analysis method. Results: The finding of this study showed that cloud computing is a very widespread technology. It includes domains such as cost, security and privacy, scalability, mutual performance and interoperability, implementation platform and independence of Cloud Computing, ability to search and exploration, reducing errors and improving the quality, structure, flexibility and sharing ability. It will be effective for electronic health record. Conclusion: According to the findings of the present study, higher capabilities of cloud computing are useful in implementing EHR in a variety of contexts. It also provides wide opportunities for managers, analysts and providers of health information systems. Considering the advantages and domains of cloud computing in the establishment of HER, it is recommended to use this technology. PMID:29719309
Towards Efficient Scientific Data Management Using Cloud Storage
NASA Technical Reports Server (NTRS)
He, Qiming
2013-01-01
A software prototype allows users to backup and restore data to/from both public and private cloud storage such as Amazon's S3 and NASA's Nebula. Unlike other off-the-shelf tools, this software ensures user data security in the cloud (through encryption), and minimizes users operating costs by using space- and bandwidth-efficient compression and incremental backup. Parallel data processing utilities have also been developed by using massively scalable cloud computing in conjunction with cloud storage. One of the innovations in this software is using modified open source components to work with a private cloud like NASA Nebula. Another innovation is porting the complex backup to- cloud software to embedded Linux, running on the home networking devices, in order to benefit more users.
NASA Astrophysics Data System (ADS)
Lin, Guofen; Hong, Hanshu; Xia, Yunhao; Sun, Zhixin
2017-10-01
Attribute-based encryption (ABE) is an interesting cryptographic technique for flexible cloud data sharing access control. However, some open challenges hinder its practical application. In previous schemes, all attributes are considered as in the same status while they are not in most of practical scenarios. Meanwhile, the size of access policy increases dramatically with the raise of its expressiveness complexity. In addition, current research hardly notices that mobile front-end devices, such as smartphones, are poor in computational performance while too much bilinear pairing computation is needed for ABE. In this paper, we propose a key-policy weighted attribute-based encryption without bilinear pairing computation (KP-WABE-WB) for secure cloud data sharing access control. A simple weighted mechanism is presented to describe different importance of each attribute. We introduce a novel construction of ABE without executing any bilinear pairing computation. Compared to previous schemes, our scheme has a better performance in expressiveness of access policy and computational efficiency.
SenseMyHeart: A cloud service and API for wearable heart monitors.
Pinto Silva, P M; Silva Cunha, J P
2015-01-01
In the era of ubiquitous computing, the growing adoption of wearable systems and body sensor networks is trailing the path for new research and software for cardiovascular intensity, energy expenditure and stress and fatigue detection through cardiovascular monitoring. Several systems have received clinical-certification and provide huge amounts of reliable heart-related data in a continuous basis. PhysioNet provides equally reliable open-source software tools for ECG processing and analysis that can be combined with these devices. However, this software remains difficult to use in a mobile environment and for researchers unfamiliar with Linux-based systems. In the present paper we present an approach that aims at tackling these limitations by developing a cloud service that provides an API for a PhysioNet-based pipeline for ECG processing and Heart Rate Variability measurement. We describe the proposed solution, along with its advantages and tradeoffs. We also present some client tools (windows and Android) and several projects where the developed cloud service has been used successfully as a standard for Heart Rate and Heart Rate Variability studies in different scenarios.
Two-Cloud-Servers-Assisted Secure Outsourcing Multiparty Computation
Wen, Qiaoyan; Zhang, Hua; Jin, Zhengping; Li, Wenmin
2014-01-01
We focus on how to securely outsource computation task to the cloud and propose a secure outsourcing multiparty computation protocol on lattice-based encrypted data in two-cloud-servers scenario. Our main idea is to transform the outsourced data respectively encrypted by different users' public keys to the ones that are encrypted by the same two private keys of the two assisted servers so that it is feasible to operate on the transformed ciphertexts to compute an encrypted result following the function to be computed. In order to keep the privacy of the result, the two servers cooperatively produce a custom-made result for each user that is authorized to get the result so that all authorized users can recover the desired result while other unauthorized ones including the two servers cannot. Compared with previous research, our protocol is completely noninteractive between any users, and both of the computation and the communication complexities of each user in our solution are independent of the computing function. PMID:24982949
Two-cloud-servers-assisted secure outsourcing multiparty computation.
Sun, Yi; Wen, Qiaoyan; Zhang, Yudong; Zhang, Hua; Jin, Zhengping; Li, Wenmin
2014-01-01
We focus on how to securely outsource computation task to the cloud and propose a secure outsourcing multiparty computation protocol on lattice-based encrypted data in two-cloud-servers scenario. Our main idea is to transform the outsourced data respectively encrypted by different users' public keys to the ones that are encrypted by the same two private keys of the two assisted servers so that it is feasible to operate on the transformed ciphertexts to compute an encrypted result following the function to be computed. In order to keep the privacy of the result, the two servers cooperatively produce a custom-made result for each user that is authorized to get the result so that all authorized users can recover the desired result while other unauthorized ones including the two servers cannot. Compared with previous research, our protocol is completely noninteractive between any users, and both of the computation and the communication complexities of each user in our solution are independent of the computing function.
Cloud GIS Based Watershed Management
NASA Astrophysics Data System (ADS)
Bediroğlu, G.; Colak, H. E.
2017-11-01
In this study, we generated a Cloud GIS based watershed management system with using Cloud Computing architecture. Cloud GIS is used as SAAS (Software as a Service) and DAAS (Data as a Service). We applied GIS analysis on cloud in terms of testing SAAS and deployed GIS datasets on cloud in terms of DAAS. We used Hybrid cloud computing model in manner of using ready web based mapping services hosted on cloud (World Topology, Satellite Imageries). We uploaded to system after creating geodatabases including Hydrology (Rivers, Lakes), Soil Maps, Climate Maps, Rain Maps, Geology and Land Use. Watershed of study area has been determined on cloud using ready-hosted topology maps. After uploading all the datasets to systems, we have applied various GIS analysis and queries. Results shown that Cloud GIS technology brings velocity and efficiency for watershed management studies. Besides this, system can be easily implemented for similar land analysis and management studies.
NASA Astrophysics Data System (ADS)
Vilotte, J.-P.; Atkinson, M.; Michelini, A.; Igel, H.; van Eck, T.
2012-04-01
Increasingly dense seismic and geodetic networks are continuously transmitting a growing wealth of data from around the world. The multi-use of these data leaded the seismological community to pioneer globally distributed open-access data infrastructures, standard services and formats, e.g., the Federation of Digital Seismic Networks (FDSN) and the European Integrated Data Archives (EIDA). Our ability to acquire observational data outpaces our ability to manage, analyze and model them. Research in seismology is today facing a fundamental paradigm shift. Enabling advanced data-intensive analysis and modeling applications challenges conventional storage, computation and communication models and requires a new holistic approach. It is instrumental to exploit the cornucopia of data, and to guarantee optimal operation and design of the high-cost monitoring facilities. The strategy of VERCE is driven by the needs of the seismological data-intensive applications in data analysis and modeling. It aims to provide a comprehensive architecture and framework adapted to the scale and the diversity of those applications, and integrating the data infrastructures with Grid, Cloud and HPC infrastructures. It will allow prototyping solutions for new use cases as they emerge within the European Plate Observatory Systems (EPOS), the ESFRI initiative of the solid Earth community. Computational seismology, and information management, is increasingly revolving around massive amounts of data that stem from: (1) the flood of data from the observational systems; (2) the flood of data from large-scale simulations and inversions; (3) the ability to economically store petabytes of data online; (4) the evolving Internet and Data-aware computing capabilities. As data-intensive applications are rapidly increasing in scale and complexity, they require additional services-oriented architectures offering a virtualization-based flexibility for complex and re-usable workflows. Scientific information management poses computer science challenges: acquisition, organization, query and visualization tasks scale almost linearly with the data volumes. Commonly used FTP-GREP metaphor allows today to scan gigabyte-sized datasets but will not work for scanning terabyte-sized continuous waveform datasets. New data analysis and modeling methods, exploiting the signal coherence within dense network arrays, are nonlinear. Pair-algorithms on N points scale as N2. Wave form inversion and stochastic simulations raise computing and data handling challenges These applications are unfeasible for tera-scale datasets without new parallel algorithms that use near-linear processing, storage and bandwidth, and that can exploit new computing paradigms enabled by the intersection of several technologies (HPC, parallel scalable database crawler, data-aware HPC). This issues will be discussed based on a number of core pilot data-intensive applications and use cases retained in VERCE. This core applications are related to: (1) data processing and data analysis methods based on correlation techniques; (2) cpu-intensive applications such as large-scale simulation of synthetic waveforms in complex earth systems, and full waveform inversion and tomography. We shall analyze their workflow and data flow, and their requirements for a new service-oriented architecture and a data-aware platform with services and tools. Finally, we will outline the importance of a new collaborative environment between seismology and computer science, together with the need for the emergence and the recognition of 'research technologists' mastering the evolving data-aware technologies and the data-intensive research goals in seismology.
Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening.
Karthikeyan, Muthukumarasamy; Pandit, Deepak; Bhavasar, Arvind; Vyas, Renu
2015-01-01
The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platorm ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.
Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets.
Tran, Ngoc Tam L; Huang, Chun-Hsi
2017-05-01
We developed the cloud-based MOTIFSIM on Amazon Web Services (AWS) cloud. The tool is an extended version from our web-based tool version 2.0, which was developed based on a novel algorithm for detecting similarity in multiple DNA motif data sets. This cloud-based version further allows researchers to exploit the computing resources available from AWS to detect similarity in multiple large-scale DNA motif data sets resulting from the next-generation sequencing technology. The tool is highly scalable with expandable AWS.
Molnár-Gábor, Fruzsina; Lueck, Rupert; Yakneen, Sergei; Korbel, Jan O
2017-06-20
Biomedical research is becoming increasingly large-scale and international. Cloud computing enables the comprehensive integration of genomic and clinical data, and the global sharing and collaborative processing of these data within a flexibly scalable infrastructure. Clouds offer novel research opportunities in genomics, as they facilitate cohort studies to be carried out at unprecedented scale, and they enable computer processing with superior pace and throughput, allowing researchers to address questions that could not be addressed by studies using limited cohorts. A well-developed example of such research is the Pan-Cancer Analysis of Whole Genomes project, which involves the analysis of petabyte-scale genomic datasets from research centers in different locations or countries and different jurisdictions. Aside from the tremendous opportunities, there are also concerns regarding the utilization of clouds; these concerns pertain to perceived limitations in data security and protection, and the need for due consideration of the rights of patient donors and research participants. Furthermore, the increased outsourcing of information technology impedes the ability of researchers to act within the realm of existing local regulations owing to fundamental differences in the understanding of the right to data protection in various legal systems. In this Opinion article, we address the current opportunities and limitations of cloud computing and highlight the responsible use of federated and hybrid clouds that are set up between public and private partners as an adequate solution for genetics and genomics research in Europe, and under certain conditions between Europe and international partners. This approach could represent a sensible middle ground between fragmented individual solutions and a "one-size-fits-all" approach.
NASA Astrophysics Data System (ADS)
Chu, C.; Sun-Mack, S.; Chen, Y.; Heckert, E.; Doelling, D. R.
2017-12-01
In Langley NASA, Clouds and the Earth's Radiant Energy System (CERES) and Moderate Resolution Imaging Spectroradiometer (MODIS) are merged with Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) on the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) and CloudSat Cloud Profiling Radar (CPR). The CERES merged product (C3M) matches up to three CALIPSO footprints with each MODIS pixel along its ground track. It then assigns the nearest CloudSat footprint to each of those MODIS pixels. The cloud properties from MODIS, retrieved using the CERES algorithms, are included in C3M with the matched CALIPSO and CloudSat products along with radiances from 18 MODIS channels. The dataset is used to validate the CERES retrieved MODIS cloud properties and the computed TOA and surface flux difference using MODIS or CALIOP/CloudSAT retrieved clouds. This information is then used to tune the computed fluxes to match the CERES observed TOA flux. A visualization tool will be invaluable to determine the cause of these large cloud and flux differences in order to improve the methodology. This effort is part of larger effort to allow users to order the CERES C3M product sub-setted by time and parameter as well as the previously mentioned visualization capabilities. This presentation will show a new graphical 3D-interface, 3D-CERESVis, that allows users to view both passive remote sensing satellites (MODIS and CERES) and active satellites (CALIPSO and CloudSat), such that the detailed vertical structures of cloud properties from CALIPSO and CloudSat are displayed side by side with horizontally retrieved cloud properties from MODIS and CERES. Similarly, the CERES computed profile fluxes whether using MODIS or CALIPSO and CloudSat clouds can also be compared. 3D-CERESVis is a browser-based visualization tool that makes uses of techniques such as multiple synchronized cursors, COLLADA format data and Cesium.
Sahoo, Satya S; Jayapandian, Catherine; Garg, Gaurav; Kaffashi, Farhad; Chung, Stephanie; Bozorgi, Alireza; Chen, Chien-Hun; Loparo, Kenneth; Lhatoo, Samden D; Zhang, Guo-Qiang
2014-01-01
The rapidly growing volume of multimodal electrophysiological signal data is playing a critical role in patient care and clinical research across multiple disease domains, such as epilepsy and sleep medicine. To facilitate secondary use of these data, there is an urgent need to develop novel algorithms and informatics approaches using new cloud computing technologies as well as ontologies for collaborative multicenter studies. We present the Cloudwave platform, which (a) defines parallelized algorithms for computing cardiac measures using the MapReduce parallel programming framework, (b) supports real-time interaction with large volumes of electrophysiological signals, and (c) features signal visualization and querying functionalities using an ontology-driven web-based interface. Cloudwave is currently used in the multicenter National Institute of Neurological Diseases and Stroke (NINDS)-funded Prevention and Risk Identification of SUDEP (sudden unexplained death in epilepsy) Mortality (PRISM) project to identify risk factors for sudden death in epilepsy. Comparative evaluations of Cloudwave with traditional desktop approaches to compute cardiac measures (eg, QRS complexes, RR intervals, and instantaneous heart rate) on epilepsy patient data show one order of magnitude improvement for single-channel ECG data and 20 times improvement for four-channel ECG data. This enables Cloudwave to support real-time user interaction with signal data, which is semantically annotated with a novel epilepsy and seizure ontology. Data privacy is a critical issue in using cloud infrastructure, and cloud platforms, such as Amazon Web Services, offer features to support Health Insurance Portability and Accountability Act standards. The Cloudwave platform is a new approach to leverage of large-scale electrophysiological data for advancing multicenter clinical research.
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
A computational- And storage-cloud for integration of biodiversity collections
Matsunaga, A.; Thompson, A.; Figueiredo, R. J.; Germain-Aubrey, C.C; Collins, M.; Beeman, R.S; Macfadden, B.J.; Riccardi, G.; Soltis, P.S; Page, L. M.; Fortes, J.A.B
2013-01-01
A core mission of the Integrated Digitized Biocollections (iDigBio) project is the building and deployment of a cloud computing environment customized to support the digitization workflow and integration of data from all U.S. nonfederal biocollections. iDigBio chose to use cloud computing technologies to deliver a cyberinfrastructure that is flexible, agile, resilient, and scalable to meet the needs of the biodiversity community. In this context, this paper describes the integration of open source cloud middleware, applications, and third party services using standard formats, protocols, and services. In addition, this paper demonstrates the value of the digitized information from collections in a broader scenario involving multiple disciplines.
Scalable computing for evolutionary genomics.
Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert
2012-01-01
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Investigation of Storage Options for Scientific Computing on Grid and Cloud Facilities
NASA Astrophysics Data System (ADS)
Garzoglio, Gabriele
2012-12-01
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storage server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on “bare metal” nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.
Crowd-Funding: A New Resource Cooperation Mode for Mobile Cloud Computing.
Zhang, Nan; Yang, Xiaolong; Zhang, Min; Sun, Yan
2016-01-01
Mobile cloud computing, which integrates the cloud computing techniques into the mobile environment, is regarded as one of the enabler technologies for 5G mobile wireless networks. There are many sporadic spare resources distributed within various devices in the networks, which can be used to support mobile cloud applications. However, these devices, with only a few spare resources, cannot support some resource-intensive mobile applications alone. If some of them cooperate with each other and share their resources, then they can support many applications. In this paper, we propose a resource cooperative provision mode referred to as "Crowd-funding", which is designed to aggregate the distributed devices together as the resource provider of mobile applications. Moreover, to facilitate high-efficiency resource management via dynamic resource allocation, different resource providers should be selected to form a stable resource coalition for different requirements. Thus, considering different requirements, we propose two different resource aggregation models for coalition formation. Finally, we may allocate the revenues based on their attributions according to the concept of the "Shapley value" to enable a more impartial revenue share among the cooperators. It is shown that a dynamic and flexible resource-management method can be developed based on the proposed Crowd-funding model, relying on the spare resources in the network.
Crowd-Funding: A New Resource Cooperation Mode for Mobile Cloud Computing
Zhang, Min; Sun, Yan
2016-01-01
Mobile cloud computing, which integrates the cloud computing techniques into the mobile environment, is regarded as one of the enabler technologies for 5G mobile wireless networks. There are many sporadic spare resources distributed within various devices in the networks, which can be used to support mobile cloud applications. However, these devices, with only a few spare resources, cannot support some resource-intensive mobile applications alone. If some of them cooperate with each other and share their resources, then they can support many applications. In this paper, we propose a resource cooperative provision mode referred to as "Crowd-funding", which is designed to aggregate the distributed devices together as the resource provider of mobile applications. Moreover, to facilitate high-efficiency resource management via dynamic resource allocation, different resource providers should be selected to form a stable resource coalition for different requirements. Thus, considering different requirements, we propose two different resource aggregation models for coalition formation. Finally, we may allocate the revenues based on their attributions according to the concept of the "Shapley value" to enable a more impartial revenue share among the cooperators. It is shown that a dynamic and flexible resource-management method can be developed based on the proposed Crowd-funding model, relying on the spare resources in the network. PMID:28030553
Investigation of storage options for scientific computing on Grid and Cloud facilities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garzoglio, Gabriele
In recent years, several new storage technologies, such as Lustre, Hadoop, OrangeFS, and BlueArc, have emerged. While several groups have run benchmarks to characterize them under a variety of configurations, more work is needed to evaluate these technologies for the use cases of scientific computing on Grid clusters and Cloud facilities. This paper discusses our evaluation of the technologies as deployed on a test bed at FermiCloud, one of the Fermilab infrastructure-as-a-service Cloud facilities. The test bed consists of 4 server-class nodes with 40 TB of disk space and up to 50 virtual machine clients, some running on the storagemore » server nodes themselves. With this configuration, the evaluation compares the performance of some of these technologies when deployed on virtual machines and on bare metal nodes. In addition to running standard benchmarks such as IOZone to check the sanity of our installation, we have run I/O intensive tests using physics-analysis applications. This paper presents how the storage solutions perform in a variety of realistic use cases of scientific computing. One interesting difference among the storage systems tested is found in a decrease in total read throughput with increasing number of client processes, which occurs in some implementations but not others.« less
Improving Individual Acceptance of Health Clouds through Confidentiality Assurance.
Ermakova, Tatiana; Fabian, Benjamin; Zarnekow, Rüdiger
2016-10-26
Cloud computing promises to essentially improve healthcare delivery performance. However, shifting sensitive medical records to third-party cloud providers could create an adoption hurdle because of security and privacy concerns. This study examines the effect of confidentiality assurance in a cloud-computing environment on individuals' willingness to accept the infrastructure for inter-organizational sharing of medical data. We empirically investigate our research question by a survey with over 260 full responses. For the setting with a high confidentiality assurance, we base on a recent multi-cloud architecture which provides very high confidentiality assurance through a secret-sharing mechanism: Health information is cryptographically encoded and distributed in a way that no single and no small group of cloud providers is able to decode it. Our results indicate the importance of confidentiality assurance in individuals' acceptance of health clouds for sensitive medical data. Specifically, this finding holds for a variety of practically relevant circumstances, i.e., in the absence and despite the presence of conventional offline alternatives and along with pseudonymization. On the other hand, we do not find support for the effect of confidentiality assurance in individuals' acceptance of health clouds for non-sensitive medical data. These results could support the process of privacy engineering for health-cloud solutions.
Improving Individual Acceptance of Health Clouds through Confidentiality Assurance
Fabian, Benjamin; Zarnekow, Rüdiger
2016-01-01
Summary Background Cloud computing promises to essentially improve healthcare delivery performance. However, shifting sensitive medical records to third-party cloud providers could create an adoption hurdle because of security and privacy concerns. Objectives This study examines the effect of confidentiality assurance in a cloud-computing environment on individuals’ willingness to accept the infrastructure for inter-organizational sharing of medical data. Methods We empirically investigate our research question by a survey with over 260 full responses. For the setting with a high confidentiality assurance, we base on a recent multi-cloud architecture which provides very high confidentiality assurance through a secret-sharing mechanism: Health information is cryptographically encoded and distributed in a way that no single and no small group of cloud providers is able to decode it. Results Our results indicate the importance of confidentiality assurance in individuals’ acceptance of health clouds for sensitive medical data. Specifically, this finding holds for a variety of practically relevant circumstances, i.e., in the absence and despite the presence of conventional offline alternatives and along with pseudonymization. On the other hand, we do not find support for the effect of confidentiality assurance in individuals’ acceptance of health clouds for non-sensitive medical data. These results could support the process of privacy engineering for health-cloud solutions. PMID:27781238
NASA Astrophysics Data System (ADS)
Esmaili, Rebekah Bradley
Global climate models, numerical weather prediction, and flood models rely on accurate satellite precipitation products, which are the only datasets that are continuous in time and space across the globe. While there are more earth observing satellites than ever before, gaps in precipitation retrievals exist due to sensor and orbital limitations of low-earth (LEO) satellites, which are overcome by merging data from different sensors in satellite precipitation products (SPPs). Using cloud tracking at higher resolutions than the spatio-temporal scales of LEO satellites, this thesis examines how clouds typically form in the atmosphere, the rate that cloud size and temperature evolve over the life cycle, and the time of day that cloud development take place. This thesis found that cloud evolution was non-linear, which disagrees with the linear interpolation schemes used in SPPs. Longer lasting clouds tended to achieve their temperature and size maturity milestones at different times, while these stages often occurred simultaneously in shorter lasting clouds. Over the ocean, longer lasting clouds were found to occur more frequently at night, while shorter lasting clouds were more common during the daytime. This thesis also examines whether large-scale Saharan dust outbreaks can impact the trajectories and intensity of cloud clusters in the tropical Atlantic, which is predicted by modeling studies. The presented results show that proximity to Saharan dust outbreaks shifts Atlantic cloud development northward and intense storms becoming more common, whereas on days with low dust loading small-scale, warmer clouds are more common. A simplified view of cloud evolution in merged rainfall retrievals is a possible source of errors, which can propagate into higher level analysis. This thesis investigates the difference in the intensity, duration, and frequency of precipitation in IMERG, a next-generation satellite precipitation product with ground radar observations over the contiguous United States. There was agreement on seasonal totals, but closer examination shows that the average intensity and duration of events is too high, and too infrequent compared to events detected on the ground. Awareness of the strengths and limitations, particularly in context of high-resolution cloud development, can enhance SPPs and can complement climate model simulations.
2010-04-29
Cloud Computing The answer, my friend, is blowing in the wind. The answer is blowing in the wind. 1Bingue ‐ Cook Cloud Computing STSC 2010... Cloud Computing STSC 2010 Objectives • Define the cloud • Risks of cloud computing f l d i• Essence o c ou comput ng • Deployed clouds in DoD 3Bingue...Cook Cloud Computing STSC 2010 Definitions of Cloud Computing Cloud computing is a model for enabling b d d ku
MaMR: High-performance MapReduce programming model for material cloud applications
NASA Astrophysics Data System (ADS)
Jing, Weipeng; Tong, Danyu; Wang, Yangang; Wang, Jingyuan; Liu, Yaqiu; Zhao, Peng
2017-02-01
With the increasing data size in materials science, existing programming models no longer satisfy the application requirements. MapReduce is a programming model that enables the easy development of scalable parallel applications to process big data on cloud computing systems. However, this model does not directly support the processing of multiple related data, and the processing performance does not reflect the advantages of cloud computing. To enhance the capability of workflow applications in material data processing, we defined a programming model for material cloud applications that supports multiple different Map and Reduce functions running concurrently based on hybrid share-memory BSP called MaMR. An optimized data sharing strategy to supply the shared data to the different Map and Reduce stages was also designed. We added a new merge phase to MapReduce that can efficiently merge data from the map and reduce modules. Experiments showed that the model and framework present effective performance improvements compared to previous work.
Airborne Cloud Computing Environment (ACCE)
NASA Technical Reports Server (NTRS)
Hardman, Sean; Freeborn, Dana; Crichton, Dan; Law, Emily; Kay-Im, Liz
2011-01-01
Airborne Cloud Computing Environment (ACCE) is JPL's internal investment to improve the return on airborne missions. Improve development performance of the data system. Improve return on the captured science data. The investment is to develop a common science data system capability for airborne instruments that encompasses the end-to-end lifecycle covering planning, provisioning of data system capabilities, and support for scientific analysis in order to improve the quality, cost effectiveness, and capabilities to enable new scientific discovery and research in earth observation.
Enabling Analytics in the Cloud for Earth Science Data
NASA Technical Reports Server (NTRS)
Ramachandran, Rahul; Lynnes, Christopher; Bingham, Andrew W.; Quam, Brandi M.
2018-01-01
The purpose of this workshop was to hold interactive discussions where providers, users, and other stakeholders could explore the convergence of three main elements in the rapidly developing world of technology: Big Data, Cloud Computing, and Analytics, [for earth science data].
Lightning data study in conjunction with geostationary satellite data
NASA Technical Reports Server (NTRS)
Auvine, Brian; Martin, David W.
1987-01-01
During the summer of 1985, cloud-to-ground stroke lightning were collected. Thirty minute samples of lightning were compared with GOES IR fractional cold cloud coverage computed for three temperature thresholds (213, 243, and 273 K) twice daily (morning and evening). It was found that satellite measurements of cold cloud have a relationship to the flashrate and, in a more limited way, to the polarity and numbers of return strokes. Results varied little by location. Lightning, especially positive strokes, was found to be correlated with fractional cloud coverage, especially for clouds at or below 213 K. Other data and correlations are discussed.