Cloud computing applications for biomedical science: A perspective.
Navale, Vivek; Bourne, Philip E
2018-06-01
Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Cloud computing applications for biomedical science: A perspective
2018-01-01
Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
If It's in the Cloud, Get It on Paper: Cloud Computing Contract Issues
ERIC Educational Resources Information Center
Trappler, Thomas J.
2010-01-01
Much recent discussion has focused on the pros and cons of cloud computing. Some institutions are attracted to cloud computing benefits such as rapid deployment, flexible scalability, and low initial start-up cost, while others are concerned about cloud computing risks such as those related to data location, level of service, and security…
2010-07-01
Cloud computing , an emerging form of computing in which users have access to scalable, on-demand capabilities that are provided through Internet... cloud computing , (2) the information security implications of using cloud computing services in the Federal Government, and (3) federal guidance and...efforts to address information security when using cloud computing . The complete report is titled Information Security: Federal Guidance Needed to
CloudMan as a platform for tool, data, and analysis distribution.
Afgan, Enis; Chapman, Brad; Taylor, James
2012-11-27
Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.
Cloud Computing in Support of Synchronized Disaster Response Operations
2010-09-01
scalable, Web application based on cloud computing technologies to facilitate communication between a broad range of public and private entities without...requiring them to compromise security or competitive advantage. The proposed design applies the unique benefits of cloud computing architectures such as
CloudMan as a platform for tool, data, and analysis distribution
2012-01-01
Background Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. Results CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. Conclusions With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions. PMID:23181507
A Strategic Approach to Network Defense: Framing the Cloud
2011-03-10
accepted network defensive principles, to reduce risks associated with emerging virtualization capabilities and scalability of cloud computing . This expanded...defensive framework can assist enterprise networking and cloud computing architects to better design more secure systems.
ERIC Educational Resources Information Center
Islam, Muhammad Faysal
2013-01-01
Cloud computing offers the advantage of on-demand, reliable and cost efficient computing solutions without the capital investment and management resources to build and maintain in-house data centers and network infrastructures. Scalability of cloud solutions enable consumers to upgrade or downsize their services as needed. In a cloud environment,…
Flexible services for the support of research.
Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John
2013-01-28
Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system
NASA Astrophysics Data System (ADS)
Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.
2014-06-01
The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
Scalable cloud without dedicated storage
NASA Astrophysics Data System (ADS)
Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.
2015-05-01
We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Serving ocean model data on the cloud
Meisinger, Michael; Farcas, Claudiu; Farcas, Emilia; Alexander, Charles; Arrott, Matthew; de La Beaujardiere, Jeff; Hubbard, Paul; Mendelssohn, Roy; Signell, Richard P.
2010-01-01
The NOAA-led Integrated Ocean Observing System (IOOS) and the NSF-funded Ocean Observatories Initiative Cyberinfrastructure Project (OOI-CI) are collaborating on a prototype data delivery system for numerical model output and other gridded data using cloud computing. The strategy is to take an existing distributed system for delivering gridded data and redeploy on the cloud, making modifications to the system that allow it to harness the scalability of the cloud as well as adding functionality that the scalability affords.
Security and Cloud Outsourcing Framework for Economic Dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Security and Cloud Outsourcing Framework for Economic Dispatch
Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi; ...
2017-04-24
The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Quantitative Investigation of the Technologies That Support Cloud Computing
ERIC Educational Resources Information Center
Hu, Wenjin
2014-01-01
Cloud computing is dramatically shaping modern IT infrastructure. It virtualizes computing resources, provides elastic scalability, serves as a pay-as-you-use utility, simplifies the IT administrators' daily tasks, enhances the mobility and collaboration of data, and increases user productivity. We focus on providing generalized black-box…
BlueSky Cloud Framework: An E-Learning Framework Embracing Cloud Computing
NASA Astrophysics Data System (ADS)
Dong, Bo; Zheng, Qinghua; Qiao, Mu; Shu, Jian; Yang, Jie
Currently, E-Learning has grown into a widely accepted way of learning. With the huge growth of users, services, education contents and resources, E-Learning systems are facing challenges of optimizing resource allocations, dealing with dynamic concurrency demands, handling rapid storage growth requirements and cost controlling. In this paper, an E-Learning framework based on cloud computing is presented, namely BlueSky cloud framework. Particularly, the architecture and core components of BlueSky cloud framework are introduced. In BlueSky cloud framework, physical machines are virtualized, and allocated on demand for E-Learning systems. Moreover, BlueSky cloud framework combines with traditional middleware functions (such as load balancing and data caching) to serve for E-Learning systems as a general architecture. It delivers reliable, scalable and cost-efficient services to E-Learning systems, and E-Learning organizations can establish systems through these services in a simple way. BlueSky cloud framework solves the challenges faced by E-Learning, and improves the performance, availability and scalability of E-Learning systems.
Large-scale parallel genome assembler over cloud computing environment.
Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong
2017-06-01
The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Scalable and responsive event processing in the cloud
Suresh, Visalakshmi; Ezhilchelvan, Paul; Watson, Paul
2013-01-01
Event processing involves continuous evaluation of queries over streams of events. Response-time optimization is traditionally done over a fixed set of nodes and/or by using metrics measured at query-operator levels. Cloud computing makes it easy to acquire and release computing nodes as required. Leveraging this flexibility, we propose a novel, queueing-theory-based approach for meeting specified response-time targets against fluctuating event arrival rates by drawing only the necessary amount of computing resources from a cloud platform. In the proposed approach, the entire processing engine of a distinct query is modelled as an atomic unit for predicting response times. Several such units hosted on a single node are modelled as a multiple class M/G/1 system. These aspects eliminate intrusive, low-level performance measurements at run-time, and also offer portability and scalability. Using model-based predictions, cloud resources are efficiently used to meet response-time targets. The efficacy of the approach is demonstrated through cloud-based experiments. PMID:23230164
Homomorphic encryption experiments on IBM's cloud quantum computing platform
NASA Astrophysics Data System (ADS)
Huang, He-Liang; Zhao, You-Wei; Li, Tan; Li, Feng-Guang; Du, Yu-Tao; Fu, Xiang-Qun; Zhang, Shuo; Wang, Xiang; Bao, Wan-Su
2017-02-01
Quantum computing has undergone rapid development in recent years. Owing to limitations on scalability, personal quantum computers still seem slightly unrealistic in the near future. The first practical quantum computer for ordinary users is likely to be on the cloud. However, the adoption of cloud computing is possible only if security is ensured. Homomorphic encryption is a cryptographic protocol that allows computation to be performed on encrypted data without decrypting them, so it is well suited to cloud computing. Here, we first applied homomorphic encryption on IBM's cloud quantum computer platform. In our experiments, we successfully implemented a quantum algorithm for linear equations while protecting our privacy. This demonstration opens a feasible path to the next stage of development of cloud quantum information technology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, Dongwan; Claycomb, William R.; Urias, Vincent E.
Cloud computing is a paradigm rapidly being embraced by government and industry as a solution for cost-savings, scalability, and collaboration. While a multitude of applications and services are available commercially for cloud-based solutions, research in this area has yet to fully embrace the full spectrum of potential challenges facing cloud computing. This tutorial aims to provide researchers with a fundamental understanding of cloud computing, with the goals of identifying a broad range of potential research topics, and inspiring a new surge in research to address current issues. We will also discuss real implementations of research-oriented cloud computing systems for bothmore » academia and government, including configuration options, hardware issues, challenges, and solutions.« less
Identity-Based Authentication for Cloud Computing
NASA Astrophysics Data System (ADS)
Li, Hongwei; Dai, Yuanshun; Tian, Ling; Yang, Haomiao
Cloud computing is a recently developed new technology for complex systems with massive-scale services sharing among numerous users. Therefore, authentication of both users and services is a significant issue for the trust and security of the cloud computing. SSL Authentication Protocol (SAP), once applied in cloud computing, will become so complicated that users will undergo a heavily loaded point both in computation and communication. This paper, based on the identity-based hierarchical model for cloud computing (IBHMCC) and its corresponding encryption and signature schemes, presented a new identity-based authentication protocol for cloud computing and services. Through simulation testing, it is shown that the authentication protocol is more lightweight and efficient than SAP, specially the more lightweight user side. Such merit of our model with great scalability is very suited to the massive-scale cloud.
ERIC Educational Resources Information Center
Panoutsopoulos, Hercules; Donert, Karl; Papoutsis, Panos; Kotsanis, Ioannis
2015-01-01
During the last few years, ongoing developments in the technological field of Cloud computing have initiated discourse on the potential of the Cloud to be systematically exploited in educational contexts. Research interest has been stimulated by a range of advantages of Cloud technologies (e.g. adaptability, flexibility, scalability,…
Do Clouds Compute? A Framework for Estimating the Value of Cloud Computing
NASA Astrophysics Data System (ADS)
Klems, Markus; Nimis, Jens; Tai, Stefan
On-demand provisioning of scalable and reliable compute services, along with a cost model that charges consumers based on actual service usage, has been an objective in distributed computing research and industry for a while. Cloud Computing promises to deliver on this objective: consumers are able to rent infrastructure in the Cloud as needed, deploy applications and store data, and access them via Web protocols on a pay-per-use basis. The acceptance of Cloud Computing, however, depends on the ability for Cloud Computing providers and consumers to implement a model for business value co-creation. Therefore, a systematic approach to measure costs and benefits of Cloud Computing is needed. In this paper, we discuss the need for valuation of Cloud Computing, identify key components, and structure these components in a framework. The framework assists decision makers in estimating Cloud Computing costs and to compare these costs to conventional IT solutions. We demonstrate by means of representative use cases how our framework can be applied to real world scenarios.
USDA-ARS?s Scientific Manuscript database
Infrastructure-as-a-service (IaaS) clouds provide a new medium for deployment of environmental modeling applications. Harnessing advancements in virtualization, IaaS clouds can provide dynamic scalable infrastructure to better support scientific modeling computational demands. Providing scientific m...
A Framework for Collaborative and Convenient Learning on Cloud Computing Platforms
ERIC Educational Resources Information Center
Sharma, Deepika; Kumar, Vikas
2017-01-01
The depth of learning resides in collaborative work with more engagement and fun. Technology can enhance collaboration with a higher level of convenience and cloud computing can facilitate this in a cost effective and scalable manner. However, to deploy a successful online learning environment, elementary components of learning pedagogy must be…
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2016-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing.
Cole, Brian S; Moore, Jason H
2018-03-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing
Moore, Jason H.
2018-01-01
Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416
Cloud Quantum Computing of an Atomic Nucleus
NASA Astrophysics Data System (ADS)
Dumitrescu, E. F.; McCaskey, A. J.; Hagen, G.; Jansen, G. R.; Morris, T. D.; Papenbrock, T.; Pooser, R. C.; Dean, D. J.; Lougovski, P.
2018-05-01
We report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
Cloud Quantum Computing of an Atomic Nucleus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumitrescu, Eugene F.; McCaskey, Alex J.; Hagen, Gaute
Here, we report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
Cloud Quantum Computing of an Atomic Nucleus.
Dumitrescu, E F; McCaskey, A J; Hagen, G; Jansen, G R; Morris, T D; Papenbrock, T; Pooser, R C; Dean, D J; Lougovski, P
2018-05-25
We report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
Cloud Quantum Computing of an Atomic Nucleus
Dumitrescu, Eugene F.; McCaskey, Alex J.; Hagen, Gaute; ...
2018-05-23
Here, we report a quantum simulation of the deuteron binding energy on quantum processors accessed via cloud servers. We use a Hamiltonian from pionless effective field theory at leading order. We design a low-depth version of the unitary coupled-cluster ansatz, use the variational quantum eigensolver algorithm, and compute the binding energy to within a few percent. Our work is the first step towards scalable nuclear structure computations on a quantum processor via the cloud, and it sheds light on how to map scientific computing applications onto nascent quantum devices.
2010-09-01
Cloud computing describes a new distributed computing paradigm for IT data and services that involves over-the-Internet provision of dynamically scalable and often virtualized resources. While cost reduction and flexibility in storage, services, and maintenance are important considerations when deciding on whether or how to migrate data and applications to the cloud, large organizations like the Department of Defense need to consider the organization and structure of data on the cloud and the operations on such data in order to reap the full benefit of cloud
Cloud Computing and Its Applications in GIS
NASA Astrophysics Data System (ADS)
Kang, Cao
2011-12-01
Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
Cloud Computing - A Unified Approach for Surveillance Issues
NASA Astrophysics Data System (ADS)
Rachana, C. R.; Banu, Reshma, Dr.; Ahammed, G. F. Ali, Dr.; Parameshachari, B. D., Dr.
2017-08-01
Cloud computing describes highly scalable resources provided as an external service via the Internet on a basis of pay-per-use. From the economic point of view, the main attractiveness of cloud computing is that users only use what they need, and only pay for what they actually use. Resources are available for access from the cloud at any time, and from any location through networks. Cloud computing is gradually replacing the traditional Information Technology Infrastructure. Securing data is one of the leading concerns and biggest issue for cloud computing. Privacy of information is always a crucial pointespecially when an individual’s personalinformation or sensitive information is beingstored in the organization. It is indeed true that today; cloud authorization systems are notrobust enough. This paper presents a unified approach for analyzing the various security issues and techniques to overcome the challenges in the cloud environment.
Space Situational Awareness Data Processing Scalability Utilizing Google Cloud Services
NASA Astrophysics Data System (ADS)
Greenly, D.; Duncan, M.; Wysack, J.; Flores, F.
Space Situational Awareness (SSA) is a fundamental and critical component of current space operations. The term SSA encompasses the awareness, understanding and predictability of all objects in space. As the population of orbital space objects and debris increases, the number of collision avoidance maneuvers grows and prompts the need for accurate and timely process measures. The SSA mission continually evolves to near real-time assessment and analysis demanding the need for higher processing capabilities. By conventional methods, meeting these demands requires the integration of new hardware to keep pace with the growing complexity of maneuver planning algorithms. SpaceNav has implemented a highly scalable architecture that will track satellites and debris by utilizing powerful virtual machines on the Google Cloud Platform. SpaceNav algorithms for processing CDMs outpace conventional means. A robust processing environment for tracking data, collision avoidance maneuvers and various other aspects of SSA can be created and deleted on demand. Migrating SpaceNav tools and algorithms into the Google Cloud Platform will be discussed and the trials and tribulations involved. Information will be shared on how and why certain cloud products were used as well as integration techniques that were implemented. Key items to be presented are: 1.Scientific algorithms and SpaceNav tools integrated into a scalable architecture a) Maneuver Planning b) Parallel Processing c) Monte Carlo Simulations d) Optimization Algorithms e) SW Application Development/Integration into the Google Cloud Platform 2. Compute Engine Processing a) Application Engine Automated Processing b) Performance testing and Performance Scalability c) Cloud MySQL databases and Database Scalability d) Cloud Data Storage e) Redundancy and Availability
Use of cloud computing in biomedicine.
Sobeslav, Vladimir; Maresova, Petra; Krejcar, Ondrej; Franca, Tanos C C; Kuca, Kamil
2016-12-01
Nowadays, biomedicine is characterised by a growing need for processing of large amounts of data in real time. This leads to new requirements for information and communication technologies (ICT). Cloud computing offers a solution to these requirements and provides many advantages, such as cost savings, elasticity and scalability of using ICT. The aim of this paper is to explore the concept of cloud computing and the related use of this concept in the area of biomedicine. Authors offer a comprehensive analysis of the implementation of the cloud computing approach in biomedical research, decomposed into infrastructure, platform and service layer, and a recommendation for processing large amounts of data in biomedicine. Firstly, the paper describes the appropriate forms and technological solutions of cloud computing. Secondly, the high-end computing paradigm of cloud computing aspects is analysed. Finally, the potential and current use of applications in scientific research of this technology in biomedicine is discussed.
Cloud-based Web Services for Near-Real-Time Web access to NPP Satellite Imagery and other Data
NASA Astrophysics Data System (ADS)
Evans, J. D.; Valente, E. G.
2010-12-01
We are building a scalable, cloud computing-based infrastructure for Web access to near-real-time data products synthesized from the U.S. National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP) and other geospatial and meteorological data. Given recent and ongoing changes in the the NPP and NPOESS programs (now Joint Polar Satellite System), the need for timely delivery of NPP data is urgent. We propose an alternative to a traditional, centralized ground segment, using distributed Direct Broadcast facilities linked to industry-standard Web services by a streamlined processing chain running in a scalable cloud computing environment. Our processing chain, currently implemented on Amazon.com's Elastic Compute Cloud (EC2), retrieves raw data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) and synthesizes data products such as Sea-Surface Temperature, Vegetation Indices, etc. The cloud computing approach lets us grow and shrink computing resources to meet large and rapid fluctuations (twice daily) in both end-user demand and data availability from polar-orbiting sensors. Early prototypes have delivered various data products to end-users with latencies between 6 and 32 minutes. We have begun to replicate machine instances in the cloud, so as to reduce latency and maintain near-real time data access regardless of increased data input rates or user demand -- all at quite moderate monthly costs. Our service-based approach (in which users invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored and composite (e.g., false-color multiband) products on demand. To facilitate broad impact and adoption of our technology, we have emphasized open, industry-standard software interfaces and open source software. Through our work, we envision the widespread establishment of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sensors. A scalable architecture based on cloud computing ensures cost-effective, real-time processing and delivery of NPP and other data. Access via standard Web services maximizes its interoperability and usefulness.
AstroCloud, a Cyber-Infrastructure for Astronomy Research: Cloud Computing Environments
NASA Astrophysics Data System (ADS)
Li, C.; Wang, J.; Cui, C.; He, B.; Fan, D.; Yang, Y.; Chen, J.; Zhang, H.; Yu, C.; Xiao, J.; Wang, C.; Cao, Z.; Fan, Y.; Hong, Z.; Li, S.; Mi, L.; Wan, W.; Wang, J.; Yin, S.
2015-09-01
AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on CloudStack, an open source software, we set up the cloud computing environment for AstroCloud Project. It consists of five distributed nodes across the mainland of China. Users can use and analysis data in this cloud computing environment. Based on GlusterFS, we built a scalable cloud storage system. Each user has a private space, which can be shared among different virtual machines and desktop systems. With this environments, astronomer can access to astronomical data collected by different telescopes and data centers easily, and data producers can archive their datasets safely.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-12-01
Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment
Meng, Bowen; Pratx, Guillem; Xing, Lei
2011-01-01
Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas
2017-01-01
Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments. PMID:28190948
Wiewiórka, Marek S; Messina, Antonio; Pacholewska, Alicja; Maffioletti, Sergio; Gawrysiak, Piotr; Okoniewski, Michał J
2014-09-15
Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes. Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Adventures in Private Cloud: Balancing Cost and Capability at the CloudSat Data Processing Center
NASA Astrophysics Data System (ADS)
Partain, P.; Finley, S.; Fluke, J.; Haynes, J. M.; Cronk, H. Q.; Miller, S. D.
2016-12-01
Since the beginning of the CloudSat Mission in 2006, The CloudSat Data Processing Center (DPC) at the Cooperative Institute for Research in the Atmosphere (CIRA) has been ingesting data from the satellite and other A-Train sensors, producing data products, and distributing them to researchers around the world. The computing infrastructure was specifically designed to fulfill the requirements as specified at the beginning of what nominally was a two-year mission. The environment consisted of servers dedicated to specific processing tasks in a rigid workflow to generate the required products. To the benefit of science and with credit to the mission engineers, CloudSat has lasted well beyond its planned lifetime and is still collecting data ten years later. Over that period requirements of the data processing system have greatly expanded and opportunities for providing value-added services have presented themselves. But while demands on the system have increased, the initial design allowed for very little expansion in terms of scalability and flexibility. The design did change to include virtual machine processing nodes and distributed workflows but infrastructure management was still a time consuming task when system modification was required to run new tests or implement new processes. To address the scalability, flexibility, and manageability of the system Cloud computing methods and technologies are now being employed. The use of a public cloud like Amazon Elastic Compute Cloud or Google Compute Engine was considered but, among other issues, data transfer and storage cost becomes a problem especially when demand fluctuates as a result of reprocessing and the introduction of new products and services. Instead, the existing system was converted to an on premises private Cloud using the OpenStack computing platform and Ceph software defined storage to reap the benefits of the Cloud computing paradigm. This work details the decisions that were made, the benefits that have been realized, the difficulties that were encountered and issues that still exist.
Reviews on Security Issues and Challenges in Cloud Computing
NASA Astrophysics Data System (ADS)
An, Y. Z.; Zaaba, Z. F.; Samsudin, N. F.
2016-11-01
Cloud computing is an Internet-based computing service provided by the third party allowing share of resources and data among devices. It is widely used in many organizations nowadays and becoming more popular because it changes the way of how the Information Technology (IT) of an organization is organized and managed. It provides lots of benefits such as simplicity and lower costs, almost unlimited storage, least maintenance, easy utilization, backup and recovery, continuous availability, quality of service, automated software integration, scalability, flexibility and reliability, easy access to information, elasticity, quick deployment and lower barrier to entry. While there is increasing use of cloud computing service in this new era, the security issues of the cloud computing become a challenges. Cloud computing must be safe and secure enough to ensure the privacy of the users. This paper firstly lists out the architecture of the cloud computing, then discuss the most common security issues of using cloud and some solutions to the security issues since security is one of the most critical aspect in cloud computing due to the sensitivity of user's data.
Cloud-Based Virtual Laboratory for Network Security Education
ERIC Educational Resources Information Center
Xu, Le; Huang, Dijiang; Tsai, Wei-Tek
2014-01-01
Hands-on experiments are essential for computer network security education. Existing laboratory solutions usually require significant effort to build, configure, and maintain and often do not support reconfigurability, flexibility, and scalability. This paper presents a cloud-based virtual laboratory education platform called V-Lab that provides a…
An Interactive Web-Based Analysis Framework for Remote Sensing Cloud Computing
NASA Astrophysics Data System (ADS)
Wang, X. Z.; Zhang, H. M.; Zhao, J. H.; Lin, Q. H.; Zhou, Y. C.; Li, J. H.
2015-07-01
Spatiotemporal data, especially remote sensing data, are widely used in ecological, geographical, agriculture, and military research and applications. With the development of remote sensing technology, more and more remote sensing data are accumulated and stored in the cloud. An effective way for cloud users to access and analyse these massive spatiotemporal data in the web clients becomes an urgent issue. In this paper, we proposed a new scalable, interactive and web-based cloud computing solution for massive remote sensing data analysis. We build a spatiotemporal analysis platform to provide the end-user with a safe and convenient way to access massive remote sensing data stored in the cloud. The lightweight cloud storage system used to store public data and users' private data is constructed based on open source distributed file system. In it, massive remote sensing data are stored as public data, while the intermediate and input data are stored as private data. The elastic, scalable, and flexible cloud computing environment is built using Docker, which is a technology of open-source lightweight cloud computing container in the Linux operating system. In the Docker container, open-source software such as IPython, NumPy, GDAL, and Grass GIS etc., are deployed. Users can write scripts in the IPython Notebook web page through the web browser to process data, and the scripts will be submitted to IPython kernel to be executed. By comparing the performance of remote sensing data analysis tasks executed in Docker container, KVM virtual machines and physical machines respectively, we can conclude that the cloud computing environment built by Docker makes the greatest use of the host system resources, and can handle more concurrent spatial-temporal computing tasks. Docker technology provides resource isolation mechanism in aspects of IO, CPU, and memory etc., which offers security guarantee when processing remote sensing data in the IPython Notebook. Users can write complex data processing code on the web directly, so they can design their own data processing algorithm.
Exploration of cloud computing late start LDRD #149630 : Raincoat. v. 2.1.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Echeverria, Victor T.; Metral, Michael David; Leger, Michelle A.
This report contains documentation from an interoperability study conducted under the Late Start LDRD 149630, Exploration of Cloud Computing. A small late-start LDRD from last year resulted in a study (Raincoat) on using Virtual Private Networks (VPNs) to enhance security in a hybrid cloud environment. Raincoat initially explored the use of OpenVPN on IPv4 and demonstrates that it is possible to secure the communication channel between two small 'test' clouds (a few nodes each) at New Mexico Tech and Sandia. We extended the Raincoat study to add IPSec support via Vyatta routers, to interface with a public cloud (Amazon Elasticmore » Compute Cloud (EC2)), and to be significantly more scalable than the previous iteration. The study contributed to our understanding of interoperability in a hybrid cloud.« less
Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.
2009-01-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N
2009-06-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds
NASA Astrophysics Data System (ADS)
Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.
In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
A computational- And storage-cloud for integration of biodiversity collections
Matsunaga, A.; Thompson, A.; Figueiredo, R. J.; Germain-Aubrey, C.C; Collins, M.; Beeman, R.S; Macfadden, B.J.; Riccardi, G.; Soltis, P.S; Page, L. M.; Fortes, J.A.B
2013-01-01
A core mission of the Integrated Digitized Biocollections (iDigBio) project is the building and deployment of a cloud computing environment customized to support the digitization workflow and integration of data from all U.S. nonfederal biocollections. iDigBio chose to use cloud computing technologies to deliver a cyberinfrastructure that is flexible, agile, resilient, and scalable to meet the needs of the biodiversity community. In this context, this paper describes the integration of open source cloud middleware, applications, and third party services using standard formats, protocols, and services. In addition, this paper demonstrates the value of the digitized information from collections in a broader scenario involving multiple disciplines.
NASA Astrophysics Data System (ADS)
Delipetrev, Blagoj
2016-04-01
Presently, most of the existing software is desktop-based, designed to work on a single computer, which represents a major limitation in many ways, starting from limited computer processing, storage power, accessibility, availability, etc. The only feasible solution lies in the web and cloud. This abstract presents research and development of a cloud computing geospatial application for water resources based on free and open source software and open standards using hybrid deployment model of public - private cloud, running on two separate virtual machines (VMs). The first one (VM1) is running on Amazon web services (AWS) and the second one (VM2) is running on a Xen cloud platform. The presented cloud application is developed using free and open source software, open standards and prototype code. The cloud application presents a framework how to develop specialized cloud geospatial application that needs only a web browser to be used. This cloud application is the ultimate collaboration geospatial platform because multiple users across the globe with internet connection and browser can jointly model geospatial objects, enter attribute data and information, execute algorithms, and visualize results. The presented cloud application is: available all the time, accessible from everywhere, it is scalable, works in a distributed computer environment, it creates a real-time multiuser collaboration platform, the programing languages code and components are interoperable, and it is flexible in including additional components. The cloud geospatial application is implemented as a specialized water resources application with three web services for 1) data infrastructure (DI), 2) support for water resources modelling (WRM), 3) user management. The web services are running on two VMs that are communicating over the internet providing services to users. The application was tested on the Zletovica river basin case study with concurrent multiple users. The application is a state-of-the-art cloud geospatial collaboration platform. The presented solution is a prototype and can be used as a foundation for developing of any specialized cloud geospatial applications. Further research will be focused on distributing the cloud application on additional VMs, testing the scalability and availability of services.
Geometric Data Perturbation-Based Personal Health Record Transactions in Cloud Computing
Balasubramaniam, S.; Kavitha, V.
2015-01-01
Cloud computing is a new delivery model for information technology services and it typically involves the provision of dynamically scalable and often virtualized resources over the Internet. However, cloud computing raises concerns on how cloud service providers, user organizations, and governments should handle such information and interactions. Personal health records represent an emerging patient-centric model for health information exchange, and they are outsourced for storage by third parties, such as cloud providers. With these records, it is necessary for each patient to encrypt their own personal health data before uploading them to cloud servers. Current techniques for encryption primarily rely on conventional cryptographic approaches. However, key management issues remain largely unsolved with these cryptographic-based encryption techniques. We propose that personal health record transactions be managed using geometric data perturbation in cloud computing. In our proposed scheme, the personal health record database is perturbed using geometric data perturbation and outsourced to the Amazon EC2 cloud. PMID:25767826
Geometric data perturbation-based personal health record transactions in cloud computing.
Balasubramaniam, S; Kavitha, V
2015-01-01
Cloud computing is a new delivery model for information technology services and it typically involves the provision of dynamically scalable and often virtualized resources over the Internet. However, cloud computing raises concerns on how cloud service providers, user organizations, and governments should handle such information and interactions. Personal health records represent an emerging patient-centric model for health information exchange, and they are outsourced for storage by third parties, such as cloud providers. With these records, it is necessary for each patient to encrypt their own personal health data before uploading them to cloud servers. Current techniques for encryption primarily rely on conventional cryptographic approaches. However, key management issues remain largely unsolved with these cryptographic-based encryption techniques. We propose that personal health record transactions be managed using geometric data perturbation in cloud computing. In our proposed scheme, the personal health record database is perturbed using geometric data perturbation and outsourced to the Amazon EC2 cloud.
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.
Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina
2013-01-28
Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
Three-Dimensional Space to Assess Cloud Interoperability
2013-03-01
12 1. Portability and Mobility ...collection of network-enabled services that guarantees to provide a scalable, easy accessible, reliable, and personalized computing infrastructure , based on...are used in research to describe cloud models, such as SaaS (Software as a Service), PaaS (Platform as a service), IaaS ( Infrastructure as a Service
Cloud Computing for Protein-Ligand Binding Site Comparison
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery. PMID:23762824
Cloud computing for protein-ligand binding site comparison.
Hung, Che-Lun; Hua, Guan-Jie
2013-01-01
The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery.
The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences
USDA-ARS?s Scientific Manuscript database
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...
Cloud Computing Boosts Business Intelligence of Telecommunication Industry
NASA Astrophysics Data System (ADS)
Xu, Meng; Gao, Dan; Deng, Chao; Luo, Zhiguo; Sun, Shaoling
Business Intelligence becomes an attracting topic in today's data intensive applications, especially in telecommunication industry. Meanwhile, Cloud Computing providing IT supporting Infrastructure with excellent scalability, large scale storage, and high performance becomes an effective way to implement parallel data processing and data mining algorithms. BC-PDM (Big Cloud based Parallel Data Miner) is a new MapReduce based parallel data mining platform developed by CMRI (China Mobile Research Institute) to fit the urgent requirements of business intelligence in telecommunication industry. In this paper, the architecture, functionality and performance of BC-PDM are presented, together with the experimental evaluation and case studies of its applications. The evaluation result demonstrates both the usability and the cost-effectiveness of Cloud Computing based Business Intelligence system in applications of telecommunication industry.
Towards real-time photon Monte Carlo dose calculation in the cloud
NASA Astrophysics Data System (ADS)
Ziegenhein, Peter; Kozin, Igor N.; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-01
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards real-time photon Monte Carlo dose calculation in the cloud.
Ziegenhein, Peter; Kozin, Igor N; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-07
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.
Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo
2016-01-01
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507
Cloud Computing for the Grid: GridControl: A Software Platform to Support the Smart Grid
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
GENI Project: Cornell University is creating a new software platform for grid operators called GridControl that will utilize cloud computing to more efficiently control the grid. In a cloud computing system, there are minimal hardware and software demands on users. The user can tap into a network of computers that is housed elsewhere (the cloud) and the network runs computer applications for the user. The user only needs interface software to access all of the cloud’s data resources, which can be as simple as a web browser. Cloud computing can reduce costs, facilitate innovation through sharing, empower users, and improvemore » the overall reliability of a dispersed system. Cornell’s GridControl will focus on 4 elements: delivering the state of the grid to users quickly and reliably; building networked, scalable grid-control software; tailoring services to emerging smart grid uses; and simulating smart grid behavior under various conditions.« less
A Platform for Scalable Satellite and Geospatial Data Analysis
NASA Astrophysics Data System (ADS)
Beneke, C. M.; Skillman, S.; Warren, M. S.; Kelton, T.; Brumby, S. P.; Chartrand, R.; Mathis, M.
2017-12-01
At Descartes Labs, we use the commercial cloud to run global-scale machine learning applications over satellite imagery. We have processed over 5 Petabytes of public and commercial satellite imagery, including the full Landsat and Sentinel archives. By combining open-source tools with a FUSE-based filesystem for cloud storage, we have enabled a scalable compute platform that has demonstrated reading over 200 GB/s of satellite imagery into cloud compute nodes. In one application, we generated global 15m Landsat-8, 20m Sentinel-1, and 10m Sentinel-2 composites from 15 trillion pixels, using over 10,000 CPUs. We recently created a public open-source Python client library that can be used to query and access preprocessed public satellite imagery from within our platform, and made this platform available to researchers for non-commercial projects. In this session, we will describe how you can use the Descartes Labs Platform for rapid prototyping and scaling of geospatial analyses and demonstrate examples in land cover classification.
Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets.
Tran, Ngoc Tam L; Huang, Chun-Hsi
2017-05-01
We developed the cloud-based MOTIFSIM on Amazon Web Services (AWS) cloud. The tool is an extended version from our web-based tool version 2.0, which was developed based on a novel algorithm for detecting similarity in multiple DNA motif data sets. This cloud-based version further allows researchers to exploit the computing resources available from AWS to detect similarity in multiple large-scale DNA motif data sets resulting from the next-generation sequencing technology. The tool is highly scalable with expandable AWS.
Benefits of cloud computing for PACS and archiving.
Koch, Patrick
2012-01-01
The goal of cloud-based services is to provide easy, scalable access to computing resources and IT services. The healthcare industry requires a private cloud that adheres to government mandates designed to ensure privacy and security of patient data while enabling access by authorized users. Cloud-based computing in the imaging market has evolved from a service that provided cost effective disaster recovery for archived data to fully featured PACS and vendor neutral archiving services that can address the needs of healthcare providers of all sizes. Healthcare providers worldwide are now using the cloud to distribute images to remote radiologists while supporting advanced reading tools, deliver radiology reports and imaging studies to referring physicians, and provide redundant data storage. Vendor managed cloud services eliminate large capital investments in equipment and maintenance, as well as staffing for the data center--creating a reduction in total cost of ownership for the healthcare provider.
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters
ERIC Educational Resources Information Center
Younge, Andrew J.
2016-01-01
With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…
Exploiting NASA's Cumulus Earth Science Cloud Archive with Services and Computation
NASA Astrophysics Data System (ADS)
Pilone, D.; Quinn, P.; Jazayeri, A.; Schuler, I.; Plofchan, P.; Baynes, K.; Ramachandran, R.
2017-12-01
NASA's Earth Observing System Data and Information System (EOSDIS) houses nearly 30PBs of critical Earth Science data and with upcoming missions is expected to balloon to between 200PBs-300PBs over the next seven years. In addition to the massive increase in data collected, researchers and application developers want more and faster access - enabling complex visualizations, long time-series analysis, and cross dataset research without needing to copy and manage massive amounts of data locally. NASA has started prototyping with commercial cloud providers to make this data available in elastic cloud compute environments, allowing application developers direct access to the massive EOSDIS holdings. In this talk we'll explain the principles behind the archive architecture and share our experience of dealing with large amounts of data with serverless architectures including AWS Lambda, the Elastic Container Service (ECS) for long running jobs, and why we dropped thousands of lines of code for AWS Step Functions. We'll discuss best practices and patterns for accessing and using data available in a shared object store (S3) and leveraging events and message passing for sophisticated and highly scalable processing and analysis workflows. Finally we'll share capabilities NASA and cloud services are making available on the archives to enable massively scalable analysis and computation in a variety of formats and tools.
CLON: Overlay Networks and Gossip Protocols for Cloud Environments
NASA Astrophysics Data System (ADS)
Matos, Miguel; Sousa, António; Pereira, José; Oliveira, Rui; Deliot, Eric; Murray, Paul
Although epidemic or gossip-based multicast is a robust and scalable approach to reliable data dissemination, its inherent redundancy results in high resource consumption on both links and nodes. This problem is aggravated in settings that have costlier or resource constrained links as happens in Cloud Computing infrastructures composed by several interconnected data centers across the globe.
Toward Scalable Trustworthy Computing Using the Human-Physiology-Immunity Metaphor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hively, Lee M; Sheldon, Frederick T
The cybersecurity landscape consists of an ad hoc patchwork of solutions. Optimal cybersecurity is difficult for various reasons: complexity, immense data and processing requirements, resource-agnostic cloud computing, practical time-space-energy constraints, inherent flaws in 'Maginot Line' defenses, and the growing number and sophistication of cyberattacks. This article defines the high-priority problems and examines the potential solution space. In that space, achieving scalable trustworthy computing and communications is possible through real-time knowledge-based decisions about cyber trust. This vision is based on the human-physiology-immunity metaphor and the human brain's ability to extract knowledge from data and information. The article outlines future steps towardmore » scalable trustworthy systems requiring a long-term commitment to solve the well-known challenges.« less
Diaz, Javier; Arrizabalaga, Saioa; Bustamante, Paul; Mesa, Iker; Añorga, Javier; Goya, Jon
2013-01-01
Portable systems and global communications open a broad spectrum for new health applications. In the framework of electrophysiological applications, several challenges are faced when developing portable systems embedded in Cloud computing services. In order to facilitate new developers in this area based on our experience, five areas of interest are presented in this paper where strategies can be applied for improving the performance of portable systems: transducer and conditioning, processing, wireless communications, battery and power management. Likewise, for Cloud services, scalability, portability, privacy and security guidelines have been highlighted.
Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects
Clematis, Andrea; Quarati, Alfonso; Cesini, Daniele; Milanesi, Luciano; Merelli, Ivan
2013-01-01
Cloud computing opens new perspectives for small-medium biotechnology laboratories that need to perform bioinformatics analysis in a flexible and effective way. This seems particularly true for hybrid clouds that couple the scalability offered by general-purpose public clouds with the greater control and ad hoc customizations supplied by the private ones. A hybrid cloud broker, acting as an intermediary between users and public providers, can support customers in the selection of the most suitable offers, optionally adding the provisioning of dedicated services with higher levels of quality. This paper analyses some economic and practical aspects of exploiting cloud computing in a real research scenario for the in silico drug discovery in terms of requirements, costs, and computational load based on the number of expected users. In particular, our work is aimed at supporting both the researchers and the cloud broker delivering an IaaS cloud infrastructure for biotechnology laboratories exposing different levels of nonfunctional requirements. PMID:24106693
Genotyping in the cloud with Crossbow.
Gurtowski, James; Schatz, Michael C; Langmead, Ben
2012-09-01
Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.
Integration of High-Performance Computing into Cloud Computing Services
NASA Astrophysics Data System (ADS)
Vouk, Mladen A.; Sills, Eric; Dreher, Patrick
High-Performance Computing (HPC) projects span a spectrum of computer hardware implementations ranging from peta-flop supercomputers, high-end tera-flop facilities running a variety of operating systems and applications, to mid-range and smaller computational clusters used for HPC application development, pilot runs and prototype staging clusters. What they all have in common is that they operate as a stand-alone system rather than a scalable and shared user re-configurable resource. The advent of cloud computing has changed the traditional HPC implementation. In this article, we will discuss a very successful production-level architecture and policy framework for supporting HPC services within a more general cloud computing infrastructure. This integrated environment, called Virtual Computing Lab (VCL), has been operating at NC State since fall 2004. Nearly 8,500,000 HPC CPU-Hrs were delivered by this environment to NC State faculty and students during 2009. In addition, we present and discuss operational data that show that integration of HPC and non-HPC (or general VCL) services in a cloud can substantially reduce the cost of delivering cloud services (down to cents per CPU hour).
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Machine learning patterns for neuroimaging-genetic studies in the cloud.
Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand
2014-01-01
Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.
Bioinformatics and Microarray Data Analysis on the Cloud.
Calabrese, Barbara; Cannataro, Mario
2016-01-01
High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network
NASA Astrophysics Data System (ADS)
Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan
2017-01-01
The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.
2015-01-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.
Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L
2015-02-01
Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
TethysCluster: A comprehensive approach for harnessing cloud resources for hydrologic modeling
NASA Astrophysics Data System (ADS)
Nelson, J.; Jones, N.; Ames, D. P.
2015-12-01
Advances in water resources modeling are improving the information that can be supplied to support decisions affecting the safety and sustainability of society. However, as water resources models become more sophisticated and data-intensive they require more computational power to run. Purchasing and maintaining the computing facilities needed to support certain modeling tasks has been cost-prohibitive for many organizations. With the advent of the cloud, the computing resources needed to address this challenge are now available and cost-effective, yet there still remains a significant technical barrier to leverage these resources. This barrier inhibits many decision makers and even trained engineers from taking advantage of the best science and tools available. Here we present the Python tools TethysCluster and CondorPy, that have been developed to lower the barrier to model computation in the cloud by providing (1) programmatic access to dynamically scalable computing resources, (2) a batch scheduling system to queue and dispatch the jobs to the computing resources, (3) data management for job inputs and outputs, and (4) the ability to dynamically create, submit, and monitor computing jobs. These Python tools leverage the open source, computing-resource management, and job management software, HTCondor, to offer a flexible and scalable distributed-computing environment. While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision-support systems deployed as web apps.
Streaming support for data intensive cloud-based sequence analysis.
Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karthik, Rajasekar
2014-01-01
In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
Evaluating open-source cloud computing solutions for geosciences
NASA Astrophysics Data System (ADS)
Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong
2013-09-01
Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
2011-01-01
Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Human face recognition using eigenface in cloud computing environment
NASA Astrophysics Data System (ADS)
Siregar, S. T. M.; Syahputra, M. F.; Rahmat, R. F.
2018-02-01
Doing a face recognition for one single face does not take a long time to process, but if we implement attendance system or security system on companies that have many faces to be recognized, it will take a long time. Cloud computing is a computing service that is done not on a local device, but on an internet connected to a data center infrastructure. The system of cloud computing also provides a scalability solution where cloud computing can increase the resources needed when doing larger data processing. This research is done by applying eigenface while collecting data as training data is also done by using REST concept to provide resource, then server can process the data according to existing stages. After doing research and development of this application, it can be concluded by implementing Eigenface, recognizing face by applying REST concept as endpoint in giving or receiving related information to be used as a resource in doing model formation to do face recognition.
Signal and image processing algorithm performance in a virtual and elastic computing environment
NASA Astrophysics Data System (ADS)
Bennett, Kelly W.; Robertson, James
2013-05-01
The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
Interoperating Cloud-based Virtual Farms
NASA Astrophysics Data System (ADS)
Bagnasco, S.; Colamaria, F.; Colella, D.; Casula, E.; Elia, D.; Franco, A.; Lusso, S.; Luparello, G.; Masera, M.; Miniello, G.; Mura, D.; Piano, S.; Vallero, S.; Venaruzzo, M.; Vino, G.
2015-12-01
The present work aims at optimizing the use of computing resources available at the grid Italian Tier-2 sites of the ALICE experiment at CERN LHC by making them accessible to interactive distributed analysis, thanks to modern solutions based on cloud computing. The scalability and elasticity of the computing resources via dynamic (“on-demand”) provisioning is essentially limited by the size of the computing site, reaching the theoretical optimum only in the asymptotic case of infinite resources. The main challenge of the project is to overcome this limitation by federating different sites through a distributed cloud facility. Storage capacities of the participating sites are seen as a single federated storage area, preventing the need of mirroring data across them: high data access efficiency is guaranteed by location-aware analysis software and storage interfaces, in a transparent way from an end-user perspective. Moreover, the interactive analysis on the federated cloud reduces the execution time with respect to grid batch jobs. The tests of the investigated solutions for both cloud computing and distributed storage on wide area network will be presented.
Infrastructures for Distributed Computing: the case of BESIII
NASA Astrophysics Data System (ADS)
Pellegrino, J.
2018-05-01
The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.
Towards Scalable Graph Computation on Mobile Devices.
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2014-10-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.
Towards Scalable Graph Computation on Mobile Devices
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2015-01-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach. PMID:25859564
cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design
Pan, Yuchao; Dong, Yuxi; Zhou, Jingtian; Hallen, Mark; Donald, Bruce R.; Xu, Wei
2016-01-01
Abstract Finding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches. PMID:27154509
Large-scale virtual screening on public cloud resources with Apache Spark.
Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola
2017-01-01
Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
An approximate dynamic programming approach to resource management in multi-cloud scenarios
NASA Astrophysics Data System (ADS)
Pietrabissa, Antonio; Priscoli, Francesco Delli; Di Giorgio, Alessandro; Giuseppi, Alessandro; Panfili, Martina; Suraci, Vincenzo
2017-03-01
The programmability and the virtualisation of network resources are crucial to deploy scalable Information and Communications Technology (ICT) services. The increasing demand of cloud services, mainly devoted to the storage and computing, requires a new functional element, the Cloud Management Broker (CMB), aimed at managing multiple cloud resources to meet the customers' requirements and, simultaneously, to optimise their usage. This paper proposes a multi-cloud resource allocation algorithm that manages the resource requests with the aim of maximising the CMB revenue over time. The algorithm is based on Markov decision process modelling and relies on reinforcement learning techniques to find online an approximate solution.
Polyphony: A Workflow Orchestration Framework for Cloud Computing
NASA Technical Reports Server (NTRS)
Shams, Khawaja S.; Powell, Mark W.; Crockett, Tom M.; Norris, Jeffrey S.; Rossi, Ryan; Soderstrom, Tom
2010-01-01
Cloud Computing has delivered unprecedented compute capacity to NASA missions at affordable rates. Missions like the Mars Exploration Rovers (MER) and Mars Science Lab (MSL) are enjoying the elasticity that enables them to leverage hundreds, if not thousands, or machines for short durations without making any hardware procurements. In this paper, we describe Polyphony, a resilient, scalable, and modular framework that efficiently leverages a large set of computing resources to perform parallel computations. Polyphony can employ resources on the cloud, excess capacity on local machines, as well as spare resources on the supercomputing center, and it enables these resources to work in concert to accomplish a common goal. Polyphony is resilient to node failures, even if they occur in the middle of a transaction. We will conclude with an evaluation of a production-ready application built on top of Polyphony to perform image-processing operations of images from around the solar system, including Mars, Saturn, and Titan.
A Cloud-Based Simulation Architecture for Pandemic Influenza Simulation
Eriksson, Henrik; Raciti, Massimiliano; Basile, Maurizio; Cunsolo, Alessandro; Fröberg, Anders; Leifler, Ola; Ekberg, Joakim; Timpka, Toomas
2011-01-01
High-fidelity simulations of pandemic outbreaks are resource consuming. Cluster-based solutions have been suggested for executing such complex computations. We present a cloud-based simulation architecture that utilizes computing resources both locally available and dynamically rented online. The approach uses the Condor framework for job distribution and management of the Amazon Elastic Computing Cloud (EC2) as well as local resources. The architecture has a web-based user interface that allows users to monitor and control simulation execution. In a benchmark test, the best cost-adjusted performance was recorded for the EC2 H-CPU Medium instance, while a field trial showed that the job configuration had significant influence on the execution time and that the network capacity of the master node could become a bottleneck. We conclude that it is possible to develop a scalable simulation environment that uses cloud-based solutions, while providing an easy-to-use graphical user interface. PMID:22195089
Processing NASA Earth Science Data on Nebula Cloud
NASA Technical Reports Server (NTRS)
Chen, Aijun; Pham, Long; Kempler, Steven
2012-01-01
Three applications were successfully migrated to Nebula, including S4PM, AIRS L1/L2 algorithms, and Giovanni MAPSS. Nebula has some advantages compared with local machines (e.g. performance, cost, scalability, bundling, etc.). Nebula still faces some challenges (e.g. stability, object storage, networking, etc.). Migrating applications to Nebula is feasible but time consuming. Lessons learned from our Nebula experience will benefit future Cloud Computing efforts at GES DISC.
Yang, Shuai; Zhang, Xinlei; Diao, Lihong; Guo, Feifei; Wang, Dan; Liu, Zhongyang; Li, Honglei; Zheng, Junjie; Pan, Jingshan; Nice, Edouard C; Li, Dong; He, Fuchu
2015-09-04
The Chromosome-centric Human Proteome Project (C-HPP) aims to catalog genome-encoded proteins using a chromosome-by-chromosome strategy. As the C-HPP proceeds, the increasing requirement for data-intensive analysis of the MS/MS data poses a challenge to the proteomic community, especially small laboratories lacking computational infrastructure. To address this challenge, we have updated the previous CAPER browser into a higher version, CAPER 3.0, which is a scalable cloud-based system for data-intensive analysis of C-HPP data sets. CAPER 3.0 uses cloud computing technology to facilitate MS/MS-based peptide identification. In particular, it can use both public and private cloud, facilitating the analysis of C-HPP data sets. CAPER 3.0 provides a graphical user interface (GUI) to help users transfer data, configure jobs, track progress, and visualize the results comprehensively. These features enable users without programming expertise to easily conduct data-intensive analysis using CAPER 3.0. Here, we illustrate the usage of CAPER 3.0 with four specific mass spectral data-intensive problems: detecting novel peptides, identifying single amino acid variants (SAVs) derived from known missense mutations, identifying sample-specific SAVs, and identifying exon-skipping events. CAPER 3.0 is available at http://prodigy.bprc.ac.cn/caper3.
CloudMC: a cloud computing application for Monte Carlo simulation.
Miras, H; Jiménez, R; Miras, C; Gomà, C
2013-04-21
This work presents CloudMC, a cloud computing application-developed in Windows Azure®, the platform of the Microsoft® cloud-for the parallelization of Monte Carlo simulations in a dynamic virtual cluster. CloudMC is a web application designed to be independent of the Monte Carlo code in which the simulations are based-the simulations just need to be of the form: input files → executable → output files. To study the performance of CloudMC in Windows Azure®, Monte Carlo simulations with penelope were performed on different instance (virtual machine) sizes, and for different number of instances. The instance size was found to have no effect on the simulation runtime. It was also found that the decrease in time with the number of instances followed Amdahl's law, with a slight deviation due to the increase in the fraction of non-parallelizable time with increasing number of instances. A simulation that would have required 30 h of CPU on a single instance was completed in 48.6 min when executed on 64 instances in parallel (speedup of 37 ×). Furthermore, the use of cloud computing for parallel computing offers some advantages over conventional clusters: high accessibility, scalability and pay per usage. Therefore, it is strongly believed that cloud computing will play an important role in making Monte Carlo dose calculation a reality in future clinical practice.
Capabilities and Advantages of Cloud Computing in the Implementation of Electronic Health Record.
Ahmadi, Maryam; Aslani, Nasim
2018-01-01
With regard to the high cost of the Electronic Health Record (EHR), in recent years the use of new technologies, in particular cloud computing, has increased. The purpose of this study was to review systematically the studies conducted in the field of cloud computing. The present study was a systematic review conducted in 2017. Search was performed in the Scopus, Web of Sciences, IEEE, Pub Med and Google Scholar databases by combination keywords. From the 431 article that selected at the first, after applying the inclusion and exclusion criteria, 27 articles were selected for surveyed. Data gathering was done by a self-made check list and was analyzed by content analysis method. The finding of this study showed that cloud computing is a very widespread technology. It includes domains such as cost, security and privacy, scalability, mutual performance and interoperability, implementation platform and independence of Cloud Computing, ability to search and exploration, reducing errors and improving the quality, structure, flexibility and sharing ability. It will be effective for electronic health record. According to the findings of the present study, higher capabilities of cloud computing are useful in implementing EHR in a variety of contexts. It also provides wide opportunities for managers, analysts and providers of health information systems. Considering the advantages and domains of cloud computing in the establishment of HER, it is recommended to use this technology.
Capabilities and Advantages of Cloud Computing in the Implementation of Electronic Health Record
Ahmadi, Maryam; Aslani, Nasim
2018-01-01
Background: With regard to the high cost of the Electronic Health Record (EHR), in recent years the use of new technologies, in particular cloud computing, has increased. The purpose of this study was to review systematically the studies conducted in the field of cloud computing. Methods: The present study was a systematic review conducted in 2017. Search was performed in the Scopus, Web of Sciences, IEEE, Pub Med and Google Scholar databases by combination keywords. From the 431 article that selected at the first, after applying the inclusion and exclusion criteria, 27 articles were selected for surveyed. Data gathering was done by a self-made check list and was analyzed by content analysis method. Results: The finding of this study showed that cloud computing is a very widespread technology. It includes domains such as cost, security and privacy, scalability, mutual performance and interoperability, implementation platform and independence of Cloud Computing, ability to search and exploration, reducing errors and improving the quality, structure, flexibility and sharing ability. It will be effective for electronic health record. Conclusion: According to the findings of the present study, higher capabilities of cloud computing are useful in implementing EHR in a variety of contexts. It also provides wide opportunities for managers, analysts and providers of health information systems. Considering the advantages and domains of cloud computing in the establishment of HER, it is recommended to use this technology. PMID:29719309
The AIST Managed Cloud Environment
NASA Astrophysics Data System (ADS)
Cook, S.
2016-12-01
ESTO is currently in the process of developing and implementing the AIST Managed Cloud Environment (AMCE) to offer cloud computing services to ESTO-funded PIs to conduct their project research. AIST will provide projects access to a cloud computing framework that incorporates NASA security, technical, and financial standards, on which project can freely store, run, and process data. Currently, many projects led by research groups outside of NASA do not have the awareness of requirements or the resources to implement NASA standards into their research, which limits the likelihood of infusing the work into NASA applications. Offering this environment to PIs will allow them to conduct their project research using the many benefits of cloud computing. In addition to the well-known cost and time savings that it allows, it also provides scalability and flexibility. The AMCE will facilitate infusion and end user access by ensuring standardization and security. This approach will ultimately benefit ESTO, the science community, and the research, allowing the technology developments to have quicker and broader applications.
The AMCE (AIST Managed Cloud Environment)
NASA Astrophysics Data System (ADS)
Cook, S.
2017-12-01
ESTO has developed and implemented the AIST Managed Cloud Environment (AMCE) to offer cloud computing services to SMD-funded PIs to conduct their project research. AIST will provide projects access to a cloud computing framework that incorporates NASA security, technical, and financial standards, on which project can freely store, run, and process data. Currently, many projects led by research groups outside of NASA do not have the awareness of requirements or the resources to implement NASA standards into their research, which limits the likelihood of infusing the work into NASA applications. Offering this environment to PIs allows them to conduct their project research using the many benefits of cloud computing. In addition to the well-known cost and time savings that it allows, it also provides scalability and flexibility. The AMCE facilitates infusion and end user access by ensuring standardization and security. This approach will ultimately benefit ESTO, the science community, and the research, allowing the technology developments to have quicker and broader applications.
Identification of Program Signatures from Cloud Computing System Telemetry Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nichols, Nicole M.; Greaves, Mark T.; Smith, William P.
Malicious cloud computing activity can take many forms, including running unauthorized programs in a virtual environment. Detection of these malicious activities while preserving the privacy of the user is an important research challenge. Prior work has shown the potential viability of using cloud service billing metrics as a mechanism for proxy identification of malicious programs. Previously this novel detection method has been evaluated in a synthetic and isolated computational environment. In this paper we demonstrate the ability of billing metrics to identify programs, in an active cloud computing environment, including multiple virtual machines running on the same hypervisor. The openmore » source cloud computing platform OpenStack, is used for private cloud management at Pacific Northwest National Laboratory. OpenStack provides a billing tool (Ceilometer) to collect system telemetry measurements. We identify four different programs running on four virtual machines under the same cloud user account. Programs were identified with up to 95% accuracy. This accuracy is dependent on the distinctiveness of telemetry measurements for the specific programs we tested. Future work will examine the scalability of this approach for a larger selection of programs to better understand the uniqueness needed to identify a program. Additionally, future work should address the separation of signatures when multiple programs are running on the same virtual machine.« less
Towards Efficient Scientific Data Management Using Cloud Storage
NASA Technical Reports Server (NTRS)
He, Qiming
2013-01-01
A software prototype allows users to backup and restore data to/from both public and private cloud storage such as Amazon's S3 and NASA's Nebula. Unlike other off-the-shelf tools, this software ensures user data security in the cloud (through encryption), and minimizes users operating costs by using space- and bandwidth-efficient compression and incremental backup. Parallel data processing utilities have also been developed by using massively scalable cloud computing in conjunction with cloud storage. One of the innovations in this software is using modified open source components to work with a private cloud like NASA Nebula. Another innovation is porting the complex backup to- cloud software to embedded Linux, running on the home networking devices, in order to benefit more users.
Streaming Support for Data Intensive Cloud-Based Sequence Analysis
Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed
2013-01-01
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461
NASA Astrophysics Data System (ADS)
Dowling, J. A.; Burdett, N.; Greer, P. B.; Sun, J.; Parker, J.; Pichler, P.; Stanwell, P.; Chandra, S.; Rivest-Hénault, D.; Ghose, S.; Salvado, O.; Fripp, J.
2014-03-01
Our group have been developing methods for MRI-alone prostate cancer radiation therapy treatment planning. To assist with clinical validation of the workflow we are investigating a cloud platform solution for research purposes. Benefits of cloud computing can include increased scalability, performance and extensibility while reducing total cost of ownership. In this paper we demonstrate the generation of DICOM-RT directories containing an automatic average atlas based electron density image and fast pelvic organ contouring from whole pelvis MR scans.
Secure data sharing in public cloud
NASA Astrophysics Data System (ADS)
Venkataramana, Kanaparti; Naveen Kumar, R.; Tatekalva, Sandhya; Padmavathamma, M.
2012-04-01
Secure multi-party protocols have been proposed for entities (organizations or individuals) that don't fully trust each other to share sensitive information. Many types of entities need to collect, analyze, and disseminate data rapidly and accurately, without exposing sensitive information to unauthorized or untrusted parties. Solutions based on secure multiparty computation guarantee privacy and correctness, at an extra communication (too costly in communication to be practical) and computation cost. The high overhead motivates us to extend this SMC to cloud environment which provides large computation and communication capacity which makes SMC to be used between multiple clouds (i.e., it may between private or public or hybrid clouds).Cloud may encompass many high capacity servers which acts as a hosts which participate in computation (IaaS and PaaS) for final result, which is controlled by Cloud Trusted Authority (CTA) for secret sharing within the cloud. The communication between two clouds is controlled by High Level Trusted Authority (HLTA) which is one of the hosts in a cloud which provides MgaaS (Management as a Service). Due to high risk for security in clouds, HLTA generates and distributes public keys and private keys by using Carmichael-R-Prime- RSA algorithm for exchange of private data in SMC between itself and clouds. In cloud, CTA creates Group key for Secure communication between the hosts in cloud based on keys sent by HLTA for exchange of Intermediate values and shares for computation of final result. Since this scheme is extended to be used in clouds( due to high availability and scalability to increase computation power) it is possible to implement SMC practically for privacy preserving in data mining at low cost for the clients.
Making Spatial Statistics Service Accessible On Cloud Platform
NASA Astrophysics Data System (ADS)
Mu, X.; Wu, J.; Li, T.; Zhong, Y.; Gao, X.
2014-04-01
Web service can bring together applications running on diverse platforms, users can access and share various data, information and models more effectively and conveniently from certain web service platform. Cloud computing emerges as a paradigm of Internet computing in which dynamical, scalable and often virtualized resources are provided as services. With the rampant growth of massive data and restriction of net, traditional web services platforms have some prominent problems existing in development such as calculation efficiency, maintenance cost and data security. In this paper, we offer a spatial statistics service based on Microsoft cloud. An experiment was carried out to evaluate the availability and efficiency of this service. The results show that this spatial statistics service is accessible for the public conveniently with high processing efficiency.
The cloud paradigm applied to e-Health.
Vilaplana, Jordi; Solsona, Francesc; Abella; Filgueira, Rosa; Rius, Josep
2013-03-14
Cloud computing is a new paradigm that is changing how enterprises, institutions and people understand, perceive and use current software systems. With this paradigm, the organizations have no need to maintain their own servers, nor host their own software. Instead, everything is moved to the cloud and provided on demand, saving energy, physical space and technical staff. Cloud-based system architectures provide many advantages in terms of scalability, maintainability and massive data processing. We present the design of an e-health cloud system, modelled by an M/M/m queue with QoS capabilities, i.e. maximum waiting time of requests. Detailed results for the model formed by a Jackson network of two M/M/m queues from the queueing theory perspective are presented. These results show a significant performance improvement when the number of servers increases. Platform scalability becomes a critical issue since we aim to provide the system with high Quality of Service (QoS). In this paper we define an architecture capable of adapting itself to different diseases and growing numbers of patients. This platform could be applied to the medical field to greatly enhance the results of those therapies that have an important psychological component, such as addictions and chronic diseases.
Duro, Francisco Rodrigo; Blas, Javier Garcia; Isaila, Florin; ...
2016-10-06
The increasing volume of scientific data and the limited scalability and performance of storage systems are currently presenting a significant limitation for the productivity of the scientific workflows running on both high-performance computing (HPC) and cloud platforms. Clearly needed is better integration of storage systems and workflow engines to address this problem. This paper presents and evaluates a novel solution that leverages codesign principles for integrating Hercules—an in-memory data store—with a workflow management system. We consider four main aspects: workflow representation, task scheduling, task placement, and task termination. As a result, the experimental evaluation on both cloud and HPC systemsmore » demonstrates significant performance and scalability improvements over existing state-of-the-art approaches.« less
Distributed MRI reconstruction using Gadgetron-based cloud computing.
Xue, Hui; Inati, Souheil; Sørensen, Thomas Sangild; Kellman, Peter; Hansen, Michael S
2015-03-01
To expand the open source Gadgetron reconstruction framework to support distributed computing and to demonstrate that a multinode version of the Gadgetron can be used to provide nonlinear reconstruction with clinically acceptable latency. The Gadgetron framework was extended with new software components that enable an arbitrary number of Gadgetron instances to collaborate on a reconstruction task. This cloud-enabled version of the Gadgetron was deployed on three different distributed computing platforms ranging from a heterogeneous collection of commodity computers to the commercial Amazon Elastic Compute Cloud. The Gadgetron cloud was used to provide nonlinear, compressed sensing reconstruction on a clinical scanner with low reconstruction latency (eg, cardiac and neuroimaging applications). The proposed setup was able to handle acquisition and 11 -SPIRiT reconstruction of nine high temporal resolution real-time, cardiac short axis cine acquisitions, covering the ventricles for functional evaluation, in under 1 min. A three-dimensional high-resolution brain acquisition with 1 mm(3) isotropic pixel size was acquired and reconstructed with nonlinear reconstruction in less than 5 min. A distributed computing enabled Gadgetron provides a scalable way to improve reconstruction performance using commodity cluster computing. Nonlinear, compressed sensing reconstruction can be deployed clinically with low image reconstruction latency. © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Wang, Xi Vincent; Wang, Lihui
2017-08-01
Cloud computing is the new enabling technology that offers centralised computing, flexible data storage and scalable services. In the manufacturing context, it is possible to utilise the Cloud technology to integrate and provide industrial resources and capabilities in terms of Cloud services. In this paper, a function block-based integration mechanism is developed to connect various types of production resources. A Cloud-based architecture is also deployed to offer a service pool which maintains these resources as production services. The proposed system provides a flexible and integrated information environment for the Cloud-based production system. As a specific type of manufacturing, Waste Electrical and Electronic Equipment (WEEE) remanufacturing experiences difficulties in system integration, information exchange and resource management. In this research, WEEE is selected as the example of Internet of Things to demonstrate how the obstacles and bottlenecks are overcome with the help of Cloud-based informatics approach. In the case studies, the WEEE recycle/recovery capabilities are also integrated and deployed as flexible Cloud services. Supporting mechanisms and technologies are presented and evaluated towards the end of the paper.
MarFS, a Near-POSIX Interface to Cloud Objects
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson
The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less
MarFS, a Near-POSIX Interface to Cloud Objects
Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson; ...
2017-01-01
The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less
Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing
NASA Astrophysics Data System (ADS)
Bazhirov, Timur; Mohammadi, Mohammad; Ding, Kevin; Barabash, Sergey
Recent advances in cloud computing made it possible to access large-scale computational resources completely on-demand in a rapid and efficient manner. When combined with high fidelity simulations, they serve as an alternative pathway to enable computational discovery and design of new materials through large-scale high-throughput screening. Here, we present a case study for a cloud platform implemented at Exabyte Inc. We perform calculations to screen lightweight ternary alloys for thermodynamic stability. Due to the lack of experimental data for most such systems, we rely on theoretical approaches based on first-principle pseudopotential density functional theory. We calculate the formation energies for a set of ternary compounds approximated by special quasirandom structures. During an example run we were able to scale to 10,656 CPUs within 7 minutes from the start, and obtain results for 296 compounds within 38 hours. The results indicate that the ultimate formation enthalpy of ternary systems can be negative for some of lightweight alloys, including Li and Mg compounds. We conclude that compared to traditional capital-intensive approach that requires in on-premises hardware resources, cloud computing is agile and cost-effective, yet scalable and delivers similar performance.
SCIMITAR: Scalable Stream-Processing for Sensor Information Brokering
2013-11-01
IaaS) cloud frameworks including Amazon Web Services and Eucalyptus . For load testing, we used The Grinder [9], a Java load testing framework that...internal Eucalyptus cluster which we could not scale as large as the Amazon environment due to a lack of computation resources. We recreated our
A Hierarchical Auction-Based Mechanism for Real-Time Resource Allocation in Cloud Robotic Systems.
Wang, Lujia; Liu, Ming; Meng, Max Q-H
2017-02-01
Cloud computing enables users to share computing resources on-demand. The cloud computing framework cannot be directly mapped to cloud robotic systems with ad hoc networks since cloud robotic systems have additional constraints such as limited bandwidth and dynamic structure. However, most multirobotic applications with cooperative control adopt this decentralized approach to avoid a single point of failure. Robots need to continuously update intensive data to execute tasks in a coordinated manner, which implies real-time requirements. Thus, a resource allocation strategy is required, especially in such resource-constrained environments. This paper proposes a hierarchical auction-based mechanism, namely link quality matrix (LQM) auction, which is suitable for ad hoc networks by introducing a link quality indicator. The proposed algorithm produces a fast and robust method that is accurate and scalable. It reduces both global communication and unnecessary repeated computation. The proposed method is designed for firm real-time resource retrieval for physical multirobot systems. A joint surveillance scenario empirically validates the proposed mechanism by assessing several practical metrics. The results show that the proposed LQM auction outperforms state-of-the-art algorithms for resource allocation.
Static Memory Deduplication for Performance Optimization in Cloud Computing.
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-04-27
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-01-01
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Reid, Jeffrey G; Carroll, Andrew; Veeraraghavan, Narayanan; Dahdouli, Mahmoud; Sundquist, Andreas; English, Adam; Bainbridge, Matthew; White, Simon; Salerno, William; Buhay, Christian; Yu, Fuli; Muzny, Donna; Daly, Richard; Duyk, Geoff; Gibbs, Richard A; Boerwinkle, Eric
2014-01-29
Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.
Open Reading Frame Phylogenetic Analysis on the Cloud
2013-01-01
Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843
NASA Astrophysics Data System (ADS)
Eilers, J.
2013-09-01
The interface analysis from an observer of space objects makes a standard necessary. This standardized dataset serves as input for a cloud based service, which aimed for a near real-time Space Situational Awareness (SSA) system. The system contains all advantages of a cloud based solution, like redundancy, scalability and an easy way to distribute information. For the standard based on the interface analysis of the observer, the information can be separated in three parts. One part is the information about the observer e.g. a ground station. The next part is the information about the sensors that are used by the observer. And the last part is the data from the detected object. Backbone of the SSA System is the cloud based service which includes the consistency check for the observed objects, a database for the objects, the algorithms and analysis as well as the visualization of the results. This paper also provides an approximation of the needed computational power, data storage and a financial approach to deliver this service to a broad community. In this context cloud means, neither the user nor the observer has to think about the infrastructure of the calculation environment. The decision if the IT-infrastructure will be built by a conglomerate of different nations or rented on the marked should be based on an efficiency analysis. Also combinations are possible like starting on a rented cloud and then go to a private cloud owned by the government. One of the advantages of a cloud solution is the scalability. There are about 3000 satellites in space, 900 of them are active, and in total there are about ~17.000 detected space objects orbiting earth. But for the computation it is not a N(active) to N problem it is more N(active) to N(apo peri) quantity of N(all). Instead of 15.3 million possible collisions to calculate a computation of only approx. 2.3 million possible collisions must be done. In general, this Space Situational Awareness System can be used as a tool for satellite system owner for collision avoidance.
Implementation of Virtualization Oriented Architecture: A Healthcare Industry Case Study
NASA Astrophysics Data System (ADS)
Rao, G. Subrahmanya Vrk; Parthasarathi, Jinka; Karthik, Sundararaman; Rao, Gvn Appa; Ganesan, Suresh
This paper presents a Virtualization Oriented Architecture (VOA) and an implementation of VOA for Hridaya - a Telemedicine initiative. Hadoop Compute cloud was established at our labs and jobs which require a massive computing capability such as ECG signal analysis were submitted and the study is presented in this current paper. VOA takes advantage of inexpensive community PCs and provides added advantages such as Fault Tolerance, Scalability, Performance, High Availability.
A Cloud-based Infrastructure and Architecture for Environmental System Research
NASA Astrophysics Data System (ADS)
Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.
2016-12-01
The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.
Christoph, J; Griebel, L; Leb, I; Engel, I; Köpcke, F; Toddenroth, D; Prokosch, H-U; Laufer, J; Marquardt, K; Sedlmayr, M
2015-01-01
The secondary use of clinical data provides large opportunities for clinical and translational research as well as quality assurance projects. For such purposes, it is necessary to provide a flexible and scalable infrastructure that is compliant with privacy requirements. The major goals of the cloud4health project are to define such an architecture, to implement a technical prototype that fulfills these requirements and to evaluate it with three use cases. The architecture provides components for multiple data provider sites such as hospitals to extract free text as well as structured data from local sources and de-identify such data for further anonymous or pseudonymous processing. Free text documentation is analyzed and transformed into structured information by text-mining services, which are provided within a cloud-computing environment. Thus, newly gained annotations can be integrated along with the already available structured data items and the resulting data sets can be uploaded to a central study portal for further analysis. Based on the architecture design, a prototype has been implemented and is under evaluation in three clinical use cases. Data from several hundred patients provided by a University Hospital and a private hospital chain have already been processed. Cloud4health has shown how existing components for secondary use of structured data can be complemented with text-mining in a privacy compliant manner. The cloud-computing paradigm allows a flexible and dynamically adaptable service provision that facilitates the adoption of services by data providers without own investments in respective hardware resources and software tools.
Enabling BOINC in infrastructure as a service cloud system
NASA Astrophysics Data System (ADS)
Montes, Diego; Añel, Juan A.; Pena, Tomás F.; Uhe, Peter; Wallom, David C. H.
2017-02-01
Volunteer or crowd computing is becoming increasingly popular for solving complex research problems from an increasingly diverse range of areas. The majority of these have been built using the Berkeley Open Infrastructure for Network Computing (BOINC) platform, which provides a range of different services to manage all computation aspects of a project. The BOINC system is ideal in those cases where not only does the research community involved need low-cost access to massive computing resources but also where there is a significant public interest in the research being done.We discuss the way in which cloud services can help BOINC-based projects to deliver results in a fast, on demand manner. This is difficult to achieve using volunteers, and at the same time, using scalable cloud resources for short on demand projects can optimize the use of the available resources. We show how this design can be used as an efficient distributed computing platform within the cloud, and outline new approaches that could open up new possibilities in this field, using Climateprediction.net (http://www.climateprediction.net/) as a case study.
Scalability and Validation of Big Data Bioinformatics Software.
Yang, Andrian; Troup, Michael; Ho, Joshua W K
2017-01-01
This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
The cloud paradigm applied to e-Health
2013-01-01
Background Cloud computing is a new paradigm that is changing how enterprises, institutions and people understand, perceive and use current software systems. With this paradigm, the organizations have no need to maintain their own servers, nor host their own software. Instead, everything is moved to the cloud and provided on demand, saving energy, physical space and technical staff. Cloud-based system architectures provide many advantages in terms of scalability, maintainability and massive data processing. Methods We present the design of an e-health cloud system, modelled by an M/M/m queue with QoS capabilities, i.e. maximum waiting time of requests. Results Detailed results for the model formed by a Jackson network of two M/M/m queues from the queueing theory perspective are presented. These results show a significant performance improvement when the number of servers increases. Conclusions Platform scalability becomes a critical issue since we aim to provide the system with high Quality of Service (QoS). In this paper we define an architecture capable of adapting itself to different diseases and growing numbers of patients. This platform could be applied to the medical field to greatly enhance the results of those therapies that have an important psychological component, such as addictions and chronic diseases. PMID:23496912
Cloud Service Selection Using Multicriteria Decision Analysis
Anuar, Nor Badrul; Shiraz, Muhammad; Haque, Israat Tanzeena
2014-01-01
Cloud computing (CC) has recently been receiving tremendous attention from the IT industry and academic researchers. CC leverages its unique services to cloud customers in a pay-as-you-go, anytime, anywhere manner. Cloud services provide dynamically scalable services through the Internet on demand. Therefore, service provisioning plays a key role in CC. The cloud customer must be able to select appropriate services according to his or her needs. Several approaches have been proposed to solve the service selection problem, including multicriteria decision analysis (MCDA). MCDA enables the user to choose from among a number of available choices. In this paper, we analyze the application of MCDA to service selection in CC. We identify and synthesize several MCDA techniques and provide a comprehensive analysis of this technology for general readers. In addition, we present a taxonomy derived from a survey of the current literature. Finally, we highlight several state-of-the-art practical aspects of MCDA implementation in cloud computing service selection. The contributions of this study are four-fold: (a) focusing on the state-of-the-art MCDA techniques, (b) highlighting the comparative analysis and suitability of several MCDA methods, (c) presenting a taxonomy through extensive literature review, and (d) analyzing and summarizing the cloud computing service selections in different scenarios. PMID:24696645
Cloud service selection using multicriteria decision analysis.
Whaiduzzaman, Md; Gani, Abdullah; Anuar, Nor Badrul; Shiraz, Muhammad; Haque, Mohammad Nazmul; Haque, Israat Tanzeena
2014-01-01
Cloud computing (CC) has recently been receiving tremendous attention from the IT industry and academic researchers. CC leverages its unique services to cloud customers in a pay-as-you-go, anytime, anywhere manner. Cloud services provide dynamically scalable services through the Internet on demand. Therefore, service provisioning plays a key role in CC. The cloud customer must be able to select appropriate services according to his or her needs. Several approaches have been proposed to solve the service selection problem, including multicriteria decision analysis (MCDA). MCDA enables the user to choose from among a number of available choices. In this paper, we analyze the application of MCDA to service selection in CC. We identify and synthesize several MCDA techniques and provide a comprehensive analysis of this technology for general readers. In addition, we present a taxonomy derived from a survey of the current literature. Finally, we highlight several state-of-the-art practical aspects of MCDA implementation in cloud computing service selection. The contributions of this study are four-fold: (a) focusing on the state-of-the-art MCDA techniques, (b) highlighting the comparative analysis and suitability of several MCDA methods, (c) presenting a taxonomy through extensive literature review, and (d) analyzing and summarizing the cloud computing service selections in different scenarios.
NASA Astrophysics Data System (ADS)
Zhou, S.; Tao, W. K.; Li, X.; Matsui, T.; Sun, X. H.; Yang, X.
2015-12-01
A cloud-resolving model (CRM) is an atmospheric numerical model that can numerically resolve clouds and cloud systems at 0.25~5km horizontal grid spacings. The main advantage of the CRM is that it can allow explicit interactive processes between microphysics, radiation, turbulence, surface, and aerosols without subgrid cloud fraction, overlapping and convective parameterization. Because of their fine resolution and complex physical processes, it is challenging for the CRM community to i) visualize/inter-compare CRM simulations, ii) diagnose key processes for cloud-precipitation formation and intensity, and iii) evaluate against NASA's field campaign data and L1/L2 satellite data products due to large data volume (~10TB) and complexity of CRM's physical processes. We have been building the Super Cloud Library (SCL) upon a Hadoop framework, capable of CRM database management, distribution, visualization, subsetting, and evaluation in a scalable way. The current SCL capability includes (1) A SCL data model enables various CRM simulation outputs in NetCDF, including the NASA-Unified Weather Research and Forecasting (NU-WRF) and Goddard Cumulus Ensemble (GCE) model, to be accessed and processed by Hadoop, (2) A parallel NetCDF-to-CSV converter supports NU-WRF and GCE model outputs, (3) A technique visualizes Hadoop-resident data with IDL, (4) A technique subsets Hadoop-resident data, compliant to the SCL data model, with HIVE or Impala via HUE's Web interface, (5) A prototype enables a Hadoop MapReduce application to dynamically access and process data residing in a parallel file system, PVFS2 or CephFS, where high performance computing (HPC) simulation outputs such as NU-WRF's and GCE's are located. We are testing Apache Spark to speed up SCL data processing and analysis.With the SCL capabilities, SCL users can conduct large-domain on-demand tasks without downloading voluminous CRM datasets and various observations from NASA Field Campaigns and Satellite data to a local computer, and inter-compare CRM output and data with GCE and NU-WRF.
Scalable and cost-effective NGS genotyping in the cloud.
Souilmi, Yassine; Lancaster, Alex K; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared B; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter J; Wall, Dennis P
2015-10-15
While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10's of dollars. We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration.
Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter
2015-01-20
While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
A PACS archive architecture supported on cloud services.
Silva, Luís A Bastião; Costa, Carlos; Oliveira, José Luis
2012-05-01
Diagnostic imaging procedures have continuously increased over the last decade and this trend may continue in coming years, creating a great impact on storage and retrieval capabilities of current PACS. Moreover, many smaller centers do not have financial resources or requirements that justify the acquisition of a traditional infrastructure. Alternative solutions, such as cloud computing, may help address this emerging need. A tremendous amount of ubiquitous computational power, such as that provided by Google and Amazon, are used every day as a normal commodity. Taking advantage of this new paradigm, an architecture for a Cloud-based PACS archive that provides data privacy, integrity, and availability is proposed. The solution is independent from the cloud provider and the core modules were successfully instantiated in examples of two cloud computing providers. Operational metrics for several medical imaging modalities were tabulated and compared for Google Storage, Amazon S3, and LAN PACS. A PACS-as-a-Service archive that provides storage of medical studies using the Cloud was developed. The results show that the solution is robust and that it is possible to store, query, and retrieve all desired studies in a similar way as in a local PACS approach. Cloud computing is an emerging solution that promises high scalability of infrastructures, software, and applications, according to a "pay-as-you-go" business model. The presented architecture uses the cloud to setup medical data repositories and can have a significant impact on healthcare institutions by reducing IT infrastructures.
Sentinel-1 Interferometry from the Cloud to the Scientist
NASA Astrophysics Data System (ADS)
Garron, J.; Stoner, C.; Johnston, A.; Arko, S. A.
2017-12-01
Big data problems and solutions are growing in the technological and scientific sectors daily. Cloud computing is a vertically and horizontally scalable solution available now for archiving and processing large volumes of data quickly, without significant on-site computing hardware costs. Be that as it may, the conversion of scientific data processors to these powerful platforms requires not only the proof of concept, but the demonstration of credibility in an operational setting. The Alaska Satellite Facility (ASF) Distributed Active Archive Center (DAAC), in partnership with NASA's Jet Propulsion Laboratory, is exploring the functional architecture of Amazon Web Services cloud computing environment for the processing, distribution and archival of Synthetic Aperture Radar data in preparation for the NASA-ISRO Synthetic Aperture Radar (NISAR) Mission. Leveraging built-in AWS services for logging, monitoring and dashboarding, the GRFN (Getting Ready for NISAR) team has built a scalable processing, distribution and archival system of Sentinel-1 L2 interferograms produced using the ISCE algorithm. This cloud-based functional prototype provides interferograms over selected global land deformation features (volcanoes, land subsidence, seismic zones) and are accessible to scientists via NASA's EarthData Search client and the ASF DAACs primary SAR interface, Vertex, for direct download. The interferograms are produced using nearest-neighbor logic for identifying pairs of granules for interferometric processing, creating deep stacks of BETA products from almost every satellite orbit for scientists to explore. This presentation highlights the functional lessons learned to date from this exercise, including the cost analysis of various data lifecycle policies as implemented through AWS. While demonstrating the architecture choices in support of efficient big science data management, we invite feedback and questions about the process and products from the InSAR community.
Jaschob, Daniel; Riffle, Michael
2012-07-30
Laboratories engaged in computational biology or bioinformatics frequently need to run lengthy, multistep, and user-driven computational jobs. Each job can tie up a computer for a few minutes to several days, and many laboratories lack the expertise or resources to build and maintain a dedicated computer cluster. JobCenter is a client-server application and framework for job management and distributed job execution. The client and server components are both written in Java and are cross-platform and relatively easy to install. All communication with the server is client-driven, which allows worker nodes to run anywhere (even behind external firewalls or "in the cloud") and provides inherent load balancing. Adding a worker node to the worker pool is as simple as dropping the JobCenter client files onto any computer and performing basic configuration, which provides tremendous ease-of-use, flexibility, and limitless horizontal scalability. Each worker installation may be independently configured, including the types of jobs it is able to run. Executed jobs may be written in any language and may include multistep workflows. JobCenter is a versatile and scalable distributed job management system that allows laboratories to very efficiently distribute all computational work among available resources. JobCenter is freely available at http://code.google.com/p/jobcenter/.
Marathe, Aniruddha P.; Harris, Rachel A.; Lowenthal, David K.; ...
2015-12-17
The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides fixed-cost and variable-cost, auction-based options. The auction market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost bymore » exploiting redundancy in the EC2 auction market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to seven times cheaper than using the on-demand market and up to 44 percent cheaper than the best non-redundant, auction-market algorithm. We extend our adaptive algorithm to incorporate application scalability characteristics for further cost savings. In conclusion, we show that the adaptive algorithm informed with scalability characteristics of applications achieves up to 56 percent cost savings compared to the expected cost for the base adaptive algorithm run at a fixed, user-defined scale.« less
An adaptive process-based cloud infrastructure for space situational awareness applications
NASA Astrophysics Data System (ADS)
Liu, Bingwei; Chen, Yu; Shen, Dan; Chen, Genshe; Pham, Khanh; Blasch, Erik; Rubin, Bruce
2014-06-01
Space situational awareness (SSA) and defense space control capabilities are top priorities for groups that own or operate man-made spacecraft. Also, with the growing amount of space debris, there is an increase in demand for contextual understanding that necessitates the capability of collecting and processing a vast amount sensor data. Cloud computing, which features scalable and flexible storage and computing services, has been recognized as an ideal candidate that can meet the large data contextual challenges as needed by SSA. Cloud computing consists of physical service providers and middleware virtual machines together with infrastructure, platform, and software as service (IaaS, PaaS, SaaS) models. However, the typical Virtual Machine (VM) abstraction is on a per operating systems basis, which is at too low-level and limits the flexibility of a mission application architecture. In responding to this technical challenge, a novel adaptive process based cloud infrastructure for SSA applications is proposed in this paper. In addition, the details for the design rationale and a prototype is further examined. The SSA Cloud (SSAC) conceptual capability will potentially support space situation monitoring and tracking, object identification, and threat assessment. Lastly, the benefits of a more granular and flexible cloud computing resources allocation are illustrated for data processing and implementation considerations within a representative SSA system environment. We show that the container-based virtualization performs better than hypervisor-based virtualization technology in an SSA scenario.
Scalable and Resilient Middleware to Handle Information Exchange during Environment Crisis
NASA Astrophysics Data System (ADS)
Tao, R.; Poslad, S.; Moßgraber, J.; Middleton, S.; Hammitzsch, M.
2012-04-01
The EU FP7 TRIDEC project focuses on enabling real-time, intelligent, information management of collaborative, complex, critical decision processes for earth management. A key challenge is to promote a communication infrastructure to facilitate interoperable environment information services during environment events and crises such as tsunamis and drilling, during which increasing volumes and dimensionality of disparate information sources, including sensor-based and human-based ones, can result, and need to be managed. Such a system needs to support: scalable, distributed messaging; asynchronous messaging; open messaging to handling changing clients such as new and retired automated system and human information sources becoming online or offline; flexible data filtering, and heterogeneous access networks (e.g., GSM, WLAN and LAN). In addition, the system needs to be resilient to handle the ICT system failures, e.g. failure, degradation and overloads, during environment events. There are several system middleware choices for TRIDEC based upon a Service-oriented-architecture (SOA), Event-driven-Architecture (EDA), Cloud Computing, and Enterprise Service Bus (ESB). In an SOA, everything is a service (e.g. data access, processing and exchange); clients can request on demand or subscribe to services registered by providers; more often interaction is synchronous. In an EDA system, events that represent significant changes in state can be processed simply, or as streams or more complexly. Cloud computing is a virtualization, interoperable and elastic resource allocation model. An ESB, a fundamental component for enterprise messaging, supports synchronous and asynchronous message exchange models and has inbuilt resilience against ICT failure. Our middleware proposal is an ESB based hybrid architecture model: an SOA extension supports more synchronous workflows; EDA assists the ESB to handle more complex event processing; Cloud computing can be used to increase and decrease the ESB resources on demand. To reify this hybrid ESB centric architecture, we will adopt two complementary approaches: an open source one for scalability and resilience improvement while a commercial one can be used for ultra-speed messaging, whilst we can bridge between these two to support interoperability. In TRIDEC, to manage such a hybrid messaging system, overlay and underlay management techniques will be adopted. The managers (both global and local) will collect, store and update status information (e.g. CPU utilization, free space, number of clients) and balance the usage, throughput, and delays to improve resilience and scalability. The expected resilience improvement includes dynamic failover, self-healing, pre-emptive load balancing, and bottleneck prediction while the expected improvement for scalability includes capacity estimation, Http Bridge, and automatic configuration and reconfiguration (e.g. add or delete clients and servers).
An Application-Based Performance Evaluation of NASAs Nebula Cloud Computing Platform
NASA Technical Reports Server (NTRS)
Saini, Subhash; Heistand, Steve; Jin, Haoqiang; Chang, Johnny; Hood, Robert T.; Mehrotra, Piyush; Biswas, Rupak
2012-01-01
The high performance computing (HPC) community has shown tremendous interest in exploring cloud computing as it promises high potential. In this paper, we examine the feasibility, performance, and scalability of production quality scientific and engineering applications of interest to NASA on NASA's cloud computing platform, called Nebula, hosted at Ames Research Center. This work represents the comprehensive evaluation of Nebula using NUTTCP, HPCC, NPB, I/O, and MPI function benchmarks as well as four applications representative of the NASA HPC workload. Specifically, we compare Nebula performance on some of these benchmarks and applications to that of NASA s Pleiades supercomputer, a traditional HPC system. We also investigate the impact of virtIO and jumbo frames on interconnect performance. Overall results indicate that on Nebula (i) virtIO and jumbo frames improve network bandwidth by a factor of 5x, (ii) there is a significant virtualization layer overhead of about 10% to 25%, (iii) write performance is lower by a factor of 25x, (iv) latency for short MPI messages is very high, and (v) overall performance is 15% to 48% lower than that on Pleiades for NASA HPC applications. We also comment on the usability of the cloud platform.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
Biomedical cloud computing with Amazon Web Services.
Fusaro, Vincent A; Patil, Prasad; Gafni, Erik; Wall, Dennis P; Tonellato, Peter J
2011-08-01
In this overview to biomedical computing in the cloud, we discussed two primary ways to use the cloud (a single instance or cluster), provided a detailed example using NGS mapping, and highlighted the associated costs. While many users new to the cloud may assume that entry is as straightforward as uploading an application and selecting an instance type and storage options, we illustrated that there is substantial up-front effort required before an application can make full use of the cloud's vast resources. Our intention was to provide a set of best practices and to illustrate how those apply to a typical application pipeline for biomedical informatics, but also general enough for extrapolation to other types of computational problems. Our mapping example was intended to illustrate how to develop a scalable project and not to compare and contrast alignment algorithms for read mapping and genome assembly. Indeed, with a newer aligner such as Bowtie, it is possible to map the entire African genome using one m2.2xlarge instance in 48 hours for a total cost of approximately $48 in computation time. In our example, we were not concerned with data transfer rates, which are heavily influenced by the amount of available bandwidth, connection latency, and network availability. When transferring large amounts of data to the cloud, bandwidth limitations can be a major bottleneck, and in some cases it is more efficient to simply mail a storage device containing the data to AWS (http://aws.amazon.com/importexport/). More information about cloud computing, detailed cost analysis, and security can be found in references.
2014-01-01
Background Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. Results To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. Conclusions By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples. PMID:24475911
MaMR: High-performance MapReduce programming model for material cloud applications
NASA Astrophysics Data System (ADS)
Jing, Weipeng; Tong, Danyu; Wang, Yangang; Wang, Jingyuan; Liu, Yaqiu; Zhao, Peng
2017-02-01
With the increasing data size in materials science, existing programming models no longer satisfy the application requirements. MapReduce is a programming model that enables the easy development of scalable parallel applications to process big data on cloud computing systems. However, this model does not directly support the processing of multiple related data, and the processing performance does not reflect the advantages of cloud computing. To enhance the capability of workflow applications in material data processing, we defined a programming model for material cloud applications that supports multiple different Map and Reduce functions running concurrently based on hybrid share-memory BSP called MaMR. An optimized data sharing strategy to supply the shared data to the different Map and Reduce stages was also designed. We added a new merge phase to MapReduce that can efficiently merge data from the map and reduce modules. Experiments showed that the model and framework present effective performance improvements compared to previous work.
Benchmarking gate-based quantum computers
NASA Astrophysics Data System (ADS)
Michielsen, Kristel; Nocon, Madita; Willsch, Dennis; Jin, Fengping; Lippert, Thomas; De Raedt, Hans
2017-11-01
With the advent of public access to small gate-based quantum processors, it becomes necessary to develop a benchmarking methodology such that independent researchers can validate the operation of these processors. We explore the usefulness of a number of simple quantum circuits as benchmarks for gate-based quantum computing devices and show that circuits performing identity operations are very simple, scalable and sensitive to gate errors and are therefore very well suited for this task. We illustrate the procedure by presenting benchmark results for the IBM Quantum Experience, a cloud-based platform for gate-based quantum computing.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.
Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker
2016-01-01
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
Secure Skyline Queries on Cloud Platform.
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-04-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.
TOSCA-based orchestration of complex clusters at the IaaS level
NASA Astrophysics Data System (ADS)
Caballer, M.; Donvito, G.; Moltó, G.; Rocha, R.; Velten, M.
2017-10-01
This paper describes the adoption and extension of the TOSCA standard by the INDIGO-DataCloud project for the definition and deployment of complex computing clusters together with the required support in both OpenStack and OpenNebula, carried out in close collaboration with industry partners such as IBM. Two examples of these clusters are described in this paper, the definition of an elastic computing cluster to support the Galaxy bioinformatics application where the nodes are dynamically added and removed from the cluster to adapt to the workload, and the definition of an scalable Apache Mesos cluster for the execution of batch jobs and support for long-running services. The coupling of TOSCA with Ansible Roles to perform automated installation has resulted in the definition of high-level, deterministic templates to provision complex computing clusters across different Cloud sites.
BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data.
Ausmees, Kristiina; John, Aji; Toor, Salman Z; Hellander, Andreas; Nettelblad, Carl
2018-06-26
The advent of next-generation sequencing (NGS) has made whole-genome sequencing of cohorts of individuals a reality. Primary datasets of raw or aligned reads of this sort can get very large. For scientific questions where curated called variants are not sufficient, the sheer size of the datasets makes analysis prohibitively expensive. In order to make re-analysis of such data feasible without the need to have access to a large-scale computing facility, we have developed a highly scalable, storage-agnostic framework, an associated API and an easy-to-use web user interface to execute custom filters on large genomic datasets. We present BAMSI, a Software as-a Service (SaaS) solution for filtering of the 1000 Genomes phase 3 set of aligned reads, with the possibility of extension and customization to other sets of files. Unique to our solution is the capability of simultaneously utilizing many different mirrors of the data to increase the speed of the analysis. In particular, if the data is available in private or public clouds - an increasingly common scenario for both academic and commercial cloud providers - our framework allows for seamless deployment of filtering workers close to data. We show results indicating that such a setup improves the horizontal scalability of the system, and present a possible use case of the framework by performing an analysis of structural variation in the 1000 Genomes data set. BAMSI constitutes a framework for efficient filtering of large genomic data sets that is flexible in the use of compute as well as storage resources. The data resulting from the filter is assumed to be greatly reduced in size, and can easily be downloaded or routed into e.g. a Hadoop cluster for subsequent interactive analysis using Hive, Spark or similar tools. In this respect, our framework also suggests a general model for making very large datasets of high scientific value more accessible by offering the possibility for organizations to share the cost of hosting data on hot storage, without compromising the scalability of downstream analysis.
CloudNeo: a cloud pipeline for identifying patient-specific tumor neoantigens.
Bais, Preeti; Namburi, Sandeep; Gatti, Daniel M; Zhang, Xinyu; Chuang, Jeffrey H
2017-10-01
We present CloudNeo, a cloud-based computational workflow for identifying patient-specific tumor neoantigens from next generation sequencing data. Tumor-specific mutant peptides can be detected by the immune system through their interactions with the human leukocyte antigen complex, and neoantigen presence has recently been shown to correlate with anti T-cell immunity and efficacy of checkpoint inhibitor therapy. However computing capabilities to identify neoantigens from genomic sequencing data are a limiting factor for understanding their role. This challenge has grown as cancer datasets become increasingly abundant, making them cumbersome to store and analyze on local servers. Our cloud-based pipeline provides scalable computation capabilities for neoantigen identification while eliminating the need to invest in local infrastructure for data transfer, storage or compute. The pipeline is a Common Workflow Language (CWL) implementation of human leukocyte antigen (HLA) typing using Polysolver or HLAminer combined with custom scripts for mutant peptide identification and NetMHCpan for neoantigen prediction. We have demonstrated the efficacy of these pipelines on Amazon cloud instances through the Seven Bridges Genomics implementation of the NCI Cancer Genomics Cloud, which provides graphical interfaces for running and editing, infrastructure for workflow sharing and version tracking, and access to TCGA data. The CWL implementation is at: https://github.com/TheJacksonLaboratory/CloudNeo. For users who have obtained licenses for all internal software, integrated versions in CWL and on the Seven Bridges Cancer Genomics Cloud platform (https://cgc.sbgenomics.com/, recommended version) can be obtained by contacting the authors. jeff.chuang@jax.org. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D T; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D. T.; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung
2014-01-01
Background Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. Results We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. Conclusions CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/. PMID:24897343
RBioCloud: A Light-Weight Framework for Bioconductor and R-based Jobs on the Cloud.
Varghese, Blesson; Patel, Ishan; Barker, Adam
2015-01-01
Large-scale ad hoc analytics of genomic data is popular using the R-programming language supported by over 700 software packages provided by Bioconductor. More recently, analytical jobs are benefitting from on-demand computing and storage, their scalability and their low maintenance cost, all of which are offered by the cloud. While biologists and bioinformaticists can take an analytical job and execute it on their personal workstations, it remains challenging to seamlessly execute the job on the cloud infrastructure without extensive knowledge of the cloud dashboard. How analytical jobs can not only with minimum effort be executed on the cloud, but also how both the resources and data required by the job can be managed is explored in this paper. An open-source light-weight framework for executing R-scripts using Bioconductor packages, referred to as `RBioCloud', is designed and developed. RBioCloud offers a set of simple command-line tools for managing the cloud resources, the data and the execution of the job. Three biological test cases validate the feasibility of RBioCloud. The framework is available from http://www.rbiocloud.com.
Scientific Data Storage for Cloud Computing
NASA Astrophysics Data System (ADS)
Readey, J.
2014-12-01
Traditionally data storage used for geophysical software systems has centered on file-based systems and libraries such as NetCDF and HDF5. In contrast cloud based infrastructure providers such as Amazon AWS, Microsoft Azure, and the Google Cloud Platform generally provide storage technologies based on an object based storage service (for large binary objects) complemented by a database service (for small objects that can be represented as key-value pairs). These systems have been shown to be highly scalable, reliable, and cost effective. We will discuss a proposed system that leverages these cloud-based storage technologies to provide an API-compatible library for traditional NetCDF and HDF5 applications. This system will enable cloud storage suitable for geophysical applications that can scale up to petabytes of data and thousands of users. We'll also cover other advantages of this system such as enhanced metadata search.
NASA Astrophysics Data System (ADS)
Martin, William G. K.; Hasekamp, Otto P.
2018-01-01
In previous work, we derived the adjoint method as a computationally efficient path to three-dimensional (3D) retrievals of clouds and aerosols. In this paper we will demonstrate the use of adjoint methods for retrieving two-dimensional (2D) fields of cloud extinction. The demonstration uses a new 2D radiative transfer solver (FSDOM). This radiation code was augmented with adjoint methods to allow efficient derivative calculations needed to retrieve cloud and surface properties from multi-angle reflectance measurements. The code was then used in three synthetic retrieval studies. Our retrieval algorithm adjusts the cloud extinction field and surface albedo to minimize the measurement misfit function with a gradient-based, quasi-Newton approach. At each step we compute the value of the misfit function and its gradient with two calls to the solver FSDOM. First we solve the forward radiative transfer equation to compute the residual misfit with measurements, and second we solve the adjoint radiative transfer equation to compute the gradient of the misfit function with respect to all unknowns. The synthetic retrieval studies verify that adjoint methods are scalable to retrieval problems with many measurements and unknowns. We can retrieve the vertically-integrated optical depth of moderately thick clouds as a function of the horizontal coordinate. It is also possible to retrieve the vertical profile of clouds that are separated by clear regions. The vertical profile retrievals improve for smaller cloud fractions. This leads to the conclusion that cloud edges actually increase the amount of information that is available for retrieving the vertical profile of clouds. However, to exploit this information one must retrieve the horizontally heterogeneous cloud properties with a 2D (or 3D) model. This prototype shows that adjoint methods can efficiently compute the gradient of the misfit function. This work paves the way for the application of similar methods to 3D remote sensing problems.
Cost Considerations in Cloud Computing
2014-01-01
investments. 2. Database Options The potential promise that “ big data ” analytics holds for many enterprise mission areas makes relevant the question of the...development of a range of new distributed file systems and data - bases that have better scalability properties than traditional SQL databases. Hadoop ... data . Many systems exist that extend or supplement Hadoop —such as Apache Accumulo, which provides a highly granular mechanism for managing security
Aether: leveraging linear programming for optimal cloud computing in genomics.
Luber, Jacob M; Tierney, Braden T; Cofer, Evan M; Patel, Chirag J; Kostic, Aleksandar D
2018-05-01
Across biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users' existing HPC pipelines. Data utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. chirag_patel@hms.harvard.edu or aleksandar.kostic@joslin.harvard.edu. Supplementary data are available at Bioinformatics online.
A novel clinical decision support algorithm for constructing complete medication histories.
Long, Ju; Yuan, Michael Juntao
2017-07-01
A patient's complete medication history is a crucial element for physicians to develop a full understanding of the patient's medical conditions and treatment options. However, due to the fragmented nature of medical data, this process can be very time-consuming and often impossible for physicians to construct a complete medication history for complex patients. In this paper, we describe an accurate, computationally efficient and scalable algorithm to construct a medication history timeline. The algorithm is developed and validated based on 1 million random prescription records from a large national prescription data aggregator. Our evaluation shows that the algorithm can be scaled horizontally on-demand, making it suitable for future delivery in a cloud-computing environment. We also propose that this cloud-based medication history computation algorithm could be integrated into Electronic Medical Records, enabling informed clinical decision-making at the point of care. Copyright © 2017 Elsevier B.V. All rights reserved.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences
Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker
2016-01-01
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627
Astronomy In The Cloud: Using Mapreduce For Image Coaddition
NASA Astrophysics Data System (ADS)
Wiley, Keith; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.
2011-01-01
In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computational challenges such as anomaly detection, classification, and moving object tracking. Since such studies require the highest quality data, methods such as image coaddition, i.e., registration, stacking, and mosaicing, will be critical to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources, e.g., asteroids, or transient objects, e.g., supernovae, these datastreams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this paper we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data is partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources, i.e., platforms where Hadoop is offered as a service. We report on our experience implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multi-terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image coaddition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results compring their performance. This work is funded by the NSF and by NASA.
Astronomy in the Cloud: Using MapReduce for Image Co-Addition
NASA Astrophysics Data System (ADS)
Wiley, K.; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.
2011-03-01
In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification and moving-object tracking. Since such studies benefit from the highest-quality data, methods such as image co-addition, i.e., astrometric registration followed by per-pixel summation, will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources such as potentially hazardous asteroids or transient objects such as supernovae, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this article we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data are partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources: i.e., platforms where Hadoop is offered as a service. We report on our experience of implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multiterabyte imaging data set provides a good testbed for algorithm development, since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image co-addition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.
Integrating the Apache Big Data Stack with HPC for Big Data
NASA Astrophysics Data System (ADS)
Fox, G. C.; Qiu, J.; Jha, S.
2014-12-01
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development. However, the same is not so true for data intensive computing, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations. We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures. We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks and use these to identify a few key classes of hardware/software architectures. Our analysis builds on combining HPC and ABDS the Apache big data software stack that is well used in modern cloud computing. Initial results on clouds and HPC systems are encouraging. We propose the development of SPIDAL - Scalable Parallel Interoperable Data Analytics Library -- built on system aand data abstractions suggested by the HPC-ABDS architecture. We discuss how it can be used in several application areas including Polar Science.
Cloud flexibility using DIRAC interware
NASA Astrophysics Data System (ADS)
Fernandez Albor, Víctor; Seco Miguelez, Marcos; Fernandez Pena, Tomas; Mendez Muñoz, Victor; Saborido Silva, Juan Jose; Graciani Diaz, Ricardo
2014-06-01
Communities of different locations are running their computing jobs on dedicated infrastructures without the need to worry about software, hardware or even the site where their programs are going to be executed. Nevertheless, this usually implies that they are restricted to use certain types or versions of an Operating System because either their software needs an definite version of a system library or a specific platform is required by the collaboration to which they belong. On this scenario, if a data center wants to service software to incompatible communities, it has to split its physical resources among those communities. This splitting will inevitably lead to an underuse of resources because the data centers are bound to have periods where one or more of its subclusters are idle. It is, in this situation, where Cloud Computing provides the flexibility and reduction in computational cost that data centers are searching for. This paper describes a set of realistic tests that we ran on one of such implementations. The test comprise software from three different HEP communities (Auger, LHCb and QCD phenomelogists) and the Parsec Benchmark Suite running on one or more of three Linux flavors (SL5, Ubuntu 10.04 and Fedora 13). The implemented infrastructure has, at the cloud level, CloudStack that manages the virtual machines (VM) and the hosts on which they run, and, at the user level, the DIRAC framework along with a VM extension that will submit, monitorize and keep track of the user jobs and also requests CloudStack to start or stop the necessary VM's. In this infrastructure, the community software is distributed via the CernVM-FS, which has been proven to be a reliable and scalable software distribution system. With the resulting infrastructure, users are allowed to send their jobs transparently to the Data Center. The main purpose of this system is the creation of flexible cluster, multiplatform with an scalable method for software distribution for several VOs. Users from different communities do not need to care about the installation of the standard software that is available at the nodes, nor the operating system of the host machine, which is transparent to the user.
NASA Technical Reports Server (NTRS)
Shen, Bo-Wen; Tao, Wei-Kuo; Chern, Jiun-Dar
2007-01-01
Improving our understanding of hurricane inter-annual variability and the impact of climate change (e.g., doubling CO2 and/or global warming) on hurricanes brings both scientific and computational challenges to researchers. As hurricane dynamics involves multiscale interactions among synoptic-scale flows, mesoscale vortices, and small-scale cloud motions, an ideal numerical model suitable for hurricane studies should demonstrate its capabilities in simulating these interactions. The newly-developed multiscale modeling framework (MMF, Tao et al., 2007) and the substantial computing power by the NASA Columbia supercomputer show promise in pursuing the related studies, as the MMF inherits the advantages of two NASA state-of-the-art modeling components: the GEOS4/fvGCM and 2D GCEs. This article focuses on the computational issues and proposes a revised methodology to improve the MMF's performance and scalability. It is shown that this prototype implementation enables 12-fold performance improvements with 364 CPUs, thereby making it more feasible to study hurricane climate.
An Overview of Cloud Implementation in the Manufacturing Process Life Cycle
NASA Astrophysics Data System (ADS)
Kassim, Noordiana; Yusof, Yusri; Hakim Mohamad, Mahmod Abd; Omar, Abdul Halim; Roslan, Rosfuzah; Aryanie Bahrudin, Ida; Ali, Mohd Hatta Mohamed
2017-08-01
The advancement of information and communication technology (ICT) has changed the structure and functions of various sectors and it has also started to play a significant role in modern manufacturing in terms of computerized machining and cloud manufacturing. It is important for industries to keep up with the current trend of ICT for them to be able survive and be competitive. Cloud manufacturing is an approach that wanted to realize a real-world manufacturing processes that will apply the basic concept from the field of Cloud computing to the manufacturing domain called Cloud-based manufacturing (CBM) or cloud manufacturing (CM). Cloud manufacturing has been recognized as a new paradigm for manufacturing businesses. In cloud manufacturing, manufacturing companies need to support flexible and scalable business processes in the shop floor as well as the software itself. This paper provides an insight or overview on the implementation of cloud manufacturing in the modern manufacturing processes and at the same times analyses the requirements needed regarding process enactment for Cloud manufacturing and at the same time proposing a STEP-NC concept that can function as a tool to support the cloud manufacturing concept.
Secure Skyline Queries on Cloud Platform
Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian
2017-01-01
Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions. PMID:28883710
Bao, Shunxing; Damon, Stephen M; Landman, Bennett A; Gokhale, Aniruddha
2016-02-27
Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical-Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for-use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline.
NASA Astrophysics Data System (ADS)
Bao, Shunxing; Damon, Stephen M.; Landman, Bennett A.; Gokhale, Aniruddha
2016-03-01
Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical- Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for- use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline.
Bao, Shunxing; Damon, Stephen M.; Landman, Bennett A.; Gokhale, Aniruddha
2016-01-01
Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical-Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for-use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline. PMID:27127335
NASA Astrophysics Data System (ADS)
Shamugam, Veeramani; Murray, I.; Leong, J. A.; Sidhu, Amandeep S.
2016-03-01
Cloud computing provides services on demand instantly, such as access to network infrastructure consisting of computing hardware, operating systems, network storage, database and applications. Network usage and demands are growing at a very fast rate and to meet the current requirements, there is a need for automatic infrastructure scaling. Traditional networks are difficult to automate because of the distributed nature of their decision making process for switching or routing which are collocated on the same device. Managing complex environments using traditional networks is time-consuming and expensive, especially in the case of generating virtual machines, migration and network configuration. To mitigate the challenges, network operations require efficient, flexible, agile and scalable software defined networks (SDN). This paper discuss various issues in SDN and suggests how to mitigate the network management related issues. A private cloud prototype test bed was setup to implement the SDN on the OpenStack platform to test and evaluate the various network performances provided by the various configurations.
Folding Proteins at 500 ns/hour with Work Queue.
Abdul-Wahid, Badi'; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2012-10-01
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.
Folding Proteins at 500 ns/hour with Work Queue
Abdul-Wahid, Badi’; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A.
2014-01-01
Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour. PMID:25540799
Moving image analysis to the cloud: A case study with a genome-scale tomographic study
NASA Astrophysics Data System (ADS)
Mader, Kevin; Stampanoni, Marco
2016-01-01
Over the last decade, the time required to measure a terabyte of microscopic imaging data has gone from years to minutes. This shift has moved many of the challenges away from experimental design and measurement to scalable storage, organization, and analysis. As many scientists and scientific institutions lack training and competencies in these areas, major bottlenecks have arisen and led to substantial delays and gaps between measurement, understanding, and dissemination. We present in this paper a framework for analyzing large 3D datasets using cloud-based computational and storage resources. We demonstrate its applicability by showing the setup and costs associated with the analysis of a genome-scale study of bone microstructure. We then evaluate the relative advantages and disadvantages associated with local versus cloud infrastructures.
Security on Cloud Revocation Authority using Identity Based Encryption
NASA Astrophysics Data System (ADS)
Rajaprabha, M. N.
2017-11-01
As due to the era of cloud computing most of the people are saving there documents, files and other things on cloud spaces. Due to this security over the cloud is also important because all the confidential things are there on the cloud. So to overcome private key infrastructure (PKI) issues some revocable Identity Based Encryption (IBE) techniques are introduced which eliminates the demand of PKI. The technique introduced is key update cloud service provider which is having two issues in it and they are computation and communication cost is high and second one is scalability issue. So to overcome this problem we come along with the system in which the Cloud Revocation Authority (CRA) is there for the security which will only hold the secret key for each user. And the secret key was send with the help of advanced encryption standard security. The key is encrypted and send to the CRA for giving the authentication to the person who wants to share the data or files or for the communication purpose. Through that key only the other user will able to access that file and if the user apply some invalid key on the particular file than the information of that user and file is send to the administrator and administrator is having rights to block that person of black list that person to use the system services.
Sideloading - Ingestion of Large Point Clouds Into the Apache Spark Big Data Engine
NASA Astrophysics Data System (ADS)
Boehm, J.; Liu, K.; Alis, C.
2016-06-01
In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.
IAServ: an intelligent home care web services platform in a cloud for aging-in-place.
Su, Chuan-Jun; Chiang, Chang-Yu
2013-11-12
As the elderly population has been rapidly expanding and the core tax-paying population has been shrinking, the need for adequate elderly health and housing services continues to grow while the resources to provide such services are becoming increasingly scarce. Thus, increasing the efficiency of the delivery of healthcare services through the use of modern technology is a pressing issue. The seamless integration of such enabling technologies as ontology, intelligent agents, web services, and cloud computing is transforming healthcare from hospital-based treatments to home-based self-care and preventive care. A ubiquitous healthcare platform based on this technological integration, which synergizes service providers with patients' needs to be developed to provide personalized healthcare services at the right time, in the right place, and the right manner. This paper presents the development and overall architecture of IAServ (the Intelligent Aging-in-place Home care Web Services Platform) to provide personalized healthcare service ubiquitously in a cloud computing setting to support the most desirable and cost-efficient method of care for the aged-aging in place. The IAServ is expected to offer intelligent, pervasive, accurate and contextually-aware personal care services. Architecturally the implemented IAServ leverages web services and cloud computing to provide economic, scalable, and robust healthcare services over the Internet.
IAServ: An Intelligent Home Care Web Services Platform in a Cloud for Aging-in-Place
Su, Chuan-Jun; Chiang, Chang-Yu
2013-01-01
As the elderly population has been rapidly expanding and the core tax-paying population has been shrinking, the need for adequate elderly health and housing services continues to grow while the resources to provide such services are becoming increasingly scarce. Thus, increasing the efficiency of the delivery of healthcare services through the use of modern technology is a pressing issue. The seamless integration of such enabling technologies as ontology, intelligent agents, web services, and cloud computing is transforming healthcare from hospital-based treatments to home-based self-care and preventive care. A ubiquitous healthcare platform based on this technological integration, which synergizes service providers with patients’ needs to be developed to provide personalized healthcare services at the right time, in the right place, and the right manner. This paper presents the development and overall architecture of IAServ (the Intelligent Aging-in-place Home care Web Services Platform) to provide personalized healthcare service ubiquitously in a cloud computing setting to support the most desirable and cost-efficient method of care for the aged-aging in place. The IAServ is expected to offer intelligent, pervasive, accurate and contextually-aware personal care services. Architecturally the implemented IAServ leverages web services and cloud computing to provide economic, scalable, and robust healthcare services over the Internet. PMID:24225647
Costs of cloud computing for a biometry department. A case study.
Knaus, J; Hieke, S; Binder, H; Schwarzer, G
2013-01-01
"Cloud" computing providers, such as the Amazon Web Services (AWS), offer stable and scalable computational resources based on hardware virtualization, with short, usually hourly, billing periods. The idea of pay-as-you-use seems appealing for biometry research units which have only limited access to university or corporate data center resources or grids. This case study compares the costs of an existing heterogeneous on-site hardware pool in a Medical Biometry and Statistics department to a comparable AWS offer. The "total cost of ownership", including all direct costs, is determined for the on-site hardware, and hourly prices are derived, based on actual system utilization during the year 2011. Indirect costs, which are difficult to quantify are not included in this comparison, but nevertheless some rough guidance from our experience is given. To indicate the scale of costs for a methodological research project, a simulation study of a permutation-based statistical approach is performed using AWS and on-site hardware. In the presented case, with a system utilization of 25-30 percent and 3-5-year amortization, on-site hardware can result in smaller costs, compared to hourly rental in the cloud dependent on the instance chosen. Renting cloud instances with sufficient main memory is a deciding factor in this comparison. Costs for on-site hardware may vary, depending on the specific infrastructure at a research unit, but have only moderate impact on the overall comparison and subsequent decision for obtaining affordable scientific computing resources. Overall utilization has a much stronger impact as it determines the actual computing hours needed per year. Taking this into ac count, cloud computing might still be a viable option for projects with limited maturity, or as a supplement for short peaks in demand.
STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.
Karczewski, Konrad J; Fernald, Guy Haskin; Martin, Alicia R; Snyder, Michael; Tatonetti, Nicholas P; Dudley, Joel T
2014-01-01
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.
Aether: leveraging linear programming for optimal cloud computing in genomics
Luber, Jacob M; Tierney, Braden T; Cofer, Evan M; Patel, Chirag J
2018-01-01
Abstract Motivation Across biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. Results Here, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines. Availability and implementation Data utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. Contact chirag_patel@hms.harvard.edu or aleksandar.kostic@joslin.harvard.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:29228186
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marathe, Aniruddha P.; Harris, Rachel A.; Lowenthal, David K.
The use of clouds to execute high-performance computing (HPC) applications has greatly increased recently. Clouds provide several potential advantages over traditional supercomputers and in-house clusters. The most popular cloud is currently Amazon EC2, which provides fixed-cost and variable-cost, auction-based options. The auction market trades lower cost for potential interruptions that necessitate checkpointing; if the market price exceeds the bid price, a node is taken away from the user without warning. We explore techniques to maximize performance per dollar given a time constraint within which an application must complete. Specifically, we design and implement multiple techniques to reduce expected cost bymore » exploiting redundancy in the EC2 auction market. We then design an adaptive algorithm that selects a scheduling algorithm and determines the bid price. We show that our adaptive algorithm executes programs up to seven times cheaper than using the on-demand market and up to 44 percent cheaper than the best non-redundant, auction-market algorithm. We extend our adaptive algorithm to incorporate application scalability characteristics for further cost savings. In conclusion, we show that the adaptive algorithm informed with scalability characteristics of applications achieves up to 56 percent cost savings compared to the expected cost for the base adaptive algorithm run at a fixed, user-defined scale.« less
OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.
2016-12-01
The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time-series. The presentation will cover the architecture of OceanXtremes, parallelization of the climatology computation and anomaly detection algorithms using Spark, example results for SST and other time-series, and parallel performance metrics.
Attribute based encryption for secure sharing of E-health data
NASA Astrophysics Data System (ADS)
Charanya, R.; Nithya, S.; Manikandan, N.
2017-11-01
Distributed computing is one of the developing innovations in IT part and information security assumes a real part. It includes sending gathering of remote server and programming that permit the unified information and online access to PC administrations. Distributed computing depends on offering of asset among different clients are additionally progressively reallocated on interest. Cloud computing is a revolutionary computing paradigm which enables flexible, on-demand and low-cost usage of computing resources. The reasons for security and protection issues, which rise on the grounds that the health information possessed by distinctive clients are put away in some cloud servers rather than under their own particular control”z. To deal with security problems, various schemes based on the Attribute-Based Encryption have been proposed. In this paper, in order to make ehealth data’s more secure we use multi party in cloud computing system. Where the health data is encrypted using attributes and key policy. And the user with a particular attribute and key policy alone will be able to decrypt the health data after it is verified by “key distribution centre” and the “secure data distributor”. This technique can be used in medical field for secure storage of patient details and limiting to particular doctor access. To make data’s scalable secure we need to encrypt the health data before outsourcing.
Molnár-Gábor, Fruzsina; Lueck, Rupert; Yakneen, Sergei; Korbel, Jan O
2017-06-20
Biomedical research is becoming increasingly large-scale and international. Cloud computing enables the comprehensive integration of genomic and clinical data, and the global sharing and collaborative processing of these data within a flexibly scalable infrastructure. Clouds offer novel research opportunities in genomics, as they facilitate cohort studies to be carried out at unprecedented scale, and they enable computer processing with superior pace and throughput, allowing researchers to address questions that could not be addressed by studies using limited cohorts. A well-developed example of such research is the Pan-Cancer Analysis of Whole Genomes project, which involves the analysis of petabyte-scale genomic datasets from research centers in different locations or countries and different jurisdictions. Aside from the tremendous opportunities, there are also concerns regarding the utilization of clouds; these concerns pertain to perceived limitations in data security and protection, and the need for due consideration of the rights of patient donors and research participants. Furthermore, the increased outsourcing of information technology impedes the ability of researchers to act within the realm of existing local regulations owing to fundamental differences in the understanding of the right to data protection in various legal systems. In this Opinion article, we address the current opportunities and limitations of cloud computing and highlight the responsible use of federated and hybrid clouds that are set up between public and private partners as an adequate solution for genetics and genomics research in Europe, and under certain conditions between Europe and international partners. This approach could represent a sensible middle ground between fragmented individual solutions and a "one-size-fits-all" approach.
NASA Technical Reports Server (NTRS)
Mohr, Karen Irene; Tao, Wei-Kuo; Chern, Jiun-Dar; Kumar, Sujay V.; Peters-Lidard, Christa D.
2013-01-01
The present generation of general circulation models (GCM) use parameterized cumulus schemes and run at hydrostatic grid resolutions. To improve the representation of cloud-scale moist processes and landeatmosphere interactions, a global, Multi-scale Modeling Framework (MMF) coupled to the Land Information System (LIS) has been developed at NASA-Goddard Space Flight Center. The MMFeLIS has three components, a finite-volume (fv) GCM (Goddard Earth Observing System Ver. 4, GEOS-4), a 2D cloud-resolving model (Goddard Cumulus Ensemble, GCE), and the LIS, representing the large-scale atmospheric circulation, cloud processes, and land surface processes, respectively. The non-hydrostatic GCE model replaces the single-column cumulus parameterization of fvGCM. The model grid is composed of an array of fvGCM gridcells each with a series of embedded GCE models. A horizontal coupling strategy, GCE4fvGCM4Coupler4LIS, offered significant computational efficiency, with the scalability and I/O capabilities of LIS permitting landeatmosphere interactions at cloud-scale. Global simulations of 2007e2008 and comparisons to observations and reanalysis products were conducted. Using two different versions of the same land surface model but the same initial conditions, divergence in regional, synoptic-scale surface pressure patterns emerged within two weeks. The sensitivity of largescale circulations to land surface model physics revealed significant functional value to using a scalable, multi-model land surface modeling system in global weather and climate prediction.
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud
Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew
2015-01-01
Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation. PMID:26501966
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.
Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew
2015-01-01
Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
Energy-Aware Computation Offloading of IoT Sensors in Cloudlet-Based Mobile Edge Computing.
Ma, Xiao; Lin, Chuang; Zhang, Han; Liu, Jianwei
2018-06-15
Mobile edge computing is proposed as a promising computing paradigm to relieve the excessive burden of data centers and mobile networks, which is induced by the rapid growth of Internet of Things (IoT). This work introduces the cloud-assisted multi-cloudlet framework to provision scalable services in cloudlet-based mobile edge computing. Due to the constrained computation resources of cloudlets and limited communication resources of wireless access points (APs), IoT sensors with identical computation offloading decisions interact with each other. To optimize the processing delay and energy consumption of computation tasks, theoretic analysis of the computation offloading decision problem of IoT sensors is presented in this paper. In more detail, the computation offloading decision problem of IoT sensors is formulated as a computation offloading game and the condition of Nash equilibrium is derived by introducing the tool of a potential game. By exploiting the finite improvement property of the game, the Computation Offloading Decision (COD) algorithm is designed to provide decentralized computation offloading strategies for IoT sensors. Simulation results demonstrate that the COD algorithm can significantly reduce the system cost compared with the random-selection algorithm and the cloud-first algorithm. Furthermore, the COD algorithm can scale well with increasing IoT sensors.
Chen, Shang-Liang; Chen, Yun-Yao; Hsu, Chiang
2014-01-01
Cloud computing is changing the ways software is developed and managed in enterprises, which is changing the way of doing business in that dynamically scalable and virtualized resources are regarded as services over the Internet. Traditional manufacturing systems such as supply chain management (SCM), customer relationship management (CRM), and enterprise resource planning (ERP) are often developed case by case. However, effective collaboration between different systems, platforms, programming languages, and interfaces has been suggested by researchers. In cloud-computing-based systems, distributed resources are encapsulated into cloud services and centrally managed, which allows high automation, flexibility, fast provision, and ease of integration at low cost. The integration between physical resources and cloud services can be improved by combining Internet of things (IoT) technology and Software-as-a-Service (SaaS) technology. This study proposes a new approach for developing cloud-based manufacturing systems based on a four-layer SaaS model. There are three main contributions of this paper: (1) enterprises can develop their own cloud-based logistic management information systems based on the approach proposed in this paper; (2) a case study based on literature reviews with experimental results is proposed to verify that the system performance is remarkable; (3) challenges encountered and feedback collected from T Company in the case study are discussed in this paper for the purpose of enterprise deployment. PMID:24686728
Chen, Shang-Liang; Chen, Yun-Yao; Hsu, Chiang
2014-03-28
Cloud computing is changing the ways software is developed and managed in enterprises, which is changing the way of doing business in that dynamically scalable and virtualized resources are regarded as services over the Internet. Traditional manufacturing systems such as supply chain management (SCM), customer relationship management (CRM), and enterprise resource planning (ERP) are often developed case by case. However, effective collaboration between different systems, platforms, programming languages, and interfaces has been suggested by researchers. In cloud-computing-based systems, distributed resources are encapsulated into cloud services and centrally managed, which allows high automation, flexibility, fast provision, and ease of integration at low cost. The integration between physical resources and cloud services can be improved by combining Internet of things (IoT) technology and Software-as-a-Service (SaaS) technology. This study proposes a new approach for developing cloud-based manufacturing systems based on a four-layer SaaS model. There are three main contributions of this paper: (1) enterprises can develop their own cloud-based logistic management information systems based on the approach proposed in this paper; (2) a case study based on literature reviews with experimental results is proposed to verify that the system performance is remarkable; (3) challenges encountered and feedback collected from T Company in the case study are discussed in this paper for the purpose of enterprise deployment.
Yu, Jia; Blom, Jochen; Sczyrba, Alexander; Goesmann, Alexander
2017-09-10
The introduction of next generation sequencing has caused a steady increase in the amounts of data that have to be processed in modern life science. Sequence alignment plays a key role in the analysis of sequencing data e.g. within whole genome sequencing or metagenome projects. BLAST is a commonly used alignment tool that was the standard approach for more than two decades, but in the last years faster alternatives have been proposed including RapSearch, GHOSTX, and DIAMOND. Here we introduce HAMOND, an application that uses Apache Hadoop to parallelize DIAMOND computation in order to scale-out the calculation of alignments. HAMOND is fault tolerant and scalable by utilizing large cloud computing infrastructures like Amazon Web Services. HAMOND has been tested in comparative genomics analyses and showed promising results both in efficiency and accuracy. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.
STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud
Karczewski, Konrad J.; Fernald, Guy Haskin; Martin, Alicia R.; Snyder, Michael; Tatonetti, Nicholas P.; Dudley, Joel T.
2014-01-01
The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756
Openwebglobe 2: Visualization of Complex 3D-GEODATA in the (mobile) Webbrowser
NASA Astrophysics Data System (ADS)
Christen, M.
2016-06-01
Providing worldwide high resolution data for virtual globes consists of compute and storage intense tasks for processing data. Furthermore, rendering complex 3D-Geodata, such as 3D-City models with an extremely high polygon count and a vast amount of textures at interactive framerates is still a very challenging task, especially on mobile devices. This paper presents an approach for processing, caching and serving massive geospatial data in a cloud-based environment for large scale, out-of-core, highly scalable 3D scene rendering on a web based virtual globe. Cloud computing is used for processing large amounts of geospatial data and also for providing 2D and 3D map data to a large amount of (mobile) web clients. In this paper the approach for processing, rendering and caching very large datasets in the currently developed virtual globe "OpenWebGlobe 2" is shown, which displays 3D-Geodata on nearly every device.
Universal blind quantum computation for hybrid system
NASA Astrophysics Data System (ADS)
Huang, He-Liang; Bao, Wan-Su; Li, Tan; Li, Feng-Guang; Fu, Xiang-Qun; Zhang, Shuo; Zhang, Hai-Long; Wang, Xiang
2017-08-01
As progress on the development of building quantum computer continues to advance, first-generation practical quantum computers will be available for ordinary users in the cloud style similar to IBM's Quantum Experience nowadays. Clients can remotely access the quantum servers using some simple devices. In such a situation, it is of prime importance to keep the security of the client's information. Blind quantum computation protocols enable a client with limited quantum technology to delegate her quantum computation to a quantum server without leaking any privacy. To date, blind quantum computation has been considered only for an individual quantum system. However, practical universal quantum computer is likely to be a hybrid system. Here, we take the first step to construct a framework of blind quantum computation for the hybrid system, which provides a more feasible way for scalable blind quantum computation.
Exploiting Parallel R in the Cloud with SPRINT
Piotrowski, M.; McGilvary, G.A.; Sloan, T. M.; Mewissen, M.; Lloyd, A.D.; Forster, T.; Mitchell, L.; Ghazal, P.; Hill, J.
2012-01-01
Background Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Objectives Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon’s Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. Methods The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. Results It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of algorithm. Resource underutilization can further improve the time to result. End-user’s location impacts on costs due to factors such as local taxation. Conclusions: Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds. PMID:23223611
Exploiting parallel R in the cloud with SPRINT.
Piotrowski, M; McGilvary, G A; Sloan, T M; Mewissen, M; Lloyd, A D; Forster, T; Mitchell, L; Ghazal, P; Hill, J
2013-01-01
Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon's Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user's location impacts on costs due to factors such as local taxation. Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.
SeqWare Query Engine: storing and searching sequence data in the cloud.
O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F
2010-12-21
Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud
2010-01-01
Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark.
Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung
2016-05-01
Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today's data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG's simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.
Mission critical cloud computing in a week
NASA Astrophysics Data System (ADS)
George, B.; Shams, K.; Knight, D.; Kinney, J.
NASA's vision is to “ reach for new heights and reveal the unknown so that what we do and learn will benefit all humankind.” While our missions provide large volumes of unique and invaluable data to the scientific community, they also serve to inspire and educate the next generation of engineers and scientists. One critical aspect of “ benefiting all humankind” is to make our missions as visible and accessible as possible to facilitate the transfer of scientific knowledge to the public. The recent successful landing of the Curiosity rover on Mars exemplified this vision: we shared the landing event via live video streaming and web experiences with millions of people around the world. The video stream on Curiosity's website was delivered by a highly scalable stack of computing resources in the cloud to cache and distribute the video stream to our viewers. While this work was done in the context of public outreach, it has extensive implications for the development of mission critical, highly available, and elastic applications in the cloud for a diverse set of use cases across NASA.
High performance data transfer
NASA Astrophysics Data System (ADS)
Cottrell, R.; Fang, C.; Hanushevsky, A.; Kreuger, W.; Yang, W.
2017-10-01
The exponentially increasing need for high speed data transfer is driven by big data, and cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software. This has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a scalable, balanced, easy to deploy and use way while minimizing power and space utilization. In collaboration with several commercial vendors, Proofs of Concept (PoC) consisting of clusters have been put together using off-the- shelf components to test the ZX scalability and ability to balance services using multiple cores, and links. The PoCs are based on SSD flash storage that is managed by a parallel file system. Each cluster occupies 4 rack units. Using the PoCs, between clusters we have achieved almost 200Gbps memory to memory over two 100Gbps links, and 70Gbps parallel file to parallel file with encryption over a 5000 mile 100Gbps link.
Scalable Machine Learning for Massive Astronomical Datasets
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.; Gray, A.
2014-04-01
We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
Scalable Machine Learning for Massive Astronomical Datasets
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.; Astronomy Data Centre, Canadian
2014-01-01
We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors, and the local outlier factor. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex datasets that wishes to extract the full scientific value from its data.
MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.
Expósito, Roberto R; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan
2017-09-01
This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool. Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es . rreye@udc.es. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Blind topological measurement-based quantum computation.
Morimae, Tomoyuki; Fujii, Keisuke
2012-01-01
Blind quantum computation is a novel secure quantum-computing protocol that enables Alice, who does not have sufficient quantum technology at her disposal, to delegate her quantum computation to Bob, who has a fully fledged quantum computer, in such a way that Bob cannot learn anything about Alice's input, output and algorithm. A recent proof-of-principle experiment demonstrating blind quantum computation in an optical system has raised new challenges regarding the scalability of blind quantum computation in realistic noisy conditions. Here we show that fault-tolerant blind quantum computation is possible in a topologically protected manner using the Raussendorf-Harrington-Goyal scheme. The error threshold of our scheme is 4.3 × 10(-3), which is comparable to that (7.5 × 10(-3)) of non-blind topological quantum computation. As the error per gate of the order 10(-3) was already achieved in some experimental systems, our result implies that secure cloud quantum computation is within reach.
Blind topological measurement-based quantum computation
NASA Astrophysics Data System (ADS)
Morimae, Tomoyuki; Fujii, Keisuke
2012-09-01
Blind quantum computation is a novel secure quantum-computing protocol that enables Alice, who does not have sufficient quantum technology at her disposal, to delegate her quantum computation to Bob, who has a fully fledged quantum computer, in such a way that Bob cannot learn anything about Alice's input, output and algorithm. A recent proof-of-principle experiment demonstrating blind quantum computation in an optical system has raised new challenges regarding the scalability of blind quantum computation in realistic noisy conditions. Here we show that fault-tolerant blind quantum computation is possible in a topologically protected manner using the Raussendorf-Harrington-Goyal scheme. The error threshold of our scheme is 4.3×10-3, which is comparable to that (7.5×10-3) of non-blind topological quantum computation. As the error per gate of the order 10-3 was already achieved in some experimental systems, our result implies that secure cloud quantum computation is within reach.
Virtual Geophysics Laboratory: Exploiting the Cloud and Empowering Geophysicsts
NASA Astrophysics Data System (ADS)
Fraser, Ryan; Vote, Josh; Goh, Richard; Cox, Simon
2013-04-01
Over the last five decades geoscientists from Australian state and federal agencies have collected and assembled around 3 Petabytes of geoscience data sets under public funding. As a consequence of technological progress, data is now being acquired at exponential rates and in higher resolution than ever before. Effective use of these big data sets challenges the storage and computational infrastructure of most organizations. The Virtual Geophysics Laboratory (VGL) is a scientific workflow portal addresses some of the resulting issues by providing Australian geophysicists with access to a Web 2.0 or Rich Internet Application (RIA) based integrated environment that exploits eResearch tools and Cloud computing technology, and promotes collaboration between the user community. VGL simplifies and automates large portions of what were previously manually intensive scientific workflow processes, allowing scientists to focus on the natural science problems, rather than computer science and IT. A number of geophysical processing codes are incorporated to support multiple workflows. For example a gravity inversion can be performed by combining the Escript/Finley codes (from the University of Queensland) with the gravity data registered in VGL. Likewise, tectonic processes can also be modeled by combining the Underworld code (from Monash University) with one of the various 3D models available to VGL. Cloud services provide scalable and cost effective compute resources. VGL is built on top of mature standards-compliant information services, many deployed using the Spatial Information Services Stack (SISS), which provides direct access to geophysical data. A large number of data sets from Geoscience Australia assist users in data discovery. GeoNetwork provides a metadata catalog to store workflow results for future use, discovery and provenance tracking. VGL has been developed in collaboration with the research community using incremental software development practices and open source tools. While developed to provide the geophysics research community with a sustainable platform and scalable infrastructure; VGL has also developed a number of concepts, patterns and generic components of which have been reused for cases beyond geophysics, including natural hazards, satellite processing and other areas requiring spatial data discovery and processing. Future plans for VGL include a number of improvements in both functional and non-functional areas in response to its user community needs and advancement in information technologies. In particular, research is underway in the following areas (a) distributed and parallel workflow processing in the cloud, (b) seamless integration with various cloud providers, and (c) integration with virtual laboratories representing other science domains. Acknowledgements: VGL was developed by CSIRO in collaboration with Geoscience Australia, National Computational Infrastructure, Australia National University, Monash University and University of Queensland, and has been supported by the Australian Government's Education Investment Funds through NeCTAR.
Pinheiro, Alexandre; Dias Canedo, Edna; de Sousa Junior, Rafael Timoteo; de Oliveira Albuquerque, Robson; García Villalba, Luis Javier; Kim, Tai-Hoon
2018-03-02
Cloud computing is considered an interesting paradigm due to its scalability, availability and virtually unlimited storage capacity. However, it is challenging to organize a cloud storage service (CSS) that is safe from the client point-of-view and to implement this CSS in public clouds since it is not advisable to blindly consider this configuration as fully trustworthy. Ideally, owners of large amounts of data should trust their data to be in the cloud for a long period of time, without the burden of keeping copies of the original data, nor of accessing the whole content for verifications regarding data preservation. Due to these requirements, integrity, availability, privacy and trust are still challenging issues for the adoption of cloud storage services, especially when losing or leaking information can bring significant damage, be it legal or business-related. With such concerns in mind, this paper proposes an architecture for periodically monitoring both the information stored in the cloud and the service provider behavior. The architecture operates with a proposed protocol based on trust and encryption concepts to ensure cloud data integrity without compromising confidentiality and without overloading storage services. Extensive tests and simulations of the proposed architecture and protocol validate their functional behavior and performance.
2018-01-01
Cloud computing is considered an interesting paradigm due to its scalability, availability and virtually unlimited storage capacity. However, it is challenging to organize a cloud storage service (CSS) that is safe from the client point-of-view and to implement this CSS in public clouds since it is not advisable to blindly consider this configuration as fully trustworthy. Ideally, owners of large amounts of data should trust their data to be in the cloud for a long period of time, without the burden of keeping copies of the original data, nor of accessing the whole content for verifications regarding data preservation. Due to these requirements, integrity, availability, privacy and trust are still challenging issues for the adoption of cloud storage services, especially when losing or leaking information can bring significant damage, be it legal or business-related. With such concerns in mind, this paper proposes an architecture for periodically monitoring both the information stored in the cloud and the service provider behavior. The architecture operates with a proposed protocol based on trust and encryption concepts to ensure cloud data integrity without compromising confidentiality and without overloading storage services. Extensive tests and simulations of the proposed architecture and protocol validate their functional behavior and performance. PMID:29498641
Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes
Shringarpure, Suyash S.; Carroll, Andrew; De La Vega, Francisco M.; Bustamante, Carlos D.
2015-01-01
Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future. PMID:26110529
Applying a cloud computing approach to storage architectures for spacecraft
NASA Astrophysics Data System (ADS)
Baldor, Sue A.; Quiroz, Carlos; Wood, Paul
As sensor technologies, processor speeds, and memory densities increase, spacecraft command, control, processing, and data storage systems have grown in complexity to take advantage of these improvements and expand the possible missions of spacecraft. Spacecraft systems engineers are increasingly looking for novel ways to address this growth in complexity and mitigate associated risks. Looking to conventional computing, many solutions have been executed to solve both the problem of complexity and heterogeneity in systems. In particular, the cloud-based paradigm provides a solution for distributing applications and storage capabilities across multiple platforms. In this paper, we propose utilizing a cloud-like architecture to provide a scalable mechanism for providing mass storage in spacecraft networks that can be reused on multiple spacecraft systems. By presenting a consistent interface to applications and devices that request data to be stored, complex systems designed by multiple organizations may be more readily integrated. Behind the abstraction, the cloud storage capability would manage wear-leveling, power consumption, and other attributes related to the physical memory devices, critical components in any mass storage solution for spacecraft. Our approach employs SpaceWire networks and SpaceWire-capable devices, although the concept could easily be extended to non-heterogeneous networks consisting of multiple spacecraft and potentially the ground segment.
Reducing Time to Science: Unidata and JupyterHub Technology Using the Jetstream Cloud
NASA Astrophysics Data System (ADS)
Chastang, J.; Signell, R. P.; Fischer, J. L.
2017-12-01
Cloud computing can accelerate scientific workflows, discovery, and collaborations by reducing research and data friction. We describe the deployment of Unidata and JupyterHub technologies on the NSF-funded XSEDE Jetstream cloud. With the aid of virtual machines and Docker technology, we deploy a Unidata JupyterHub server co-located with a Local Data Manager (LDM), THREDDS data server (TDS), and RAMADDA geoscience content management system. We provide Jupyter Notebooks and the pre-built Python environments needed to run them. The notebooks can be used for instruction and as templates for scientific experimentation and discovery. We also supply a large quantity of NCEP forecast model results to allow data-proximate analysis and visualization. In addition, users can transfer data using Globus command line tools, and perform their own data-proximate analysis and visualization with Notebook technology. These data can be shared with others via a dedicated TDS server for scientific distribution and collaboration. There are many benefits of this approach. Not only is the cloud computing environment fast, reliable and scalable, but scientists can analyze, visualize, and share data using only their web browser. No local specialized desktop software or a fast internet connection is required. This environment will enable scientists to spend less time managing their software and more time doing science.
NASA Astrophysics Data System (ADS)
Perez, G. L.; Larour, E. Y.; Halkides, D. J.; Cheng, D. L. C.
2015-12-01
The Virtual Ice Sheet Laboratory(VISL) is a Cryosphere outreach effort byscientists at the Jet Propulsion Laboratory(JPL) in Pasadena, CA, Earth and SpaceResearch(ESR) in Seattle, WA, and the University of California at Irvine (UCI), with the goal of providing interactive lessons for K-12 and college level students,while conforming to STEM guidelines. At the core of VISL is the Ice Sheet System Model(ISSM), an open-source project developed jointlyat JPL and UCI whose main purpose is to model the evolution of the polar ice caps in Greenland and Antarctica. By using ISSM, VISL students have access tostate-of-the-art modeling software that is being used to conduct scientificresearch by users all over the world. However, providing this functionality isby no means simple. The modeling of ice sheets in response to sea and atmospheric temperatures, among many other possible parameters, requiressignificant computational resources. Furthermore, this service needs to beresponsive and capable of handling burst requests produced by classrooms ofstudents. Cloud computing providers represent a burgeoning industry. With majorinvestments by tech giants like Amazon, Google and Microsoft, it has never beeneasier or more affordable to deploy computational elements on-demand. This isexactly what VISL needs and ISSM is capable of. Moreover, this is a promisingalternative to investing in expensive and rapidly devaluing hardware.
NASA Astrophysics Data System (ADS)
Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.
2017-12-01
Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of exploiting large computational and storage resources of Cloud Computing platforms for large scale DInSAR analysis. The presented Cloud Computing P-SBAS processing chain can be a precious tool in the perspective of developing operational services disposable for the EO scientific community related to hazard monitoring and risk prevention and mitigation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Malik, Saif Ur Rehman; Khan, Samee U.; Ewen, Sam J.
2015-03-14
As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobiledevices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amountmore » of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies. Moreover, the breakdown of both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.« less
Evaluation of the Huawei UDS cloud storage system for CERN specific data
NASA Astrophysics Data System (ADS)
Zotes Resines, M.; Heikkila, S. S.; Duellmann, D.; Adde, G.; Toebbicke, R.; Hughes, J.; Wang, L.
2014-06-01
Cloud storage is an emerging architecture aiming to provide increased scalability and access performance, compared to more traditional solutions. CERN is evaluating this promise using Huawei UDS and OpenStack SWIFT storage deployments, focusing on the needs of high-energy physics. Both deployed setups implement S3, one of the protocols that are emerging as a standard in the cloud storage market. A set of client machines is used to generate I/O load patterns to evaluate the storage system performance. The presented read and write test results indicate scalability both in metadata and data perspectives. Futher the Huawei UDS cloud storage is shown to be able to recover from a major failure of losing 16 disks. Both cloud storages are finally demonstrated to function as back-end storage systems to a filesystem, which is used to deliver high energy physics software.
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing
NASA Astrophysics Data System (ADS)
Chen, A.; Pham, L.; Kempler, S.; Theobald, M.; Esfandiari, A.; Campino, J.; Vollmer, B.; Lynnes, C.
2011-12-01
Cloud Computing technology has been used to offer high-performance and low-cost computing and storage resources for both scientific problems and business services. Several cloud computing services have been implemented in the commercial arena, e.g. Amazon's EC2 & S3, Microsoft's Azure, and Google App Engine. There are also some research and application programs being launched in academia and governments to utilize Cloud Computing. NASA launched the Nebula Cloud Computing platform in 2008, which is an Infrastructure as a Service (IaaS) to deliver on-demand distributed virtual computers. Nebula users can receive required computing resources as a fully outsourced service. NASA Goddard Earth Science Data and Information Service Center (GES DISC) migrated several GES DISC's applications to the Nebula as a proof of concept, including: a) The Simple, Scalable, Script-based Science Processor for Measurements (S4PM) for processing scientific data; b) the Atmospheric Infrared Sounder (AIRS) data process workflow for processing AIRS raw data; and c) the GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI) for online access to, analysis, and visualization of Earth science data. This work aims to evaluate the practicability and adaptability of the Nebula. The initial work focused on the AIRS data process workflow to evaluate the Nebula. The AIRS data process workflow consists of a series of algorithms being used to process raw AIRS level 0 data and output AIRS level 2 geophysical retrievals. Migrating the entire workflow to the Nebula platform is challenging, but practicable. After installing several supporting libraries and the processing code itself, the workflow is able to process AIRS data in a similar fashion to its current (non-cloud) configuration. We compared the performance of processing 2 days of AIRS level 0 data through level 2 using a Nebula virtual computer and a local Linux computer. The result shows that Nebula has significantly better performance than the local machine. Much of the difference was due to newer equipment in the Nebula than the legacy computer, which is suggestive of a potential economic advantage beyond elastic power, i.e., access to up-to-date hardware vs. legacy hardware that must be maintained past its prime to amortize the cost. In addition to a trade study of advantages and challenges of porting complex processing to the cloud, a tutorial was developed to enable further progress in utilizing the Nebula for Earth Science applications and understanding better the potential for Cloud Computing in further data- and computing-intensive Earth Science research. In particular, highly bursty computing such as that experienced in the user-demand-driven Giovanni system may become more tractable in a Cloud environment. Our future work will continue to focus on migrating more GES DISC's applications/instances, e.g. Giovanni instances, to the Nebula platform and making matured migrated applications to be in operation on the Nebula.
BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark
Gulzar, Muhammad Ali; Interlandi, Matteo; Yoo, Seunghyun; Tetali, Sai Deep; Condie, Tyson; Millstein, Todd; Kim, Miryung
2016-01-01
Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today’s data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user. First, BIGDEBUG’s simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact. PMID:27390389
Survey of MapReduce frame operation in bioinformatics.
Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke
2014-07-01
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.
Niemenmaa, Matti; Kallio, Aleksi; Schumacher, André; Klemelä, Petri; Korpelainen, Eija; Heljanko, Keijo
2012-03-15
Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.
Bioinformatics on the cloud computing platform Azure.
Shanahan, Hugh P; Owen, Anne M; Harrison, Andrew P
2014-01-01
We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development.
Bioinformatics on the Cloud Computing Platform Azure
Shanahan, Hugh P.; Owen, Anne M.; Harrison, Andrew P.
2014-01-01
We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811
Dynamic federation of grid and cloud storage
NASA Astrophysics Data System (ADS)
Furano, Fabrizio; Keeble, Oliver; Field, Laurence
2016-09-01
The Dynamic Federations project ("Dynafed") enables the deployment of scalable, distributed storage systems composed of independent storage endpoints. While the Uniform Generic Redirector at the heart of the project is protocol-agnostic, we have focused our effort on HTTP-based protocols, including S3 and WebDAV. The system has been deployed on testbeds covering the majority of the ATLAS and LHCb data, and supports geography-aware replica selection. The work done exploits the federation potential of HTTP to build systems that offer uniform, scalable, catalogue-less access to the storage and metadata ensemble and the possibility of seamless integration of other compatible resources such as those from cloud providers. Dynafed can exploit the potential of the S3 delegation scheme, effectively federating on the fly any number of S3 buckets from different providers and applying a uniform authorization to them. This feature has been used to deploy in production the BOINC Data Bridge, which uses the Uniform Generic Redirector with S3 buckets to harmonize the BOINC authorization scheme with the Grid/X509. The Data Bridge has been deployed in production with good results. We believe that the features of a loosely coupled federation of open-protocolbased storage elements open many possibilities of smoothly evolving the current computing models and of supporting new scientific computing projects that rely on massive distribution of data and that would appreciate systems that can more easily be interfaced with commercial providers and can work natively with Web browsers and clients.
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
TomoMiner and TomoMinerCloud: A software platform for large-scale subtomogram structural analysis
Frazier, Zachary; Xu, Min; Alber, Frank
2017-01-01
SUMMARY Cryo-electron tomography (cryoET) captures the 3D electron density distribution of macromolecular complexes in close to native state. With the rapid advance of cryoET acquisition technologies, it is possible to generate large numbers (>100,000) of subtomograms, each containing a macromolecular complex. Often, these subtomograms represent a heterogeneous sample due to variations in structure and composition of a complex in situ form or because particles are a mixture of different complexes. In this case subtomograms must be classified. However, classification of large numbers of subtomograms is a time-intensive task and often a limiting bottleneck. This paper introduces an open source software platform, TomoMiner, for large-scale subtomogram classification, template matching, subtomogram averaging, and alignment. Its scalable and robust parallel processing allows efficient classification of tens to hundreds of thousands of subtomograms. Additionally, TomoMiner provides a pre-configured TomoMinerCloud computing service permitting users without sufficient computing resources instant access to TomoMiners high-performance features. PMID:28552576
Using Clouds for MapReduce Measurement Assignments
ERIC Educational Resources Information Center
Rabkin, Ariel; Reiss, Charles; Katz, Randy; Patterson, David
2013-01-01
We describe our experiences teaching MapReduce in a large undergraduate lecture course using public cloud services and the standard Hadoop API. Using the standard API, students directly experienced the quality of industrial big-data tools. Using the cloud, every student could carry out scalability benchmarking assignments on realistic hardware,…
NASA Astrophysics Data System (ADS)
Schnase, J. L.; Duffy, D.; Tamkin, G. S.; Nadeau, D.; Thompson, J. H.; Grieg, C. M.; McInerney, M.; Webster, W. P.
2013-12-01
Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS built on this principle. MERRA/AS enables MapReduce analytics over NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRA/AS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRA/AS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data.
NASA Technical Reports Server (NTRS)
Schnase, John L.; Duffy, Daniel Quinn; Tamkin, Glenn S.; Nadeau, Denis; Thompson, John H.; Grieg, Christina M.; McInerney, Mark A.; Webster, William P.
2014-01-01
Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we it see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRAAS) is an example of cloud-enabled CAaaS built on this principle. MERRAAS enables MapReduce analytics over NASAs Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRAAS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRAAS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers' increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing's capacity to engage communities in the construction of new capabilies is perhaps the most important link between Cloud Computing and Big Data.
The EPOS Vision for the Open Science Cloud
NASA Astrophysics Data System (ADS)
Jeffery, Keith; Harrison, Matt; Cocco, Massimo
2016-04-01
Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.
A distributed cloud-based cyberinfrastructure framework for integrated bridge monitoring
NASA Astrophysics Data System (ADS)
Jeong, Seongwoon; Hou, Rui; Lynch, Jerome P.; Sohn, Hoon; Law, Kincho H.
2017-04-01
This paper describes a cloud-based cyberinfrastructure framework for the management of the diverse data involved in bridge monitoring. Bridge monitoring involves various hardware systems, software tools and laborious activities that include, for examples, a structural health monitoring (SHM), sensor network, engineering analysis programs and visual inspection. Very often, these monitoring systems, tools and activities are not coordinated, and the collected information are not shared. A well-designed integrated data management framework can support the effective use of the data and, thereby, enhance bridge management and maintenance operations. The cloud-based cyberinfrastructure framework presented herein is designed to manage not only sensor measurement data acquired from the SHM system, but also other relevant information, such as bridge engineering model and traffic videos, in an integrated manner. For the scalability and flexibility, cloud computing services and distributed database systems are employed. The information stored can be accessed through standard web interfaces. For demonstration, the cyberinfrastructure system is implemented for the monitoring of the bridges located along the I-275 Corridor in the state of Michigan.
Liebeskind, David S
2016-01-01
Crowdsourcing, an unorthodox approach in medicine, creates an unusual paradigm to study precision cerebrovascular health, eliminating the relative isolation and non-standardized nature of current imaging data infrastructure, while shifting emphasis to the astounding capacity of big data in the cloud. This perspective envisions the use of imaging data of the brain and vessels to orient and seed A Million Brains Initiative™ that may leapfrog incremental advances in stroke and rapidly provide useful data to the sizable population around the globe prone to the devastating effects of stroke and vascular substrates of dementia. Despite such variability in the type of data available and other limitations, the data hierarchy logically starts with imaging and can be enriched with almost endless types and amounts of other clinical and biological data. Crowdsourcing allows an individual to contribute to aggregated data on a population, while preserving their right to specific information about their own brain health. The cloud now offers endless storage, computing prowess, and neuroimaging applications for postprocessing that is searchable and scalable. Collective expertise is a windfall of the crowd in the cloud and particularly valuable in an area such as cerebrovascular health. The rise of precision medicine, rapidly evolving technological capabilities of cloud computing and the global imperative to limit the public health impact of cerebrovascular disease converge in the imaging of A Million Brains Initiative™. Crowdsourcing secure data on brain health may provide ultimate generalizability, enable focused analyses, facilitate clinical practice, and accelerate research efforts.
Design Patterns to Achieve 300x Speedup for Oceanographic Analytics in the Cloud
NASA Astrophysics Data System (ADS)
Jacob, J. C.; Greguska, F. R., III; Huang, T.; Quach, N.; Wilson, B. D.
2017-12-01
We describe how we achieve super-linear speedup over standard approaches for oceanographic analytics on a cluster computer and the Amazon Web Services (AWS) cloud. NEXUS is an open source platform for big data analytics in the cloud that enables this performance through a combination of horizontally scalable data parallelism with Apache Spark and rapid data search, subset, and retrieval with tiled array storage in cloud-aware NoSQL databases like Solr and Cassandra. NEXUS is the engine behind several public portals at NASA and OceanWorks is a newly funded project for the ocean community that will mature and extend this capability for improved data discovery, subset, quality screening, analysis, matchup of satellite and in situ measurements, and visualization. We review the Python language API for Spark and how to use it to quickly convert existing programs to use Spark to run with cloud-scale parallelism, and discuss strategies to improve performance. We explain how partitioning the data over space, time, or both leads to algorithmic design patterns for Spark analytics that can be applied to many different algorithms. We use NEXUS analytics as examples, including area-averaged time series, time averaged map, and correlation map.
Millstone: software for multiplex microbial genome analysis and engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering.
Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.; ...
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
NASA Astrophysics Data System (ADS)
Arabas, S.; Jaruga, A.; Pawlowska, H.; Grabowski, W. W.
2012-12-01
Clouds may influence aerosol characteristics of their environment. The relevant processes include wet deposition (rainout or washout) and cloud condensation nuclei (CCN) recycling through evaporation of cloud droplets and drizzle drops. Recycled CCN physicochemical properties may be altered if the evaporated droplets go through collisional growth or irreversible chemical reactions (e.g. SO2 oxidation). The key challenge of representing these processes in a numerical cloud model stems from the need to track properties of activated CCN throughout the cloud lifecycle. Lack of such "memory" characterises the so-called bulk, multi-moment as well as bin representations of cloud microphysics. In this study we apply the particle-based scheme of Shima et al. 2009. Each modelled particle (aka super-droplet) is a numerical proxy for a multiplicity of real-world CCN, cloud, drizzle or rain particles of the same size, nucleus type,and position. Tracking cloud nucleus properties is an inherent feature of the particle-based frameworks, making them suitable for studying aerosol-cloud-aerosol interactions. The super-droplet scheme is furthermore characterized by linear scalability in the number of computational particles, and no numerical diffusion in the condensational and in the Monte-Carlo type collisional growth schemes. The presentation will focus on processing of aerosol by a drizzling stratocumulus deck. The simulations are carried out using a 2D kinematic framework and a VOCALS experiment inspired set-up (see http://www.rap.ucar.edu/~gthompsn/workshop2012/case1/).
Blind topological measurement-based quantum computation
Morimae, Tomoyuki; Fujii, Keisuke
2012-01-01
Blind quantum computation is a novel secure quantum-computing protocol that enables Alice, who does not have sufficient quantum technology at her disposal, to delegate her quantum computation to Bob, who has a fully fledged quantum computer, in such a way that Bob cannot learn anything about Alice's input, output and algorithm. A recent proof-of-principle experiment demonstrating blind quantum computation in an optical system has raised new challenges regarding the scalability of blind quantum computation in realistic noisy conditions. Here we show that fault-tolerant blind quantum computation is possible in a topologically protected manner using the Raussendorf–Harrington–Goyal scheme. The error threshold of our scheme is 4.3×10−3, which is comparable to that (7.5×10−3) of non-blind topological quantum computation. As the error per gate of the order 10−3 was already achieved in some experimental systems, our result implies that secure cloud quantum computation is within reach. PMID:22948818
NASA Astrophysics Data System (ADS)
Adedayo, Bada; Wang, Qi; Alcaraz Calero, Jose M.; Grecos, Christos
2015-02-01
The recent explosion in video-related Internet traffic has been driven by the widespread use of smart mobile devices, particularly smartphones with advanced cameras that are able to record high-quality videos. Although many of these devices offer the facility to record videos at different spatial and temporal resolutions, primarily with local storage considerations in mind, most users only ever use the highest quality settings. The vast majority of these devices are optimised for compressing the acquired video using a single built-in codec and have neither the computational resources nor battery reserves to transcode the video to alternative formats. This paper proposes a new low-complexity dynamic resource allocation engine for cloud-based video transcoding services that are both scalable and capable of being delivered in real-time. Firstly, through extensive experimentation, we establish resource requirement benchmarks for a wide range of transcoding tasks. The set of tasks investigated covers the most widely used input formats (encoder type, resolution, amount of motion and frame rate) associated with mobile devices and the most popular output formats derived from a comprehensive set of use cases, e.g. a mobile news reporter directly transmitting videos to the TV audience of various video format requirements, with minimal usage of resources both at the reporter's end and at the cloud infrastructure end for transcoding services.
Cloud archiving and data mining of High-Resolution Rapid Refresh forecast model output
NASA Astrophysics Data System (ADS)
Blaylock, Brian K.; Horel, John D.; Liston, Samuel T.
2017-12-01
Weather-related research often requires synthesizing vast amounts of data that need archival solutions that are both economical and viable during and past the lifetime of the project. Public cloud computing services (e.g., from Amazon, Microsoft, or Google) or private clouds managed by research institutions are providing object data storage systems potentially appropriate for long-term archives of such large geophysical data sets. We illustrate the use of a private cloud object store developed by the Center for High Performance Computing (CHPC) at the University of Utah. Since early 2015, we have been archiving thousands of two-dimensional gridded fields (each one containing over 1.9 million values over the contiguous United States) from the High-Resolution Rapid Refresh (HRRR) data assimilation and forecast modeling system. The archive is being used for retrospective analyses of meteorological conditions during high-impact weather events, assessing the accuracy of the HRRR forecasts, and providing initial and boundary conditions for research simulations. The archive is accessible interactively and through automated download procedures for researchers at other institutions that can be tailored by the user to extract individual two-dimensional grids from within the highly compressed files. Characteristics of the CHPC object storage system are summarized relative to network file system storage or tape storage solutions. The CHPC storage system is proving to be a scalable, reliable, extensible, affordable, and usable archive solution for our research.
xQTL workbench: a scalable web environment for multi-level QTL analysis.
Arends, Danny; van der Velde, K Joeri; Prins, Pjotr; Broman, Karl W; Möller, Steffen; Jansen, Ritsert C; Swertz, Morris A
2012-04-01
xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. m.a.swertz@rug.nl.
xQTL workbench: a scalable web environment for multi-level QTL analysis
Arends, Danny; van der Velde, K. Joeri; Prins, Pjotr; Broman, Karl W.; Möller, Steffen; Jansen, Ritsert C.; Swertz, Morris A.
2012-01-01
Summary: xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. Availability: xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. Contact: m.a.swertz@rug.nl PMID:22308096
A Cloud-based Approach to Medical NLP
Chard, Kyle; Russell, Michael; Lussier, Yves A.; Mendonça, Eneida A; Silverstein, Jonathan C.
2011-01-01
Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN. PMID:22195072
A cloud-based approach to medical NLP.
Chard, Kyle; Russell, Michael; Lussier, Yves A; Mendonça, Eneida A; Silverstein, Jonathan C
2011-01-01
Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN.
Bao, Riyue; Hernandez, Kyle; Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge
2015-01-01
Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.
NASA Astrophysics Data System (ADS)
Sisodia, Mitali; Shukla, Abhishek; Pathak, Anirban
2017-12-01
A scheme for distributed quantum measurement that allows nondestructive or indirect Bell measurement was proposed by Gupta et al [1]. In the present work, Gupta et al.'s scheme is experimentally realized using the five-qubit super-conductivity-based quantum computer, which has been recently placed in cloud by IBM Corporation. The experiment confirmed that the Bell state can be constructed and measured in a nondestructive manner with a reasonably high fidelity. A comparison of the outcomes of this study and the results obtained earlier in an NMR-based experiment (Samal et al. (2010) [10]) has also been performed. The study indicates that to make a scalable SQUID-based quantum computer, errors introduced by the gates (in the present technology) have to be reduced considerably.
NASA Astrophysics Data System (ADS)
Chee, T.; Nguyen, L.; Smith, W. L., Jr.; Spangenberg, D.; Palikonda, R.; Bedka, K. M.; Minnis, P.; Thieman, M. M.; Nordeen, M.
2017-12-01
Providing public access to research products including cloud macro and microphysical properties and satellite imagery are a key concern for the NASA Langley Research Center Cloud and Radiation Group. This work describes a web based visualization tool and API that allows end users to easily create customized cloud product and satellite imagery, ground site data and satellite ground track information that is generated dynamically. The tool has two uses, one to visualize the dynamically created imagery and the other to provide access to the dynamically generated imagery directly at a later time. Internally, we leverage our practical experience with large, scalable application practices to develop a system that has the largest potential for scalability as well as the ability to be deployed on the cloud to accommodate scalability issues. We build upon NASA Langley Cloud and Radiation Group's experience with making real-time and historical satellite cloud product information, satellite imagery, ground site data and satellite track information accessible and easily searchable. This tool is the culmination of our prior experience with dynamic imagery generation and provides a way to build a "mash-up" of dynamically generated imagery and related kinds of information that are visualized together to add value to disparate but related information. In support of NASA strategic goals, our group aims to make as much scientific knowledge, observations and products available to the citizen science, research and interested communities as well as for automated systems to acquire the same information for data mining or other analytic purposes. This tool and the underlying API's provide a valuable research tool to a wide audience both as a standalone research tool and also as an easily accessed data source that can easily be mined or used with existing tools.
Rai, Rashmi; Sahoo, Gadadhar; Mehfuz, Shabana
2015-01-01
Today, most of the organizations trust on their age old legacy applications, to support their business-critical systems. However, there are several critical concerns, as maintainability and scalability issues, associated with the legacy system. In this background, cloud services offer a more agile and cost effective platform, to support business applications and IT infrastructure. As the adoption of cloud services has been increasing recently and so has been the academic research in cloud migration. However, there is a genuine need of secondary study to further strengthen this research. The primary objective of this paper is to scientifically and systematically identify, categorize and compare the existing research work in the area of legacy to cloud migration. The paper has also endeavored to consolidate the research on Security issues, which is prime factor hindering the adoption of cloud through classifying the studies on secure cloud migration. SLR (Systematic Literature Review) of thirty selected papers, published from 2009 to 2014 was conducted to properly understand the nuances of the security framework. To categorize the selected studies, authors have proposed a conceptual model for cloud migration which has resulted in a resource base of existing solutions for cloud migration. This study concludes that cloud migration research is in seminal stage but simultaneously it is also evolving and maturing, with increasing participation from academics and industry alike. The paper also identifies the need for a secure migration model, which can fortify organization's trust into cloud migration and facilitate necessary tool support to automate the migration process.
Towards a Cloud Based Smart Traffic Management Framework
NASA Astrophysics Data System (ADS)
Rahimi, M. M.; Hakimpour, F.
2017-09-01
Traffic big data has brought many opportunities for traffic management applications. However several challenges like heterogeneity, storage, management, processing and analysis of traffic big data may hinder their efficient and real-time applications. All these challenges call for well-adapted distributed framework for smart traffic management that can efficiently handle big traffic data integration, indexing, query processing, mining and analysis. In this paper, we present a novel, distributed, scalable and efficient framework for traffic management applications. The proposed cloud computing based framework can answer technical challenges for efficient and real-time storage, management, process and analyse of traffic big data. For evaluation of the framework, we have used OpenStreetMap (OSM) real trajectories and road network on a distributed environment. Our evaluation results indicate that speed of data importing to this framework exceeds 8000 records per second when the size of datasets is near to 5 million. We also evaluate performance of data retrieval in our proposed framework. The data retrieval speed exceeds 15000 records per second when the size of datasets is near to 5 million. We have also evaluated scalability and performance of our proposed framework using parallelisation of a critical pre-analysis in transportation applications. The results show that proposed framework achieves considerable performance and efficiency in traffic management applications.
Cloudgene: A graphical execution platform for MapReduce programs on private and public clouds
2012-01-01
Background The MapReduce framework enables a scalable processing and analyzing of large datasets by distributing the computational load on connected computer nodes, referred to as a cluster. In Bioinformatics, MapReduce has already been adopted to various case scenarios such as mapping next generation sequencing data to a reference genome, finding SNPs from short read data or matching strings in genotype files. Nevertheless, tasks like installing and maintaining MapReduce on a cluster system, importing data into its distributed file system or executing MapReduce programs require advanced knowledge in computer science and could thus prevent scientists from usage of currently available and useful software solutions. Results Here we present Cloudgene, a freely available platform to improve the usability of MapReduce programs in Bioinformatics by providing a graphical user interface for the execution, the import and export of data and the reproducibility of workflows on in-house (private clouds) and rented clusters (public clouds). The aim of Cloudgene is to build a standardized graphical execution environment for currently available and future MapReduce programs, which can all be integrated by using its plug-in interface. Since Cloudgene can be executed on private clusters, sensitive datasets can be kept in house at all time and data transfer times are therefore minimized. Conclusions Our results show that MapReduce programs can be integrated into Cloudgene with little effort and without adding any computational overhead to existing programs. This platform gives developers the opportunity to focus on the actual implementation task and provides scientists a platform with the aim to hide the complexity of MapReduce. In addition to MapReduce programs, Cloudgene can also be used to launch predefined systems (e.g. Cloud BioLinux, RStudio) in public clouds. Currently, five different bioinformatic programs using MapReduce and two systems are integrated and have been successfully deployed. Cloudgene is freely available at http://cloudgene.uibk.ac.at. PMID:22888776
The ATLAS Event Service: A new approach to event processing
NASA Astrophysics Data System (ADS)
Calafiura, P.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.
2015-12-01
The ATLAS Event Service (ES) implements a new fine grained approach to HEP event processing, designed to be agile and efficient in exploiting transient, short-lived resources such as HPC hole-filling, spot market commercial clouds, and volunteer computing. Input and output control and data flows, bookkeeping, monitoring, and data storage are all managed at the event level in an implementation capable of supporting ATLAS-scale distributed processing throughputs (about 4M CPU-hours/day). Input data flows utilize remote data repositories with no data locality or pre-staging requirements, minimizing the use of costly storage in favor of strongly leveraging powerful networks. Object stores provide a highly scalable means of remotely storing the quasi-continuous, fine grained outputs that give ES based applications a very light data footprint on a processing resource, and ensure negligible losses should the resource suddenly vanish. We will describe the motivations for the ES system, its unique features and capabilities, its architecture and the highly scalable tools and technologies employed in its implementation, and its applications in ATLAS processing on HPCs, commercial cloud resources, volunteer computing, and grid resources. Notice: This manuscript has been authored by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
A Highly Scalable Data Service (HSDS) using Cloud-based Storage Technologies for Earth Science Data
NASA Astrophysics Data System (ADS)
Michaelis, A.; Readey, J.; Votava, P.; Henderson, J.; Willmore, F.
2017-12-01
Cloud based infrastructure may offer several key benefits of scalability, built in redundancy, security mechanisms and reduced total cost of ownership as compared with a traditional data center approach. However, most of the tools and legacy software systems developed for online data repositories within the federal government were not developed with a cloud based infrastructure in mind and do not fully take advantage of commonly available cloud-based technologies. Moreover, services bases on object storage are well established and provided through all the leading cloud service providers (Amazon Web Service, Microsoft Azure, Google Cloud, etc…) of which can often provide unmatched "scale-out" capabilities and data availability to a large and growing consumer base at a price point unachievable from in-house solutions. We describe a system that utilizes object storage rather than traditional file system based storage to vend earth science data. The system described is not only cost effective, but shows a performance advantage for running many different analytics tasks in the cloud. To enable compatibility with existing tools and applications, we outline client libraries that are API compatible with existing libraries for HDF5 and NetCDF4. Performance of the system is demonstrated using clouds services running on Amazon Web Services.
NASA Astrophysics Data System (ADS)
Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.
2012-12-01
Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
Waggle: A Framework for Intelligent Attentive Sensing and Actuation
NASA Astrophysics Data System (ADS)
Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.
2014-12-01
Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.
AWE-WQ: fast-forwarding molecular dynamics using the accelerated weighted ensemble.
Abdul-Wahid, Badi'; Feng, Haoyun; Rajan, Dinesh; Costaouec, Ronan; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A
2014-10-27
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy.
AWE-WQ: Fast-Forwarding Molecular Dynamics Using the Accelerated Weighted Ensemble
2015-01-01
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This is due to the rarity of observing transitions between metastable states since high energy barriers trap the system in these states. Recently the weighted ensemble (WE) family of methods have emerged which can flexibly and efficiently sample conformational space without being trapped and allow calculation of unbiased rates. However, while WE can sample correctly and efficiently, a scalable implementation applicable to interesting biomolecular systems is not available. We provide here a GPLv2 implementation called AWE-WQ of a WE algorithm using the master/worker distributed computing WorkQueue (WQ) framework. AWE-WQ is scalable to thousands of nodes and supports dynamic allocation of computer resources, heterogeneous resource usage (such as central processing units (CPU) and graphical processing units (GPUs) concurrently), seamless heterogeneous cluster usage (i.e., campus grids and cloud providers), and support for arbitrary MD codes such as GROMACS, while ensuring that all statistics are unbiased. We applied AWE-WQ to a 34 residue protein which simulated 1.5 ms over 8 months with peak aggregate performance of 1000 ns/h. Comparison was done with a 200 μs simulation collected on a GPU over a similar timespan. The folding and unfolded rates were of comparable accuracy. PMID:25207854
Developing cloud applications using the e-Science Central platform.
Hiden, Hugo; Woodman, Simon; Watson, Paul; Cala, Jacek
2013-01-28
This paper describes the e-Science Central (e-SC) cloud data processing system and its application to a number of e-Science projects. e-SC provides both software as a service (SaaS) and platform as a service for scientific data management, analysis and collaboration. It is a portable system and can be deployed on both private (e.g. Eucalyptus) and public clouds (Amazon AWS and Microsoft Windows Azure). The SaaS application allows scientists to upload data, edit and run workflows and share results in the cloud, using only a Web browser. It is underpinned by a scalable cloud platform consisting of a set of components designed to support the needs of scientists. The platform is exposed to developers so that they can easily upload their own analysis services into the system and make these available to other users. A representational state transfer-based application programming interface (API) is also provided so that external applications can leverage the platform's functionality, making it easier to build scalable, secure cloud-based applications. This paper describes the design of e-SC, its API and its use in three different case studies: spectral data visualization, medical data capture and analysis, and chemical property prediction.
Developing cloud applications using the e-Science Central platform
Hiden, Hugo; Woodman, Simon; Watson, Paul; Cala, Jacek
2013-01-01
This paper describes the e-Science Central (e-SC) cloud data processing system and its application to a number of e-Science projects. e-SC provides both software as a service (SaaS) and platform as a service for scientific data management, analysis and collaboration. It is a portable system and can be deployed on both private (e.g. Eucalyptus) and public clouds (Amazon AWS and Microsoft Windows Azure). The SaaS application allows scientists to upload data, edit and run workflows and share results in the cloud, using only a Web browser. It is underpinned by a scalable cloud platform consisting of a set of components designed to support the needs of scientists. The platform is exposed to developers so that they can easily upload their own analysis services into the system and make these available to other users. A representational state transfer-based application programming interface (API) is also provided so that external applications can leverage the platform's functionality, making it easier to build scalable, secure cloud-based applications. This paper describes the design of e-SC, its API and its use in three different case studies: spectral data visualization, medical data capture and analysis, and chemical property prediction. PMID:23230161
Monte Carlo verification of radiotherapy treatments with CloudMC.
Miras, Hector; Jiménez, Rubén; Perales, Álvaro; Terrón, José Antonio; Bertolet, Alejandro; Ortiz, Antonio; Macías, José
2018-06-27
A new implementation has been made on CloudMC, a cloud-based platform presented in a previous work, in order to provide services for radiotherapy treatment verification by means of Monte Carlo in a fast, easy and economical way. A description of the architecture of the application and the new developments implemented is presented together with the results of the tests carried out to validate its performance. CloudMC has been developed over Microsoft Azure cloud. It is based on a map/reduce implementation for Monte Carlo calculations distribution over a dynamic cluster of virtual machines in order to reduce calculation time. CloudMC has been updated with new methods to read and process the information related to radiotherapy treatment verification: CT image set, treatment plan, structures and dose distribution files in DICOM format. Some tests have been designed in order to determine, for the different tasks, the most suitable type of virtual machines from those available in Azure. Finally, the performance of Monte Carlo verification in CloudMC is studied through three real cases that involve different treatment techniques, linac models and Monte Carlo codes. Considering computational and economic factors, D1_v2 and G1 virtual machines were selected as the default type for the Worker Roles and the Reducer Role respectively. Calculation times up to 33 min and costs of 16 € were achieved for the verification cases presented when a statistical uncertainty below 2% (2σ) was required. The costs were reduced to 3-6 € when uncertainty requirements are relaxed to 4%. Advantages like high computational power, scalability, easy access and pay-per-usage model, make Monte Carlo cloud-based solutions, like the one presented in this work, an important step forward to solve the long-lived problem of truly introducing the Monte Carlo algorithms in the daily routine of the radiotherapy planning process.
Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min
2015-06-01
The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-01-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-06-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Limaye, Ashutosh S.; Molthan, Andrew L.; Srikishen, Jayanthi
2010-01-01
The development of the Nebula Cloud Computing Platform at NASA Ames Research Center provides an open-source solution for the deployment of scalable computing and storage capabilities relevant to the execution of real-time weather forecasts and the distribution of high resolution satellite data to the operational weather community. Two projects at Marshall Space Flight Center may benefit from use of the Nebula system. The NASA Short-term Prediction Research and Transition (SPoRT) Center facilitates the use of unique NASA satellite data and research capabilities in the operational weather community by providing datasets relevant to numerical weather prediction, and satellite data sets useful in weather analysis. SERVIR provides satellite data products for decision support, emphasizing environmental threats such as wildfires, floods, landslides, and other hazards, with interests in numerical weather prediction in support of disaster response. The Weather Research and Forecast (WRF) model Environmental Modeling System (WRF-EMS) has been configured for Nebula cloud computing use via the creation of a disk image and deployment of repeated instances. Given the available infrastructure within Nebula and the "infrastructure as a service" concept, the system appears well-suited for the rapid deployment of additional forecast models over different domains, in response to real-time research applications or disaster response. Future investigations into Nebula capabilities will focus on the development of a web mapping server and load balancing configuration to support the distribution of high resolution satellite data sets to users within the National Weather Service and international partners of SERVIR.
CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping.
Nguyen, Tung; Shi, Weisong; Ruden, Douglas
2011-06-06
Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http://mine.cs.wayne.edu:8080/CloudAligner/. Our results show that CloudAligner is faster than CloudBurst, provides more accurate results than RMAP, and supports various input as well as output formats. In addition, with the web-based interface, it is easier to use than its counterparts.
A Cloud-Based Infrastructure for Near-Real-Time Processing and Dissemination of NPP Data
NASA Astrophysics Data System (ADS)
Evans, J. D.; Valente, E. G.; Chettri, S. S.
2011-12-01
We are building a scalable cloud-based infrastructure for generating and disseminating near-real-time data products from a variety of geospatial and meteorological data sources, including the new National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP). Our approach relies on linking Direct Broadcast and other data streams to a suite of scientific algorithms coordinated by NASA's International Polar-Orbiter Processing Package (IPOPP). The resulting data products are directly accessible to a wide variety of end-user applications, via industry-standard protocols such as OGC Web Services, Unidata Local Data Manager, or OPeNDAP, using open source software components. The processing chain employs on-demand computing resources from Amazon.com's Elastic Compute Cloud and NASA's Nebula cloud services. Our current prototype targets short-term weather forecasting, in collaboration with NASA's Short-term Prediction Research and Transition (SPoRT) program and the National Weather Service. Direct Broadcast is especially crucial for NPP, whose current ground segment is unlikely to deliver data quickly enough for short-term weather forecasters and other near-real-time users. Direct Broadcast also allows full local control over data handling, from the receiving antenna to end-user applications: this provides opportunities to streamline processes for data ingest, processing, and dissemination, and thus to make interpreted data products (Environmental Data Records) available to practitioners within minutes of data capture at the sensor. Cloud computing lets us grow and shrink computing resources to meet large and rapid fluctuations in data availability (twice daily for polar orbiters) - and similarly large fluctuations in demand from our target (near-real-time) users. This offers a compelling business case for cloud computing: the processing or dissemination systems can grow arbitrarily large to sustain near-real time data access despite surges in data volumes or user demand, but that computing capacity (and hourly costs) can be dropped almost instantly once the surge passes. Cloud computing also allows low-risk experimentation with a variety of machine architectures (processor types; bandwidth, memory, and storage capacities, etc.) and of system configurations (including massively parallel computing patterns). Finally, our service-based approach (in which user applications invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored products on demand. To maximize the usefulness and impact of our technology, we have emphasized open, industry-standard software interfaces. We are also using and developing open source software to facilitate the widespread adoption of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sources.
Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge
2015-01-01
Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud. PMID:26271043
Considerations for Software Defined Networking (SDN): Approaches and use cases
NASA Astrophysics Data System (ADS)
Bakshi, K.
Software Defined Networking (SDN) is an evolutionary approach to network design and functionality based on the ability to programmatically modify the behavior of network devices. SDN uses user-customizable and configurable software that's independent of hardware to enable networked systems to expand data flow control. SDN is in large part about understanding and managing a network as a unified abstraction. It will make networks more flexible, dynamic, and cost-efficient, while greatly simplifying operational complexity. And this advanced solution provides several benefits including network and service customizability, configurability, improved operations, and increased performance. There are several approaches to SDN and its practical implementation. Among them, two have risen to prominence with differences in pedigree and implementation. This paper's main focus will be to define, review, and evaluate salient approaches and use cases of the OpenFlow and Virtual Network Overlay approaches to SDN. OpenFlow is a communication protocol that gives access to the forwarding plane of a network's switches and routers. The Virtual Network Overlay relies on a completely virtualized network infrastructure and services to abstract the underlying physical network, which allows the overlay to be mobile to other physical networks. This is an important requirement for cloud computing, where applications and associated network services are migrated to cloud service providers and remote data centers on the fly as resource demands dictate. The paper will discuss how and where SDN can be applied and implemented, including research and academia, virtual multitenant data center, and cloud computing applications. Specific attention will be given to the cloud computing use case, where automated provisioning and programmable overlay for scalable multi-tenancy is leveraged via the SDN approach.
Software-defined optical network for metro-scale geographically distributed data centers.
Samadi, Payman; Wen, Ke; Xu, Junjie; Bergman, Keren
2016-05-30
The emergence of cloud computing and big data has rapidly increased the deployment of small and mid-sized data centers. Enterprises and cloud providers require an agile network among these data centers to empower application reliability and flexible scalability. We present a software-defined inter data center network to enable on-demand scale out of data centers on a metro-scale optical network. The architecture consists of a combined space/wavelength switching platform and a Software-Defined Networking (SDN) control plane equipped with a wavelength and routing assignment module. It enables establishing transparent and bandwidth-selective connections from L2/L3 switches, on-demand. The architecture is evaluated in a testbed consisting of 3 data centers, 5-25 km apart. We successfully demonstrated end-to-end bulk data transfer and Virtual Machine (VM) migrations across data centers with less than 100 ms connection setup time and close to full link capacity utilization.
Cloud-based large-scale air traffic flow optimization
NASA Astrophysics Data System (ADS)
Cao, Yi
The ever-increasing traffic demand makes the efficient use of airspace an imperative mission, and this paper presents an effort in response to this call. Firstly, a new aggregate model, called Link Transmission Model (LTM), is proposed, which models the nationwide traffic as a network of flight routes identified by origin-destination pairs. The traversal time of a flight route is assumed to be the mode of distribution of historical flight records, and the mode is estimated by using Kernel Density Estimation. As this simplification abstracts away physical trajectory details, the complexity of modeling is drastically decreased, resulting in efficient traffic forecasting. The predicative capability of LTM is validated against recorded traffic data. Secondly, a nationwide traffic flow optimization problem with airport and en route capacity constraints is formulated based on LTM. The optimization problem aims at alleviating traffic congestions with minimal global delays. This problem is intractable due to millions of variables. A dual decomposition method is applied to decompose the large-scale problem such that the subproblems are solvable. However, the whole problem is still computational expensive to solve since each subproblem is an smaller integer programming problem that pursues integer solutions. Solving an integer programing problem is known to be far more time-consuming than solving its linear relaxation. In addition, sequential execution on a standalone computer leads to linear runtime increase when the problem size increases. To address the computational efficiency problem, a parallel computing framework is designed which accommodates concurrent executions via multithreading programming. The multithreaded version is compared with its monolithic version to show decreased runtime. Finally, an open-source cloud computing framework, Hadoop MapReduce, is employed for better scalability and reliability. This framework is an "off-the-shelf" parallel computing model that can be used for both offline historical traffic data analysis and online traffic flow optimization. It provides an efficient and robust platform for easy deployment and implementation. A small cloud consisting of five workstations was configured and used to demonstrate the advantages of cloud computing in dealing with large-scale parallelizable traffic problems.
Hydrodynamics and Water Quality forecasting over a Cloud Computing environment: INDIGO-DataCloud
NASA Astrophysics Data System (ADS)
Aguilar Gómez, Fernando; de Lucas, Jesús Marco; García, Daniel; Monteoliva, Agustín
2017-04-01
Algae Bloom due to eutrophication is an extended problem for water reservoirs and lakes that impacts directly in water quality. It can create a dead zone that lacks enough oxygen to support life and it can also be human harmful, so it must be controlled in water masses for supplying, bathing or other uses. Hydrodynamic and Water Quality modelling can contribute to forecast the status of the water system in order to alert authorities before an algae bloom event occurs. It can be used to predict scenarios and find solutions to reduce the harmful impact of the blooms. High resolution models need to process a big amount of data using a robust enough computing infrastructure. INDIGO-DataCloud (https://www.indigo-datacloud.eu/) is an European Commission funded project that aims at developing a data and computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid (private or public) e-infrastructures. The project addresses the development of solutions for different Case Studies using different Cloud-based alternatives. In the first INDIGO software release, a set of components are ready to manage the deployment of services to perform N number of Delft3D simulations (for calibrating or scenario definition) over a Cloud Computing environment, using the Docker technology: TOSCA requirement description, Docker repository, Orchestrator, AAI (Authorization, Authentication) and OneData (Distributed Storage System). Moreover, the Future Gateway portal based on Liferay, provides an user-friendly interface where the user can configure the simulations. Due to the data approach of INDIGO, the developed solutions can contribute to manage the full data life cycle of a project, thanks to different tools to manage datasets or even metadata. Furthermore, the cloud environment contributes to provide a dynamic, scalable and easy-to-use framework for non-IT experts users. This framework is potentially capable to automatize the processing of forecasting applying periodic tasks. For instance, a user can forecast every month the hydrodynamics and water quality status of a reservoir starting from a base model and supplying new data gathered from the instrumentation or observations. This interactive presentation aims to show the use of INDIGO solutions in a particular forecasting use case and to inspire others in the use of a Cloud framework for their applications.
NASA Astrophysics Data System (ADS)
Biercamp, Joachim; Adamidis, Panagiotis; Neumann, Philipp
2017-04-01
With the exa-scale era approaching, length and time scales used for climate research on one hand and numerical weather prediction on the other hand blend into each other. The Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE) represents a European consortium comprising partners from climate, weather and HPC in their effort to address key scientific challenges that both communities have in common. A particular challenge is to reach global models with spatial resolutions that allow simulating convective clouds and small-scale ocean eddies. These simulations would produce better predictions of trends and provide much more fidelity in the representation of high-impact regional events. However, running such models in operational mode, i.e with sufficient throughput in ensemble mode clearly will require exa-scale computing and data handling capability. We will discuss the ESiWACE initiative and relate it to work-in-progress on high-resolution simulations in Europe. We present recent strong scalability measurements from ESiWACE to demonstrate current computability in weather and climate simulation. A special focus in this particular talk is on the Icosahedal Nonhydrostatic (ICON) model used for a comparison of high resolution regional and global simulations with high quality observation data. We demonstrate that close-to-optimal parallel efficiency can be achieved in strong scaling global resolution experiments on Mistral/DKRZ, e.g. 94% for 5km resolution simulations using 36k cores on Mistral/DKRZ. Based on our scalability and high-resolution experiments, we deduce and extrapolate future capabilities for ICON that are expected for weather and climate research at exascale.
Closha: bioinformatics workflow system for the analysis of massive sequencing data.
Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook
2018-02-19
While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .
The Cancer Analysis Virtual Machine (CAVM) project will leverage cloud technology, the UCSC Cancer Genomics Browser, and the Galaxy analysis workflow system to provide investigators with a flexible, scalable platform for hosting, visualizing and analyzing their own genomic data.
Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
Niemenmaa, Matti; Kallio, Aleksi; Schumacher, André; Klemelä, Petri; Korpelainen, Eija; Heljanko, Keijo
2012-01-01
Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps. Availability: Available under the open-source MIT license at http://sourceforge.net/projects/hadoop-bam/ Contact: matti.niemenmaa@aalto.fi Supplementary information: Supplementary material is available at Bioinformatics online. PMID:22302568
Rucio, the next-generation Data Management system in ATLAS
NASA Astrophysics Data System (ADS)
Serfon, C.; Barisits, M.; Beermann, T.; Garonne, V.; Goossens, L.; Lassnig, M.; Nairz, A.; Vigne, R.; ATLAS Collaboration
2016-04-01
Rucio is the next-generation of Distributed Data Management (DDM) system benefiting from recent advances in cloud and ;Big Data; computing to address HEP experiments scaling requirements. Rucio is an evolution of the ATLAS DDM system Don Quixote 2 (DQ2), which has demonstrated very large scale data management capabilities with more than 160 petabytes spread worldwide across 130 sites, and accesses from 1,000 active users. However, DQ2 is reaching its limits in terms of scalability, requiring a large number of support staff to operate and being hard to extend with new technologies. Rucio addresses these issues by relying on new technologies to ensure system scalability, cover new user requirements and employ new automation framework to reduce operational overheads. This paper shows the key concepts of Rucio, details the Rucio design, and the technology it employs, the tests that were conducted to validate it and finally describes the migration steps that were conducted to move from DQ2 to Rucio.
Dashti, Ali; Komarov, Ivan; D'Souza, Roshan M
2013-01-01
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
System and Method for Providing a Climate Data Persistence Service
NASA Technical Reports Server (NTRS)
Schnase, John L. (Inventor); Ripley, III, William David (Inventor); Duffy, Daniel Q. (Inventor); Thompson, John H. (Inventor); Strong, Savannah L. (Inventor); McInerney, Mark (Inventor); Sinno, Scott (Inventor); Tamkin, Glenn S. (Inventor); Nadeau, Denis (Inventor)
2018-01-01
A system, method and computer-readable storage devices for providing a climate data persistence service. A system configured to provide the service can include a climate data server that performs data and metadata storage and management functions for climate data objects, a compute-storage platform that provides the resources needed to support a climate data server, provisioning software that allows climate data server instances to be deployed as virtual climate data servers in a cloud computing environment, and a service interface, wherein persistence service capabilities are invoked by software applications running on a client device. The climate data objects can be in various formats, such as International Organization for Standards (ISO) Open Archival Information System (OAIS) Reference Model Submission Information Packages, Archive Information Packages, and Dissemination Information Packages. The climate data server can enable scalable, federated storage, management, discovery, and access, and can be tailored for particular use cases.
High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn
2014-11-14
Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbormore » points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.« less
GSKY: A scalable distributed geospatial data server on the cloud
NASA Astrophysics Data System (ADS)
Rozas Larraondo, Pablo; Pringle, Sean; Antony, Joseph; Evans, Ben
2017-04-01
Earth systems, environmental and geophysical datasets are an extremely valuable sources of information about the state and evolution of the Earth. Being able to combine information coming from different geospatial collections is in increasing demand by the scientific community, and requires managing and manipulating data with different formats and performing operations such as map reprojections, resampling and other transformations. Due to the large data volume inherent in these collections, storing multiple copies of them is unfeasible and so such data manipulation must be performed on-the-fly using efficient, high performance techniques. Ideally this should be performed using a trusted data service and common system libraries to ensure wide use and reproducibility. Recent developments in distributed computing based on dynamic access to significant cloud infrastructure opens the door for such new ways of processing geospatial data on demand. The National Computational Infrastructure (NCI), hosted at the Australian National University (ANU), has over 10 Petabytes of nationally significant research data collections. Some of these collections, which comprise a variety of observed and modelled geospatial data, are now made available via a highly distributed geospatial data server, called GSKY (pronounced [jee-skee]). GSKY supports on demand processing of large geospatial data products such as satellite earth observation data as well as numerical weather products, allowing interactive exploration and analysis of the data. It dynamically and efficiently distributes the required computations among cloud nodes providing a scalable analysis framework that can adapt to serve large number of concurrent users. Typical geospatial workflows handling different file formats and data types, or blending data in different coordinate projections and spatio-temporal resolutions, is handled transparently by GSKY. This is achieved by decoupling the data ingestion and indexing process as an independent service. An indexing service crawls data collections either locally or remotely by extracting, storing and indexing all spatio-temporal metadata associated with each individual record. GSKY provides the user with the ability of specifying how ingested data should be aggregated, transformed and presented. It presents an OGC standards-compliant interface, allowing ready accessibility for users of the data via Web Map Services (WMS), Web Processing Services (WPS) or raw data arrays using Web Coverage Services (WCS). The presentation will show some cases where we have used this new capability to provide a significant improvement over previous approaches.
CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.
Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian
2017-04-27
The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
fastBMA: scalable network inference and transitive reduction.
Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee
2017-10-01
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/). © The Authors 2017. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Jiang, Guodong; Fan, Ming; Li, Lihua
2016-03-01
Mammography is the gold standard for breast cancer screening, reducing mortality by about 30%. The application of a computer-aided detection (CAD) system to assist a single radiologist is important to further improve mammographic sensitivity for breast cancer detection. In this study, a design and realization of the prototype for remote diagnosis system in mammography based on cloud platform were proposed. To build this system, technologies were utilized including medical image information construction, cloud infrastructure and human-machine diagnosis model. Specifically, on one hand, web platform for remote diagnosis was established by J2EE web technology. Moreover, background design was realized through Hadoop open-source framework. On the other hand, storage system was built up with Hadoop distributed file system (HDFS) technology which enables users to easily develop and run on massive data application, and give full play to the advantages of cloud computing which is characterized by high efficiency, scalability and low cost. In addition, the CAD system was realized through MapReduce frame. The diagnosis module in this system implemented the algorithms of fusion of machine and human intelligence. Specifically, we combined results of diagnoses from doctors' experience and traditional CAD by using the man-machine intelligent fusion model based on Alpha-Integration and multi-agent algorithm. Finally, the applications on different levels of this system in the platform were also discussed. This diagnosis system will have great importance for the balanced health resource, lower medical expense and improvement of accuracy of diagnosis in basic medical institutes.
Leveraging the Cloud for Integrated Network Experimentation
2014-03-01
kernel settings, or any of the low-level subcomponents. 3. Scalable Solutions: Businesses can build scalable solutions for their clients , ranging from...values. These values 13 can assume several distributions that include normal, Pareto , uniform, exponential and Poisson, among others [21]. Additionally, D...communication, the web client establishes a connection to the server before traffic begins to flow. Web servers do not initiate connections to clients in
Scalable computing for evolutionary genomics.
Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert
2012-01-01
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Efficient universal quantum channel simulation in IBM's cloud quantum computer
NASA Astrophysics Data System (ADS)
Wei, Shi-Jie; Xin, Tao; Long, Gui-Lu
2018-07-01
The study of quantum channels is an important field and promises a wide range of applications, because any physical process can be represented as a quantum channel that transforms an initial state into a final state. Inspired by the method of performing non-unitary operators by the linear combination of unitary operations, we proposed a quantum algorithm for the simulation of the universal single-qubit channel, described by a convex combination of "quasi-extreme" channels corresponding to four Kraus operators, and is scalable to arbitrary higher dimension. We demonstrated the whole algorithm experimentally using the universal IBM cloud-based quantum computer and studied the properties of different qubit quantum channels. We illustrated the quantum capacity of the general qubit quantum channels, which quantifies the amount of quantum information that can be protected. The behavior of quantum capacity in different channels revealed which types of noise processes can support information transmission, and which types are too destructive to protect information. There was a general agreement between the theoretical predictions and the experiments, which strongly supports our method. By realizing the arbitrary qubit channel, this work provides a universally- accepted way to explore various properties of quantum channels and novel prospect for quantum communication.
NASA Technical Reports Server (NTRS)
Schnase, John L.; Tamkin, Glenn S.; Ripley, W. David III; Stong, Savannah; Gill, Roger; Duffy, Daniel Q.
2012-01-01
Scientific data services are becoming an important part of the NASA Center for Climate Simulation's mission. Our technological response to this expanding role is built around the concept of a Virtual Climate Data Server (vCDS), repetitive provisioning, image-based deployment and distribution, and virtualization-as-a-service. The vCDS is an iRODS-based data server specialized to the needs of a particular data-centric application. We use RPM scripts to build vCDS images in our local computing environment, our local Virtual Machine Environment, NASA s Nebula Cloud Services, and Amazon's Elastic Compute Cloud. Once provisioned into one or more of these virtualized resource classes, vCDSs can use iRODS s federation capabilities to create an integrated ecosystem of managed collections that is scalable and adaptable to changing resource requirements. This approach enables platform- or software-asa- service deployment of vCDS and allows the NCCS to offer virtualization-as-a-service: a capacity to respond in an agile way to new customer requests for data services.
omniClassifier: a Desktop Grid Computing System for Big Data Prediction Modeling
Phan, John H.; Kothari, Sonal; Wang, May D.
2016-01-01
Robust prediction models are important for numerous science, engineering, and biomedical applications. However, best-practice procedures for optimizing prediction models can be computationally complex, especially when choosing models from among hundreds or thousands of parameter choices. Computational complexity has further increased with the growth of data in these fields, concurrent with the era of “Big Data”. Grid computing is a potential solution to the computational challenges of Big Data. Desktop grid computing, which uses idle CPU cycles of commodity desktop machines, coupled with commercial cloud computing resources can enable research labs to gain easier and more cost effective access to vast computing resources. We have developed omniClassifier, a multi-purpose prediction modeling application that provides researchers with a tool for conducting machine learning research within the guidelines of recommended best-practices. omniClassifier is implemented as a desktop grid computing system using the Berkeley Open Infrastructure for Network Computing (BOINC) middleware. In addition to describing implementation details, we use various gene expression datasets to demonstrate the potential scalability of omniClassifier for efficient and robust Big Data prediction modeling. A prototype of omniClassifier can be accessed at http://omniclassifier.bme.gatech.edu/. PMID:27532062
DInSAR time series generation within a cloud computing environment: from ERS to Sentinel-1 scenario
NASA Astrophysics Data System (ADS)
Casu, Francesco; Elefante, Stefano; Imperatore, Pasquale; Lanari, Riccardo; Manunta, Michele; Zinno, Ivana; Mathot, Emmanuel; Brito, Fabrice; Farres, Jordi; Lengert, Wolfgang
2013-04-01
One of the techniques that will strongly benefit from the advent of the Sentinel-1 system is Differential SAR Interferometry (DInSAR), which has successfully demonstrated to be an effective tool to detect and monitor ground displacements with centimetre accuracy. The geoscience communities (volcanology, seismicity, …), as well as those related to hazard monitoring and risk mitigation, make extensively use of the DInSAR technique and they will take advantage from the huge amount of SAR data acquired by Sentinel-1. Indeed, such an information will successfully permit the generation of Earth's surface displacement maps and time series both over large areas and long time span. However, the issue of managing, processing and analysing the large Sentinel data stream is envisaged by the scientific community to be a major bottleneck, particularly during crisis phases. The emerging need of creating a common ecosystem in which data, results and processing tools are shared, is envisaged to be a successful way to address such a problem and to contribute to the information and knowledge spreading. The Supersites initiative as well as the ESA SuperSites Exploitation Platform (SSEP) and the ESA Cloud Computing Operational Pilot (CIOP) projects provide effective answers to this need and they are pushing towards the development of such an ecosystem. It is clear that all the current and existent tools for querying, processing and analysing SAR data are required to be not only updated for managing the large data stream of Sentinel-1 satellite, but also reorganized for quickly replying to the simultaneous and highly demanding user requests, mainly during emergency situations. This translates into the automatic and unsupervised processing of large amount of data as well as the availability of scalable, widely accessible and high performance computing capabilities. The cloud computing environment permits to achieve all of these objectives, particularly in case of spike and peak requests of processing resources linked to disaster events. This work aims at presenting a parallel computational model for the widely used DInSAR algorithm named as Small BAseline Subset (SBAS), which has been implemented within the cloud computing environment provided by the ESA-CIOP platform. This activity has resulted in developing a scalable, unsupervised, portable, and widely accessible (through a web portal) parallel DInSAR computational tool. The activity has rewritten and developed the SBAS application algorithm within a parallel system environment, i.e., in a form that allows us to benefit from multiple processing units. This requires the devising a parallel version of the SBAS algorithm and its subsequent implementation, implying additional complexity in algorithm designing and an efficient multi processor programming, with the final aim of a parallel performance optimization. Although the presented algorithm has been designed to work with Sentinel-1 data, it can also process other satellite SAR data (ERS, ENVISAT, CSK, TSX, ALOS). Indeed, the performance analysis of the implemented SBAS parallel version has been tested on the full ASAR archive (64 acquisitions) acquired over the Napoli Bay, a volcanic and densely urbanized area in Southern Italy. The full processing - from the raw data download to the generation of DInSAR time series - has been carried out by engaging 4 nodes, each one with 2 cores and 16 GB of RAM, and has taken about 36 hours, with respect to about 135 hours of the sequential version. Extensive analysis on other test areas significant from DInSAR and geophysical viewpoint will be presented. Finally, preliminary performance evaluation of the presented approach within the Sentinel-1 scenario will be provided.
Cloud-based interactive analytics for terabytes of genomic variants data.
Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S
2017-12-01
Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. cuiping@stanford.edu or ptsao@stanford.edu. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
Zhao, Shanrong; Prenger, Kurt; Smith, Lance
2013-01-01
RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets. PMID:25937948
NASA Astrophysics Data System (ADS)
Papageorgas, Panagiotis G.; Agavanakis, Kyriakos; Dogas, Ioannis; Piromalis, Dimitrios D.
2018-05-01
A cloud-based architecture is presented for the internetworking of sensors and actuators through a universal gateway, network server and application user interface design. The proposed approach targets to Energy Efficiency and sustainability in a holistic way, by integrating an open-source test bed prototype based on long-range low-bandwidth wireless networking technology for sensing and actuation as the elementary block of a viable, cost-effective and reliable solution. The prototype presented is capable of supporting both sensors and actuators, processing data locally and transmitting the results of the imposed computations to a higher level node. Additionally, it is combined with a service-oriented architecture and involves publish/subscribe middleware protocols and cloud technology to confront with the system needs in terms of data volume and processing power. In this context, the integration of instant message (chat) services is demonstrated so that they can be part of an emerging global-scope eco-system of Cyber-Physical Systems to support a wide variety of IoT applications, with strong advantages such as usability, scalability and security, while adopting a unified gateway design and a simple - yet powerful - user interface.
Zhao, Shanrong; Prenger, Kurt; Smith, Lance
2013-01-01
RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.
Cloud-based interactive analytics for terabytes of genomic variants data
Pan, Cuiping; McInnes, Gregory; Deflaux, Nicole; Snyder, Michael; Bingham, Jonathan; Datta, Somalee; Tsao, Philip S
2017-01-01
Abstract Motivation Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. Results We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. Availability and implementation Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. Contact cuiping@stanford.edu or ptsao@stanford.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28961771
Leveraging Cloud Computing to Improve Storage Durability, Availability, and Cost for MER Maestro
NASA Technical Reports Server (NTRS)
Chang, George W.; Powell, Mark W.; Callas, John L.; Torres, Recaredo J.; Shams, Khawaja S.
2012-01-01
The Maestro for MER (Mars Exploration Rover) software is the premiere operation and activity planning software for the Mars rovers, and it is required to deliver all of the processed image products to scientists on demand. These data span multiple storage arrays sized at 2 TB, and a backup scheme ensures data is not lost. In a catastrophe, these data would currently recover at 20 GB/hour, taking several days for a restoration. A seamless solution provides access to highly durable, highly available, scalable, and cost-effective storage capabilities. This approach also employs a novel technique that enables storage of the majority of data on the cloud and some data locally. This feature is used to store the most recent data locally in order to guarantee utmost reliability in case of an outage or disconnect from the Internet. This also obviates any changes to the software that generates the most recent data set as it still has the same interface to the file system as it did before updates
ASDC Advances in the Utilization of Microservices and Hybrid Cloud Environments
NASA Astrophysics Data System (ADS)
Baskin, W. E.; Herbert, A.; Mazaika, A.; Walter, J.
2017-12-01
The Atmospheric Science Data Center (ASDC) is transitioning many of its software tools and applications to standalone microservices deployable in a hybrid cloud, offering benefits such as scalability and efficient environment management. This presentation features several projects the ASDC staff have implemented leveraging the OpenShift Container Application Platform and OpenStack Hybrid Cloud Environment focusing on key tools and techniques applied to: Earth Science data processing Spatial-Temporal metadata generation, validation, repair, and curation Archived Data discovery, visualization, and access
A Kenyan Cloud School. Massive Open Online & Ongoing Courses for Blended and Lifelong Learning
ERIC Educational Resources Information Center
Jobe, William
2013-01-01
This research describes the predicted outcomes of a Kenyan Cloud School (KCS), which is a MOOC that contains all courses taught at the secondary school level in Kenya. This MOOC will consist of online, ongoing subjects in both English and Kiswahili. The KCS subjects offer self-testing and peer assessment to maximize scalability, and digital badges…
Secure and Lightweight Cloud-Assisted Video Reporting Protocol over 5G-Enabled Vehicular Networks
2017-01-01
In the vehicular networks, the real-time video reporting service is used to send the recorded videos in the vehicle to the cloud. However, when facilitating the real-time video reporting service in the vehicular networks, the usage of the fourth generation (4G) long term evolution (LTE) was proved to suffer from latency while the IEEE 802.11p standard does not offer sufficient scalability for a such congested environment. To overcome those drawbacks, the fifth-generation (5G)-enabled vehicular network is considered as a promising technology for empowering the real-time video reporting service. In this paper, we note that security and privacy related issues should also be carefully addressed to boost the early adoption of 5G-enabled vehicular networks. There exist a few research works for secure video reporting service in 5G-enabled vehicular networks. However, their usage is limited because of public key certificates and expensive pairing operations. Thus, we propose a secure and lightweight protocol for cloud-assisted video reporting service in 5G-enabled vehicular networks. Compared to the conventional public key certificates, the proposed protocol achieves entities’ authorization through anonymous credential. Also, by using lightweight security primitives instead of expensive bilinear pairing operations, the proposed protocol minimizes the computational overhead. From the evaluation results, we show that the proposed protocol takes the smaller computation and communication time for the cryptographic primitives than that of the well-known Eiza-Ni-Shi protocol. PMID:28946633
Secure and Lightweight Cloud-Assisted Video Reporting Protocol over 5G-Enabled Vehicular Networks.
Nkenyereye, Lewis; Kwon, Joonho; Choi, Yoon-Ho
2017-09-23
In the vehicular networks, the real-time video reporting service is used to send the recorded videos in the vehicle to the cloud. However, when facilitating the real-time video reporting service in the vehicular networks, the usage of the fourth generation (4G) long term evolution (LTE) was proved to suffer from latency while the IEEE 802.11p standard does not offer sufficient scalability for a such congested environment. To overcome those drawbacks, the fifth-generation (5G)-enabled vehicular network is considered as a promising technology for empowering the real-time video reporting service. In this paper, we note that security and privacy related issues should also be carefully addressed to boost the early adoption of 5G-enabled vehicular networks. There exist a few research works for secure video reporting service in 5G-enabled vehicular networks. However, their usage is limited because of public key certificates and expensive pairing operations. Thus, we propose a secure and lightweight protocol for cloud-assisted video reporting service in 5G-enabled vehicular networks. Compared to the conventional public key certificates, the proposed protocol achieves entities' authorization through anonymous credential. Also, by using lightweight security primitives instead of expensive bilinear pairing operations, the proposed protocol minimizes the computational overhead. From the evaluation results, we show that the proposed protocol takes the smaller computation and communication time for the cryptographic primitives than that of the well-known Eiza-Ni-Shi protocol.
Cloud-Based Computational Tools for Earth Science Applications
NASA Astrophysics Data System (ADS)
Arendt, A. A.; Fatland, R.; Howe, B.
2015-12-01
Earth scientists are increasingly required to think across disciplines and utilize a wide range of datasets in order to solve complex environmental challenges. Although significant progress has been made in distributing data, researchers must still invest heavily in developing computational tools to accommodate their specific domain. Here we document our development of lightweight computational data systems aimed at enabling rapid data distribution, analytics and problem solving tools for Earth science applications. Our goal is for these systems to be easily deployable, scalable and flexible to accommodate new research directions. As an example we describe "Ice2Ocean", a software system aimed at predicting runoff from snow and ice in the Gulf of Alaska region. Our backend components include relational database software to handle tabular and vector datasets, Python tools (NumPy, pandas and xray) for rapid querying of gridded climate data, and an energy and mass balance hydrological simulation model (SnowModel). These components are hosted in a cloud environment for direct access across research teams, and can also be accessed via API web services using a REST interface. This API is a vital component of our system architecture, as it enables quick integration of our analytical tools across disciplines, and can be accessed by any existing data distribution centers. We will showcase several data integration and visualization examples to illustrate how our system has expanded our ability to conduct cross-disciplinary research.
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.
Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan
2013-06-27
Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing
2013-01-01
Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html. PMID:23802613
NASA Astrophysics Data System (ADS)
Watari, S.; Morikawa, Y.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Kato, H.; Shimojo, S.; Murata, K. T.
2010-12-01
In the Solar-Terrestrial Physics (STP) field, spatio-temporal resolution of computer simulations is getting higher and higher because of tremendous advancement of supercomputers. A more advanced technology is Grid Computing that integrates distributed computational resources to provide scalable computing resources. In the simulation research, it is effective that a researcher oneself designs his physical model, performs calculations with a supercomputer, and analyzes and visualizes for consideration by a familiar method. A supercomputer is far from an analysis and visualization environment. In general, a researcher analyzes and visualizes in the workstation (WS) managed at hand because the installation and the operation of software in the WS are easy. Therefore, it is necessary to copy the data from the supercomputer to WS manually. Time necessary for the data transfer through long delay network disturbs high-accuracy simulations actually. In terms of usefulness, integrating a supercomputer and an analysis and visualization environment seamlessly with a researcher's familiar method is important. NICT has been developing a cloud computing environment (NICT Space Weather Cloud). In the NICT Space Weather Cloud, disk servers are located near its supercomputer and WSs for data analysis and visualization. They are connected to JGN2plus that is high-speed network for research and development. Distributed virtual high-capacity storage is also constructed by Grid Datafarm (Gfarm v2). Huge-size data output from the supercomputer is transferred to the virtual storage through JGN2plus. A researcher can concentrate on the research by a familiar method without regard to distance between a supercomputer and an analysis and visualization environment. Now, total 16 disk servers are setup in NICT headquarters (at Koganei, Tokyo), JGN2plus NOC (at Otemachi, Tokyo), Okinawa Subtropical Environment Remote-Sensing Center, and Cybermedia Center, Osaka University. They are connected on JGN2plus, and they constitute 1PB (physical size) virtual storage by Gfarm v2. These disk servers are connected with supercomputers of NICT and Osaka University. A system that data output from the supercomputers are automatically transferred to the virtual storage had been built up. Transfer rate is about 50 GB/hrs by actual measurement. It is estimated that the performance is reasonable for a certain simulation and analysis for reconstruction of coronal magnetic field. This research is assumed an experiment of the system, and the verification of practicality is advanced at the same time. Herein we introduce an overview of the space weather cloud system so far we have developed. We also demonstrate several scientific results using the space weather cloud system. We also introduce several web applications of the cloud as a service of the space weather cloud, which is named as "e-SpaceWeather" (e-SW). The e-SW provides with a variety of space weather online services from many aspects.
Castedo, Luis
2017-01-01
Fog computing extends cloud computing to the edge of a network enabling new Internet of Things (IoT) applications and services, which may involve critical data that require privacy and security. In an IoT fog computing system, three elements can be distinguished: IoT nodes that collect data, the cloud, and interconnected IoT gateways that exchange messages with the IoT nodes and with the cloud. This article focuses on securing IoT gateways, which are assumed to be constrained in terms of computational resources, but that are able to offload some processing from the cloud and to reduce the latency in the responses to the IoT nodes. However, it is usually taken for granted that IoT gateways have direct access to the electrical grid, which is not always the case: in mission-critical applications like natural disaster relief or environmental monitoring, it is common to deploy IoT nodes and gateways in large areas where electricity comes from solar or wind energy that charge the batteries that power every device. In this article, how to secure IoT gateway communications while minimizing power consumption is analyzed. The throughput and power consumption of Rivest–Shamir–Adleman (RSA) and Elliptic Curve Cryptography (ECC) are considered, since they are really popular, but have not been thoroughly analyzed when applied to IoT scenarios. Moreover, the most widespread Transport Layer Security (TLS) cipher suites use RSA as the main public key-exchange algorithm, but the key sizes needed are not practical for most IoT devices and cannot be scaled to high security levels. In contrast, ECC represents a much lighter and scalable alternative. Thus, RSA and ECC are compared for equivalent security levels, and power consumption and data throughput are measured using a testbed of IoT gateways. The measurements obtained indicate that, in the specific fog computing scenario proposed, ECC is clearly a much better alternative than RSA, obtaining energy consumption reductions of up to 50% and a data throughput that doubles RSA in most scenarios. These conclusions are then corroborated by a frame temporal analysis of Ethernet packets. In addition, current data compression algorithms are evaluated, concluding that, when dealing with the small payloads related to IoT applications, they do not pay off in terms of real data throughput and power consumption. PMID:28850104
Suárez-Albela, Manuel; Fernández-Caramés, Tiago M; Fraga-Lamas, Paula; Castedo, Luis
2017-08-29
Fog computing extends cloud computing to the edge of a network enabling new Internet of Things (IoT) applications and services, which may involve critical data that require privacy and security. In an IoT fog computing system, three elements can be distinguished: IoT nodes that collect data, the cloud, and interconnected IoT gateways that exchange messages with the IoT nodes and with the cloud. This article focuses on securing IoT gateways, which are assumed to be constrained in terms of computational resources, but that are able to offload some processing from the cloud and to reduce the latency in the responses to the IoT nodes. However, it is usually taken for granted that IoT gateways have direct access to the electrical grid, which is not always the case: in mission-critical applications like natural disaster relief or environmental monitoring, it is common to deploy IoT nodes and gateways in large areas where electricity comes from solar or wind energy that charge the batteries that power every device. In this article, how to secure IoT gateway communications while minimizing power consumption is analyzed. The throughput and power consumption of Rivest-Shamir-Adleman (RSA) and Elliptic Curve Cryptography (ECC) are considered, since they are really popular, but have not been thoroughly analyzed when applied to IoT scenarios. Moreover, the most widespread Transport Layer Security (TLS) cipher suites use RSA as the main public key-exchange algorithm, but the key sizes needed are not practical for most IoT devices and cannot be scaled to high security levels. In contrast, ECC represents a much lighter and scalable alternative. Thus, RSA and ECC are compared for equivalent security levels, and power consumption and data throughput are measured using a testbed of IoT gateways. The measurements obtained indicate that, in the specific fog computing scenario proposed, ECC is clearly a much better alternative than RSA, obtaining energy consumption reductions of up to 50% and a data throughput that doubles RSA in most scenarios. These conclusions are then corroborated by a frame temporal analysis of Ethernet packets. In addition, current data compression algorithms are evaluated, concluding that, when dealing with the small payloads related to IoT applications, they do not pay off in terms of real data throughput and power consumption.
NASA Astrophysics Data System (ADS)
Zhang, Yunju; Chen, Zhongyi; Guo, Ming; Lin, Shunsheng; Yan, Yinyang
2018-01-01
With the large capacity of the power system, the development trend of the large unit and the high voltage, the scheduling operation is becoming more frequent and complicated, and the probability of operation error increases. This paper aims at the problem of the lack of anti-error function, single scheduling function and low working efficiency for technical support system in regional regulation and integration, the integrated construction of the error prevention of the integrated architecture of the system of dispatching anti - error of dispatching anti - error of power network based on cloud computing has been proposed. Integrated system of error prevention of Energy Management System, EMS, and Operation Management System, OMS have been constructed either. The system architecture has good scalability and adaptability, which can improve the computational efficiency, reduce the cost of system operation and maintenance, enhance the ability of regional regulation and anti-error checking with broad development prospects.
NASA Astrophysics Data System (ADS)
Furht, Borko
In the introductory chapter we define the concept of cloud computing and cloud services, and we introduce layers and types of cloud computing. We discuss the differences between cloud computing and cloud services. New technologies that enabled cloud computing are presented next. We also discuss cloud computing features, standards, and security issues. We introduce the key cloud computing platforms, their vendors, and their offerings. We discuss cloud computing challenges and the future of cloud computing.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.
Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli
2016-09-10
The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
NASA Astrophysics Data System (ADS)
Hua, H.; Owen, S. E.; Yun, S. H.; Agram, P. S.; Manipon, G.; Starch, M.; Sacco, G. F.; Bue, B. D.; Dang, L. B.; Linick, J. P.; Malarout, N.; Rosen, P. A.; Fielding, E. J.; Lundgren, P.; Moore, A. W.; Liu, Z.; Farr, T.; Webb, F.; Simons, M.; Gurrola, E. M.
2017-12-01
With the increased availability of open SAR data (e.g. Sentinel-1 A/B), new challenges are being faced with processing and analyzing the voluminous SAR datasets to make geodetic measurements. Upcoming SAR missions such as NISAR are expected to generate close to 100TB per day. The Advanced Rapid Imaging and Analysis (ARIA) project can now generate geocoded unwrapped phase and coherence products from Sentinel-1 TOPS mode data in an automated fashion, using the ISCE software. This capability is currently being exercised on various study sites across the United States and around the globe, including Hawaii, Central California, Iceland and South America. The automated and large-scale SAR data processing and analysis capabilities use cloud computing techniques to speed the computations and provide scalable processing power and storage. Aspects such as how to processing these voluminous SLCs and interferograms at global scales, keeping up with the large daily SAR data volumes, and how to handle the voluminous data rates are being explored. Scene-partitioning approaches in the processing pipeline help in handling global-scale processing up to unwrapped interferograms with stitching done at a late stage. We have built an advanced science data system with rapid search functions to enable access to the derived data products. Rapid image processing of Sentinel-1 data to interferograms and time series is already being applied to natural hazards including earthquakes, floods, volcanic eruptions, and land subsidence due to fluid withdrawal. We will present the status of the ARIA science data system for generating science-ready data products and challenges that arise from being able to process SAR datasets to derived time series data products at large scales. For example, how do we perform large-scale data quality screening on interferograms? What approaches can be used to minimize compute, storage, and data movement costs for time series analysis in the cloud? We will also present some of our findings from applying machine learning and data analytics on the processed SAR data streams. We will also present lessons learned on how to ease the SAR community onto interfacing with these cloud-based SAR science data systems.
Modular Universal Scalable Ion-trap Quantum Computer
2016-06-02
SECURITY CLASSIFICATION OF: The main goal of the original MUSIQC proposal was to construct and demonstrate a modular and universally- expandable ion...Distribution Unlimited UU UU UU UU 02-06-2016 1-Aug-2010 31-Jan-2016 Final Report: Modular Universal Scalable Ion-trap Quantum Computer The views...P.O. Box 12211 Research Triangle Park, NC 27709-2211 Ion trap quantum computation, scalable modular architectures REPORT DOCUMENTATION PAGE 11
Advancing global marine biogeography research with open-source GIS software and cloud-computing
Fujioka, Ei; Vanden Berghe, Edward; Donnelly, Ben; Castillo, Julio; Cleary, Jesse; Holmes, Chris; McKnight, Sean; Halpin, patrick
2012-01-01
Across many scientific domains, the ability to aggregate disparate datasets enables more meaningful global analyses. Within marine biology, the Census of Marine Life served as the catalyst for such a global data aggregation effort. Under the Census framework, the Ocean Biogeographic Information System was established to coordinate an unprecedented aggregation of global marine biogeography data. The OBIS data system now contains 31.3 million observations, freely accessible through a geospatial portal. The challenges of storing, querying, disseminating, and mapping a global data collection of this complexity and magnitude are significant. In the face of declining performance and expanding feature requests, a redevelopment of the OBIS data system was undertaken. Following an Open Source philosophy, the OBIS technology stack was rebuilt using PostgreSQL, PostGIS, GeoServer and OpenLayers. This approach has markedly improved the performance and online user experience while maintaining a standards-compliant and interoperable framework. Due to the distributed nature of the project and increasing needs for storage, scalability and deployment flexibility, the entire hardware and software stack was built on a Cloud Computing environment. The flexibility of the platform, combined with the power of the application stack, enabled rapid re-development of the OBIS infrastructure, and ensured complete standards-compliance.
Developing a Hadoop-based Middleware for Handling Multi-dimensional NetCDF
NASA Astrophysics Data System (ADS)
Li, Z.; Yang, C. P.; Schnase, J. L.; Duffy, D.; Lee, T. J.
2014-12-01
Climate observations and model simulations are collecting and generating vast amounts of climate data, and these data are ever-increasing and being accumulated in a rapid speed. Effectively managing and analyzing these data are essential for climate change studies. Hadoop, a distributed storage and processing framework for large data sets, has attracted increasing attentions in dealing with the Big Data challenge. The maturity of Infrastructure as a Service (IaaS) of cloud computing further accelerates the adoption of Hadoop in solving Big Data problems. However, Hadoop is designed to process unstructured data such as texts, documents and web pages, and cannot effectively handle the scientific data format such as array-based NetCDF files and other binary data format. In this paper, we propose to build a Hadoop-based middleware for transparently handling big NetCDF data by 1) designing a distributed climate data storage mechanism based on POSIX-enabled parallel file system to enable parallel big data processing with MapReduce, as well as support data access by other systems; 2) modifying the Hadoop framework to transparently processing NetCDF data in parallel without sequencing or converting the data into other file formats, or loading them to HDFS; and 3) seamlessly integrating Hadoop, cloud computing and climate data in a highly scalable and fault-tolerance framework.
Progress in fast, accurate multi-scale climate simulations
Collins, W. D.; Johansen, H.; Evans, K. J.; ...
2015-06-01
We present a survey of physical and computational techniques that have the potential to contribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth with these computational improvements include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enablingmore » improved accuracy and fidelity in simulation of dynamics and allowing more complete representations of climate features at the global scale. At the same time, partnerships with computer science teams have focused on taking advantage of evolving computer architectures such as many-core processors and GPUs. As a result, approaches which were previously considered prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas is poised to transform climate modeling in the coming decades.« less
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
NASA Astrophysics Data System (ADS)
Swetnam, T. L.; Pelletier, J. D.; Merchant, N.; Callahan, N.; Lyons, E.
2015-12-01
Earth science is making rapid advances through effective utilization of large-scale data repositories such as aerial LiDAR and access to NSF-funded cyberinfrastructures (e.g. the OpenTopography.org data portal, iPlant Collaborative, and XSEDE). Scaling analysis tasks that are traditionally developed using desktops, laptops or computing clusters to effectively leverage national and regional scale cyberinfrastructure pose unique challenges and barriers to adoption. To address some of these challenges in Fall 2014 an 'Applied Cyberinfrastructure Concepts' a project-based learning course (ISTA 420/520) at the University of Arizona focused on developing scalable models of 'Effective Energy and Mass Transfer' (EEMT, MJ m-2 yr-1) for use by the NSF Critical Zone Observatories (CZO) project. EEMT is a quantitative measure of the flux of available energy to the critical zone, and its computation involves inputs that have broad applicability (e.g. solar insolation). The course comprised of 25 students with varying level of computational skills and with no prior domain background in the geosciences, collaborated with domain experts to develop the scalable workflow. The original workflow relying on open-source QGIS platform on a laptop was scaled to effectively utilize cloud environments (Openstack), UA Campus HPC systems, iRODS, and other XSEDE and OSG resources. The project utilizes public data, e.g. DEMs produced by OpenTopography.org and climate data from Daymet, which are processed using GDAL, GRASS and SAGA and the Makeflow and Work-queue task management software packages. Students were placed into collaborative groups to develop the separate aspects of the project. They were allowed to change teams, alter workflows, and design and develop novel code. The students were able to identify all necessary dependencies, recompile source onto the target execution platforms, and demonstrate a functional workflow, which was further improved upon by one of the group leaders over Spring 2015. All of the code, documentation and workflow description are currently available on GitHub and a public data portal is in development. We present a case study of how students reacted to the challenge of a real science problem, their interactions with end-users, what went right, and what could be done better in the future.
Integrating Data Distribution and Data Assimilation Between the OOI CI and the NOAA DIF
NASA Astrophysics Data System (ADS)
Meisinger, M.; Arrott, M.; Clemesha, A.; Farcas, C.; Farcas, E.; Im, T.; Schofield, O.; Krueger, I.; Klacansky, I.; Orcutt, J.; Peach, C.; Chave, A.; Raymer, D.; Vernon, F.
2008-12-01
The Ocean Observatories Initiative (OOI) is an NSF funded program to establish the ocean observing infrastructure of the 21st century benefiting research and education. It is currently approaching final design and promises to deliver cyber and physical observatory infrastructure components as well as substantial core instrumentation to study environmental processes of the ocean at various scales, from coastal shelf-slope exchange processes to the deep ocean. The OOI's data distribution network lies at the heart of its cyber- infrastructure, which enables a multitude of science and education applications, ranging from data analysis, to processing, visualization and ontology supported query and mediation. In addition, it fundamentally supports a class of applications exploiting the knowledge gained from analyzing observational data for objective-driven ocean observing applications, such as automatically triggered response to episodic environmental events and interactive instrument tasking and control. The U.S. Department of Commerce through NOAA operates the Integrated Ocean Observing System (IOOS) providing continuous data in various formats, rates and scales on open oceans and coastal waters to scientists, managers, businesses, governments, and the public to support research and inform decision-making. The NOAA IOOS program initiated development of the Data Integration Framework (DIF) to improve management and delivery of an initial subset of ocean observations with the expectation of achieving improvements in a select set of NOAA's decision-support tools. Both OOI and NOAA through DIF collaborate on an effort to integrate the data distribution, access and analysis needs of both programs. We present details and early findings from this collaboration; one part of it is the development of a demonstrator combining web-based user access to oceanographic data through ERDDAP, efficient science data distribution, and scalable, self-healing deployment in a cloud computing environment. ERDDAP is a web-based front-end application integrating oceanographic data sources of various formats, for instance CDF data files as aggregated through NcML or presented using a THREDDS server. The OOI-designed data distribution network provides global traffic management and computational load balancing for observatory resources; it makes use of the OpenDAP Data Access Protocol (DAP) for efficient canonical science data distribution over the network. A cloud computing strategy is the basis for scalable, self-healing organization of an observatory's computing and storage resources, independent of the physical location and technical implementation of these resources.
NASA Astrophysics Data System (ADS)
Casu, F.; de Luca, C.; Lanari, R.; Manunta, M.; Zinno, I.
2016-12-01
A methodology for computing surface deformation time series and mean velocity maps of large areas is presented. Our approach relies on the availability of a multi-temporal set of Synthetic Aperture Radar (SAR) data collected from ascending and descending orbits over an area of interest, and also permits to estimate the vertical and horizontal (East-West) displacement components of the Earth's surface. The adopted methodology is based on an advanced Cloud Computing implementation of the Differential SAR Interferometry (DInSAR) Parallel Small Baseline Subset (P-SBAS) processing chain which allows the unsupervised processing of large SAR data volumes, from the raw data (level-0) imagery up to the generation of DInSAR time series and maps. The presented solution, which is highly scalable, has been tested on the ascending and descending ENVISAT SAR archives, which have been acquired over a large area of Southern California (US) that extends for about 90.000 km2. Such an input dataset has been processed in parallel by exploiting 280 computing nodes of the Amazon Web Services Cloud environment. Moreover, to produce the final mean deformation velocity maps of the vertical and East-West displacement components of the whole investigated area, we took also advantage of the information available from external GPS measurements that permit to account for possible regional trends not easily detectable by DInSAR and to refer the P-SBAS measurements to an external geodetic datum. The presented results clearly demonstrate the effectiveness of the proposed approach that paves the way to the extensive use of the available ERS and ENVISAT SAR data archives. Furthermore, the proposed methodology can be particularly suitable to deal with the very huge data flow provided by the Sentinel-1 constellation, thus permitting to extend the DInSAR analyses at a nearly global scale. This work is partially supported by: the DPC-CNR agreement, the EPOS-IP project and the ESA GEP project.
High Resolution Nature Runs and the Big Data Challenge
NASA Technical Reports Server (NTRS)
Webster, W. Phillip; Duffy, Daniel Q.
2015-01-01
NASA's Global Modeling and Assimilation Office at Goddard Space Flight Center is undertaking a series of very computationally intensive Nature Runs and a downscaled reanalysis. The nature runs use the GEOS-5 as an Atmospheric General Circulation Model (AGCM) while the reanalysis uses the GEOS-5 in Data Assimilation mode. This paper will present computational challenges from three runs, two of which are AGCM and one is downscaled reanalysis using the full DAS. The nature runs will be completed at two surface grid resolutions, 7 and 3 kilometers and 72 vertical levels. The 7 km run spanned 2 years (2005-2006) and produced 4 PB of data while the 3 km run will span one year and generate 4 BP of data. The downscaled reanalysis (MERRA-II Modern-Era Reanalysis for Research and Applications) will cover 15 years and generate 1 PB of data. Our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS), a specialization of the concept of business process-as-a-service that is an evolving extension of IaaS, PaaS, and SaaS enabled by cloud computing. In this presentation, we will describe two projects that demonstrate this shift. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS. MERRA/AS enables MapReduce analytics over MERRA reanalysis data collection by bringing together the high-performance computing, scalable data management, and a domain-specific climate data services API. NASA's High-Performance Science Cloud (HPSC) is an example of the type of compute-storage fabric required to support CAaaS. The HPSC comprises a high speed Infinib and network, high performance file systems and object storage, and a virtual system environments specific for data intensive, science applications. These technologies are providing a new tier in the data and analytic services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. In our experience, CAaaS lowers the barriers and risk to organizational change, fosters innovation and experimentation, and provides the agility required to meet our customers' increasing and changing needs
Visibiome: an efficient microbiome search engine based on a scalable, distributed architecture.
Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas
2017-07-24
Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.
A cloud-based information repository for bridge monitoring applications
NASA Astrophysics Data System (ADS)
Jeong, Seongwoon; Zhang, Yilan; Hou, Rui; Lynch, Jerome P.; Sohn, Hoon; Law, Kincho H.
2016-04-01
This paper describes an information repository to support bridge monitoring applications on a cloud computing platform. Bridge monitoring, with instrumentation of sensors in particular, collects significant amount of data. In addition to sensor data, a wide variety of information such as bridge geometry, analysis model and sensor description need to be stored. Data management plays an important role to facilitate data utilization and data sharing. While bridge information modeling (BrIM) technologies and standards have been proposed and they provide a means to enable integration and facilitate interoperability, current BrIM standards support mostly the information about bridge geometry. In this study, we extend the BrIM schema to include analysis models and sensor information. Specifically, using the OpenBrIM standards as the base, we draw on CSI Bridge, a commercial software widely used for bridge analysis and design, and SensorML, a standard schema for sensor definition, to define the data entities necessary for bridge monitoring applications. NoSQL database systems are employed for data repository. Cloud service infrastructure is deployed to enhance scalability, flexibility and accessibility of the data management system. The data model and systems are tested using the bridge model and the sensor data collected at the Telegraph Road Bridge, Monroe, Michigan.
Seqcrawler: biological data indexing and browsing platform.
Sallou, Olivier; Bretaudeau, Anthony; Roult, Aurelien
2012-07-24
Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search their data, there is a lack of free and open source solutions to browse one's own set of data with a flexible query system and able to scale from a single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search their meta-data but also to build a larger information system with custom subsets of data. The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (pieces of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance users' result analysis. Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure.
Hinkson, Izumi V.; Davidsen, Tanja M.; Klemm, Juli D.; Chandramouliswaran, Ishwar; Kerlavage, Anthony R.; Kibbe, Warren A.
2017-01-01
Advancements in next-generation sequencing and other -omics technologies are accelerating the detailed molecular characterization of individual patient tumors, and driving the evolution of precision medicine. Cancer is no longer considered a single disease, but rather, a diverse array of diseases wherein each patient has a unique collection of germline variants and somatic mutations. Molecular profiling of patient-derived samples has led to a data explosion that could help us understand the contributions of environment and germline to risk, therapeutic response, and outcome. To maximize the value of these data, an interdisciplinary approach is paramount. The National Cancer Institute (NCI) has initiated multiple projects to characterize tumor samples using multi-omic approaches. These projects harness the expertise of clinicians, biologists, computer scientists, and software engineers to investigate cancer biology and therapeutic response in multidisciplinary teams. Petabytes of cancer genomic, transcriptomic, epigenomic, proteomic, and imaging data have been generated by these projects. To address the data analysis challenges associated with these large datasets, the NCI has sponsored the development of the Genomic Data Commons (GDC) and three Cloud Resources. The GDC ensures data and metadata quality, ingests and harmonizes genomic data, and securely redistributes the data. During its pilot phase, the Cloud Resources tested multiple cloud-based approaches for enhancing data access, collaboration, computational scalability, resource democratization, and reproducibility. These NCI-led efforts are continuously being refined to better support open data practices and precision oncology, and to serve as building blocks of the NCI Cancer Research Data Commons. PMID:28983483
Toward Interactive Scenario Analysis and Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gayle, Thomas R.; Summers, Kenneth Lee; Jungels, John
2015-01-01
As Modeling and Simulation (M&S) tools have matured, their applicability and importance have increased across many national security challenges. In particular, they provide a way to test how something may behave without the need to do real world testing. However, current and future changes across several factors including capabilities, policy, and funding are driving a need for rapid response or evaluation in ways that many M&S tools cannot address. Issues around large data, computational requirements, delivery mechanisms, and analyst involvement already exist and pose significant challenges. Furthermore, rising expectations, rising input complexity, and increasing depth of analysis will only increasemore » the difficulty of these challenges. In this study we examine whether innovations in M&S software coupled with advances in ''cloud'' computing and ''big-data'' methodologies can overcome many of these challenges. In particular, we propose a simple, horizontally-scalable distributed computing environment that could provide the foundation (i.e. ''cloud'') for next-generation M&S-based applications based on the notion of ''parallel multi-simulation''. In our context, the goal of parallel multi- simulation is to consider as many simultaneous paths of execution as possible. Therefore, with sufficient resources, the complexity is dominated by the cost of single scenario runs as opposed to the number of runs required. We show the feasibility of this architecture through a stable prototype implementation coupled with the Umbra Simulation Framework [6]. Finally, we highlight the utility through multiple novel analysis tools and by showing the performance improvement compared to existing tools.« less
Ocean Sciences meets Big Data Analytics
NASA Astrophysics Data System (ADS)
Hurwitz, B. L.; Choi, I.; Hartman, J.
2016-02-01
Hundreds of researchers worldwide have joined forces in the Tara Oceans Expedition to create an unprecedented planetary-scale dataset comprised of state-of-the-art next generation sequencing, microscopy, and physical/chemical metadata to explore ocean biodiversity. This summer the complete collection of data from the 2009-2013 Tara voyage was released. Yet, despite herculean efforts by the Tara Oceans Consortium to make raw data and computationally derived assemblies and gene catalogs available, most researchers are stymied by the sheer volume of the data. Specifically, the most tantalizing research questions lie in understanding the unifying principles that guide the distribution of organisms across the sea and affect climate and ecosystem function. To use the data in this capacity researchers must download, integrate, and analyze more than 7.2 trillion bases of metagenomic data and associated metadata from viruses, bacteria, archaea and small eukaryotes at their own data centers ( 9 TB of raw data). Accessing large-scale data sets in this way impedes scientists' from replicating and building on prior work. To this end, we are developing a data platform called the Ocean Cloud Commons (OCC) as part of the iMicrobe project. The OCC is built using an algorithm we developed to pre-compute massive comparative metagenomic analyses in a Hadoop big data framework. By maintaining data in a cloud commons researchers have access to scalable computation and real-time analytics to promote the integrated and broad use of planetary-scale datasets, such as Tara.
Scalable Quantum Networks for Distributed Computing and Sensing
2016-04-01
probabilistic measurement , so we developed quantum memories and guided-wave implementations of same, demonstrating controlled delay of a heralded single...Second, fundamental scalability requires a method to synchronize protocols based on quantum measurements , which are inherently probabilistic. To meet...AFRL-AFOSR-UK-TR-2016-0007 Scalable Quantum Networks for Distributed Computing and Sensing Ian Walmsley THE UNIVERSITY OF OXFORD Final Report 04/01
Opportunity and Challenges for Migrating Big Data Analytics in Cloud
NASA Astrophysics Data System (ADS)
Amitkumar Manekar, S.; Pradeepini, G., Dr.
2017-08-01
Big Data Analytics is a big word now days. As per demanding and more scalable process data generation capabilities, data acquisition and storage become a crucial issue. Cloud storage is a majorly usable platform; the technology will become crucial to executives handling data powered by analytics. Now a day’s trend towards “big data-as-a-service” is talked everywhere. On one hand, cloud-based big data analytics exactly tackle in progress issues of scale, speed, and cost. But researchers working to solve security and other real-time problem of big data migration on cloud based platform. This article specially focused on finding possible ways to migrate big data to cloud. Technology which support coherent data migration and possibility of doing big data analytics on cloud platform is demanding in natute for new era of growth. This article also gives information about available technology and techniques for migration of big data in cloud.
NASA Astrophysics Data System (ADS)
Kershaw, Philip; Lawrence, Bryan; Gomez-Dans, Jose; Holt, John
2015-04-01
We explore how the popular IPython Notebook computing system can be hosted on a cloud platform to provide a flexible virtual research hosting environment for Earth Observation data processing and analysis and how this approach can be expanded more broadly into a generic SaaS (Software as a Service) offering for the environmental sciences. OPTIRAD (OPTImisation environment for joint retrieval of multi-sensor RADiances) is a project funded by the European Space Agency to develop a collaborative research environment for Data Assimilation of Earth Observation products for land surface applications. Data Assimilation provides a powerful means to combine multiple sources of data and derive new products for this application domain. To be most effective, it requires close collaboration between specialists in this field, land surface modellers and end users of data generated. A goal of OPTIRAD then is to develop a collaborative research environment to engender shared working. Another significant challenge is that of data volume and complexity. Study of land surface requires high spatial and temporal resolutions, a relatively large number of variables and the application of algorithms which are computationally expensive. These problems can be addressed with the application of parallel processing techniques on specialist compute clusters. However, scientific users are often deterred by the time investment required to port their codes to these environments. Even when successfully achieved, it may be difficult to readily change or update. This runs counter to the scientific process of continuous experimentation, analysis and validation. The IPython Notebook provides users with a web-based interface to multiple interactive shells for the Python programming language. Code, documentation and graphical content can be saved and shared making it directly applicable to OPTIRAD's requirements for a shared working environment. Given the web interface it can be readily made into a hosted service with Wakari and Microsoft Azure being notable examples. Cloud-hosting of the Notebook allows the same familiar Python interface to be retained but backed by Cloud Computing attributes of scalability, elasticity and resource pooling. This combination makes it a powerful solution to address the needs of long-tail science users of Big Data: an intuitive interactive interface with which to access powerful compute resources. IPython Notebook can be hosted as a single user desktop environment but the recent development by the IPython community of JupyterHub enables it to be run as a multi-user hosting environment. In addition, IPython.parallel allows the exposition of parallel compute infrastructure through a Python interface. Applying these technologies in combination, a collaborative research environment has been developed for OPTIRAD on the UK JASMIN/CEMS facility's private cloud (http://jasmin.ac.uk). Based on this experience, a generic virtualised solution is under development suitable for use by the wider environmental science community - on both JASMIN and portable to third party cloud platforms.
Efficient feature extraction from wide-area motion imagery by MapReduce in Hadoop
NASA Astrophysics Data System (ADS)
Cheng, Erkang; Ma, Liya; Blaisse, Adam; Blasch, Erik; Sheaff, Carolyn; Chen, Genshe; Wu, Jie; Ling, Haibin
2014-06-01
Wide-Area Motion Imagery (WAMI) feature extraction is important for applications such as target tracking, traffic management and accident discovery. With the increasing amount of WAMI collections and feature extraction from the data, a scalable framework is needed to handle the large amount of information. Cloud computing is one of the approaches recently applied in large scale or big data. In this paper, MapReduce in Hadoop is investigated for large scale feature extraction tasks for WAMI. Specifically, a large dataset of WAMI images is divided into several splits. Each split has a small subset of WAMI images. The feature extractions of WAMI images in each split are distributed to slave nodes in the Hadoop system. Feature extraction of each image is performed individually in the assigned slave node. Finally, the feature extraction results are sent to the Hadoop File System (HDFS) to aggregate the feature information over the collected imagery. Experiments of feature extraction with and without MapReduce are conducted to illustrate the effectiveness of our proposed Cloud-Enabled WAMI Exploitation (CAWE) approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Gail-Joon
The project seeks an innovative framework to enable users to access and selectively share resources in distributed environments, enhancing the scalability of information sharing. We have investigated secure sharing & assurance approaches for ad-hoc collaboration, focused on Grids, Clouds, and ad-hoc network environments.
Multi-service small-cell cloud wired/wireless access network based on tunable optical frequency comb
NASA Astrophysics Data System (ADS)
Xiang, Yu; Zhou, Kun; Yang, Liu; Pan, Lei; Liao, Zhen-wan; Zhang, Qiang
2015-11-01
In this paper, we demonstrate a novel multi-service wired/wireless integrated access architecture of cloud radio access network (C-RAN) based on radio-over-fiber passive optical network (RoF-PON) system, which utilizes scalable multiple- frequency millimeter-wave (MF-MMW) generation based on tunable optical frequency comb (TOFC). In the baseband unit (BBU) pool, the generated optical comb lines are modulated into wired, RoF and WiFi/WiMAX signals, respectively. The multi-frequency RoF signals are generated by beating the optical comb line pairs in the small cell. The WiFi/WiMAX signals are demodulated after passing through the band pass filter (BPF) and band stop filter (BSF), respectively, whereas the wired signal can be received directly. The feasibility and scalability of the proposed multi-service wired/wireless integrated C-RAN are confirmed by the simulations.
NASA Astrophysics Data System (ADS)
Xu, Boyi; Xu, Li Da; Fei, Xiang; Jiang, Lihong; Cai, Hongming; Wang, Shuai
2017-08-01
Facing the rapidly changing business environments, implementation of flexible business process is crucial, but difficult especially in data-intensive application areas. This study aims to provide scalable and easily accessible information resources to leverage business process management. In this article, with a resource-oriented approach, enterprise data resources are represented as data-centric Web services, grouped on-demand of business requirement and configured dynamically to adapt to changing business processes. First, a configurable architecture CIRPA involving information resource pool is proposed to act as a scalable and dynamic platform to virtualise enterprise information resources as data-centric Web services. By exposing data-centric resources as REST services in larger granularities, tenant-isolated information resources could be accessed in business process execution. Second, dynamic information resource pool is designed to fulfil configurable and on-demand data accessing in business process execution. CIRPA also isolates transaction data from business process while supporting diverse business processes composition. Finally, a case study of using our method in logistics application shows that CIRPA provides an enhanced performance both in static service encapsulation and dynamic service execution in cloud computing environment.
Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin
2015-10-19
The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.
2010-04-29
Cloud Computing The answer, my friend, is blowing in the wind. The answer is blowing in the wind. 1Bingue ‐ Cook Cloud Computing STSC 2010... Cloud Computing STSC 2010 Objectives • Define the cloud • Risks of cloud computing f l d i• Essence o c ou comput ng • Deployed clouds in DoD 3Bingue...Cook Cloud Computing STSC 2010 Definitions of Cloud Computing Cloud computing is a model for enabling b d d ku
NASA Astrophysics Data System (ADS)
Moody, D.; Brumby, S. P.; Chartrand, R.; Franco, E.; Keisler, R.; Kelton, T.; Kontgis, C.; Mathis, M.; Raleigh, D.; Rudelis, X.; Skillman, S.; Warren, M. S.; Longbotham, N.
2016-12-01
The recent computing performance revolution has driven improvements in sensor, communication, and storage technology. Historical, multi-decadal remote sensing datasets at the petabyte scale are now available in commercial clouds, with new satellite constellations generating petabytes per year of high-resolution imagery with daily global coverage. Cloud computing and storage, combined with recent advances in machine learning and open software, are enabling understanding of the world at an unprecedented scale and detail. We have assembled all available satellite imagery from the USGS Landsat, NASA MODIS, and ESA Sentinel programs, as well as commercial PlanetScope and RapidEye imagery, and have analyzed over 2.8 quadrillion multispectral pixels. We leveraged the commercial cloud to generate a tiled, spatio-temporal mosaic of the Earth for fast iteration and development of new algorithms combining analysis techniques from remote sensing, machine learning, and scalable compute infrastructure. Our data platform enables processing at petabytes per day rates using multi-source data to produce calibrated, georeferenced imagery stacks at desired points in time and space that can be used for pixel level or global scale analysis. We demonstrate our data platform capability by using the European Space Agency's (ESA) published 2006 and 2009 GlobCover 20+ category label maps to train and test a Land Cover Land Use (LCLU) classifier, and generate current self-consistent LCLU maps in Brazil. We train a standard classifier on 2006 GlobCover categories using temporal imagery stacks, and we validate our results on co-registered 2009 Globcover LCLU maps and 2009 imagery. We then extend the derived LCLU model to current imagery stacks to generate an updated, in-season label map. Changes in LCLU labels can now be seamlessly monitored for a given location across the years in order to track, for example, cropland expansion, forest growth, and urban developments. An example of change monitoring is illustrated in the included figure showing rainfed cropland change in the Mato Grosso region of Brazil between 2006 and 2009.
Computer-Aided Discovery Tools for Volcano Deformation Studies with InSAR and GPS
NASA Astrophysics Data System (ADS)
Pankratius, V.; Pilewskie, J.; Rude, C. M.; Li, J. D.; Gowanlock, M.; Bechor, N.; Herring, T.; Wauthier, C.
2016-12-01
We present a Computer-Aided Discovery approach that facilitates the cloud-scalable fusion of different data sources, such as GPS time series and Interferometric Synthetic Aperture Radar (InSAR), for the purpose of identifying the expansion centers and deformation styles of volcanoes. The tools currently developed at MIT allow the definition of alternatives for data processing pipelines that use various analysis algorithms. The Computer-Aided Discovery system automatically generates algorithmic and parameter variants to help researchers explore multidimensional data processing search spaces efficiently. We present first application examples of this technique using GPS data on volcanoes on the Aleutian Islands and work in progress on combined GPS and InSAR data in Hawaii. In the model search context, we also illustrate work in progress combining time series Principal Component Analysis with InSAR augmentation to constrain the space of possible model explanations on current empirical data sets and achieve a better identification of deformation patterns. This work is supported by NASA AIST-NNX15AG84G and NSF ACI-1442997 (PI: V. Pankratius).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, William D; Johansen, Hans; Evans, Katherine J
We present a survey of physical and computational techniques that have the potential to con- tribute to the next generation of high-fidelity, multi-scale climate simulations. Examples of the climate science problems that can be investigated with more depth include the capture of remote forcings of localized hydrological extreme events, an accurate representation of cloud features over a range of spatial and temporal scales, and parallel, large ensembles of simulations to more effectively explore model sensitivities and uncertainties. Numerical techniques, such as adaptive mesh refinement, implicit time integration, and separate treatment of fast physical time scales are enabling improved accuracy andmore » fidelity in simulation of dynamics and allow more complete representations of climate features at the global scale. At the same time, part- nerships with computer science teams have focused on taking advantage of evolving computer architectures, such as many-core processors and GPUs, so that these approaches which were previously considered prohibitively costly have become both more efficient and scalable. In combination, progress in these three critical areas is poised to transform climate modeling in the coming decades.« less
An Overview of Cloud Computing in Distributed Systems
NASA Astrophysics Data System (ADS)
Divakarla, Usha; Kumari, Geetha
2010-11-01
Cloud computing is the emerging trend in the field of distributed computing. Cloud computing evolved from grid computing and distributed computing. Cloud plays an important role in huge organizations in maintaining huge data with limited resources. Cloud also helps in resource sharing through some specific virtual machines provided by the cloud service provider. This paper gives an overview of the cloud organization and some of the basic security issues pertaining to the cloud.
High-Speed On-Board Data Processing for Science Instruments
NASA Technical Reports Server (NTRS)
Beyon, Jeffrey Y.; Ng, Tak-Kwong; Lin, Bing; Hu, Yongxiang; Harrison, Wallace
2014-01-01
A new development of on-board data processing platform has been in progress at NASA Langley Research Center since April, 2012, and the overall review of such work is presented in this paper. The project is called High-Speed On-Board Data Processing for Science Instruments (HOPS) and focuses on a high-speed scalable data processing platform for three particular National Research Council's Decadal Survey missions such as Active Sensing of CO2 Emissions over Nights, Days, and Seasons (ASCENDS), Aerosol-Cloud-Ecosystems (ACE), and Doppler Aerosol Wind Lidar (DAWN) 3-D Winds. HOPS utilizes advanced general purpose computing with Field Programmable Gate Array (FPGA) based algorithm implementation techniques. The significance of HOPS is to enable high speed on-board data processing for current and future science missions with its reconfigurable and scalable data processing platform. A single HOPS processing board is expected to provide approximately 66 times faster data processing speed for ASCENDS, more than 70% reduction in both power and weight, and about two orders of cost reduction compared to the state-of-the-art (SOA) on-board data processing system. Such benchmark predictions are based on the data when HOPS was originally proposed in August, 2011. The details of these improvement measures are also presented. The two facets of HOPS development are identifying the most computationally intensive algorithm segments of each mission and implementing them in a FPGA-based data processing board. A general introduction of such facets is also the purpose of this paper.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slattery, Stuart R.
In this study we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residual method using the common-refinement scheme and two mesh-free algorithms leveraging compactly supported radial basis functions: one using a spline interpolation and one using a moving least square reconstruction. Through the comparison we assess both the conservation and accuracy of the data transfer obtained from each of the methods. We do so for a varying set of geometries with and without curvature and sharp features and for functions with and without smoothnessmore » and with varying gradients. Our results show that the mesh-based and mesh-free algorithms are complementary with cases where each was demonstrated to perform better than the other. We then focus on the mesh-free methods by developing a set of algorithms to parallelize them based on sparse linear algebra techniques. This includes a discussion of fast parallel radius searching in point clouds and restructuring the interpolation algorithms to leverage data structures and linear algebra services designed for large distributed computing environments. The scalability of our new algorithms is demonstrated on a leadership class computing facility using a set of basic scaling studies. Finally, these scaling studies show that for problems with reasonable load balance, our new algorithms for both spline interpolation and moving least square reconstruction demonstrate both strong and weak scalability using more than 100,000 MPI processes with billions of degrees of freedom in the data transfer operation.« less
Analysis on the security of cloud computing
NASA Astrophysics Data System (ADS)
He, Zhonglin; He, Yuhua
2011-02-01
Cloud computing is a new technology, which is the fusion of computer technology and Internet development. It will lead the revolution of IT and information field. However, in cloud computing data and application software is stored at large data centers, and the management of data and service is not completely trustable, resulting in safety problems, which is the difficult point to improve the quality of cloud service. This paper briefly introduces the concept of cloud computing. Considering the characteristics of cloud computing, it constructs the security architecture of cloud computing. At the same time, with an eye toward the security threats cloud computing faces, several corresponding strategies are provided from the aspect of cloud computing users and service providers.
Modeling Cardiac Electrophysiology at the Organ Level in the Peta FLOPS Computing Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitchell, Lawrence; Bishop, Martin; Hoetzl, Elena
2010-09-30
Despite a steep increase in available compute power, in-silico experimentation with highly detailed models of the heart remains to be challenging due to the high computational cost involved. It is hoped that next generation high performance computing (HPC) resources lead to significant reductions in execution times to leverage a new class of in-silico applications. However, performance gains with these new platforms can only be achieved by engaging a much larger number of compute cores, necessitating strongly scalable numerical techniques. So far strong scalability has been demonstrated only for a moderate number of cores, orders of magnitude below the range requiredmore » to achieve the desired performance boost.In this study, strong scalability of currently used techniques to solve the bidomain equations is investigated. Benchmark results suggest that scalability is limited to 512-4096 cores within the range of relevant problem sizes even when systems are carefully load-balanced and advanced IO strategies are employed.« less
Bussery, Justin; Denis, Leslie-Alexandre; Guillon, Benjamin; Liu, Pengfeï; Marchetti, Gino; Rahal, Ghita
2018-04-01
We describe the genesis, design and evolution of a computing platform designed and built to improve the success rate of biomedical translational research. The eTRIKS project platform was developed with the aim of building a platform that can securely host heterogeneous types of data and provide an optimal environment to run tranSMART analytical applications. Many types of data can now be hosted, including multi-OMICS data, preclinical laboratory data and clinical information, including longitudinal data sets. During the last two years, the platform has matured into a robust translational research knowledge management system that is able to host other data mining applications and support the development of new analytical tools. Copyright © 2018 Elsevier Ltd. All rights reserved.
Authentication and Authorization of End User in Microservice Architecture
NASA Astrophysics Data System (ADS)
He, Xiuyu; Yang, Xudong
2017-10-01
As the market and business continues to expand; the traditional single monolithic architecture is facing more and more challenges. The development of cloud computing and container technology promote microservice architecture became more popular. While the low coupling, fine granularity, scalability, flexibility and independence of the microservice architecture bring convenience, the inherent complexity of the distributed system make the security of microservice architecture important and difficult. This paper aims to study the authentication and authorization of the end user under the microservice architecture. By comparing with the traditional measures and researching on existing technology, this paper put forward a set of authentication and authorization strategies suitable for microservice architecture, such as distributed session, SSO solutions, client-side JSON web token and JWT + API Gateway, and summarize the advantages and disadvantages of each method.
Future of Department of Defense Cloud Computing Amid Cultural Confusion
2013-03-01
enterprise cloud - computing environment and transition to a public cloud service provider. Services have started the development of individual cloud - computing environments...endorsing cloud computing . It addresses related issues in matters of service culture changes and how strategic leaders will dictate the future of cloud ...through data center consolidation and individual Service provided cloud computing .
A Scalable Infrastructure for Lidar Topography Data Distribution, Processing, and Discovery
NASA Astrophysics Data System (ADS)
Crosby, C. J.; Nandigam, V.; Krishnan, S.; Phan, M.; Cowart, C. A.; Arrowsmith, R.; Baru, C.
2010-12-01
High-resolution topography data acquired with lidar (light detection and ranging) technology have emerged as a fundamental tool in the Earth sciences, and are also being widely utilized for ecological, planning, engineering, and environmental applications. Collected from airborne, terrestrial, and space-based platforms, these data are revolutionary because they permit analysis of geologic and biologic processes at resolutions essential for their appropriate representation. Public domain lidar data collection by federal, state, and local agencies are a valuable resource to the scientific community, however the data pose significant distribution challenges because of the volume and complexity of data that must be stored, managed, and processed. Lidar data acquisition may generate terabytes of data in the form of point clouds, digital elevation models (DEMs), and derivative products. This massive volume of data is often challenging to host for resource-limited agencies. Furthermore, these data can be technically challenging for users who lack appropriate software, computing resources, and expertise. The National Science Foundation-funded OpenTopography Facility (www.opentopography.org) has developed a cyberinfrastructure-based solution to enable online access to Earth science-oriented high-resolution lidar topography data, online processing tools, and derivative products. OpenTopography provides access to terabytes of point cloud data, standard DEMs, and Google Earth image data, all co-located with computational resources for on-demand data processing. The OpenTopography portal is built upon a cyberinfrastructure platform that utilizes a Services Oriented Architecture (SOA) to provide a modular system that is highly scalable and flexible enough to support the growing needs of the Earth science lidar community. OpenTopography strives to host and provide access to datasets as soon as they become available, and also to expose greater application level functionalities to our end-users (such as generation of custom DEMs via various gridding algorithms, and hydrological modeling algorithms). In the future, the SOA will enable direct authenticated access to back-end functionality through simple Web service Application Programming Interfaces (APIs), so that users may access our data and compute resources via clients other than Web browsers. In addition to an overview of the OpenTopography SOA, this presentation will discuss our recently developed lidar data ingestion and management system for point cloud data delivered in the binary LAS standard. This system compliments our existing partitioned database approach for data delivered in ASCII format, and permits rapid ingestion of data. The system has significantly reduced data ingestion times and has implications for data distribution in emergency response situations. We will also address on ongoing work to develop a community lidar metadata catalog based on the OGC Catalogue Service for Web (CSW) standard, which will help to centralize discovery of public domain lidar data.
Climate Analytics as a Service
NASA Technical Reports Server (NTRS)
Schnase, John L.; Duffy, Daniel Q.; McInerney, Mark A.; Webster, W. Phillip; Lee, Tsengdar J.
2014-01-01
Climate science is a big data domain that is experiencing unprecedented growth. In our efforts to address the big data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). CAaaS combines high-performance computing and data-proximal analytics with scalable data management, cloud computing virtualization, the notion of adaptive analytics, and a domain-harmonized API to improve the accessibility and usability of large collections of climate data. MERRA Analytic Services (MERRA/AS) provides an example of CAaaS. MERRA/AS enables MapReduce analytics over NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of key climate variables. The effectiveness of MERRA/AS has been demonstrated in several applications. In our experience, CAaaS is providing the agility required to meet our customers' increasing and changing data management and data analysis needs.
Production experience with the ATLAS Event Service
NASA Astrophysics Data System (ADS)
Benjamin, D.; Calafiura, P.; Childers, T.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The ATLAS Event Service (AES) has been designed and implemented for efficient running of ATLAS production workflows on a variety of computing platforms, ranging from conventional Grid sites to opportunistic, often short-lived resources, such as spot market commercial clouds, supercomputers and volunteer computing. The Event Service architecture allows real time delivery of fine grained workloads to running payload applications which process dispatched events or event ranges and immediately stream the outputs to highly scalable Object Stores. Thanks to its agile and flexible architecture the AES is currently being used by grid sites for assigning low priority workloads to otherwise idle computing resources; similarly harvesting HPC resources in an efficient back-fill mode; and massively scaling out to the 50-100k concurrent core level on the Amazon spot market to efficiently utilize those transient resources for peak production needs. Platform ports in development include ATLAS@Home (BOINC) and the Google Compute Engine, and a growing number of HPC platforms. After briefly reviewing the concept and the architecture of the Event Service, we will report the status and experience gained in AES commissioning and production operations on supercomputers, and our plans for extending ES application beyond Geant4 simulation to other workflows, such as reconstruction and data analysis.
NASA Astrophysics Data System (ADS)
Timm, S.; Cooper, G.; Fuess, S.; Garzoglio, G.; Holzman, B.; Kennedy, R.; Grassano, D.; Tiradani, A.; Krishnamurthy, R.; Vinayagam, S.; Raicu, I.; Wu, H.; Ren, S.; Noh, S.-Y.
2017-10-01
The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores. This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timm, S.; Cooper, G.; Fuess, S.
The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores.more » This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.« less
Sun, Xiaobo; Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng; Qin, Zhaohui S
2018-06-01
Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.
Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng
2018-01-01
Abstract Background Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)–based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems. PMID:29762754
myBlackBox: Blackbox Mobile Cloud Systems for Personalized Unusual Event Detection.
Ahn, Junho; Han, Richard
2016-05-23
We demonstrate the feasibility of constructing a novel and practical real-world mobile cloud system, called myBlackBox, that efficiently fuses multimodal smartphone sensor data to identify and log unusual personal events in mobile users' daily lives. The system incorporates a hybrid architectural design that combines unsupervised classification of audio, accelerometer and location data with supervised joint fusion classification to achieve high accuracy, customization, convenience and scalability. We show the feasibility of myBlackBox by implementing and evaluating this end-to-end system that combines Android smartphones with cloud servers, deployed for 15 users over a one-month period.
myBlackBox: Blackbox Mobile Cloud Systems for Personalized Unusual Event Detection
Ahn, Junho; Han, Richard
2016-01-01
We demonstrate the feasibility of constructing a novel and practical real-world mobile cloud system, called myBlackBox, that efficiently fuses multimodal smartphone sensor data to identify and log unusual personal events in mobile users’ daily lives. The system incorporates a hybrid architectural design that combines unsupervised classification of audio, accelerometer and location data with supervised joint fusion classification to achieve high accuracy, customization, convenience and scalability. We show the feasibility of myBlackBox by implementing and evaluating this end-to-end system that combines Android smartphones with cloud servers, deployed for 15 users over a one-month period. PMID:27223292
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dinda, Peter August
2015-03-17
This report describes the activities, findings, and products of the Northwestern University component of the "Enabling Exascale Hardware and Software Design through Scalable System Virtualization" project. The purpose of this project has been to extend the state of the art of systems software for high-end computing (HEC) platforms, and to use systems software to better enable the evaluation of potential future HEC platforms, for example exascale platforms. Such platforms, and their systems software, have the goal of providing scientific computation at new scales, thus enabling new research in the physical sciences and engineering. Over time, the innovations in systems softwaremore » for such platforms also become applicable to more widely used computing clusters, data centers, and clouds. This was a five-institution project, centered on the Palacios virtual machine monitor (VMM) systems software, a project begun at Northwestern, and originally developed in a previous collaboration between Northwestern University and the University of New Mexico. In this project, Northwestern (including via our subcontract to the University of Pittsburgh) contributed to the continued development of Palacios, along with other team members. We took the leadership role in (1) continued extension of support for emerging Intel and AMD hardware, (2) integration and performance enhancement of overlay networking, (3) connectivity with architectural simulation, (4) binary translation, and (5) support for modern Non-Uniform Memory Access (NUMA) hosts and guests. We also took a supporting role in support for specialized hardware for I/O virtualization, profiling, configurability, and integration with configuration tools. The efforts we led (1-5) were largely successful and executed as expected, with code and papers resulting from them. The project demonstrated the feasibility of a virtualization layer for HEC computing, similar to such layers for cloud or datacenter computing. For effort (3), although a prototype connecting Palacios with the GEM5 architectural simulator was demonstrated, our conclusion was that such a platform was less useful for design space exploration than anticipated due to inherent complexity of the connection between the instruction set architecture level and the microarchitectural level. For effort (4), we found that a code injection approach proved to be more fruitful. The results of our efforts are publicly available in the open source Palacios codebase and published papers, all of which are available from the project web site, v3vee.org. Palacios is currently one of the two codebases (the other being Sandia’s Kitten lightweight kernel) that underlies the node operating system for the DOE Hobbes Project, one of two projects tasked with building a systems software prototype for the national exascale computing effort.« less
Karim, Md Rezaul; Michel, Audrey; Zappa, Achille; Baranov, Pavel; Sahay, Ratnesh; Rebholz-Schuhmann, Dietrich
2017-04-16
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community. © The Author 2017. Published by Oxford University Press.
Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud.
Yang, Chao-Tung; Liu, Jung-Chun; Chen, Shuo-Tsung; Lu, Hsin-Wen
2017-08-18
Big Data analysis has become a key factor of being innovative and competitive. Along with population growth worldwide and the trend aging of population in developed countries, the rate of the national medical care usage has been increasing. Due to the fact that individual medical data are usually scattered in different institutions and their data formats are varied, to integrate those data that continue increasing is challenging. In order to have scalable load capacity for these data platforms, we must build them in good platform architecture. Some issues must be considered in order to use the cloud computing to quickly integrate big medical data into database for easy analyzing, searching, and filtering big data to obtain valuable information.This work builds a cloud storage system with HBase of Hadoop for storing and analyzing big data of medical records and improves the performance of importing data into database. The data of medical records are stored in HBase database platform for big data analysis. This system performs distributed computing on medical records data processing through Hadoop MapReduce programming, and to provide functions, including keyword search, data filtering, and basic statistics for HBase database. This system uses the Put with the single-threaded method and the CompleteBulkload mechanism to import medical data. From the experimental results, we find that when the file size is less than 300MB, the Put with single-threaded method is used and when the file size is larger than 300MB, the CompleteBulkload mechanism is used to improve the performance of data import into database. This system provides a web interface that allows users to search data, filter out meaningful information through the web, and analyze and convert data in suitable forms that will be helpful for medical staff and institutions.
NASA Astrophysics Data System (ADS)
Sangaline, E.; Lauret, J.
2014-06-01
The quantity of information produced in Nuclear and Particle Physics (NPP) experiments necessitates the transmission and storage of data across diverse collections of computing resources. Robust solutions such as XRootD have been used in NPP, but as the usage of cloud resources grows, the difficulties in the dynamic configuration of these systems become a concern. Hadoop File System (HDFS) exists as a possible cloud storage solution with a proven track record in dynamic environments. Though currently not extensively used in NPP, HDFS is an attractive solution offering both elastic storage and rapid deployment. We will present the performance of HDFS in both canonical I/O tests and for a typical data analysis pattern within the RHIC/STAR experimental framework. These tests explore the scaling with different levels of redundancy and numbers of clients. Additionally, the performance of FUSE and NFS interfaces to HDFS were evaluated as a way to allow existing software to function without modification. Unfortunately, the complicated data structures in NPP are non-trivial to integrate with Hadoop and so many of the benefits of the MapReduce paradigm could not be directly realized. Despite this, our results indicate that using HDFS as a distributed filesystem offers reasonable performance and scalability and that it excels in its ease of configuration and deployment in a cloud environment.
Accessing Cloud Properties and Satellite Imagery: A tool for visualization and data mining
NASA Astrophysics Data System (ADS)
Chee, T.; Nguyen, L.; Minnis, P.; Spangenberg, D.; Palikonda, R.
2016-12-01
Providing public access to imagery of cloud macro and microphysical properties and the underlying satellite imagery is a key concern for the NASA Langley Research Center Cloud and Radiation Group. This work describes a tool and system that allows end users to easily browse cloud information and satellite imagery that is otherwise difficult to acquire and manipulate. The tool has two uses, one to visualize the data and the other to access the data directly. It uses a widely used access protocol, the Open Geospatial Consortium's Web Map and Processing Services, to encourage user to access the data we produce. Internally, we leverage our practical experience with large, scalable application practices to develop a system that has the largest potential for scalability as well as the ability to be deployed on the cloud. One goal of the tool is to provide a demonstration of the back end capability to end users so that they can use the dynamically generated imagery and data as an input to their own work flows or to set up data mining constraints. We build upon NASA Langley Cloud and Radiation Group's experience with making real-time and historical satellite cloud product information and satellite imagery accessible and easily searchable. Increasingly, information is used in a "mash-up" form where multiple sources of information are combined to add value to disparate but related information. In support of NASA strategic goals, our group aims to make as much cutting edge scientific knowledge, observations and products available to the citizen science, research and interested communities for these kinds of "mash-ups" as well as provide a means for automated systems to data mine our information. This tool and access method provides a valuable research tool to a wide audience both as a standalone research tool and also as an easily accessed data source that can easily be mined or used with existing tools.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN
Merchant, Nirav
2016-01-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.
Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P
2016-04-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Cloud Computing for radiologists.
Kharat, Amit T; Safvi, Amjad; Thind, Ss; Singh, Amarjit
2012-07-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future.
Cloud Computing for radiologists
Kharat, Amit T; Safvi, Amjad; Thind, SS; Singh, Amarjit
2012-01-01
Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future. PMID:23599560
NASA Astrophysics Data System (ADS)
Tolba, Khaled Ibrahim; Morgenthal, Guido
2018-01-01
This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
Uncover the Cloud for Geospatial Sciences and Applications to Adopt Cloud Computing
NASA Astrophysics Data System (ADS)
Yang, C.; Huang, Q.; Xia, J.; Liu, K.; Li, J.; Xu, C.; Sun, M.; Bambacus, M.; Xu, Y.; Fay, D.
2012-12-01
Cloud computing is emerging as the future infrastructure for providing computing resources to support and enable scientific research, engineering development, and application construction, as well as work force education. On the other hand, there is a lot of doubt about the readiness of cloud computing to support a variety of scientific research, development and educations. This research is a project funded by NASA SMD to investigate through holistic studies how ready is the cloud computing to support geosciences. Four applications with different computing characteristics including data, computing, concurrent, and spatiotemporal intensities are taken to test the readiness of cloud computing to support geosciences. Three popular and representative cloud platforms including Amazon EC2, Microsoft Azure, and NASA Nebula as well as a traditional cluster are utilized in the study. Results illustrates that cloud is ready to some degree but more research needs to be done to fully implemented the cloud benefit as advertised by many vendors and defined by NIST. Specifically, 1) most cloud platform could help stand up new computing instances, a new computer, in a few minutes as envisioned, therefore, is ready to support most computing needs in an on demand fashion; 2) the load balance and elasticity, a defining characteristic, is ready in some cloud platforms, such as Amazon EC2, to support bigger jobs, e.g., needs response in minutes, while some are not ready to support the elasticity and load balance well. All cloud platform needs further research and development to support real time application at subminute level; 3) the user interface and functionality of cloud platforms vary a lot and some of them are very professional and well supported/documented, such as Amazon EC2, some of them needs significant improvement for the general public to adopt cloud computing without professional training or knowledge about computing infrastructure; 4) the security is a big concern in cloud computing platform, with the sharing spirit of cloud computing, it is very hard to ensure higher level security, except a private cloud is built for a specific organization without public access, public cloud platform does not support FISMA medium level yet and may never be able to support FISMA high level; 5) HPC jobs needs of cloud computing is not well supported and only Amazon EC2 supports this well. The research is being taken by NASA and other agencies to consider cloud computing adoption. We hope the publication of the research would also benefit the public to adopt cloud computing.
Using S3 cloud storage with ROOT and CvmFS
NASA Astrophysics Data System (ADS)
Arsuaga-Ríos, María; Heikkilä, Seppo S.; Duellmann, Dirk; Meusel, René; Blomer, Jakob; Couturier, Ben
2015-12-01
Amazon S3 is a widely adopted web API for scalable cloud storage that could also fulfill storage requirements of the high-energy physics community. CERN has been evaluating this option using some key HEP applications such as ROOT and the CernVM filesystem (CvmFS) with S3 back-ends. In this contribution we present an evaluation of two versions of the Huawei UDS storage system stressed with a large number of clients executing HEP software applications. The performance of concurrently storing individual objects is presented alongside with more complex data access patterns as produced by the ROOT data analysis framework. Both Huawei UDS generations show a successful scalability by supporting multiple byte-range requests in contrast with Amazon S3 or Ceph which do not support these commonly used HEP operations. We further report the S3 integration with recent CvmFS versions and summarize the experience with CvmFS/S3 for publishing daily releases of the full LHCb experiment software stack.
CANFAR + Skytree: Mining Massive Datasets as an Essential Part of the Future of Astronomy
NASA Astrophysics Data System (ADS)
Ball, Nicholas M.
2013-01-01
The future study of large astronomical datasets, consisting of hundreds of millions to billions of objects, will be dominated by large computing resources, and by analysis tools of the necessary scalability and sophistication to extract useful information. Significant effort will be required to fulfil their potential as a provider of the next generation of science results. To-date, computing systems have allowed either sophisticated analysis of small datasets, e.g., most astronomy software, or simple analysis of large datasets, e.g., database queries. At the Canadian Astronomy Data Centre, we have combined our cloud computing system, the Canadian Advanced Network for Astronomical Research (CANFAR), with the world's most advanced machine learning software, Skytree, to create the world's first cloud computing system for data mining in astronomy. This allows the full sophistication of the huge fields of data mining and machine learning to be applied to the hundreds of millions of objects that make up current large datasets. CANFAR works by utilizing virtual machines, which appear to the user as equivalent to a desktop. Each machine is replicated as desired to perform large-scale parallel processing. Such an arrangement carries far more flexibility than other cloud systems, because it enables the user to immediately install and run the same code that they already utilize for science on their desktop. We demonstrate the utility of the CANFAR + Skytree system by showing science results obtained, including assigning photometric redshifts with full probability density functions (PDFs) to a catalog of approximately 133 million galaxies from the MegaPipe reductions of the Canada-France-Hawaii Telescope Legacy Wide and Deep surveys. Each PDF is produced nonparametrically from 100 instances of the photometric parameters for each galaxy, generated by perturbing within the errors on the measurements. Hence, we produce, store, and assign redshifts to, a catalog of over 13 billion object instances. This catalog is comparable in size to those expected from next-generation surveys, such as Large Synoptic Survey Telescope. The CANFAR+Skytree system is open for use by any interested member of the astronomical community.
2012-05-01
cloud computing 17 NASA Nebula Platform • Cloud computing pilot program at NASA Ames • Integrates open-source components into seamless, self...Mission support • Education and public outreach (NASA Nebula , 2010) 18 NSF Supported Cloud Research • Support for Cloud Computing in...Mell, P. & Grance, T. (2011). The NIST Definition of Cloud Computing. NIST Special Publication 800-145 • NASA Nebula (2010). Retrieved from
A Hybrid Cloud Computing Service for Earth Sciences
NASA Astrophysics Data System (ADS)
Yang, C. P.
2016-12-01
Cloud Computing is becoming a norm for providing computing capabilities for advancing Earth sciences including big Earth data management, processing, analytics, model simulations, and many other aspects. A hybrid spatiotemporal cloud computing service is bulit at George Mason NSF spatiotemporal innovation center to meet this demands. This paper will report the service including several aspects: 1) the hardware includes 500 computing services and close to 2PB storage as well as connection to XSEDE Jetstream and Caltech experimental cloud computing environment for sharing the resource; 2) the cloud service is geographically distributed at east coast, west coast, and central region; 3) the cloud includes private clouds managed using open stack and eucalyptus, DC2 is used to bridge these and the public AWS cloud for interoperability and sharing computing resources when high demands surfing; 4) the cloud service is used to support NSF EarthCube program through the ECITE project, ESIP through the ESIP cloud computing cluster, semantics testbed cluster, and other clusters; 5) the cloud service is also available for the earth science communities to conduct geoscience. A brief introduction about how to use the cloud service will be included.
Using Cloud-based Storage Technologies for Earth Science Data
NASA Astrophysics Data System (ADS)
Michaelis, A.; Readey, J.; Votava, P.
2016-12-01
Cloud based infrastructure may offer several key benefits of scalability, built in redundancy and reduced total cost of ownership as compared with a traditional data center approach. However, most of the tools and software systems developed for NASA data repositories were not developed with a cloud based infrastructure in mind and do not fully take advantage of commonly available cloud-based technologies. Object storage services are provided through all the leading public (Amazon Web Service, Microsoft Azure, Google Cloud, etc.) and private (Open Stack) clouds, and may provide a more cost-effective means of storing large data collections online. We describe a system that utilizes object storage rather than traditional file system based storage to vend earth science data. The system described is not only cost effective, but shows superior performance for running many different analytics tasks in the cloud. To enable compatibility with existing tools and applications, we outline client libraries that are API compatible with existing libraries for HDF5 and NetCDF4. Performance of the system is demonstrated using clouds services running on Amazon Web Services.
Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM
2009-03-17
A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
Jiang, Shunrong; Zhu, Xiaoyan; Wang, Liangmin
2015-01-01
Mobile healthcare social networks (MHSNs) have emerged as a promising next-generation healthcare system, which will significantly improve the quality of life. However, there are many security and privacy concerns before personal health information (PHI) is shared with other parities. To ensure patients’ full control over their PHI, we propose a fine-grained and scalable data access control scheme based on attribute-based encryption (ABE). Besides, policies themselves for PHI sharing may be sensitive and may reveal information about underlying PHI or about data owners or recipients. In our scheme, we let each attribute contain an attribute name and its value and adopt the Bloom filter to efficiently check attributes before decryption. Thus, the data privacy and policy privacy can be preserved in our proposed scheme. Moreover, considering the fact that the computational cost grows with the complexity of the access policy and the limitation of the resource and energy in a smart phone, we outsource ABE decryption to the cloud while preventing the cloud from learning anything about the content and access policy. The security and performance analysis is carried out to demonstrate that our proposed scheme can achieve fine-grained access policies for PHI sharing in MHSNs. PMID:26404300
Jiang, Shunrong; Zhu, Xiaoyan; Wang, Liangmin
2015-09-03
Mobile healthcare social networks (MHSNs) have emerged as a promising next-generation healthcare system, which will significantly improve the quality of life. However, there are many security and privacy concerns before personal health information (PHI) is shared with other parities. To ensure patients' full control over their PHI, we propose a fine-grained and scalable data access control scheme based on attribute-based encryption (ABE). Besides, policies themselves for PHI sharing may be sensitive and may reveal information about underlying PHI or about data owners or recipients. In our scheme, we let each attribute contain an attribute name and its value and adopt the Bloom filter to efficiently check attributes before decryption. Thus, the data privacy and policy privacy can be preserved in our proposed scheme. Moreover, considering the fact that the computational cost grows with the complexity of the access policy and the limitation of the resource and energy in a smart phone, we outsource ABE decryption to the cloud while preventing the cloud from learning anything about the content and access policy. The security and performance analysis is carried out to demonstrate that our proposed scheme can achieve fine-grained access policies for PHI sharing in MHSNs.
Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud.
Yang, Andrian; Troup, Michael; Lin, Peijie; Ho, Joshua W K
2017-03-01
Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis. Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/. j.ho@victorchang.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
IBM Cloud Computing Powering a Smarter Planet
NASA Astrophysics Data System (ADS)
Zhu, Jinzy; Fang, Xing; Guo, Zhe; Niu, Meng Hua; Cao, Fan; Yue, Shuang; Liu, Qin Yu
With increasing need for intelligent systems supporting the world's businesses, Cloud Computing has emerged as a dominant trend to provide a dynamic infrastructure to make such intelligence possible. The article introduced how to build a smarter planet with cloud computing technology. First, it introduced why we need cloud, and the evolution of cloud technology. Secondly, it analyzed the value of cloud computing and how to apply cloud technology. Finally, it predicted the future of cloud in the smarter planet.
Cloud Computing Security Issue: Survey
NASA Astrophysics Data System (ADS)
Kamal, Shailza; Kaur, Rajpreet
2011-12-01
Cloud computing is the growing field in IT industry since 2007 proposed by IBM. Another company like Google, Amazon, and Microsoft provides further products to cloud computing. The cloud computing is the internet based computing that shared recourses, information on demand. It provides the services like SaaS, IaaS and PaaS. The services and recourses are shared by virtualization that run multiple operation applications on cloud computing. This discussion gives the survey on the challenges on security issues during cloud computing and describes some standards and protocols that presents how security can be managed.
Unified Geophysical Cloud Platform (UGCP) for Seismic Monitoring and other Geophysical Applications.
NASA Astrophysics Data System (ADS)
Synytsky, R.; Starovoit, Y. O.; Henadiy, S.; Lobzakov, V.; Kolesnikov, L.
2016-12-01
We present Unified Geophysical Cloud Platform (UGCP) or UniGeoCloud as an innovative approach for geophysical data processing in the Cloud environment with the ability to run any type of data processing software in isolated environment within the single Cloud platform. We've developed a simple and quick method of several open-source widely known software seismic packages (SeisComp3, Earthworm, Geotool, MSNoise) installation which does not require knowledge of system administration, configuration, OS compatibility issues etc. and other often annoying details preventing time wasting for system configuration work. Installation process is simplified as "mouse click" on selected software package from the Cloud market place. The main objective of the developed capability was the software tools conception with which users are able to design and install quickly their own highly reliable and highly available virtual IT-infrastructure for the organization of seismic (and in future other geophysical) data processing for either research or monitoring purposes. These tools provide access to any seismic station data available in open IP configuration from the different networks affiliated with different Institutions and Organizations. It allows also setting up your own network as you desire by selecting either regionally deployed stations or the worldwide global network based on stations selection form the global map. The processing software and products and research results could be easily monitored from everywhere using variety of user's devices form desk top computers to IT gadgets. Currents efforts of the development team are directed to achieve Scalability, Reliability and Sustainability (SRS) of proposed solutions allowing any user to run their applications with the confidence of no data loss and no failure of the monitoring or research software components. The system is suitable for quick rollout of NDC-in-Box software package developed for State Signatories and aimed for promotion of data processing collected by the IMS Network.
T-Check in System-of-Systems Technologies: Cloud Computing
2010-09-01
T-Check in System-of-Systems Technologies: Cloud Computing Harrison D. Strowd Grace A. Lewis September 2010 TECHNICAL NOTE CMU/SEI-2010... Cloud Computing 1 1.2 Types of Cloud Computing 2 1.3 Drivers and Barriers to Cloud Computing Adoption 5 2 Using the T-Check Method 7 2.1 T-Check...Hypothesis 3 25 3.4.2 Deployment View of the Solution for Testing Hypothesis 3 27 3.5 Selecting Cloud Computing Providers 30 3.6 Implementing the T-Check
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2014-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
A novel processing platform for post tape out flows
NASA Astrophysics Data System (ADS)
Vu, Hien T.; Kim, Soohong; Word, James; Cai, Lynn Y.
2018-03-01
As the computational requirements for post tape out (PTO) flows increase at the 7nm and below technology nodes, there is a need to increase the scalability of the computational tools in order to reduce the turn-around time (TAT) of the flows. Utilization of design hierarchy has been one proven method to provide sufficient partitioning to enable PTO processing. However, as the data is processed through the PTO flow, its effective hierarchy is reduced. The reduction is necessary to achieve the desired accuracy. Also, the sequential nature of the PTO flow is inherently non-scalable. To address these limitations, we are proposing a quasi-hierarchical solution that combines multiple levels of parallelism to increase the scalability of the entire PTO flow. In this paper, we describe the system and present experimental results demonstrating the runtime reduction through scalable processing with thousands of computational cores.
Risk in the Clouds?: Security Issues Facing Government Use of Cloud Computing
NASA Astrophysics Data System (ADS)
Wyld, David C.
Cloud computing is poised to become one of the most important and fundamental shifts in how computing is consumed and used. Forecasts show that government will play a lead role in adopting cloud computing - for data storage, applications, and processing power, as IT executives seek to maximize their returns on limited procurement budgets in these challenging economic times. After an overview of the cloud computing concept, this article explores the security issues facing public sector use of cloud computing and looks to the risk and benefits of shifting to cloud-based models. It concludes with an analysis of the challenges that lie ahead for government use of cloud resources.
A Review Study on Cloud Computing Issues
NASA Astrophysics Data System (ADS)
Kanaan Kadhim, Qusay; Yusof, Robiah; Sadeq Mahdi, Hamid; Al-shami, Sayed Samer Ali; Rahayu Selamat, Siti
2018-05-01
Cloud computing is the most promising current implementation of utility computing in the business world, because it provides some key features over classic utility computing, such as elasticity to allow clients dynamically scale-up and scale-down the resources in execution time. Nevertheless, cloud computing is still in its premature stage and experiences lack of standardization. The security issues are the main challenges to cloud computing adoption. Thus, critical industries such as government organizations (ministries) are reluctant to trust cloud computing due to the fear of losing their sensitive data, as it resides on the cloud with no knowledge of data location and lack of transparency of Cloud Service Providers (CSPs) mechanisms used to secure their data and applications which have created a barrier against adopting this agile computing paradigm. This study aims to review and classify the issues that surround the implementation of cloud computing which a hot area that needs to be addressed by future research.
Slattery, Stuart R.
2015-12-02
In this study we analyze and extend mesh-free algorithms for three-dimensional data transfer problems in partitioned multiphysics simulations. We first provide a direct comparison between a mesh-based weighted residual method using the common-refinement scheme and two mesh-free algorithms leveraging compactly supported radial basis functions: one using a spline interpolation and one using a moving least square reconstruction. Through the comparison we assess both the conservation and accuracy of the data transfer obtained from each of the methods. We do so for a varying set of geometries with and without curvature and sharp features and for functions with and without smoothnessmore » and with varying gradients. Our results show that the mesh-based and mesh-free algorithms are complementary with cases where each was demonstrated to perform better than the other. We then focus on the mesh-free methods by developing a set of algorithms to parallelize them based on sparse linear algebra techniques. This includes a discussion of fast parallel radius searching in point clouds and restructuring the interpolation algorithms to leverage data structures and linear algebra services designed for large distributed computing environments. The scalability of our new algorithms is demonstrated on a leadership class computing facility using a set of basic scaling studies. Finally, these scaling studies show that for problems with reasonable load balance, our new algorithms for both spline interpolation and moving least square reconstruction demonstrate both strong and weak scalability using more than 100,000 MPI processes with billions of degrees of freedom in the data transfer operation.« less
Scalable Robust Principal Component Analysis Using Grassmann Averages.
Hauberg, Sren; Feragen, Aasa; Enficiaud, Raffi; Black, Michael J
2016-11-01
In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average ( GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average ( TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-09-04
...--Intersection of Cloud Computing and Mobility Forum and Workshop AGENCY: National Institute of Standards and.../intersection-of-cloud-and-mobility.cfm . SUPPLEMENTARY INFORMATION: NIST hosted six prior Cloud Computing Forum... interoperability, portability, and security, discuss the Federal Government's experience with cloud computing...
Embracing the Cloud: Six Ways to Look at the Shift to Cloud Computing
ERIC Educational Resources Information Center
Ullman, David F.; Haggerty, Blake
2010-01-01
Cloud computing is the latest paradigm shift for the delivery of IT services. Where previous paradigms (centralized, decentralized, distributed) were based on fairly straightforward approaches to technology and its management, cloud computing is radical in comparison. The literature on cloud computing, however, suffers from many divergent…
The Research of the Parallel Computing Development from the Angle of Cloud Computing
NASA Astrophysics Data System (ADS)
Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun
2017-10-01
Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
Cloud computing basics for librarians.
Hoy, Matthew B
2012-01-01
"Cloud computing" is the name for the recent trend of moving software and computing resources to an online, shared-service model. This article briefly defines cloud computing, discusses different models, explores the advantages and disadvantages, and describes some of the ways cloud computing can be used in libraries. Examples of cloud services are included at the end of the article. Copyright © Taylor & Francis Group, LLC
A Novel College Network Resource Management Method using Cloud Computing
NASA Astrophysics Data System (ADS)
Lin, Chen
At present information construction of college mainly has construction of college networks and management information system; there are many problems during the process of information. Cloud computing is development of distributed processing, parallel processing and grid computing, which make data stored on the cloud, make software and services placed in the cloud and build on top of various standards and protocols, you can get it through all kinds of equipments. This article introduces cloud computing and function of cloud computing, then analyzes the exiting problems of college network resource management, the cloud computing technology and methods are applied in the construction of college information sharing platform.
An Efficient Interactive Model for On-Demand Sensing-As-A-Servicesof Sensor-Cloud
Dinh, Thanh; Kim, Younghan
2016-01-01
This paper proposes an efficient interactive model for the sensor-cloud to enable the sensor-cloud to efficiently provide on-demand sensing services for multiple applications with different requirements at the same time. The interactive model is designed for both the cloud and sensor nodes to optimize the resource consumption of physical sensors, as well as the bandwidth consumption of sensing traffic. In the model, the sensor-cloud plays a key role in aggregating application requests to minimize the workloads required for constrained physical nodes while guaranteeing that the requirements of all applications are satisfied. Physical sensor nodes perform their sensing under the guidance of the sensor-cloud. Based on the interactions with the sensor-cloud, physical sensor nodes adapt their scheduling accordingly to minimize their energy consumption. Comprehensive experimental results show that our proposed system achieves a significant improvement in terms of the energy consumption of physical sensors, the bandwidth consumption from the sink node to the sensor-cloud, the packet delivery latency, reliability and scalability, compared to current approaches. Based on the obtained results, we discuss the economical benefits and how the proposed system enables a win-win model in the sensor-cloud. PMID:27367689
An Efficient Interactive Model for On-Demand Sensing-As-A-Servicesof Sensor-Cloud.
Dinh, Thanh; Kim, Younghan
2016-06-28
This paper proposes an efficient interactive model for the sensor-cloud to enable the sensor-cloud to efficiently provide on-demand sensing services for multiple applications with different requirements at the same time. The interactive model is designed for both the cloud and sensor nodes to optimize the resource consumption of physical sensors, as well as the bandwidth consumption of sensing traffic. In the model, the sensor-cloud plays a key role in aggregating application requests to minimize the workloads required for constrained physical nodes while guaranteeing that the requirements of all applications are satisfied. Physical sensor nodes perform their sensing under the guidance of the sensor-cloud. Based on the interactions with the sensor-cloud, physical sensor nodes adapt their scheduling accordingly to minimize their energy consumption. Comprehensive experimental results show that our proposed system achieves a significant improvement in terms of the energy consumption of physical sensors, the bandwidth consumption from the sink node to the sensor-cloud, the packet delivery latency, reliability and scalability, compared to current approaches. Based on the obtained results, we discuss the economical benefits and how the proposed system enables a win-win model in the sensor-cloud.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rudkevich, Aleksandr; Goldis, Evgeniy
This research conducted by the Newton Energy Group, LLC (NEG) is dedicated to the development of pCloud: a Cloud-based Power Market Simulation Environment. pCloud is offering power industry stakeholders the capability to model electricity markets and is organized around the Software as a Service (SaaS) concept -- a software application delivery model in which software is centrally hosted and provided to many users via the internet. During the Phase I of this project NEG developed a prototype design for pCloud as a SaaS-based commercial service offering, system architecture supporting that design, ensured feasibility of key architecture's elements, formed technological partnershipsmore » and negotiated commercial agreements with partners, conducted market research and other related activities and secured funding for continue development of pCloud between the end of Phase I and beginning of Phase II, if awarded. Based on the results of Phase I activities, NEG has established that the development of a cloud-based power market simulation environment within the Windows Azure platform is technologically feasible, can be accomplished within the budget and timeframe available through the Phase II SBIR award with additional external funding. NEG believes that pCloud has the potential to become a game-changing technology for the modeling and analysis of electricity markets. This potential is due to the following critical advantages of pCloud over its competition: - Standardized access to advanced and proven power market simulators offered by third parties. - Automated parallelization of simulations and dynamic provisioning of computing resources on the cloud. This combination of automation and scalability dramatically reduces turn-around time while offering the capability to increase the number of analyzed scenarios by a factor of 10, 100 or even 1000. - Access to ready-to-use data and to cloud-based resources leading to a reduction in software, hardware, and IT costs. - Competitive pricing structure, which will make high-volume usage of simulation services affordable. - Availability and affordability of high quality power simulators, which presently only large corporate clients can afford, will level the playing field in developing regional energy policies, determining prudent cost recovery mechanisms and assuring just and reasonable rates to consumers. - Users that presently do not have the resources to internally maintain modeling capabilities will now be able to run simulations. This will invite more players into the industry, ultimately leading to more transparent and liquid power markets.« less
NASA Astrophysics Data System (ADS)
Dong, Yumin; Xiao, Shufen; Ma, Hongyang; Chen, Libo
2016-12-01
Cloud computing and big data have become the developing engine of current information technology (IT) as a result of the rapid development of IT. However, security protection has become increasingly important for cloud computing and big data, and has become a problem that must be solved to develop cloud computing. The theft of identity authentication information remains a serious threat to the security of cloud computing. In this process, attackers intrude into cloud computing services through identity authentication information, thereby threatening the security of data from multiple perspectives. Therefore, this study proposes a model for cloud computing protection and management based on quantum authentication, introduces the principle of quantum authentication, and deduces the quantum authentication process. In theory, quantum authentication technology can be applied in cloud computing for security protection. This technology cannot be cloned; thus, it is more secure and reliable than classical methods.
Sirepo for Synchrotron Radiation Workshop
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nagler, Robert; Moeller, Paul; Rakitin, Maksim
Sirepo is an open source framework for cloud computing. The graphical user interface (GUI) for Sirepo, also known as the client, executes in any HTML5 compliant web browser on any computing platform, including tablets. The client is built in JavaScript, making use of the following open source libraries: Bootstrap, which is fundamental for cross-platform web applications; AngularJS, which provides a model–view–controller (MVC) architecture and GUI components; and D3.js, which provides interactive plots and data-driven transformations. The Sirepo server is built on the following Python technologies: Flask, which is a lightweight framework for web development; Jinja, which is a secure andmore » widely used templating language; and Werkzeug, a utility library that is compliant with the WSGI standard. We use Nginx as the HTTP server and proxy, which provides a scalable event-driven architecture. The physics codes supported by Sirepo execute inside a Docker container. One of the codes supported by Sirepo is the Synchrotron Radiation Workshop (SRW). SRW computes synchrotron radiation from relativistic electrons in arbitrary magnetic fields and propagates the radiation wavefronts through optical beamlines. SRW is open source and is primarily supported by Dr. Oleg Chubar of NSLS-II at Brookhaven National Laboratory.« less
Using IKAROS as a data transfer and management utility within the KM3NeT computing model
NASA Astrophysics Data System (ADS)
Filippidis, Christos; Cotronis, Yiannis; Markou, Christos
2016-04-01
KM3NeT is a future European deep-sea research infrastructure hosting a new generation neutrino detectors that - located at the bottom of the Mediterranean Sea - will open a new window on the universe and answer fundamental questions both in particle physics and astrophysics. IKAROS is a framework that enables creating scalable storage formations on-demand and helps addressing several limitations that the current file systems face when dealing with very large scale infrastructures. It enables creating ad-hoc nearby storage formations and can use a huge number of I/O nodes in order to increase the available bandwidth (I/O and network). IKAROS unifies remote and local access in the overall data flow, by permitting direct access to each I/O node. In this way we can handle the overall data flow at the network layer, limiting the interaction with the operating system. This approach allows virtually connecting, at the users level, the several different computing facilities used (Grids, Clouds, HPCs, Data Centers, Local computing Clusters and personal storage devices), on-demand, based on the needs, by using well known standards and protocols, like HTTP.
Secure count query on encrypted genomic data.
Hasan, Mohammad Zahidul; Mahdi, Md Safiur Rahman; Sadat, Md Nazmus; Mohammed, Noman
2018-05-01
Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record contains 500 SNPs. And, it requires 69.7 s to execute the query on the same database that also includes phenotypes. Copyright © 2018 Elsevier Inc. All rights reserved.
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Stocker, John C.; Golomb, Andrew M.
2011-01-01
Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
Establishing a Cloud Computing Success Model for Hospitals in Taiwan.
Lian, Jiunn-Woei
2017-01-01
The purpose of this study is to understand the critical quality-related factors that affect cloud computing success of hospitals in Taiwan. In this study, private cloud computing is the major research target. The chief information officers participated in a questionnaire survey. The results indicate that the integration of trust into the information systems success model will have acceptable explanatory power to understand cloud computing success in the hospital. Moreover, information quality and system quality directly affect cloud computing satisfaction, whereas service quality indirectly affects the satisfaction through trust. In other words, trust serves as the mediator between service quality and satisfaction. This cloud computing success model will help hospitals evaluate or achieve success after adopting private cloud computing health care services.
Establishing a Cloud Computing Success Model for Hospitals in Taiwan
Lian, Jiunn-Woei
2017-01-01
The purpose of this study is to understand the critical quality-related factors that affect cloud computing success of hospitals in Taiwan. In this study, private cloud computing is the major research target. The chief information officers participated in a questionnaire survey. The results indicate that the integration of trust into the information systems success model will have acceptable explanatory power to understand cloud computing success in the hospital. Moreover, information quality and system quality directly affect cloud computing satisfaction, whereas service quality indirectly affects the satisfaction through trust. In other words, trust serves as the mediator between service quality and satisfaction. This cloud computing success model will help hospitals evaluate or achieve success after adopting private cloud computing health care services. PMID:28112020
Implementation of cloud computing in higher education
NASA Astrophysics Data System (ADS)
Asniar; Budiawan, R.
2016-04-01
Cloud computing research is a new trend in distributed computing, where people have developed service and SOA (Service Oriented Architecture) based application. This technology is very useful to be implemented, especially for higher education. This research is studied the need and feasibility for the suitability of cloud computing in higher education then propose the model of cloud computing service in higher education in Indonesia that can be implemented in order to support academic activities. Literature study is used as the research methodology to get a proposed model of cloud computing in higher education. Finally, SaaS and IaaS are cloud computing service that proposed to be implemented in higher education in Indonesia and cloud hybrid is the service model that can be recommended.
Research on Key Technologies of Cloud Computing
NASA Astrophysics Data System (ADS)
Zhang, Shufen; Yan, Hongcan; Chen, Xuebin
With the development of multi-core processors, virtualization, distributed storage, broadband Internet and automatic management, a new type of computing mode named cloud computing is produced. It distributes computation task on the resource pool which consists of massive computers, so the application systems can obtain the computing power, the storage space and software service according to its demand. It can concentrate all the computing resources and manage them automatically by the software without intervene. This makes application offers not to annoy for tedious details and more absorbed in his business. It will be advantageous to innovation and reduce cost. It's the ultimate goal of cloud computing to provide calculation, services and applications as a public facility for the public, So that people can use the computer resources just like using water, electricity, gas and telephone. Currently, the understanding of cloud computing is developing and changing constantly, cloud computing still has no unanimous definition. This paper describes three main service forms of cloud computing: SAAS, PAAS, IAAS, compared the definition of cloud computing which is given by Google, Amazon, IBM and other companies, summarized the basic characteristics of cloud computing, and emphasized on the key technologies such as data storage, data management, virtualization and programming model.
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
The Many Colors and Shapes of Cloud
NASA Astrophysics Data System (ADS)
Yeh, James T.
While many enterprises and business entities are deploying and exploiting Cloud Computing, the academic institutes and researchers are also busy trying to wrestle this beast and put a leash on this possible paradigm changing computing model. Many have argued that Cloud Computing is nothing more than a name change of Utility Computing. Others have argued that Cloud Computing is a revolutionary change of the computing architecture. So it has been difficult to put a boundary of what is in Cloud Computing, and what is not. I assert that it is equally difficult to find a group of people who would agree on even the definition of Cloud Computing. In actuality, may be all that arguments are not necessary, as Clouds have many shapes and colors. In this presentation, the speaker will attempt to illustrate that the shape and the color of the cloud depend very much on the business goals one intends to achieve. It will be a very rich territory for both the businesses to take the advantage of the benefits of Cloud Computing and the academia to integrate the technology research and business research.
NASA Astrophysics Data System (ADS)
Panitkin, Sergey; Barreiro Megino, Fernando; Caballero Bejar, Jose; Benjamin, Doug; Di Girolamo, Alessandro; Gable, Ian; Hendrix, Val; Hover, John; Kucharczyk, Katarzyna; Medrano Llamas, Ramon; Love, Peter; Ohman, Henrik; Paterson, Michael; Sobie, Randall; Taylor, Ryan; Walker, Rodney; Zaytsev, Alexander; Atlas Collaboration
2014-06-01
The computing model of the ATLAS experiment was designed around the concept of grid computing and, since the start of data taking, this model has proven very successful. However, new cloud computing technologies bring attractive features to improve the operations and elasticity of scientific distributed computing. ATLAS sees grid and cloud computing as complementary technologies that will coexist at different levels of resource abstraction, and two years ago created an R&D working group to investigate the different integration scenarios. The ATLAS Cloud Computing R&D has been able to demonstrate the feasibility of offloading work from grid to cloud sites and, as of today, is able to integrate transparently various cloud resources into the PanDA workload management system. The ATLAS Cloud Computing R&D is operating various PanDA queues on private and public resources and has provided several hundred thousand CPU days to the experiment. As a result, the ATLAS Cloud Computing R&D group has gained a significant insight into the cloud computing landscape and has identified points that still need to be addressed in order to fully utilize this technology. This contribution will explain the cloud integration models that are being evaluated and will discuss ATLAS' learning during the collaboration with leading commercial and academic cloud providers.
The Education Value of Cloud Computing
ERIC Educational Resources Information Center
Katzan, Harry, Jr.
2010-01-01
Cloud computing is a technique for supplying computer facilities and providing access to software via the Internet. Cloud computing represents a contextual shift in how computers are provisioned and accessed. One of the defining characteristics of cloud software service is the transfer of control from the client domain to the service provider.…
Cloud Computing. Technology Briefing. Number 1
ERIC Educational Resources Information Center
Alberta Education, 2013
2013-01-01
Cloud computing is Internet-based computing in which shared resources, software and information are delivered as a service that computers or mobile devices can access on demand. Cloud computing is already used extensively in education. Free or low-cost cloud-based services are used daily by learners and educators to support learning, social…
Can cloud computing benefit health services? - a SWOT analysis.
Kuo, Mu-Hsing; Kushniruk, Andre; Borycki, Elizabeth
2011-01-01
In this paper, we discuss cloud computing, the current state of cloud computing in healthcare, and the challenges and opportunities of adopting cloud computing in healthcare. A Strengths, Weaknesses, Opportunities and Threats (SWOT) analysis was used to evaluate the feasibility of adopting this computing model in healthcare. The paper concludes that cloud computing could have huge benefits for healthcare but there are a number of issues that will need to be addressed before its widespread use in healthcare.
State of the Art of Network Security Perspectives in Cloud Computing
NASA Astrophysics Data System (ADS)
Oh, Tae Hwan; Lim, Shinyoung; Choi, Young B.; Park, Kwang-Roh; Lee, Heejo; Choi, Hyunsang
Cloud computing is now regarded as one of social phenomenon that satisfy customers' needs. It is possible that the customers' needs and the primary principle of economy - gain maximum benefits from minimum investment - reflects realization of cloud computing. We are living in the connected society with flood of information and without connected computers to the Internet, our activities and work of daily living will be impossible. Cloud computing is able to provide customers with custom-tailored features of application software and user's environment based on the customer's needs by adopting on-demand outsourcing of computing resources through the Internet. It also provides cloud computing users with high-end computing power and expensive application software package, and accordingly the users will access their data and the application software where they are located at the remote system. As the cloud computing system is connected to the Internet, network security issues of cloud computing are considered as mandatory prior to real world service. In this paper, survey and issues on the network security in cloud computing are discussed from the perspective of real world service environments.
Trident: scalable compute archives: workflows, visualization, and analysis
NASA Astrophysics Data System (ADS)
Gopu, Arvind; Hayashi, Soichi; Young, Michael D.; Kotulla, Ralf; Henschel, Robert; Harbeck, Daniel
2016-08-01
The Astronomy scientific community has embraced Big Data processing challenges, e.g. associated with time-domain astronomy, and come up with a variety of novel and efficient data processing solutions. However, data processing is only a small part of the Big Data challenge. Efficient knowledge discovery and scientific advancement in the Big Data era requires new and equally efficient tools: modern user interfaces for searching, identifying and viewing data online without direct access to the data; tracking of data provenance; searching, plotting and analyzing metadata; interactive visual analysis, especially of (time-dependent) image data; and the ability to execute pipelines on supercomputing and cloud resources with minimal user overhead or expertise even to novice computing users. The Trident project at Indiana University offers a comprehensive web and cloud-based microservice software suite that enables the straight forward deployment of highly customized Scalable Compute Archive (SCA) systems; including extensive visualization and analysis capabilities, with minimal amount of additional coding. Trident seamlessly scales up or down in terms of data volumes and computational needs, and allows feature sets within a web user interface to be quickly adapted to meet individual project requirements. Domain experts only have to provide code or business logic about handling/visualizing their domain's data products and about executing their pipelines and application work flows. Trident's microservices architecture is made up of light-weight services connected by a REST API and/or a message bus; a web interface elements are built using NodeJS, AngularJS, and HighCharts JavaScript libraries among others while backend services are written in NodeJS, PHP/Zend, and Python. The software suite currently consists of (1) a simple work flow execution framework to integrate, deploy, and execute pipelines and applications (2) a progress service to monitor work flows and sub-work flows (3) ImageX, an interactive image visualization service (3) an authentication and authorization service (4) a data service that handles archival, staging and serving of data products, and (5) a notification service that serves statistical collation and reporting needs of various projects. Several other additional components are under development. Trident is an umbrella project, that evolved from the One Degree Imager, Portal, Pipeline, and Archive (ODI-PPA) project which we had initially refactored toward (1) a powerful analysis/visualization portal for Globular Cluster System (GCS) survey data collected by IU researchers, 2) a data search and download portal for the IU Electron Microscopy Center's data (EMC-SCA), 3) a prototype archive for the Ludwig Maximilian University's Wide Field Imager. The new Trident software has been used to deploy (1) a metadata quality control and analytics portal (RADY-SCA) for DICOM formatted medical imaging data produced by the IU Radiology Center, 2) Several prototype work flows for different domains, 3) a snapshot tool within IU's Karst Desktop environment, 4) a limited component-set to serve GIS data within the IU GIS web portal. Trident SCA systems leverage supercomputing and storage resources at Indiana University but can be configured to make use of any cloud/grid resource, from local workstations/servers to (inter)national supercomputing facilities such as XSEDE.
Introducing the Cloud in an Introductory IT Course
ERIC Educational Resources Information Center
Woods, David M.
2018-01-01
Cloud computing is a rapidly emerging topic, but should it be included in an introductory IT course? The magnitude of cloud computing use, especially cloud infrastructure, along with students' limited knowledge of the topic support adding cloud content to the IT curriculum. There are several arguments that support including cloud computing in an…
Enabling Earth Science Through Cloud Computing
NASA Technical Reports Server (NTRS)
Hardman, Sean; Riofrio, Andres; Shams, Khawaja; Freeborn, Dana; Springer, Paul; Chafin, Brian
2012-01-01
Cloud Computing holds tremendous potential for missions across the National Aeronautics and Space Administration. Several flight missions are already benefiting from an investment in cloud computing for mission critical pipelines and services through faster processing time, higher availability, and drastically lower costs available on cloud systems. However, these processes do not currently extend to general scientific algorithms relevant to earth science missions. The members of the Airborne Cloud Computing Environment task at the Jet Propulsion Laboratory have worked closely with the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to integrate cloud computing into their science data processing pipeline. This paper details the efforts involved in deploying a science data system for the CARVE mission, evaluating and integrating cloud computing solutions with the system and porting their science algorithms for execution in a cloud environment.
Enhancing Security by System-Level Virtualization in Cloud Computing Environments
NASA Astrophysics Data System (ADS)
Sun, Dawei; Chang, Guiran; Tan, Chunguang; Wang, Xingwei
Many trends are opening up the era of cloud computing, which will reshape the IT industry. Virtualization techniques have become an indispensable ingredient for almost all cloud computing system. By the virtual environments, cloud provider is able to run varieties of operating systems as needed by each cloud user. Virtualization can improve reliability, security, and availability of applications by using consolidation, isolation, and fault tolerance. In addition, it is possible to balance the workloads by using live migration techniques. In this paper, the definition of cloud computing is given; and then the service and deployment models are introduced. An analysis of security issues and challenges in implementation of cloud computing is identified. Moreover, a system-level virtualization case is established to enhance the security of cloud computing environments.
Military clouds: utilization of cloud computing systems at the battlefield
NASA Astrophysics Data System (ADS)
Süleyman, Sarıkürk; Volkan, Karaca; İbrahim, Kocaman; Ahmet, Şirzai
2012-05-01
Cloud computing is known as a novel information technology (IT) concept, which involves facilitated and rapid access to networks, servers, data saving media, applications and services via Internet with minimum hardware requirements. Use of information systems and technologies at the battlefield is not new. Information superiority is a force multiplier and is crucial to mission success. Recent advances in information systems and technologies provide new means to decision makers and users in order to gain information superiority. These developments in information technologies lead to a new term, which is known as network centric capability. Similar to network centric capable systems, cloud computing systems are operational today. In the near future extensive use of military clouds at the battlefield is predicted. Integrating cloud computing logic to network centric applications will increase the flexibility, cost-effectiveness, efficiency and accessibility of network-centric capabilities. In this paper, cloud computing and network centric capability concepts are defined. Some commercial cloud computing products and applications are mentioned. Network centric capable applications are covered. Cloud computing supported battlefield applications are analyzed. The effects of cloud computing systems on network centric capability and on the information domain in future warfare are discussed. Battlefield opportunities and novelties which might be introduced to network centric capability by cloud computing systems are researched. The role of military clouds in future warfare is proposed in this paper. It was concluded that military clouds will be indispensible components of the future battlefield. Military clouds have the potential of improving network centric capabilities, increasing situational awareness at the battlefield and facilitating the settlement of information superiority.
NASA Astrophysics Data System (ADS)
Aneri, Parikh; Sumathy, S.
2017-11-01
Cloud computing provides services over the internet and provides application resources and data to the users based on their demand. Base of the Cloud Computing is consumer provider model. Cloud provider provides resources which consumer can access using cloud computing model in order to build their application based on their demand. Cloud data center is a bulk of resources on shared pool architecture for cloud user to access. Virtualization is the heart of the Cloud computing model, it provides virtual machine as per application specific configuration and those applications are free to choose their own configuration. On one hand, there is huge number of resources and on other hand it has to serve huge number of requests effectively. Therefore, resource allocation policy and scheduling policy play very important role in allocation and managing resources in this cloud computing model. This paper proposes the load balancing policy using Hungarian algorithm. Hungarian Algorithm provides dynamic load balancing policy with a monitor component. Monitor component helps to increase cloud resource utilization by managing the Hungarian algorithm by monitoring its state and altering its state based on artificial intelligent. CloudSim used in this proposal is an extensible toolkit and it simulates cloud computing environment.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-01-01
Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.
Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James
2012-06-01
Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
Cloud Based Educational Systems and Its Challenges and Opportunities and Issues
ERIC Educational Resources Information Center
Paul, Prantosh Kr.; Lata Dangwal, Kiran
2014-01-01
Cloud Computing (CC) is actually is a set of hardware, software, networks, storage, services an interface combines to deliver aspects of computing as a service. Cloud Computing (CC) actually uses the central remote servers to maintain data and applications. Practically Cloud Computing (CC) is extension of Grid computing with independency and…
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
A scoping review of cloud computing in healthcare.
Griebel, Lena; Prokosch, Hans-Ulrich; Köpcke, Felix; Toddenroth, Dennis; Christoph, Jan; Leb, Ines; Engel, Igor; Sedlmayr, Martin
2015-03-19
Cloud computing is a recent and fast growing area of development in healthcare. Ubiquitous, on-demand access to virtually endless resources in combination with a pay-per-use model allow for new ways of developing, delivering and using services. Cloud computing is often used in an "OMICS-context", e.g. for computing in genomics, proteomics and molecular medicine, while other field of application still seem to be underrepresented. Thus, the objective of this scoping review was to identify the current state and hot topics in research on cloud computing in healthcare beyond this traditional domain. MEDLINE was searched in July 2013 and in December 2014 for publications containing the terms "cloud computing" and "cloud-based". Each journal and conference article was categorized and summarized independently by two researchers who consolidated their findings. 102 publications have been analyzed and 6 main topics have been found: telemedicine/teleconsultation, medical imaging, public health and patient self-management, hospital management and information systems, therapy, and secondary use of data. Commonly used features are broad network access for sharing and accessing data and rapid elasticity to dynamically adapt to computing demands. Eight articles favor the pay-for-use characteristics of cloud-based services avoiding upfront investments. Nevertheless, while 22 articles present very general potentials of cloud computing in the medical domain and 66 articles describe conceptual or prototypic projects, only 14 articles report from successful implementations. Further, in many articles cloud computing is seen as an analogy to internet-/web-based data sharing and the characteristics of the particular cloud computing approach are unfortunately not really illustrated. Even though cloud computing in healthcare is of growing interest only few successful implementations yet exist and many papers just use the term "cloud" synonymously for "using virtual machines" or "web-based" with no described benefit of the cloud paradigm. The biggest threat to the adoption in the healthcare domain is caused by involving external cloud partners: many issues of data safety and security are still to be solved. Until then, cloud computing is favored more for singular, individual features such as elasticity, pay-per-use and broad network access, rather than as cloud paradigm on its own.
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
2012-04-21
the photoelectric effect. The typical shortest wavelengths needed for ion traps range from 194 nm for Hg+ to 493 nm for Ba +, corresponding to 6.4-2.5...REPORT Comprehensive Materials and Morphologies Study of Ion Traps (COMMIT) for scalable Quantum Computation - Final Report 14. ABSTRACT 16. SECURITY...CLASSIFICATION OF: Trapped ion systems, are extremely promising for large-scale quantum computation, but face a vexing problem, with motional quantum
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2017-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
Modeling the Cloud to Enhance Capabilities for Crises and Catastrophe Management
2016-11-16
order for cloud computing infrastructures to be successfully deployed in real world scenarios as tools for crisis and catastrophe management, where...Statement of the Problem Studied As cloud computing becomes the dominant computational infrastructure[1] and cloud technologies make a transition to hosting...1. Formulate rigorous mathematical models representing technological capabilities and resources in cloud computing for performance modeling and
Automating NEURON Simulation Deployment in Cloud Resources.
Stockton, David B; Santamaria, Fidel
2017-01-01
Simulations in neuroscience are performed on local servers or High Performance Computing (HPC) facilities. Recently, cloud computing has emerged as a potential computational platform for neuroscience simulation. In this paper we compare and contrast HPC and cloud resources for scientific computation, then report how we deployed NEURON, a widely used simulator of neuronal activity, in three clouds: Chameleon Cloud, a hybrid private academic cloud for cloud technology research based on the OpenStack software; Rackspace, a public commercial cloud, also based on OpenStack; and Amazon Elastic Cloud Computing, based on Amazon's proprietary software. We describe the manual procedures and how to automate cloud operations. We describe extending our simulation automation software called NeuroManager (Stockton and Santamaria, Frontiers in Neuroinformatics, 2015), so that the user is capable of recruiting private cloud, public cloud, HPC, and local servers simultaneously with a simple common interface. We conclude by performing several studies in which we examine speedup, efficiency, total session time, and cost for sets of simulations of a published NEURON model.
Automating NEURON Simulation Deployment in Cloud Resources
Santamaria, Fidel
2016-01-01
Simulations in neuroscience are performed on local servers or High Performance Computing (HPC) facilities. Recently, cloud computing has emerged as a potential computational platform for neuroscience simulation. In this paper we compare and contrast HPC and cloud resources for scientific computation, then report how we deployed NEURON, a widely used simulator of neuronal activity, in three clouds: Chameleon Cloud, a hybrid private academic cloud for cloud technology research based on the Open-Stack software; Rackspace, a public commercial cloud, also based on OpenStack; and Amazon Elastic Cloud Computing, based on Amazon’s proprietary software. We describe the manual procedures and how to automate cloud operations. We describe extending our simulation automation software called NeuroManager (Stockton and Santamaria, Frontiers in Neuroinformatics, 2015), so that the user is capable of recruiting private cloud, public cloud, HPC, and local servers simultaneously with a simple common interface. We conclude by performing several studies in which we examine speedup, efficiency, total session time, and cost for sets of simulations of a published NEURON model. PMID:27655341
Mobile Cloud Learning for Higher Education: A Case Study of Moodle in the Cloud
ERIC Educational Resources Information Center
Wang, Minjuan; Chen, Yong; Khan, Muhammad Jahanzaib
2014-01-01
Mobile cloud learning, a combination of mobile learning and cloud computing, is a relatively new concept that holds considerable promise for future development and delivery in the education sectors. Cloud computing helps mobile learning overcome obstacles related to mobile computing. The main focus of this paper is to explore how cloud computing…
Virtualized Multi-Mission Operations Center (vMMOC) and its Cloud Services
NASA Technical Reports Server (NTRS)
Ido, Haisam Kassim
2017-01-01
His presentation will cover, the current and future, technical and organizational opportunities and challenges with virtualizing a multi-mission operations center. The full deployment of Goddard Space Flight Centers (GSFC) Virtualized Multi-Mission Operations Center (vMMOC) is nearly complete. The Space Science Mission Operations (SSMO) organizations spacecraft ACE, Fermi, LRO, MMS(4), OSIRIS-REx, SDO, SOHO, Swift, and Wind are in the process of being fully migrated to the vMMOC. The benefits of the vMMOC will be the normalization and the standardization of IT services, mission operations, maintenance, and development as well as ancillary services and policies such as collaboration tools, change management systems, and IT Security. The vMMOC will also provide operational efficiencies regarding hardware, IT domain expertise, training, maintenance and support.The presentation will also cover SSMO's secure Situational Awareness Dashboard in an integrated, fleet centric, cloud based web services fashion. Additionally the SSMO Telemetry as a Service (TaaS) will be covered, which allows authorized users and processes to access telemetry for the entire SSMO fleet, and for the entirety of each spacecrafts history. Both services leverage cloud services in a secure FISMA High and FedRamp environment, and also leverage distributed object stores in order to house and provide the telemetry. The services are also in the process of leveraging the cloud computing services elasticity and horizontal scalability. In the design phase is the Navigation as a Service (NaaS) which will provide a standardized, efficient, and normalized service for the fleet's space flight dynamics operations. Additional future services that may be considered are Ground Segment as a Service (GSaaS), Telemetry and Command as a Service (TCaaS), Flight Software Simulation as a Service, etc.
76 FR 13984 - Cloud Computing Forum & Workshop III
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-15
... DEPARTMENT OF COMMERCE National Institute of Standards and Technology Cloud Computing Forum... public workshop. SUMMARY: NIST announces the Cloud Computing Forum & Workshop III to be held on April 7... provide information on the NIST strategic and tactical Cloud Computing program, including progress on the...
NASA Astrophysics Data System (ADS)
Marinos, Alexandros; Briscoe, Gerard
Cloud Computing is rising fast, with its data centres growing at an unprecedented rate. However, this has come with concerns over privacy, efficiency at the expense of resilience, and environmental sustainability, because of the dependence on Cloud vendors such as Google, Amazon and Microsoft. Our response is an alternative model for the Cloud conceptualisation, providing a paradigm for Clouds in the community, utilising networked personal computers for liberation from the centralised vendor model. Community Cloud Computing (C3) offers an alternative architecture, created by combing the Cloud with paradigms from Grid Computing, principles from Digital Ecosystems, and sustainability from Green Computing, while remaining true to the original vision of the Internet. It is more technically challenging than Cloud Computing, having to deal with distributed computing issues, including heterogeneous nodes, varying quality of service, and additional security constraints. However, these are not insurmountable challenges, and with the need to retain control over our digital lives and the potential environmental consequences, it is a challenge we must pursue.
The Hico Image Processing System: A Web-Accessible Hyperspectral Remote Sensing Toolbox
NASA Astrophysics Data System (ADS)
Harris, A. T., III; Goodman, J.; Justice, B.
2014-12-01
As the quantity of Earth-observation data increases, the use-case for hosting analytical tools in geospatial data centers becomes increasingly attractive. To address this need, HySpeed Computing and Exelis VIS have developed the HICO Image Processing System, a prototype cloud computing system that provides online, on-demand, scalable remote sensing image processing capabilities. The system provides a mechanism for delivering sophisticated image processing analytics and data visualization tools into the hands of a global user community, who will only need a browser and internet connection to perform analysis. Functionality of the HICO Image Processing System is demonstrated using imagery from the Hyperspectral Imager for the Coastal Ocean (HICO), an imaging spectrometer located on the International Space Station (ISS) that is optimized for acquisition of aquatic targets. Example applications include a collection of coastal remote sensing algorithms that are directed at deriving critical information on water and habitat characteristics of our vulnerable coastal environment. The project leverages the ENVI Services Engine as the framework for all image processing tasks, and can readily accommodate the rapid integration of new algorithms, datasets and processing tools.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.
Simonyan, Vahan; Mazumder, Raja
2014-09-30
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis
Simonyan, Vahan; Mazumder, Raja
2014-01-01
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953
Cloud computing task scheduling strategy based on improved differential evolution algorithm
NASA Astrophysics Data System (ADS)
Ge, Junwei; He, Qian; Fang, Yiqiu
2017-04-01
In order to optimize the cloud computing task scheduling scheme, an improved differential evolution algorithm for cloud computing task scheduling is proposed. Firstly, the cloud computing task scheduling model, according to the model of the fitness function, and then used improved optimization calculation of the fitness function of the evolutionary algorithm, according to the evolution of generation of dynamic selection strategy through dynamic mutation strategy to ensure the global and local search ability. The performance test experiment was carried out in the CloudSim simulation platform, the experimental results show that the improved differential evolution algorithm can reduce the cloud computing task execution time and user cost saving, good implementation of the optimal scheduling of cloud computing tasks.
Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.
Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P
2010-12-22
Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.
75 FR 64258 - Cloud Computing Forum & Workshop II
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-19
... DEPARTMENT OF COMMERCE National Institute of Standards and Technology Cloud Computing Forum... workshop. SUMMARY: NIST announces the Cloud Computing Forum & Workshop II to be held on November 4 and 5, 2010. This workshop will provide information on a Cloud Computing Roadmap Strategy as well as provide...
76 FR 62373 - Notice of Public Meeting-Cloud Computing Forum & Workshop IV
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-07
...--Cloud Computing Forum & Workshop IV AGENCY: National Institute of Standards and Technology (NIST), Commerce. ACTION: Notice. SUMMARY: NIST announces the Cloud Computing Forum & Workshop IV to be held on... to help develop open standards in interoperability, portability and security in cloud computing. This...
Project #OA-FY14-0126, January 15, 2014. The EPA OIG is starting fieldwork on the Council of the Inspectors General on Integrity and Efficiency (CIGIE) Cloud Computing Initiative – Status of Cloud-Computing Environments Within the Federal Government.
Intelligent cloud computing security using genetic algorithm as a computational tools
NASA Astrophysics Data System (ADS)
Razuky AL-Shaikhly, Mazin H.
2018-05-01
An essential change had occurred in the field of Information Technology which represented with cloud computing, cloud giving virtual assets by means of web yet awesome difficulties in the field of information security and security assurance. Currently main problem with cloud computing is how to improve privacy and security for cloud “cloud is critical security”. This paper attempts to solve cloud security by using intelligent system with genetic algorithm as wall to provide cloud data secure, all services provided by cloud must detect who receive and register it to create list of users (trusted or un-trusted) depend on behavior. The execution of present proposal has shown great outcome.
WE-B-BRD-01: Innovation in Radiation Therapy Planning II: Cloud Computing in RT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moore, K; Kagadis, G; Xing, L
As defined by the National Institute of Standards and Technology, cloud computing is “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” Despite the omnipresent role of computers in radiotherapy, cloud computing has yet to achieve widespread adoption in clinical or research applications, though the transition to such “on-demand” access is underway. As this transition proceeds, new opportunities for aggregate studies and efficient use of computational resources are set againstmore » new challenges in patient privacy protection, data integrity, and management of clinical informatics systems. In this Session, current and future applications of cloud computing and distributed computational resources will be discussed in the context of medical imaging, radiotherapy research, and clinical radiation oncology applications. Learning Objectives: Understand basic concepts of cloud computing. Understand how cloud computing could be used for medical imaging applications. Understand how cloud computing could be employed for radiotherapy research.4. Understand how clinical radiotherapy software applications would function in the cloud.« less
Cloud Computing with iPlant Atmosphere.
McKay, Sheldon J; Skidmore, Edwin J; LaRose, Christopher J; Mercer, Andre W; Noutsos, Christos
2013-10-15
Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. Copyright © 2013 John Wiley & Sons, Inc.
Energy Consumption Management of Virtual Cloud Computing Platform
NASA Astrophysics Data System (ADS)
Li, Lin
2017-11-01
For energy consumption management research on virtual cloud computing platforms, energy consumption management of virtual computers and cloud computing platform should be understood deeper. Only in this way can problems faced by energy consumption management be solved. In solving problems, the key to solutions points to data centers with high energy consumption, so people are in great need to use a new scientific technique. Virtualization technology and cloud computing have become powerful tools in people’s real life, work and production because they have strong strength and many advantages. Virtualization technology and cloud computing now is in a rapid developing trend. It has very high resource utilization rate. In this way, the presence of virtualization and cloud computing technologies is very necessary in the constantly developing information age. This paper has summarized, explained and further analyzed energy consumption management questions of the virtual cloud computing platform. It eventually gives people a clearer understanding of energy consumption management of virtual cloud computing platform and brings more help to various aspects of people’s live, work and son on.
Cloud-free resolution element statistics program
NASA Technical Reports Server (NTRS)
Liley, B.; Martin, C. D.
1971-01-01
Computer program computes number of cloud-free elements in field-of-view and percentage of total field-of-view occupied by clouds. Human error is eliminated by using visual estimation to compute cloud statistics from aerial photographs.
The Scalable Checkpoint/Restart Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A.
The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
Research on Influence of Cloud Environment on Traditional Network Security
NASA Astrophysics Data System (ADS)
Ming, Xiaobo; Guo, Jinhua
2018-02-01
Cloud computing is a symbol of the progress of modern information network, cloud computing provides a lot of convenience to the Internet users, but it also brings a lot of risk to the Internet users. Second, one of the main reasons for Internet users to choose cloud computing is that the network security performance is great, it also is the cornerstone of cloud computing applications. This paper briefly explores the impact on cloud environment on traditional cybersecurity, and puts forward corresponding solutions.
Globally scalable generation of high-resolution land cover from multispectral imagery
NASA Astrophysics Data System (ADS)
Stutts, S. Craig; Raskob, Benjamin L.; Wenger, Eric J.
2017-05-01
We present an automated method of generating high resolution ( 2 meter) land cover using a pattern recognition neural network trained on spatial and spectral features obtained from over 9000 WorldView multispectral images (MSI) in six distinct world regions. At this resolution, the network can classify small-scale objects such as individual buildings, roads, and irrigation ponds. This paper focuses on three key areas. First, we describe our land cover generation process, which involves the co-registration and aggregation of multiple spatially overlapping MSI, post-aggregation processing, and the registration of land cover to OpenStreetMap (OSM) road vectors using feature correspondence. Second, we discuss the generation of land cover derivative products and their impact in the areas of region reduction and object detection. Finally, we discuss the process of globally scaling land cover generation using cloud computing via Amazon Web Services (AWS).
Machine Learning Toolkit for Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-03-31
Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less
77 FR 26509 - Notice of Public Meeting-Cloud Computing Forum & Workshop V
Federal Register 2010, 2011, 2012, 2013, 2014
2012-05-04
...--Cloud Computing Forum & Workshop V AGENCY: National Institute of Standards & Technology (NIST), Commerce. ACTION: Notice. SUMMARY: NIST announces the Cloud Computing Forum & Workshop V to be held on Tuesday... workshop. This workshop will provide information on the U.S. Government (USG) Cloud Computing Technology...
National electronic medical records integration on cloud computing system.
Mirza, Hebah; El-Masri, Samir
2013-01-01
Few Healthcare providers have an advanced level of Electronic Medical Record (EMR) adoption. Others have a low level and most have no EMR at all. Cloud computing technology is a new emerging technology that has been used in other industry and showed a great success. Despite the great features of Cloud computing, they haven't been utilized fairly yet in healthcare industry. This study presents an innovative Healthcare Cloud Computing system for Integrating Electronic Health Record (EHR). The proposed Cloud system applies the Cloud Computing technology on EHR system, to present a comprehensive EHR integrated environment.
Research on OpenStack of open source cloud computing in colleges and universities’ computer room
NASA Astrophysics Data System (ADS)
Wang, Lei; Zhang, Dandan
2017-06-01
In recent years, the cloud computing technology has a rapid development, especially open source cloud computing. Open source cloud computing has attracted a large number of user groups by the advantages of open source and low cost, have now become a large-scale promotion and application. In this paper, firstly we briefly introduced the main functions and architecture of the open source cloud computing OpenStack tools, and then discussed deeply the core problems of computer labs in colleges and universities. Combining with this research, it is not that the specific application and deployment of university computer rooms with OpenStack tool. The experimental results show that the application of OpenStack tool can efficiently and conveniently deploy cloud of university computer room, and its performance is stable and the functional value is good.
WASS: an open-source stereo processing pipeline for sea waves 3D reconstruction
NASA Astrophysics Data System (ADS)
Bergamasco, Filippo; Benetazzo, Alvise; Torsello, Andrea; Barbariol, Francesco; Carniel, Sandro; Sclavo, Mauro
2017-04-01
Stereo 3D reconstruction of ocean waves is gaining more and more popularity in the oceanographic community. In fact, recent advances of both computer vision algorithms and CPU processing power can now allow the study of the spatio-temporal wave fields with unprecedented accuracy, especially at small scales. Even if simple in theory, multiple details are difficult to be mastered for a practitioner so that the implementation of a 3D reconstruction pipeline is in general considered a complex task. For instance, camera calibration, reliable stereo feature matching and mean sea-plane estimation are all factors for which a well designed implementation can make the difference to obtain valuable results. For this reason, we believe that the open availability of a well-tested software package that automates the steps from stereo images to a 3D point cloud would be a valuable addition for future researches in this area. We present WASS, a completely Open-Source stereo processing pipeline for sea waves 3D reconstruction, available at http://www.dais.unive.it/wass/. Our tool completely automates the recovery of dense point clouds from stereo images by providing three main functionalities. First, WASS can automatically recover the extrinsic parameters of the stereo rig (up to scale) so that no delicate calibration has to be performed on the field. Second, WASS implements a fast 3D dense stereo reconstruction procedure so that an accurate 3D point cloud can be computed from each stereo pair. We rely on the well-consolidated OpenCV library both for the image stereo rectification and disparity map recovery. Lastly, a set of 2D and 3D filtering techniques both on the disparity map and the produced point cloud are implemented to remove the vast majority of erroneous points that can naturally arise while analyzing the optically complex nature of the water surface (examples are sun-glares, large white-capped areas, fog and water areosol, etc). Developed to be as fast as possible, WASS can process roughly four 5 MPixel stereo frames per minute (on a consumer i7 CPU) to produce a sequence of outlier-free point clouds with more than 3 million points each. Finally, it comes with an easy to use user interface and designed to be scalable on multiple parallel CPUs.
Charlebois, Kathleen; Palmour, Nicole; Knoppers, Bartha Maria
2016-01-01
This study aims to understand the influence of the ethical and legal issues on cloud computing adoption in the field of genomics research. To do so, we adapted Diffusion of Innovation (DoI) theory to enable understanding of how key stakeholders manage the various ethical and legal issues they encounter when adopting cloud computing. Twenty semi-structured interviews were conducted with genomics researchers, patient advocates and cloud service providers. Thematic analysis generated five major themes: 1) Getting comfortable with cloud computing; 2) Weighing the advantages and the risks of cloud computing; 3) Reconciling cloud computing with data privacy; 4) Maintaining trust and 5) Anticipating the cloud by creating the conditions for cloud adoption. Our analysis highlights the tendency among genomics researchers to gradually adopt cloud technology. Efforts made by cloud service providers to promote cloud computing adoption are confronted by researchers’ perpetual cost and security concerns, along with a lack of familiarity with the technology. Further underlying those fears are researchers’ legal responsibility with respect to the data that is stored on the cloud. Alternative consent mechanisms aimed at increasing patients’ control over the use of their data also provide a means to circumvent various institutional and jurisdictional hurdles that restrict access by creating siloed databases. However, the risk of creating new, cloud-based silos may run counter to the goal in genomics research to increase data sharing on a global scale. PMID:27755563
Charlebois, Kathleen; Palmour, Nicole; Knoppers, Bartha Maria
2016-01-01
This study aims to understand the influence of the ethical and legal issues on cloud computing adoption in the field of genomics research. To do so, we adapted Diffusion of Innovation (DoI) theory to enable understanding of how key stakeholders manage the various ethical and legal issues they encounter when adopting cloud computing. Twenty semi-structured interviews were conducted with genomics researchers, patient advocates and cloud service providers. Thematic analysis generated five major themes: 1) Getting comfortable with cloud computing; 2) Weighing the advantages and the risks of cloud computing; 3) Reconciling cloud computing with data privacy; 4) Maintaining trust and 5) Anticipating the cloud by creating the conditions for cloud adoption. Our analysis highlights the tendency among genomics researchers to gradually adopt cloud technology. Efforts made by cloud service providers to promote cloud computing adoption are confronted by researchers' perpetual cost and security concerns, along with a lack of familiarity with the technology. Further underlying those fears are researchers' legal responsibility with respect to the data that is stored on the cloud. Alternative consent mechanisms aimed at increasing patients' control over the use of their data also provide a means to circumvent various institutional and jurisdictional hurdles that restrict access by creating siloed databases. However, the risk of creating new, cloud-based silos may run counter to the goal in genomics research to increase data sharing on a global scale.
Cloud Computing for Complex Performance Codes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Appel, Gordon John; Hadgu, Teklu; Klein, Brandon Thorin
This report describes the use of cloud computing services for running complex public domain performance assessment problems. The work consisted of two phases: Phase 1 was to demonstrate complex codes, on several differently configured servers, could run and compute trivial small scale problems in a commercial cloud infrastructure. Phase 2 focused on proving non-trivial large scale problems could be computed in the commercial cloud environment. The cloud computing effort was successfully applied using codes of interest to the geohydrology and nuclear waste disposal modeling community.
Cloud Fingerprinting: Using Clock Skews To Determine Co Location Of Virtual Machines
2016-09-01
DISTRIBUTION CODE 13. ABSTRACT (maximum 200 words) Cloud computing has quickly revolutionized computing practices of organizations, to include the Department of... Cloud computing has quickly revolutionized computing practices of organizations, to in- clude the Department of Defense. However, security concerns...vi Table of Contents 1 Introduction 1 1.1 Proliferation of Cloud Computing . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement
Cloudbus Toolkit for Market-Oriented Cloud Computing
NASA Astrophysics Data System (ADS)
Buyya, Rajkumar; Pandey, Suraj; Vecchiola, Christian
This keynote paper: (1) presents the 21st century vision of computing and identifies various IT paradigms promising to deliver computing as a utility; (2) defines the architecture for creating market-oriented Clouds and computing atmosphere by leveraging technologies such as virtual machines; (3) provides thoughts on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain SLA-oriented resource allocation; (4) presents the work carried out as part of our new Cloud Computing initiative, called Cloudbus: (i) Aneka, a Platform as a Service software system containing SDK (Software Development Kit) for construction of Cloud applications and deployment on private or public Clouds, in addition to supporting market-oriented resource management; (ii) internetworking of Clouds for dynamic creation of federated computing environments for scaling of elastic applications; (iii) creation of 3rd party Cloud brokering services for building content delivery networks and e-Science applications and their deployment on capabilities of IaaS providers such as Amazon along with Grid mashups; (iv) CloudSim supporting modelling and simulation of Clouds for performance studies; (v) Energy Efficient Resource Allocation Mechanisms and Techniques for creation and management of Green Clouds; and (vi) pathways for future research.
Scalable parallel distance field construction for large-scale applications
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan -Liu; ...
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. Anew distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking overtime, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate itsmore » efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. In conclusion, our work greatly extends the usability of distance fields for demanding applications.« less
Scalable Parallel Distance Field Construction for Large-Scale Applications.
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan-Liu; Kolla, Hemanth; Chen, Jacqueline H
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.
Proposal for a Security Management in Cloud Computing for Health Care
Dzombeta, Srdan; Brandis, Knud
2014-01-01
Cloud computing is actually one of the most popular themes of information systems research. Considering the nature of the processed information especially health care organizations need to assess and treat specific risks according to cloud computing in their information security management system. Therefore, in this paper we propose a framework that includes the most important security processes regarding cloud computing in the health care sector. Starting with a framework of general information security management processes derived from standards of the ISO 27000 family the most important information security processes for health care organizations using cloud computing will be identified considering the main risks regarding cloud computing and the type of information processed. The identified processes will help a health care organization using cloud computing to focus on the most important ISMS processes and establish and operate them at an appropriate level of maturity considering limited resources. PMID:24701137
Proposal for a security management in cloud computing for health care.
Haufe, Knut; Dzombeta, Srdan; Brandis, Knud
2014-01-01
Cloud computing is actually one of the most popular themes of information systems research. Considering the nature of the processed information especially health care organizations need to assess and treat specific risks according to cloud computing in their information security management system. Therefore, in this paper we propose a framework that includes the most important security processes regarding cloud computing in the health care sector. Starting with a framework of general information security management processes derived from standards of the ISO 27000 family the most important information security processes for health care organizations using cloud computing will be identified considering the main risks regarding cloud computing and the type of information processed. The identified processes will help a health care organization using cloud computing to focus on the most important ISMS processes and establish and operate them at an appropriate level of maturity considering limited resources.
ERIC Educational Resources Information Center
Kaestner, Rich
2012-01-01
Most school business officials have heard the term "cloud computing" bandied about and may have some idea of what the term means. In fact, they likely already leverage a cloud-computing solution somewhere within their district. But what does cloud computing really mean? This brief article puts a bit of definition behind the term and helps one…
Cloud Computing in Higher Education Sector for Sustainable Development
ERIC Educational Resources Information Center
Duan, Yuchao
2016-01-01
Cloud computing is considered a new frontier in the field of computing, as this technology comprises three major entities namely: software, hardware and network. The collective nature of all these entities is known as the Cloud. This research aims to examine the impacts of various aspects namely: cloud computing, sustainability, performance…
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-01
...-1659-01] Request for Comments on NIST Special Publication 500-293, US Government Cloud Computing... Publication 500-293, US Government Cloud Computing Technology Roadmap, Release 1.0 (Draft). This document is... (USG) agencies to accelerate their adoption of cloud computing. The roadmap has been developed through...
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data
NASA Astrophysics Data System (ADS)
Li, Z.; Hodgson, M.; Li, W.
2016-12-01
Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
A Comprehensive Review of Existing Risk Assessment Models in Cloud Computing
NASA Astrophysics Data System (ADS)
Amini, Ahmad; Jamil, Norziana
2018-05-01
Cloud computing is a popular paradigm in information technology and computing as it offers numerous advantages in terms of economical saving and minimal management effort. Although elasticity and flexibility brings tremendous benefits, it still raises many information security issues due to its unique characteristic that allows ubiquitous computing. Therefore, the vulnerabilities and threats in cloud computing have to be identified and proper risk assessment mechanism has to be in place for better cloud computing management. Various quantitative and qualitative risk assessment models have been proposed but up to our knowledge, none of them is suitable for cloud computing environment. This paper, we compare and analyse the strengths and weaknesses of existing risk assessment models. We then propose a new risk assessment model that sufficiently address all the characteristics of cloud computing, which was not appeared in the existing models.
Impacts and Opportunities for Engineering in the Era of Cloud Computing Systems
2012-01-31
2012 UNCLASSIFIED 1 of 58 Impacts and Opportunities for Engineering in the Era of Cloud Computing Systems A Report to the U.S. Department...2.1.7 Engineering of Computational Behavior .............................................................18 2.2 How the Cloud Will Impact Systems...58 Executive Summary This report discusses the impact of cloud computing and the broader revolution in computing on systems, on the disciplines of
Cloud Computing Value Chains: Understanding Businesses and Value Creation in the Cloud
NASA Astrophysics Data System (ADS)
Mohammed, Ashraf Bany; Altmann, Jörn; Hwang, Junseok
Based on the promising developments in Cloud Computing technologies in recent years, commercial computing resource services (e.g. Amazon EC2) or software-as-a-service offerings (e.g. Salesforce. com) came into existence. However, the relatively weak business exploitation, participation, and adoption of other Cloud Computing services remain the main challenges. The vague value structures seem to be hindering business adoption and the creation of sustainable business models around its technology. Using an extensive analyze of existing Cloud business models, Cloud services, stakeholder relations, market configurations and value structures, this Chapter develops a reference model for value chains in the Cloud. Although this model is theoretically based on porter's value chain theory, the proposed Cloud value chain model is upgraded to fit the diversity of business service scenarios in the Cloud computing markets. Using this model, different service scenarios are explained. Our findings suggest new services, business opportunities, and policy practices for realizing more adoption and value creation paths in the Cloud.
Virtualization and cloud computing in dentistry.
Chow, Frank; Muftu, Ali; Shorter, Richard
2014-01-01
The use of virtualization and cloud computing has changed the way we use computers. Virtualization is a method of placing software called a hypervisor on the hardware of a computer or a host operating system. It allows a guest operating system to run on top of the physical computer with a virtual machine (i.e., virtual computer). Virtualization allows multiple virtual computers to run on top of one physical computer and to share its hardware resources, such as printers, scanners, and modems. This increases the efficient use of the computer by decreasing costs (e.g., hardware, electricity administration, and management) since only one physical computer is needed and running. This virtualization platform is the basis for cloud computing. It has expanded into areas of server and storage virtualization. One of the commonly used dental storage systems is cloud storage. Patient information is encrypted as required by the Health Insurance Portability and Accountability Act (HIPAA) and stored on off-site private cloud services for a monthly service fee. As computer costs continue to increase, so too will the need for more storage and processing power. Virtual and cloud computing will be a method for dentists to minimize costs and maximize computer efficiency in the near future. This article will provide some useful information on current uses of cloud computing.
Scalable digital hardware for a trapped ion quantum computer
NASA Astrophysics Data System (ADS)
Mount, Emily; Gaultney, Daniel; Vrijsen, Geert; Adams, Michael; Baek, So-Young; Hudek, Kai; Isabella, Louis; Crain, Stephen; van Rynbach, Andre; Maunz, Peter; Kim, Jungsang
2016-12-01
Many of the challenges of scaling quantum computer hardware lie at the interface between the qubits and the classical control signals used to manipulate them. Modular ion trap quantum computer architectures address scalability by constructing individual quantum processors interconnected via a network of quantum communication channels. Successful operation of such quantum hardware requires a fully programmable classical control system capable of frequency stabilizing the continuous wave lasers necessary for loading, cooling, initialization, and detection of the ion qubits, stabilizing the optical frequency combs used to drive logic gate operations on the ion qubits, providing a large number of analog voltage sources to drive the trap electrodes, and a scheme for maintaining phase coherence among all the controllers that manipulate the qubits. In this work, we describe scalable solutions to these hardware development challenges.
Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform.
Carranza, Cesar; Llamocca, Daniel; Pattichis, Marios
2016-01-01
The discrete periodic radon transform (DPRT) has extensively been used in applications that involve image reconstructions from projections. Beyond classic applications, the DPRT can also be used to compute fast convolutions that avoids the use of floating-point arithmetic associated with the use of the fast Fourier transform. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions and the need for a large number of memory accesses. This paper introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: a parallel array of fixed-point adder trees; circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees; an image block-based approach to DPRT computation that can fit the proposed architecture to available resources; and fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an N × N image (N prime), the proposed approach can compute up to N(2) additions per clock cycle. Compared with the previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a 251×251 image, for approximately 25% fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized just 2N + ⌈log(2) N⌉ + 1 and 2N + 3 ⌈log(2) N⌉ + B + 2 cycles, architectures that can compute the DPRT and its inverse in respectively, where B is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-b additions than for the systolic implementation and provides a tradeoff between speed and additional 1-b additions. All of the proposed DPRT architectures were implemented in VHSIC Hardware Description Language (VHDL) and validated using an Field-Programmable Gate Array (FPGA) implementation.
Global Software Development with Cloud Platforms
NASA Astrophysics Data System (ADS)
Yara, Pavan; Ramachandran, Ramaseshan; Balasubramanian, Gayathri; Muthuswamy, Karthik; Chandrasekar, Divya
Offshore and outsourced distributed software development models and processes are facing challenges, previously unknown, with respect to computing capacity, bandwidth, storage, security, complexity, reliability, and business uncertainty. Clouds promise to address these challenges by adopting recent advances in virtualization, parallel and distributed systems, utility computing, and software services. In this paper, we envision a cloud-based platform that addresses some of these core problems. We outline a generic cloud architecture, its design and our first implementation results for three cloud forms - a compute cloud, a storage cloud and a cloud-based software service- in the context of global distributed software development (GSD). Our ”compute cloud” provides computational services such as continuous code integration and a compile server farm, ”storage cloud” offers storage (block or file-based) services with an on-line virtual storage service, whereas the on-line virtual labs represent a useful cloud service. We note some of the use cases for clouds in GSD, the lessons learned with our prototypes and identify challenges that must be conquered before realizing the full business benefits. We believe that in the future, software practitioners will focus more on these cloud computing platforms and see clouds as a means to supporting a ecosystem of clients, developers and other key stakeholders.
Cloud Based Applications and Platforms (Presentation)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brodt-Giles, D.
2014-05-15
Presentation to the Cloud Computing East 2014 Conference, where we are highlighting our cloud computing strategy, describing the platforms on the cloud (including Smartgrid.gov), and defining our process for implementing cloud based applications.
A source-controlled data center network model.
Yu, Yang; Liang, Mangui; Wang, Zhe
2017-01-01
The construction of data center network by applying SDN technology has become a hot research topic. The SDN architecture has innovatively separated the control plane from the data plane which makes the network more software-oriented and agile. Moreover, it provides virtual multi-tenancy, effective scheduling resources and centralized control strategies to meet the demand for cloud computing data center. However, the explosion of network information is facing severe challenges for SDN controller. The flow storage and lookup mechanisms based on TCAM device have led to the restriction of scalability, high cost and energy consumption. In view of this, a source-controlled data center network (SCDCN) model is proposed herein. The SCDCN model applies a new type of source routing address named the vector address (VA) as the packet-switching label. The VA completely defines the communication path and the data forwarding process can be finished solely relying on VA. There are four advantages in the SCDCN architecture. 1) The model adopts hierarchical multi-controllers and abstracts large-scale data center network into some small network domains that has solved the restriction for the processing ability of single controller and reduced the computational complexity. 2) Vector switches (VS) developed in the core network no longer apply TCAM for table storage and lookup that has significantly cut down the cost and complexity for switches. Meanwhile, the problem of scalability can be solved effectively. 3) The SCDCN model simplifies the establishment process for new flows and there is no need to download flow tables to VS. The amount of control signaling consumed when establishing new flows can be significantly decreased. 4) We design the VS on the NetFPGA platform. The statistical results show that the hardware resource consumption in a VS is about 27% of that in an OFS.
The Application of Computer-Aided Discovery to Spacecraft Site Selection
NASA Astrophysics Data System (ADS)
Pankratius, V.; Blair, D. M.; Gowanlock, M.; Herring, T.
2015-12-01
The selection of landing and exploration sites for interplanetary robotic or human missions is a complex task. Historically it has been labor-intensive, with large groups of scientists manually interpreting a planetary surface across a variety of datasets to identify potential sites based on science and engineering constraints. This search process can be lengthy, and excellent sites may get overlooked when the aggregate value of site selection criteria is non-obvious or non-intuitive. As planetary data collection leads to Big Data repositories and a growing set of selection criteria, scientists will face a combinatorial search space explosion that requires scalable, automated assistance. We are currently exploring more general computer-aided discovery techniques in the context of planetary surface deformation phenomena that can lend themselves to application in the landing site search problem. In particular, we are developing a general software framework that addresses key difficulties: characterizing a given phenomenon or site based on data gathered from multiple instruments (e.g. radar interferometry, gravity, thermal maps, or GPS time series), and examining a variety of possible workflows whose individual configurations are optimized to isolate different features. The framework allows algorithmic pipelines and hypothesized models to be perturbed or permuted automatically within well-defined bounds established by the scientist. For example, even simple choices for outlier and noise handling or data interpolation can drastically affect the detectability of certain features. These techniques aim to automate repetitive tasks that scientists routinely perform in exploratory analysis, and make them more efficient and scalable by executing them in parallel in the cloud. We also explore ways in which machine learning can be combined with human feedback to prune the search space and converge to desirable results. Acknowledgements: We acknowledge support from NASA AIST NNX15AG84G (PI V. Pankratius)
A source-controlled data center network model
Yu, Yang; Liang, Mangui; Wang, Zhe
2017-01-01
The construction of data center network by applying SDN technology has become a hot research topic. The SDN architecture has innovatively separated the control plane from the data plane which makes the network more software-oriented and agile. Moreover, it provides virtual multi-tenancy, effective scheduling resources and centralized control strategies to meet the demand for cloud computing data center. However, the explosion of network information is facing severe challenges for SDN controller. The flow storage and lookup mechanisms based on TCAM device have led to the restriction of scalability, high cost and energy consumption. In view of this, a source-controlled data center network (SCDCN) model is proposed herein. The SCDCN model applies a new type of source routing address named the vector address (VA) as the packet-switching label. The VA completely defines the communication path and the data forwarding process can be finished solely relying on VA. There are four advantages in the SCDCN architecture. 1) The model adopts hierarchical multi-controllers and abstracts large-scale data center network into some small network domains that has solved the restriction for the processing ability of single controller and reduced the computational complexity. 2) Vector switches (VS) developed in the core network no longer apply TCAM for table storage and lookup that has significantly cut down the cost and complexity for switches. Meanwhile, the problem of scalability can be solved effectively. 3) The SCDCN model simplifies the establishment process for new flows and there is no need to download flow tables to VS. The amount of control signaling consumed when establishing new flows can be significantly decreased. 4) We design the VS on the NetFPGA platform. The statistical results show that the hardware resource consumption in a VS is about 27% of that in an OFS. PMID:28328925
NASA Astrophysics Data System (ADS)
Magnoni, L.; Suthakar, U.; Cordeiro, C.; Georgiou, M.; Andreeva, J.; Khan, A.; Smith, D. R.
2015-12-01
Monitoring the WLCG infrastructure requires the gathering and analysis of a high volume of heterogeneous data (e.g. data transfers, job monitoring, site tests) coming from different services and experiment-specific frameworks to provide a uniform and flexible interface for scientists and sites. The current architecture, where relational database systems are used to store, to process and to serve monitoring data, has limitations in coping with the foreseen increase in the volume (e.g. higher LHC luminosity) and the variety (e.g. new data-transfer protocols and new resource-types, as cloud-computing) of WLCG monitoring events. This paper presents a new scalable data store and analytics platform designed by the Support for Distributed Computing (SDC) group, at the CERN IT department, which uses a variety of technologies each one targeting specific aspects of big-scale distributed data-processing (commonly referred as lambda-architecture approach). Results of data processing on Hadoop for WLCG data activities monitoring are presented, showing how the new architecture can easily analyze hundreds of millions of transfer logs in a few minutes. Moreover, a comparison of data partitioning, compression and file format (e.g. CSV, Avro) is presented, with particular attention given to how the file structure impacts the overall MapReduce performance. In conclusion, the evolution of the current implementation, which focuses on data storage and batch processing, towards a complete lambda-architecture is discussed, with consideration of candidate technology for the serving layer (e.g. Elasticsearch) and a description of a proof of concept implementation, based on Apache Spark and Esper, for the real-time part which compensates for batch-processing latency and automates problem detection and failures.
Scalable Optical-Fiber Communication Networks
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Peterson, John C.
1993-01-01
Scalable arbitrary fiber extension network (SAFEnet) is conceptual fiber-optic communication network passing digital signals among variety of computers and input/output devices at rates from 200 Mb/s to more than 100 Gb/s. Intended for use with very-high-speed computers and other data-processing and communication systems in which message-passing delays must be kept short. Inherent flexibility makes it possible to match performance of network to computers by optimizing configuration of interconnections. In addition, interconnections made redundant to provide tolerance to faults.
SCALING AN URBAN EMERGENCY EVACUATION FRAMEWORK: CHALLENGES AND PRACTICES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karthik, Rajasekar; Lu, Wei
2014-01-01
Critical infrastructure disruption, caused by severe weather events, natural disasters, terrorist attacks, etc., has significant impacts on urban transportation systems. We built a computational framework to simulate urban transportation systems under critical infrastructure disruption in order to aid real-time emergency evacuation. This framework will use large scale datasets to provide a scalable tool for emergency planning and management. Our framework, World-Wide Emergency Evacuation (WWEE), integrates population distribution and urban infrastructure networks to model travel demand in emergency situations at global level. Also, a computational model of agent-based traffic simulation is used to provide an optimal evacuation plan for traffic operationmore » purpose [1]. In addition, our framework provides a web-based high resolution visualization tool for emergency evacuation modelers and practitioners. We have successfully tested our framework with scenarios in both United States (Alexandria, VA) and Europe (Berlin, Germany) [2]. However, there are still some major drawbacks for scaling this framework to handle big data workloads in real time. On our back-end, lack of proper infrastructure limits us in ability to process large amounts of data, run the simulation efficiently and quickly, and provide fast retrieval and serving of data. On the front-end, the visualization performance of microscopic evacuation results is still not efficient enough due to high volume data communication between server and client. We are addressing these drawbacks by using cloud computing and next-generation web technologies, namely Node.js, NoSQL, WebGL, Open Layers 3 and HTML5 technologies. We will describe briefly about each one and how we are using and leveraging these technologies to provide an efficient tool for emergency management organizations. Our early experimentation demonstrates that using above technologies is a promising approach to build a scalable and high performance urban emergency evacuation framework that can improve traffic mobility and safety under critical infrastructure disruption in today s socially connected world.« less
NASA Astrophysics Data System (ADS)
Lescinsky, D. T.; Wyborn, L. A.; Evans, B. J. K.; Allen, C.; Fraser, R.; Rankine, T.
2014-12-01
We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia's National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI's 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud. The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes. We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-22
... explored in this series is cloud computing. The workshop on this topic will be held in Gaithersburg, MD on October 21, 2011. Assertion: ``Current implementations of cloud computing indicate a new approach to security'' Implementations of cloud computing have provided new ways of thinking about how to secure data...
77 FR 74829 - Notice of Public Meeting-Cloud Computing and Big Data Forum and Workshop
Federal Register 2010, 2011, 2012, 2013, 2014
2012-12-18
...--Cloud Computing and Big Data Forum and Workshop AGENCY: National Institute of Standards and Technology... Standards and Technology (NIST) announces a Cloud Computing and Big Data Forum and Workshop to be held on... followed by a one-day hands-on workshop. The NIST Cloud Computing and Big Data Forum and Workshop will...
ERIC Educational Resources Information Center
Tweel, Abdeneaser
2012-01-01
High uncertainties related to cloud computing adoption may hinder IT managers from making solid decisions about adopting cloud computing. The problem addressed in this study was the lack of understanding of the relationship between factors related to the adoption of cloud computing and IT managers' interest in adopting this technology. In…
When cloud computing meets bioinformatics: a review.
Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong
2013-10-01
In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.
Analyzing large scale genomic data on the cloud with Sparkhit
Huang, Liren; Krüger, Jan
2018-01-01
Abstract Motivation The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. Moreover, they have heavy run time overheads when pre-processing large amounts of data. To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform. Results Sparkhit integrates a variety of analytical methods. It is implemented in the Spark extended MapReduce model. It runs 92–157 times faster than MetaSpark on metagenomic fragment recruitment and 18–32 times faster than Crossbow on data pre-processing. We analyzed 100 terabytes of data across four genomic projects in the cloud in 21 h, which includes the run times of cluster deployment and data downloading. Furthermore, our application on the entire Human Microbiome Project shotgun sequencing data was completed in 2 h, presenting an approach to easily associate large amounts of public datasets with reference data. Availability and implementation Sparkhit is freely available at: https://rhinempi.github.io/sparkhit/. Contact asczyrba@cebitec.uni-bielefeld.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29253074
NASA Astrophysics Data System (ADS)
Yu, Xiaoyuan; Yuan, Jian; Chen, Shi
2013-03-01
Cloud computing is one of the most popular topics in the IT industry and is recently being adopted by many companies. It has four development models, as: public cloud, community cloud, hybrid cloud and private cloud. Except others, private cloud can be implemented in a private network, and delivers some benefits of cloud computing without pitfalls. This paper makes a comparison of typical open source platforms through which we can implement a private cloud. After this comparison, we choose Eucalyptus and Wavemaker to do a case study on the private cloud. We also do some performance estimation of cloud platform services and development of prototype software as cloud services.
Blueprint for a microwave trapped ion quantum computer.
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K
2017-02-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-10-01
Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-01-01
Summary: Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Availability and implementation: Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. Contact: dariusz.mrozek@polsl.pl PMID:24930141
Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup
Kudtarkar, Parul; DeLuca, Todd F.; Fusaro, Vincent A.; Tonellato, Peter J.; Wall, Dennis P.
2010-01-01
Background Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource—Roundup—using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Methods Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon’s Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. Results We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon’s computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure. PMID:21258651
The emerging role of cloud computing in molecular modelling.
Ebejer, Jean-Paul; Fulle, Simone; Morris, Garrett M; Finn, Paul W
2013-07-01
There is a growing recognition of the importance of cloud computing for large-scale and data-intensive applications. The distinguishing features of cloud computing and their relationship to other distributed computing paradigms are described, as are the strengths and weaknesses of the approach. We review the use made to date of cloud computing for molecular modelling projects and the availability of front ends for molecular modelling applications. Although the use of cloud computing technologies for molecular modelling is still in its infancy, we demonstrate its potential by presenting several case studies. Rapid growth can be expected as more applications become available and costs continue to fall; cloud computing can make a major contribution not just in terms of the availability of on-demand computing power, but could also spur innovation in the development of novel approaches that utilize that capacity in more effective ways. Copyright © 2013 Elsevier Inc. All rights reserved.
Challenges in Securing the Interface Between the Cloud and Pervasive Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lagesse, Brent J
2011-01-01
Cloud computing presents an opportunity for pervasive systems to leverage computational and storage resources to accomplish tasks that would not normally be possible on such resource-constrained devices. Cloud computing can enable hardware designers to build lighter systems that last longer and are more mobile. Despite the advantages cloud computing offers to the designers of pervasive systems, there are some limitations of leveraging cloud computing that must be addressed. We take the position that cloud-based pervasive system must be secured holistically and discuss ways this might be accomplished. In this paper, we discuss a pervasive system utilizing cloud computing resources andmore » issues that must be addressed in such a system. In this system, the user's mobile device cannot always have network access to leverage resources from the cloud, so it must make intelligent decisions about what data should be stored locally and what processes should be run locally. As a result of these decisions, the user becomes vulnerable to attacks while interfacing with the pervasive system.« less
An Architecture for Cross-Cloud System Management
NASA Astrophysics Data System (ADS)
Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad
The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data
NASA Astrophysics Data System (ADS)
Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.
2016-12-01
Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.
NASA Astrophysics Data System (ADS)
Li, J.; Zhang, T.; Huang, Q.; Liu, Q.
2014-12-01
Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
'Cloud computing' and clinical trials: report from an ECRIN workshop.
Ohmann, Christian; Canham, Steve; Danielyan, Edgar; Robertshaw, Steve; Legré, Yannick; Clivio, Luca; Demotes, Jacques
2015-07-29
Growing use of cloud computing in clinical trials prompted the European Clinical Research Infrastructures Network, a European non-profit organisation established to support multinational clinical research, to organise a one-day workshop on the topic to clarify potential benefits and risks. The issues that arose in that workshop are summarised and include the following: the nature of cloud computing and the cloud computing industry; the risks in using cloud computing services now; the lack of explicit guidance on this subject, both generally and with reference to clinical trials; and some possible ways of reducing risks. There was particular interest in developing and using a European 'community cloud' specifically for academic clinical trial data. It was recognised that the day-long workshop was only the start of an ongoing process. Future discussion needs to include clarification of trial-specific regulatory requirements for cloud computing and involve representatives from the relevant regulatory bodies.