open-source cloud computing: Topics by Science.gov

Sample records for open-source cloud computing

Research on OpenStack of open source cloud computing in colleges and universities’ computer room

NASA Astrophysics Data System (ADS)

Wang, Lei; Zhang, Dandan

2017-06-01

In recent years, the cloud computing technology has a rapid development, especially open source cloud computing. Open source cloud computing has attracted a large number of user groups by the advantages of open source and low cost, have now become a large-scale promotion and application. In this paper, firstly we briefly introduced the main functions and architecture of the open source cloud computing OpenStack tools, and then discussed deeply the core problems of computer labs in colleges and universities. Combining with this research, it is not that the specific application and deployment of university computer rooms with OpenStack tool. The experimental results show that the application of OpenStack tool can efficiently and conveniently deploy cloud of university computer room, and its performance is stable and the functional value is good.
Evaluating open-source cloud computing solutions for geosciences

NASA Astrophysics Data System (ADS)

Huang, Qunying; Yang, Chaowei; Liu, Kai; Xia, Jizhe; Xu, Chen; Li, Jing; Gui, Zhipeng; Sun, Min; Li, Zhenglong

2013-09-01

Many organizations start to adopt cloud computing for better utilizing computing resources by taking advantage of its scalability, cost reduction, and easy to access characteristics. Many private or community cloud computing platforms are being built using open-source cloud solutions. However, little has been done to systematically compare and evaluate the features and performance of open-source solutions in supporting Geosciences. This paper provides a comprehensive study of three open-source cloud solutions, including OpenNebula, Eucalyptus, and CloudStack. We compared a variety of features, capabilities, technologies and performances including: (1) general features and supported services for cloud resource creation and management, (2) advanced capabilities for networking and security, and (3) the performance of the cloud solutions in provisioning and operating the cloud resources as well as the performance of virtual machines initiated and managed by the cloud solutions in supporting selected geoscience applications. Our study found that: (1) no significant performance differences in central processing unit (CPU), memory and I/O of virtual machines created and managed by different solutions, (2) OpenNebula has the fastest internal network while both Eucalyptus and CloudStack have better virtual machine isolation and security strategies, (3) Cloudstack has the fastest operations in handling virtual machines, images, snapshots, volumes and networking, followed by OpenNebula, and (4) the selected cloud computing solutions are capable for supporting concurrent intensive web applications, computing intensive applications, and small-scale model simulations without intensive data communication.
The Role of Standards in Cloud-Computing Interoperability

DTIC Science & Technology

2012-10-01

services are not shared outside the organization. CloudStack, Eucalyptus, HP, Microsoft, OpenStack , Ubuntu, and VMWare provide tools for building...center requirements • Developing usage models for cloud ven- dors • Independent IT consortium OpenStack http://www.openstack.org • Open-source...software for running private clouds • Currently consists of three core software projects: OpenStack Compute (Nova), OpenStack Object Storage (Swift
ProteoCloud: a full-featured open source proteomics cloud computing pipeline.

PubMed

Muth, Thilo; Peters, Julian; Blackburn, Jonathan; Rapp, Erdmann; Martens, Lennart

2013-08-02

We here present the ProteoCloud pipeline, a freely available, full-featured cloud-based platform to perform computationally intensive, exhaustive searches in a cloud environment using five different peptide identification algorithms. ProteoCloud is entirely open source, and is built around an easy to use and cross-platform software client with a rich graphical user interface. This client allows full control of the number of cloud instances to initiate and of the spectra to assign for identification. It also enables the user to track progress, and to visualize and interpret the results in detail. Source code, binaries and documentation are all available at http://proteocloud.googlecode.com. Copyright © 2012 Elsevier B.V. All rights reserved.
Creating a Rackspace and NASA Nebula compatible cloud using the OpenStack project (Invited)

NASA Astrophysics Data System (ADS)

Clark, R.

2010-12-01

NASA and Rackspace have both provided technology to the OpenStack that allows anyone to create a private Infrastructure as a Service (IaaS) cloud using open source software and commodity hardware. OpenStack is designed and developed completely in the open and with an open governance process. NASA donated Nova, which powers the compute portion of NASA Nebula Cloud Computing Platform, and Rackspace donated Swift, which powers Rackspace Cloud Files. The project is now in continuous development by NASA, Rackspace, and hundreds of other participants. When you create a private cloud using Openstack, you will have the ability to easily interact with your private cloud, a government cloud, and an ecosystem of public cloud providers, using the same API.
Cloud Computing for DoD

DTIC Science & Technology

2012-05-01

cloud computing 17 NASA Nebula Platform •  Cloud computing pilot program at NASA Ames •  Integrates open-source components into seamless, self...Mission support •  Education and public outreach (NASA Nebula , 2010) 18 NSF Supported Cloud Research •  Support for Cloud Computing in...Mell, P. & Grance, T. (2011). The NIST Definition of Cloud Computing. NIST Special Publication 800-145 •  NASA Nebula (2010). Retrieved from
Elastic Cloud Computing Infrastructures in the Open Cirrus Testbed Implemented via Eucalyptus

NASA Astrophysics Data System (ADS)

Baun, Christian; Kunze, Marcel

Cloud computing realizes the advantages and overcomes some restrictionsof the grid computing paradigm. Elastic infrastructures can easily be createdand managed by cloud users. In order to accelerate the research ondata center management and cloud services the OpenCirrusTM researchtestbed has been started by HP, Intel and Yahoo!. Although commercialcloud offerings are proprietary, Open Source solutions exist in the field ofIaaS with Eucalyptus, PaaS with AppScale and at the applications layerwith Hadoop MapReduce. This paper examines the I/O performance ofcloud computing infrastructures implemented with Eucalyptus in contrastto Amazon S3.
Cloud computing geospatial application for water resources based on free and open source software and open standards - a prototype

NASA Astrophysics Data System (ADS)

Delipetrev, Blagoj

2016-04-01

Presently, most of the existing software is desktop-based, designed to work on a single computer, which represents a major limitation in many ways, starting from limited computer processing, storage power, accessibility, availability, etc. The only feasible solution lies in the web and cloud. This abstract presents research and development of a cloud computing geospatial application for water resources based on free and open source software and open standards using hybrid deployment model of public - private cloud, running on two separate virtual machines (VMs). The first one (VM1) is running on Amazon web services (AWS) and the second one (VM2) is running on a Xen cloud platform. The presented cloud application is developed using free and open source software, open standards and prototype code. The cloud application presents a framework how to develop specialized cloud geospatial application that needs only a web browser to be used. This cloud application is the ultimate collaboration geospatial platform because multiple users across the globe with internet connection and browser can jointly model geospatial objects, enter attribute data and information, execute algorithms, and visualize results. The presented cloud application is: available all the time, accessible from everywhere, it is scalable, works in a distributed computer environment, it creates a real-time multiuser collaboration platform, the programing languages code and components are interoperable, and it is flexible in including additional components. The cloud geospatial application is implemented as a specialized water resources application with three web services for 1) data infrastructure (DI), 2) support for water resources modelling (WRM), 3) user management. The web services are running on two VMs that are communicating over the internet providing services to users. The application was tested on the Zletovica river basin case study with concurrent multiple users. The application is a state-of-the-art cloud geospatial collaboration platform. The presented solution is a prototype and can be used as a foundation for developing of any specialized cloud geospatial applications. Further research will be focused on distributing the cloud application on additional VMs, testing the scalability and availability of services.
CSNS computing environment Based on OpenStack

NASA Astrophysics Data System (ADS)

Li, Yakang; Qi, Fazhi; Chen, Gang; Wang, Yanming; Hong, Jianshu

2017-10-01

Cloud computing can allow for more flexible configuration of IT resources and optimized hardware utilization, it also can provide computing service according to the real need. We are applying this computing mode to the China Spallation Neutron Source(CSNS) computing environment. So, firstly, CSNS experiment and its computing scenarios and requirements are introduced in this paper. Secondly, the design and practice of cloud computing platform based on OpenStack are mainly demonstrated from the aspects of cloud computing system framework, network, storage and so on. Thirdly, some improvments to openstack we made are discussed further. Finally, current status of CSNS cloud computing environment are summarized in the ending of this paper.
OpenID Connect as a security service in cloud-based medical imaging systems.

PubMed

Ma, Weina; Sartipi, Kamran; Sharghigoorabi, Hassan; Koff, David; Bak, Peter

2016-04-01

The evolution of cloud computing is driving the next generation of medical imaging systems. However, privacy and security concerns have been consistently regarded as the major obstacles for adoption of cloud computing by healthcare domains. OpenID Connect, combining OpenID and OAuth together, is an emerging representational state transfer-based federated identity solution. It is one of the most adopted open standards to potentially become the de facto standard for securing cloud computing and mobile applications, which is also regarded as "Kerberos of cloud." We introduce OpenID Connect as an authentication and authorization service in cloud-based diagnostic imaging (DI) systems, and propose enhancements that allow for incorporating this technology within distributed enterprise environments. The objective of this study is to offer solutions for secure sharing of medical images among diagnostic imaging repository (DI-r) and heterogeneous picture archiving and communication systems (PACS) as well as Web-based and mobile clients in the cloud ecosystem. The main objective is to use OpenID Connect open-source single sign-on and authorization service and in a user-centric manner, while deploying DI-r and PACS to private or community clouds should provide equivalent security levels to traditional computing model.
OpenID connect as a security service in Cloud-based diagnostic imaging systems

NASA Astrophysics Data System (ADS)

Ma, Weina; Sartipi, Kamran; Sharghi, Hassan; Koff, David; Bak, Peter

2015-03-01

The evolution of cloud computing is driving the next generation of diagnostic imaging (DI) systems. Cloud-based DI systems are able to deliver better services to patients without constraining to their own physical facilities. However, privacy and security concerns have been consistently regarded as the major obstacle for adoption of cloud computing by healthcare domains. Furthermore, traditional computing models and interfaces employed by DI systems are not ready for accessing diagnostic images through mobile devices. RESTful is an ideal technology for provisioning both mobile services and cloud computing. OpenID Connect, combining OpenID and OAuth together, is an emerging REST-based federated identity solution. It is one of the most perspective open standards to potentially become the de-facto standard for securing cloud computing and mobile applications, which has ever been regarded as "Kerberos of Cloud". We introduce OpenID Connect as an identity and authentication service in cloud-based DI systems and propose enhancements that allow for incorporating this technology within distributed enterprise environment. The objective of this study is to offer solutions for secure radiology image sharing among DI-r (Diagnostic Imaging Repository) and heterogeneous PACS (Picture Archiving and Communication Systems) as well as mobile clients in the cloud ecosystem. Through using OpenID Connect as an open-source identity and authentication service, deploying DI-r and PACS to private or community clouds should obtain equivalent security level to traditional computing model.
OpenID Connect as a security service in cloud-based medical imaging systems

PubMed Central

Ma, Weina; Sartipi, Kamran; Sharghigoorabi, Hassan; Koff, David; Bak, Peter

2016-01-01

Abstract. The evolution of cloud computing is driving the next generation of medical imaging systems. However, privacy and security concerns have been consistently regarded as the major obstacles for adoption of cloud computing by healthcare domains. OpenID Connect, combining OpenID and OAuth together, is an emerging representational state transfer-based federated identity solution. It is one of the most adopted open standards to potentially become the de facto standard for securing cloud computing and mobile applications, which is also regarded as “Kerberos of cloud.” We introduce OpenID Connect as an authentication and authorization service in cloud-based diagnostic imaging (DI) systems, and propose enhancements that allow for incorporating this technology within distributed enterprise environments. The objective of this study is to offer solutions for secure sharing of medical images among diagnostic imaging repository (DI-r) and heterogeneous picture archiving and communication systems (PACS) as well as Web-based and mobile clients in the cloud ecosystem. The main objective is to use OpenID Connect open-source single sign-on and authorization service and in a user-centric manner, while deploying DI-r and PACS to private or community clouds should provide equivalent security levels to traditional computing model. PMID:27340682
Dynamic VM Provisioning for TORQUE in a Cloud Environment

NASA Astrophysics Data System (ADS)

Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.

2014-06-01

Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.
Hybrid cloud: bridging of private and public cloud computing

NASA Astrophysics Data System (ADS)

Aryotejo, Guruh; Kristiyanto, Daniel Y.; Mufadhol

2018-05-01

Cloud Computing is quickly emerging as a promising paradigm in the recent years especially for the business sector. In addition, through cloud service providers, cloud computing is widely used by Information Technology (IT) based startup company to grow their business. However, the level of most businesses awareness on data security issues is low, since some Cloud Service Provider (CSP) could decrypt their data. Hybrid Cloud Deployment Model (HCDM) has characteristic as open source, which is one of secure cloud computing model, thus HCDM may solve data security issues. The objective of this study is to design, deploy and evaluate a HCDM as Infrastructure as a Service (IaaS). In the implementation process, Metal as a Service (MAAS) engine was used as a base to build an actual server and node. Followed by installing the vsftpd application, which serves as FTP server. In comparison with HCDM, public cloud was adopted through public cloud interface. As a result, the design and deployment of HCDM was conducted successfully, instead of having good security, HCDM able to transfer data faster than public cloud significantly. To the best of our knowledge, Hybrid Cloud Deployment model is one of secure cloud computing model due to its characteristic as open source. Furthermore, this study will serve as a base for future studies about Hybrid Cloud Deployment model which may relevant for solving big security issues of IT-based startup companies especially in Indonesia.
Menu-driven cloud computing and resource sharing for R and Bioconductor.

PubMed

Bolouri, Hamid; Dulepet, Rajiv; Angerman, Michael

2011-08-15

We report CRdata.org, a cloud-based, free, open-source web server for running analyses and sharing data and R scripts with others. In addition to using the free, public service, CRdata users can launch their own private Amazon Elastic Computing Cloud (EC2) nodes and store private data and scripts on Amazon's Simple Storage Service (S3) with user-controlled access rights. All CRdata services are provided via point-and-click menus. CRdata is open-source and free under the permissive MIT License (opensource.org/licenses/mit-license.php). The source code is in Ruby (ruby-lang.org/en/) and available at: github.com/seerdata/crdata. hbolouri@fhcrc.org.
Web Solutions Inspire Cloud Computing Software

NASA Technical Reports Server (NTRS)

2013-01-01

An effort at Ames Research Center to standardize NASA websites unexpectedly led to a breakthrough in open source cloud computing technology. With the help of Rackspace Inc. of San Antonio, Texas, the resulting product, OpenStack, has spurred the growth of an entire industry that is already employing hundreds of people and generating hundreds of millions in revenue.
Menu-driven cloud computing and resource sharing for R and Bioconductor

PubMed Central

Bolouri, Hamid; Angerman, Michael

2011-01-01

Summary: We report CRdata.org, a cloud-based, free, open-source web server for running analyses and sharing data and R scripts with others. In addition to using the free, public service, CRdata users can launch their own private Amazon Elastic Computing Cloud (EC2) nodes and store private data and scripts on Amazon's Simple Storage Service (S3) with user-controlled access rights. All CRdata services are provided via point-and-click menus. Availability and Implementation: CRdata is open-source and free under the permissive MIT License (opensource.org/licenses/mit-license.php). The source code is in Ruby (ruby-lang.org/en/) and available at: github.com/seerdata/crdata. Contact: hbolouri@fhcrc.org PMID:21685055
An Interactive Web-Based Analysis Framework for Remote Sensing Cloud Computing

NASA Astrophysics Data System (ADS)

Wang, X. Z.; Zhang, H. M.; Zhao, J. H.; Lin, Q. H.; Zhou, Y. C.; Li, J. H.

2015-07-01

Spatiotemporal data, especially remote sensing data, are widely used in ecological, geographical, agriculture, and military research and applications. With the development of remote sensing technology, more and more remote sensing data are accumulated and stored in the cloud. An effective way for cloud users to access and analyse these massive spatiotemporal data in the web clients becomes an urgent issue. In this paper, we proposed a new scalable, interactive and web-based cloud computing solution for massive remote sensing data analysis. We build a spatiotemporal analysis platform to provide the end-user with a safe and convenient way to access massive remote sensing data stored in the cloud. The lightweight cloud storage system used to store public data and users' private data is constructed based on open source distributed file system. In it, massive remote sensing data are stored as public data, while the intermediate and input data are stored as private data. The elastic, scalable, and flexible cloud computing environment is built using Docker, which is a technology of open-source lightweight cloud computing container in the Linux operating system. In the Docker container, open-source software such as IPython, NumPy, GDAL, and Grass GIS etc., are deployed. Users can write scripts in the IPython Notebook web page through the web browser to process data, and the scripts will be submitted to IPython kernel to be executed. By comparing the performance of remote sensing data analysis tasks executed in Docker container, KVM virtual machines and physical machines respectively, we can conclude that the cloud computing environment built by Docker makes the greatest use of the host system resources, and can handle more concurrent spatial-temporal computing tasks. Docker technology provides resource isolation mechanism in aspects of IO, CPU, and memory etc., which offers security guarantee when processing remote sensing data in the IPython Notebook. Users can write complex data processing code on the web directly, so they can design their own data processing algorithm.
Open Source Cloud-Based Technologies for Bim

NASA Astrophysics Data System (ADS)

Logothetis, S.; Karachaliou, E.; Valari, E.; Stylianidis, E.

2018-05-01

This paper presents a Cloud-based open source system for storing and processing data from a 3D survey approach. More specifically, we provide an online service for viewing, storing and analysing BIM. Cloud technologies were used to develop a web interface as a BIM data centre, which can handle large BIM data using a server. The server can be accessed by many users through various electronic devices anytime and anywhere so they can view online 3D models using browsers. Nowadays, the Cloud computing is engaged progressively in facilitating BIM-based collaboration between the multiple stakeholders and disciplinary groups for complicated Architectural, Engineering and Construction (AEC) projects. Besides, the development of Open Source Software (OSS) has been rapidly growing and their use tends to be united. Although BIM and Cloud technologies are extensively known and used, there is a lack of integrated open source Cloud-based platforms able to support all stages of BIM processes. The present research aims to create an open source Cloud-based BIM system that is able to handle geospatial data. In this effort, only open source tools will be used; from the starting point of creating the 3D model with FreeCAD to its online presentation through BIMserver. Python plug-ins will be developed to link the two software which will be distributed and freely available to a large community of professional for their use. The research work will be completed by benchmarking four Cloud-based BIM systems: Autodesk BIM 360, BIMserver, Graphisoft BIMcloud and Onuma System, which present remarkable results.
Dynamic Extension of a Virtualized Cluster by using Cloud Resources

NASA Astrophysics Data System (ADS)

Oberst, Oliver; Hauth, Thomas; Kernert, David; Riedel, Stephan; Quast, Günter

2012-12-01

The specific requirements concerning the software environment within the HEP community constrain the choice of resource providers for the outsourcing of computing infrastructure. The use of virtualization in HPC clusters and in the context of cloud resources is therefore a subject of recent developments in scientific computing. The dynamic virtualization of worker nodes in common batch systems provided by ViBatch serves each user with a dynamically virtualized subset of worker nodes on a local cluster. Now it can be transparently extended by the use of common open source cloud interfaces like OpenNebula or Eucalyptus, launching a subset of the virtual worker nodes within the cloud. This paper demonstrates how a dynamically virtualized computing cluster is combined with cloud resources by attaching remotely started virtual worker nodes to the local batch system.

STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.

PubMed

Karczewski, Konrad J; Fernald, Guy Haskin; Martin, Alicia R; Snyder, Michael; Tatonetti, Nicholas P; Dudley, Joel T

2014-01-01

The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.
Factors Influencing F/OSS Cloud Computing Software Product Success: A Quantitative Study

ERIC Educational Resources Information Center

Letort, D. Brian

2012-01-01

Cloud Computing introduces a new business operational model that allows an organization to shift information technology consumption from traditional capital expenditure to operational expenditure. This shift introduces challenges from both the adoption and creation vantage. This study evaluates factors that influence Free/Open Source Software…
Research on private cloud computing based on analysis on typical opensource platform: a case study with Eucalyptus and Wavemaker

NASA Astrophysics Data System (ADS)

Yu, Xiaoyuan; Yuan, Jian; Chen, Shi

2013-03-01

Cloud computing is one of the most popular topics in the IT industry and is recently being adopted by many companies. It has four development models, as: public cloud, community cloud, hybrid cloud and private cloud. Except others, private cloud can be implemented in a private network, and delivers some benefits of cloud computing without pitfalls. This paper makes a comparison of typical open source platforms through which we can implement a private cloud. After this comparison, we choose Eucalyptus and Wavemaker to do a case study on the private cloud. We also do some performance estimation of cloud platform services and development of prototype software as cloud services.
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing

NASA Technical Reports Server (NTRS)

Pham, Long; Chen, Aijun; Kempler, Steven; Lynnes, Christopher; Theobald, Michael; Asghar, Esfandiari; Campino, Jane; Vollmer, Bruce

2011-01-01

Cloud Computing has been implemented in several commercial arenas. The NASA Nebula Cloud Computing platform is an Infrastructure as a Service (IaaS) built in 2008 at NASA Ames Research Center and 2010 at GSFC. Nebula is an open source Cloud platform intended to: a) Make NASA realize significant cost savings through efficient resource utilization, reduced energy consumption, and reduced labor costs. b) Provide an easier way for NASA scientists and researchers to efficiently explore and share large and complex data sets. c) Allow customers to provision, manage, and decommission computing capabilities on an as-needed bases
STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

PubMed Central

Karczewski, Konrad J.; Fernald, Guy Haskin; Martin, Alicia R.; Snyder, Michael; Tatonetti, Nicholas P.; Dudley, Joel T.

2014-01-01

The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756
The Integration of CloudStack and OCCI/OpenNebula with DIRAC

NASA Astrophysics Data System (ADS)

Méndez Muñoz, Víctor; Fernández Albor, Víctor; Graciani Diaz, Ricardo; Casajús Ramo, Adriàn; Fernández Pena, Tomás; Merino Arévalo, Gonzalo; José Saborido Silva, Juan

2012-12-01

The increasing availability of Cloud resources is arising as a realistic alternative to the Grid as a paradigm for enabling scientific communities to access large distributed computing resources. The DIRAC framework for distributed computing is an easy way to efficiently access to resources from both systems. This paper explains the integration of DIRAC with two open-source Cloud Managers: OpenNebula (taking advantage of the OCCI standard) and CloudStack. These are computing tools to manage the complexity and heterogeneity of distributed data center infrastructures, allowing to create virtual clusters on demand, including public, private and hybrid clouds. This approach has required to develop an extension to the previous DIRAC Virtual Machine engine, which was developed for Amazon EC2, allowing the connection with these new cloud managers. In the OpenNebula case, the development has been based on the CernVM Virtual Software Appliance with appropriate contextualization, while in the case of CloudStack, the infrastructure has been kept more general, which permits other Virtual Machine sources and operating systems being used. In both cases, CernVM File System has been used to facilitate software distribution to the computing nodes. With the resulting infrastructure, the cloud resources are transparent to the users through a friendly interface, like the DIRAC Web Portal. The main purpose of this integration is to get a system that can manage cloud and grid resources at the same time. This particular feature pushes DIRAC to a new conceptual denomination as interware, integrating different middleware. Users from different communities do not need to care about the installation of the standard software that is available at the nodes, nor the operating system of the host machine which is transparent to the user. This paper presents an analysis of the overhead of the virtual layer, doing some tests to compare the proposed approach with the existing Grid solution. License Notice: Published under licence in Journal of Physics: Conference Series by IOP Publishing Ltd.
Mapping urban green open space in Bontang city using QGIS and cloud computing

NASA Astrophysics Data System (ADS)

Agus, F.; Ramadiani; Silalahi, W.; Armanda, A.; Kusnandar

2018-04-01

Digital mapping techniques are available freely and openly so that map-based application development is easier, faster and cheaper. A rapid development of Cloud Computing Geographic Information System makes this system can help the needs of the community for the provision of geospatial information online. The presence of urban Green Open Space (GOS) provide great benefits as an oxygen supplier, carbon-binding agent and can contribute to providing comfort and beauty of city life. This study aims to propose a platform application of GIS Cloud Computing (CC) of Bontang City GOS mapping. The GIS-CC platform uses the basic map available that’s free and open source. The research used survey method to collect GOS data obtained from Bontang City Government, while application developing works Quantum GIS-CC. The result section describes the existence of GOS Bontang City and the design of GOS mapping application.
AstroCloud, a Cyber-Infrastructure for Astronomy Research: Cloud Computing Environments

NASA Astrophysics Data System (ADS)

Li, C.; Wang, J.; Cui, C.; He, B.; Fan, D.; Yang, Y.; Chen, J.; Zhang, H.; Yu, C.; Xiao, J.; Wang, C.; Cao, Z.; Fan, Y.; Hong, Z.; Li, S.; Mi, L.; Wan, W.; Wang, J.; Yin, S.

2015-09-01

AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on CloudStack, an open source software, we set up the cloud computing environment for AstroCloud Project. It consists of five distributed nodes across the mainland of China. Users can use and analysis data in this cloud computing environment. Based on GlusterFS, we built a scalable cloud storage system. Each user has a private space, which can be shared among different virtual machines and desktop systems. With this environments, astronomer can access to astronomical data collected by different telescopes and data centers easily, and data producers can archive their datasets safely.
Identification of Program Signatures from Cloud Computing System Telemetry Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nichols, Nicole M.; Greaves, Mark T.; Smith, William P.

Malicious cloud computing activity can take many forms, including running unauthorized programs in a virtual environment. Detection of these malicious activities while preserving the privacy of the user is an important research challenge. Prior work has shown the potential viability of using cloud service billing metrics as a mechanism for proxy identification of malicious programs. Previously this novel detection method has been evaluated in a synthetic and isolated computational environment. In this paper we demonstrate the ability of billing metrics to identify programs, in an active cloud computing environment, including multiple virtual machines running on the same hypervisor. The openmore » source cloud computing platform OpenStack, is used for private cloud management at Pacific Northwest National Laboratory. OpenStack provides a billing tool (Ceilometer) to collect system telemetry measurements. We identify four different programs running on four virtual machines under the same cloud user account. Programs were identified with up to 95% accuracy. This accuracy is dependent on the distinctiveness of telemetry measurements for the specific programs we tested. Future work will examine the scalability of this approach for a larger selection of programs to better understand the uniqueness needed to identify a program. Additionally, future work should address the separation of signatures when multiple programs are running on the same virtual machine.« less
Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms

PubMed Central

Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.

2009-01-01

One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

PubMed

Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

2009-06-01

One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
A Simple Technique for Securing Data at Rest Stored in a Computing Cloud

NASA Astrophysics Data System (ADS)

Sedayao, Jeff; Su, Steven; Ma, Xiaohao; Jiang, Minghao; Miao, Kai

"Cloud Computing" offers many potential benefits, including cost savings, the ability to deploy applications and services quickly, and the ease of scaling those application and services once they are deployed. A key barrier for enterprise adoption is the confidentiality of data stored on Cloud Computing Infrastructure. Our simple technique implemented with Open Source software solves this problem by using public key encryption to render stored data at rest unreadable by unauthorized personnel, including system administrators of the cloud computing service on which the data is stored. We validate our approach on a network measurement system implemented on PlanetLab. We then use it on a service where confidentiality is critical - a scanning application that validates external firewall implementations.
Managing a tier-2 computer centre with a private cloud infrastructure

NASA Astrophysics Data System (ADS)

Bagnasco, Stefano; Berzano, Dario; Brunetti, Riccardo; Lusso, Stefano; Vallero, Sara

2014-06-01

In a typical scientific computing centre, several applications coexist and share a single physical infrastructure. An underlying Private Cloud infrastructure eases the management and maintenance of such heterogeneous applications (such as multipurpose or application-specific batch farms, Grid sites, interactive data analysis facilities and others), allowing dynamic allocation resources to any application. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques. Such infrastructures are being deployed in some large centres (see e.g. the CERN Agile Infrastructure project), but with several open-source tools reaching maturity this is becoming viable also for smaller sites. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 centre, an Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The private cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem and the OpenWRT Linux distribution (used for network virtualization); a future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and OCCI.
A computational- And storage-cloud for integration of biodiversity collections

USGS Publications Warehouse

Matsunaga, A.; Thompson, A.; Figueiredo, R. J.; Germain-Aubrey, C.C; Collins, M.; Beeman, R.S; Macfadden, B.J.; Riccardi, G.; Soltis, P.S; Page, L. M.; Fortes, J.A.B

2013-01-01

A core mission of the Integrated Digitized Biocollections (iDigBio) project is the building and deployment of a cloud computing environment customized to support the digitization workflow and integration of data from all U.S. nonfederal biocollections. iDigBio chose to use cloud computing technologies to deliver a cyberinfrastructure that is flexible, agile, resilient, and scalable to meet the needs of the biodiversity community. In this context, this paper describes the integration of open source cloud middleware, applications, and third party services using standard formats, protocols, and services. In addition, this paper demonstrates the value of the digitized information from collections in a broader scenario involving multiple disciplines.
Scalable cloud without dedicated storage

NASA Astrophysics Data System (ADS)

Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.

2015-05-01

We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.
Sector and Sphere: the design and implementation of a high-performance data cloud

PubMed Central

Gu, Yunhong; Grossman, Robert L.

2009-01-01

Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source. PMID:19451100
Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem

NASA Astrophysics Data System (ADS)

Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.

2015-12-01

Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.
A Cloud-Based Infrastructure for Near-Real-Time Processing and Dissemination of NPP Data

NASA Astrophysics Data System (ADS)

Evans, J. D.; Valente, E. G.; Chettri, S. S.

2011-12-01

We are building a scalable cloud-based infrastructure for generating and disseminating near-real-time data products from a variety of geospatial and meteorological data sources, including the new National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP). Our approach relies on linking Direct Broadcast and other data streams to a suite of scientific algorithms coordinated by NASA's International Polar-Orbiter Processing Package (IPOPP). The resulting data products are directly accessible to a wide variety of end-user applications, via industry-standard protocols such as OGC Web Services, Unidata Local Data Manager, or OPeNDAP, using open source software components. The processing chain employs on-demand computing resources from Amazon.com's Elastic Compute Cloud and NASA's Nebula cloud services. Our current prototype targets short-term weather forecasting, in collaboration with NASA's Short-term Prediction Research and Transition (SPoRT) program and the National Weather Service. Direct Broadcast is especially crucial for NPP, whose current ground segment is unlikely to deliver data quickly enough for short-term weather forecasters and other near-real-time users. Direct Broadcast also allows full local control over data handling, from the receiving antenna to end-user applications: this provides opportunities to streamline processes for data ingest, processing, and dissemination, and thus to make interpreted data products (Environmental Data Records) available to practitioners within minutes of data capture at the sensor. Cloud computing lets us grow and shrink computing resources to meet large and rapid fluctuations in data availability (twice daily for polar orbiters) - and similarly large fluctuations in demand from our target (near-real-time) users. This offers a compelling business case for cloud computing: the processing or dissemination systems can grow arbitrarily large to sustain near-real time data access despite surges in data volumes or user demand, but that computing capacity (and hourly costs) can be dropped almost instantly once the surge passes. Cloud computing also allows low-risk experimentation with a variety of machine architectures (processor types; bandwidth, memory, and storage capacities, etc.) and of system configurations (including massively parallel computing patterns). Finally, our service-based approach (in which user applications invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored products on demand. To maximize the usefulness and impact of our technology, we have emphasized open, industry-standard software interfaces. We are also using and developing open source software to facilitate the widespread adoption of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sources.
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.

PubMed

Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina

2013-01-28

Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
A Platform for Scalable Satellite and Geospatial Data Analysis

NASA Astrophysics Data System (ADS)

Beneke, C. M.; Skillman, S.; Warren, M. S.; Kelton, T.; Brumby, S. P.; Chartrand, R.; Mathis, M.

2017-12-01

At Descartes Labs, we use the commercial cloud to run global-scale machine learning applications over satellite imagery. We have processed over 5 Petabytes of public and commercial satellite imagery, including the full Landsat and Sentinel archives. By combining open-source tools with a FUSE-based filesystem for cloud storage, we have enabled a scalable compute platform that has demonstrated reading over 200 GB/s of satellite imagery into cloud compute nodes. In one application, we generated global 15m Landsat-8, 20m Sentinel-1, and 10m Sentinel-2 composites from 15 trillion pixels, using over 10,000 CPUs. We recently created a public open-source Python client library that can be used to query and access preprocessed public satellite imagery from within our platform, and made this platform available to researchers for non-commercial projects. In this session, we will describe how you can use the Descartes Labs Platform for rapid prototyping and scaling of geospatial analyses and demonstrate examples in land cover classification.

Digital Textbooks. Research Brief

ERIC Educational Resources Information Center

Johnston, Howard

2011-01-01

Despite their growing popularity, digital alternatives to conventional textbooks are stirring up controversy. With the introduction of tablet computers, and the growing trend toward "cloud computing" and "open source" software, the trend is accelerating because costs are coming down and free or inexpensive materials are becoming more available.…
Hybrid cloud and cluster computing paradigms for life science applications

PubMed Central

2010-01-01

Background Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Results Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. Conclusions The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. Methods We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. PMID:21210982
Hybrid cloud and cluster computing paradigms for life science applications.

PubMed

Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey

2010-12-21

Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
Genes2WordCloud: a quick way to identify biological themes from gene lists and free text.

PubMed

Baroukh, Caroline; Jenkins, Sherry L; Dannenfelser, Ruth; Ma'ayan, Avi

2011-10-13

Word-clouds recently emerged on the web as a solution for quickly summarizing text by maximizing the display of most relevant terms about a specific topic in the minimum amount of space. As biologists are faced with the daunting amount of new research data commonly presented in textual formats, word-clouds can be used to summarize and represent biological and/or biomedical content for various applications. Genes2WordCloud is a web application that enables users to quickly identify biological themes from gene lists and research relevant text by constructing and displaying word-clouds. It provides users with several different options and ideas for the sources that can be used to generate a word-cloud. Different options for rendering and coloring the word-clouds give users the flexibility to quickly generate customized word-clouds of their choice. Genes2WordCloud is a word-cloud generator and a word-cloud viewer that is based on WordCram implemented using Java, Processing, AJAX, mySQL, and PHP. Text is fetched from several sources and then processed to extract the most relevant terms with their computed weights based on word frequencies. Genes2WordCloud is freely available for use online; it is open source software and is available for installation on any web-site along with supporting documentation at http://www.maayanlab.net/G2W. Genes2WordCloud provides a useful way to summarize and visualize large amounts of textual biological data or to find biological themes from several different sources. The open source availability of the software enables users to implement customized word-clouds on their own web-sites and desktop applications.
Genes2WordCloud: a quick way to identify biological themes from gene lists and free text

PubMed Central

2011-01-01

Background Word-clouds recently emerged on the web as a solution for quickly summarizing text by maximizing the display of most relevant terms about a specific topic in the minimum amount of space. As biologists are faced with the daunting amount of new research data commonly presented in textual formats, word-clouds can be used to summarize and represent biological and/or biomedical content for various applications. Results Genes2WordCloud is a web application that enables users to quickly identify biological themes from gene lists and research relevant text by constructing and displaying word-clouds. It provides users with several different options and ideas for the sources that can be used to generate a word-cloud. Different options for rendering and coloring the word-clouds give users the flexibility to quickly generate customized word-clouds of their choice. Methods Genes2WordCloud is a word-cloud generator and a word-cloud viewer that is based on WordCram implemented using Java, Processing, AJAX, mySQL, and PHP. Text is fetched from several sources and then processed to extract the most relevant terms with their computed weights based on word frequencies. Genes2WordCloud is freely available for use online; it is open source software and is available for installation on any web-site along with supporting documentation at http://www.maayanlab.net/G2W. Conclusions Genes2WordCloud provides a useful way to summarize and visualize large amounts of textual biological data or to find biological themes from several different sources. The open source availability of the software enables users to implement customized word-clouds on their own web-sites and desktop applications. PMID:21995939
Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

NASA Astrophysics Data System (ADS)

Li, Rui; Chen, Lei; Li, Wen-Syan

Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.
Laptops and Inspired Writing

ERIC Educational Resources Information Center

Warschauer, Mark; Arada, Kathleen; Zheng, Binbin

2010-01-01

Can daily access to laptop computers help students become better writers? Are such programs affordable? Evidence from the Inspired Writing program in Littleton Public Schools, Colorado, USA, provides a resounding yes to both questions. The program employs student netbooks, open-source software, cloud computing, and social media to help students in…
The Research of the Parallel Computing Development from the Angle of Cloud Computing

NASA Astrophysics Data System (ADS)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
!CHAOS: A cloud of controls

NASA Astrophysics Data System (ADS)

Angius, S.; Bisegni, C.; Ciuffetti, P.; Di Pirro, G.; Foggetta, L. G.; Galletti, F.; Gargana, R.; Gioscio, E.; Maselli, D.; Mazzitelli, G.; Michelotti, A.; Orrù, R.; Pistoni, M.; Spagnoli, F.; Spigone, D.; Stecchi, A.; Tonto, T.; Tota, M. A.; Catani, L.; Di Giulio, C.; Salina, G.; Buzzi, P.; Checcucci, B.; Lubrano, P.; Piccini, M.; Fattibene, E.; Michelotto, M.; Cavallaro, S. R.; Diana, B. F.; Enrico, F.; Pulvirenti, S.

2016-01-01

The paper is aimed to present the !CHAOS open source project aimed to develop a prototype of a national private Cloud Computing infrastructure, devoted to accelerator control systems and large experiments of High Energy Physics (HEP). The !CHAOS project has been financed by MIUR (Italian Ministry of Research and Education) and aims to develop a new concept of control system and data acquisition framework by providing, with a high level of aaabstraction, all the services needed for controlling and managing a large scientific, or non-scientific, infrastructure. A beta version of the !CHAOS infrastructure will be released at the end of December 2015 and will run on private Cloud infrastructures based on OpenStack.
Cloud-based Jupyter Notebooks for Water Data Analysis

NASA Astrophysics Data System (ADS)

Castronova, A. M.; Brazil, L.; Seul, M.

2017-12-01

The development and adoption of technologies by the water science community to improve our ability to openly collaborate and share workflows will have a transformative impact on how we address the challenges associated with collaborative and reproducible scientific research. Jupyter notebooks offer one solution by providing an open-source platform for creating metadata-rich toolchains for modeling and data analysis applications. Adoption of this technology within the water sciences, coupled with publicly available datasets from agencies such as USGS, NASA, and EPA enables researchers to easily prototype and execute data intensive toolchains. Moreover, implementing this software stack in a cloud-based environment extends its native functionality to provide researchers a mechanism to build and execute toolchains that are too large or computationally demanding for typical desktop computers. Additionally, this cloud-based solution enables scientists to disseminate data processing routines alongside journal publications in an effort to support reproducibility. For example, these data collection and analysis toolchains can be shared, archived, and published using the HydroShare platform or downloaded and executed locally to reproduce scientific analysis. This work presents the design and implementation of a cloud-based Jupyter environment and its application for collecting, aggregating, and munging various datasets in a transparent, sharable, and self-documented manner. The goals of this work are to establish a free and open source platform for domain scientists to (1) conduct data intensive and computationally intensive collaborative research, (2) utilize high performance libraries, models, and routines within a pre-configured cloud environment, and (3) enable dissemination of research products. This presentation will discuss recent efforts towards achieving these goals, and describe the architectural design of the notebook server in an effort to support collaborative and reproducible science.
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*

PubMed Central

Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.

2015-01-01

Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.

PubMed

Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L

2015-02-01

Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Waggle: A Framework for Intelligent Attentive Sensing and Actuation

NASA Astrophysics Data System (ADS)

Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.

2014-12-01

Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.
Large-scale virtual screening on public cloud resources with Apache Spark.

PubMed

Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola

2017-01-01

Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
NASA Update for Unidata Stratcomm

NASA Technical Reports Server (NTRS)

Lynnes, Chris

2017-01-01

The NASA representative to the Unidata Strategic Committee presented a semiannual update on NASAs work with and use of Unidata technologies. The talk updated Unidata on the program of cloud computing prototypes underway for the Earth Observing System Data and Information System (EOSDIS). Also discussed was a trade study on the use of the Open source Project for a Network Data Access Protocol (OPeNDAP) with Web Object Storage in the cloud.
LINUX, Virtualization, and the Cloud: A Hands-On Student Introductory Lab

ERIC Educational Resources Information Center

Serapiglia, Anthony

2013-01-01

Many students are entering Computer Science education with limited exposure to operating systems and applications other than those produced by Apple or Microsoft. This gap in familiarity with the Open Source community can quickly be bridged with a simple exercise that can also be used to strengthen two other important current computing concepts,…
Towards Efficient Scientific Data Management Using Cloud Storage

NASA Technical Reports Server (NTRS)

He, Qiming

2013-01-01

A software prototype allows users to backup and restore data to/from both public and private cloud storage such as Amazon's S3 and NASA's Nebula. Unlike other off-the-shelf tools, this software ensures user data security in the cloud (through encryption), and minimizes users operating costs by using space- and bandwidth-efficient compression and incremental backup. Parallel data processing utilities have also been developed by using massively scalable cloud computing in conjunction with cloud storage. One of the innovations in this software is using modified open source components to work with a private cloud like NASA Nebula. Another innovation is porting the complex backup to- cloud software to embedded Linux, running on the home networking devices, in order to benefit more users.
Automating NEURON Simulation Deployment in Cloud Resources.

PubMed

Stockton, David B; Santamaria, Fidel

2017-01-01

Simulations in neuroscience are performed on local servers or High Performance Computing (HPC) facilities. Recently, cloud computing has emerged as a potential computational platform for neuroscience simulation. In this paper we compare and contrast HPC and cloud resources for scientific computation, then report how we deployed NEURON, a widely used simulator of neuronal activity, in three clouds: Chameleon Cloud, a hybrid private academic cloud for cloud technology research based on the OpenStack software; Rackspace, a public commercial cloud, also based on OpenStack; and Amazon Elastic Cloud Computing, based on Amazon's proprietary software. We describe the manual procedures and how to automate cloud operations. We describe extending our simulation automation software called NeuroManager (Stockton and Santamaria, Frontiers in Neuroinformatics, 2015), so that the user is capable of recruiting private cloud, public cloud, HPC, and local servers simultaneously with a simple common interface. We conclude by performing several studies in which we examine speedup, efficiency, total session time, and cost for sets of simulations of a published NEURON model.
Automating NEURON Simulation Deployment in Cloud Resources

PubMed Central

Santamaria, Fidel

2016-01-01

Simulations in neuroscience are performed on local servers or High Performance Computing (HPC) facilities. Recently, cloud computing has emerged as a potential computational platform for neuroscience simulation. In this paper we compare and contrast HPC and cloud resources for scientific computation, then report how we deployed NEURON, a widely used simulator of neuronal activity, in three clouds: Chameleon Cloud, a hybrid private academic cloud for cloud technology research based on the Open-Stack software; Rackspace, a public commercial cloud, also based on OpenStack; and Amazon Elastic Cloud Computing, based on Amazon’s proprietary software. We describe the manual procedures and how to automate cloud operations. We describe extending our simulation automation software called NeuroManager (Stockton and Santamaria, Frontiers in Neuroinformatics, 2015), so that the user is capable of recruiting private cloud, public cloud, HPC, and local servers simultaneously with a simple common interface. We conclude by performing several studies in which we examine speedup, efficiency, total session time, and cost for sets of simulations of a published NEURON model. PMID:27655341
76 FR 62373 - Notice of Public Meeting-Cloud Computing Forum & Workshop IV

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-07

...--Cloud Computing Forum & Workshop IV AGENCY: National Institute of Standards and Technology (NIST), Commerce. ACTION: Notice. SUMMARY: NIST announces the Cloud Computing Forum & Workshop IV to be held on... to help develop open standards in interoperability, portability and security in cloud computing. This...

Application research of Ganglia in Hadoop monitoring and management

NASA Astrophysics Data System (ADS)

Li, Gang; Ding, Jing; Zhou, Lixia; Yang, Yi; Liu, Lei; Wang, Xiaolei

2017-03-01

There are many applications of Hadoop System in the field of large data, cloud computing. The test bench of storage and application in seismic network at Earthquake Administration of Tianjin use with Hadoop system, which is used the open source software of Ganglia to operate and monitor. This paper reviews the function, installation and configuration process, application effect of operating and monitoring in Hadoop system of the Ganglia system. It briefly introduces the idea and effect of Nagios software monitoring Hadoop system. It is valuable for the industry in the monitoring system of cloud computing platform.
WASS: An open-source pipeline for 3D stereo reconstruction of ocean waves

NASA Astrophysics Data System (ADS)

Bergamasco, Filippo; Torsello, Andrea; Sclavo, Mauro; Barbariol, Francesco; Benetazzo, Alvise

2017-10-01

Stereo 3D reconstruction of ocean waves is gaining more and more popularity in the oceanographic community and industry. Indeed, recent advances of both computer vision algorithms and computer processing power now allow the study of the spatio-temporal wave field with unprecedented accuracy, especially at small scales. Even if simple in theory, multiple details are difficult to be mastered for a practitioner, so that the implementation of a sea-waves 3D reconstruction pipeline is in general considered a complex task. For instance, camera calibration, reliable stereo feature matching and mean sea-plane estimation are all factors for which a well designed implementation can make the difference to obtain valuable results. For this reason, we believe that the open availability of a well tested software package that automates the reconstruction process from stereo images to a 3D point cloud would be a valuable addition for future researches in this area. We present WASS (http://www.dais.unive.it/wass), an Open-Source stereo processing pipeline for sea waves 3D reconstruction. Our tool completely automates all the steps required to estimate dense point clouds from stereo images. Namely, it computes the extrinsic parameters of the stereo rig so that no delicate calibration has to be performed on the field. It implements a fast 3D dense stereo reconstruction procedure based on the consolidated OpenCV library and, lastly, it includes set of filtering techniques both on the disparity map and the produced point cloud to remove the vast majority of erroneous points that can naturally arise while analyzing the optically complex nature of the water surface. In this paper, we describe the architecture of WASS and the internal algorithms involved. The pipeline workflow is shown step-by-step and demonstrated on real datasets acquired at sea.
75 FR 13258 - Announcing a Meeting of the Information Security and Privacy Advisory Board

Federal Register 2010, 2011, 2012, 2013, 2014

2010-03-19

.../index.html/ . Agenda: --Cloud Computing Implementations --Health IT --OpenID --Pending Cyber Security... will be available for the public and media. --OpenID --Cloud Computing Implementations --Security...
ooi: OpenStack OCCI interface

NASA Astrophysics Data System (ADS)

López García, Álvaro; Fernández del Castillo, Enol; Orviz Fernández, Pablo

In this document we present an implementation of the Open Grid Forum's Open Cloud Computing Interface (OCCI) for OpenStack, namely ooi (Openstack occi interface, 2015) [1]. OCCI is an open standard for management tasks over cloud resources, focused on interoperability, portability and integration. ooi aims to implement this open interface for the OpenStack cloud middleware, promoting interoperability with other OCCI-enabled cloud management frameworks and infrastructures. ooi focuses on being non-invasive with a vanilla OpenStack installation, not tied to a particular OpenStack release version.
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.

PubMed

Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D T; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung

2014-01-01

Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce

PubMed Central

Chung, Wei-Chun; Chen, Chien-Chih; Ho, Jan-Ming; Lin, Chung-Yen; Hsu, Wen-Lian; Wang, Yu-Chun; Lee, D. T.; Lai, Feipei; Huang, Chih-Wei; Chang, Yu-Jung

2014-01-01

Background Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. Results We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. Conclusions CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. Availability: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/. PMID:24897343
Environments for online maritime simulators with cloud computing capabilities

NASA Astrophysics Data System (ADS)

Raicu, Gabriel; Raicu, Alexandra

2016-12-01

This paper presents the cloud computing environments, network principles and methods for graphical development in realistic naval simulation, naval robotics and virtual interactions. The aim of this approach is to achieve a good simulation quality in large networked environments using open source solutions designed for educational purposes. Realistic rendering of maritime environments requires near real-time frameworks with enhanced computing capabilities during distance interactions. E-Navigation concepts coupled with the last achievements in virtual and augmented reality will enhance the overall experience leading to new developments and innovations. We have to deal with a multiprocessing situation using advanced technologies and distributed applications using remote ship scenario and automation of ship operations.
Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing.

PubMed

Trudgian, David C; Mirzaei, Hamid

2012-12-07

We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include modular local and remote scheduling for data processing jobs. The pipeline can now be run on a single PC or server, a local cluster, a remote HPC cluster, and/or the Amazon Web Services (AWS) cloud. We provide public images that allow easy deployment of CPFP in its entirety in the AWS cloud. This significantly reduces the effort necessary to use the software, and allows proteomics laboratories to pay for compute time ad hoc, rather than obtaining and maintaining expensive local server clusters. Alternatively the Amazon cloud can be used to increase the throughput of a local installation of CPFP as necessary. We demonstrate that cloud CPFP allows users to process data at higher speed than local installations but with similar cost and lower staff requirements. In addition to the computational improvements, the web interface to CPFP is simplified, and other functionalities are enhanced. The software is under active development at two leading institutions and continues to be released under an open-source license at http://cpfp.sourceforge.net.
Delivering Unidata Technology via the Cloud

NASA Astrophysics Data System (ADS)

Fisher, Ward; Oxelson Ganter, Jennifer

2016-04-01

Over the last two years, Docker has emerged as the clear leader in open-source containerization. Containerization technology provides a means by which software can be pre-configured and packaged into a single unit, i.e. a container. This container can then be easily deployed either on local or remote systems. Containerization is particularly advantageous when moving software into the cloud, as it simplifies the process. Unidata is adopting containerization as part of our commitment to migrate our technologies to the cloud. We are using a two-pronged approach in this endeavor. In addition to migrating our data-portal services to a cloud environment, we are also exploring new and novel ways to use cloud-specific technology to serve our community. This effort has resulted in several new cloud/Docker-specific projects at Unidata: "CloudStream," "CloudIDV," and "CloudControl." CloudStream is a docker-based technology stack for bringing legacy desktop software to new computing environments, without the need to invest significant engineering/development resources. CloudStream helps make it easier to run existing software in a cloud environment via a technology called "Application Streaming." CloudIDV is a CloudStream-based implementation of the Unidata Integrated Data Viewer (IDV). CloudIDV serves as a practical example of application streaming, and demonstrates how traditional software can be easily accessed and controlled via a web browser. Finally, CloudControl is a web-based dashboard which provides administrative controls for running docker-based technologies in the cloud, as well as providing user management. In this work we will give an overview of these three open-source technologies and the value they offer to our community.
Distributed MRI reconstruction using Gadgetron-based cloud computing.

PubMed

Xue, Hui; Inati, Souheil; Sørensen, Thomas Sangild; Kellman, Peter; Hansen, Michael S

2015-03-01

To expand the open source Gadgetron reconstruction framework to support distributed computing and to demonstrate that a multinode version of the Gadgetron can be used to provide nonlinear reconstruction with clinically acceptable latency. The Gadgetron framework was extended with new software components that enable an arbitrary number of Gadgetron instances to collaborate on a reconstruction task. This cloud-enabled version of the Gadgetron was deployed on three different distributed computing platforms ranging from a heterogeneous collection of commodity computers to the commercial Amazon Elastic Compute Cloud. The Gadgetron cloud was used to provide nonlinear, compressed sensing reconstruction on a clinical scanner with low reconstruction latency (eg, cardiac and neuroimaging applications). The proposed setup was able to handle acquisition and 11 -SPIRiT reconstruction of nine high temporal resolution real-time, cardiac short axis cine acquisitions, covering the ventricles for functional evaluation, in under 1 min. A three-dimensional high-resolution brain acquisition with 1 mm(3) isotropic pixel size was acquired and reconstructed with nonlinear reconstruction in less than 5 min. A distributed computing enabled Gadgetron provides a scalable way to improve reconstruction performance using commodity cluster computing. Nonlinear, compressed sensing reconstruction can be deployed clinically with low image reconstruction latency. © 2014 Wiley Periodicals, Inc.
Interfacing HTCondor-CE with OpenStack

NASA Astrophysics Data System (ADS)

Bockelman, B.; Caballero Bejar, J.; Hover, J.

2017-10-01

Over the past few years, Grid Computing technologies have reached a high level of maturity. One key aspect of this success has been the development and adoption of newer Compute Elements to interface the external Grid users with local batch systems. These new Compute Elements allow for better handling of jobs requirements and a more precise management of diverse local resources. However, despite this level of maturity, the Grid Computing world is lacking diversity in local execution platforms. As Grid Computing technologies have historically been driven by the needs of the High Energy Physics community, most resource providers run the platform (operating system version and architecture) that best suits the needs of their particular users. In parallel, the development of virtualization and cloud technologies has accelerated recently, making available a variety of solutions, both commercial and academic, proprietary and open source. Virtualization facilitates performing computational tasks on platforms not available at most computing sites. This work attempts to join the technologies, allowing users to interact with computing sites through one of the standard Computing Elements, HTCondor-CE, but running their jobs within VMs on a local cloud platform, OpenStack, when needed. The system will re-route, in a transparent way, end user jobs into dynamically-launched VM worker nodes when they have requirements that cannot be satisfied by the static local batch system nodes. Also, once the automated mechanisms are in place, it becomes straightforward to allow an end user to invoke a custom Virtual Machine at the site. This will allow cloud resources to be used without requiring the user to establish a separate account. Both scenarios are described in this work.
Cloud Based Metalearning System for Predictive Modeling of Biomedical Data

PubMed Central

Vukićević, Milan

2014-01-01

Rapid growth and storage of biomedical data enabled many opportunities for predictive modeling and improvement of healthcare processes. On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies for analysis of biomedical data. PMID:24892101
Processing Uav and LIDAR Point Clouds in Grass GIS

NASA Astrophysics Data System (ADS)

Petras, V.; Petrasova, A.; Jeziorska, J.; Mitasova, H.

2016-06-01

Today's methods of acquiring Earth surface data, namely lidar and unmanned aerial vehicle (UAV) imagery, non-selectively collect or generate large amounts of points. Point clouds from different sources vary in their properties such as number of returns, density, or quality. We present a set of tools with applications for different types of points clouds obtained by a lidar scanner, structure from motion technique (SfM), and a low-cost 3D scanner. To take advantage of the vertical structure of multiple return lidar point clouds, we demonstrate tools to process them using 3D raster techniques which allow, for example, the development of custom vegetation classification methods. Dense point clouds obtained from UAV imagery, often containing redundant points, can be decimated using various techniques before further processing. We implemented and compared several decimation techniques in regard to their performance and the final digital surface model (DSM). Finally, we will describe the processing of a point cloud from a low-cost 3D scanner, namely Microsoft Kinect, and its application for interaction with physical models. All the presented tools are open source and integrated in GRASS GIS, a multi-purpose open source GIS with remote sensing capabilities. The tools integrate with other open source projects, specifically Point Data Abstraction Library (PDAL), Point Cloud Library (PCL), and OpenKinect libfreenect2 library to benefit from the open source point cloud ecosystem. The implementation in GRASS GIS ensures long term maintenance and reproducibility by the scientific community but also by the original authors themselves.
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

PubMed Central

Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

2014-01-01

Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852
Curvature computation in volume-of-fluid method based on point-cloud sampling

NASA Astrophysics Data System (ADS)

Kassar, Bruno B. M.; Carneiro, João N. E.; Nieckele, Angela O.

2018-01-01

This work proposes a novel approach to compute interface curvature in multiphase flow simulation based on Volume of Fluid (VOF) method. It is well documented in the literature that curvature and normal vector computation in VOF may lack accuracy mainly due to abrupt changes in the volume fraction field across the interfaces. This may cause deterioration on the interface tension forces estimates, often resulting in inaccurate results for interface tension dominated flows. Many techniques have been presented over the last years in order to enhance accuracy in normal vectors and curvature estimates including height functions, parabolic fitting of the volume fraction, reconstructing distance functions, coupling Level Set method with VOF, convolving the volume fraction field with smoothing kernels among others. We propose a novel technique based on a representation of the interface by a cloud of points. The curvatures and the interface normal vectors are computed geometrically at each point of the cloud and projected onto the Eulerian grid in a Front-Tracking manner. Results are compared to benchmark data and significant reduction on spurious currents as well as improvement in the pressure jump are observed. The method was developed in the open source suite OpenFOAM® extending its standard VOF implementation, the interFoam solver.
Development of a cloud-based Bioinformatics Training Platform.

PubMed

Revote, Jerico; Watson-Haigh, Nathan S; Quenette, Steve; Bethwaite, Blair; McGrath, Annette; Shang, Catherine A

2017-05-01

The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. © The Author 2016. Published by Oxford University Press.
Development of a cloud-based Bioinformatics Training Platform

PubMed Central

Revote, Jerico; Watson-Haigh, Nathan S.; Quenette, Steve; Bethwaite, Blair; McGrath, Annette

2017-01-01

Abstract The Bioinformatics Training Platform (BTP) has been developed to provide access to the computational infrastructure required to deliver sophisticated hands-on bioinformatics training courses. The BTP is a cloud-based solution that is in active use for delivering next-generation sequencing training to Australian researchers at geographically dispersed locations. The BTP was built to provide an easy, accessible, consistent and cost-effective approach to delivering workshops at host universities and organizations with a high demand for bioinformatics training but lacking the dedicated bioinformatics training suites required. To support broad uptake of the BTP, the platform has been made compatible with multiple cloud infrastructures. The BTP is an open-source and open-access resource. To date, 20 training workshops have been delivered to over 700 trainees at over 10 venues across Australia using the BTP. PMID:27084333
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Hyunwoo; Timm, Steven

We present a summary of how X.509 authentication and authorization are used with OpenNebula in FermiCloud. We also describe a history of why the X.509 authentication was needed in FermiCloud, and review X.509 authorization options, both internal and external to OpenNebula. We show how these options can be and have been used to successfully run scientific workflows on federated clouds, which include OpenNebula on FermiCloud and Amazon Web Services as well as other community clouds. We also outline federation options being used by other commercial and open-source clouds and cloud research projects.
Key Technology Research on Open Architecture for The Sharing of Heterogeneous Geographic Analysis Models

NASA Astrophysics Data System (ADS)

Yue, S. S.; Wen, Y. N.; Lv, G. N.; Hu, D.

2013-10-01

In recent years, the increasing development of cloud computing technologies laid critical foundation for efficiently solving complicated geographic issues. However, it is still difficult to realize the cooperative operation of massive heterogeneous geographical models. Traditional cloud architecture is apt to provide centralized solution to end users, while all the required resources are often offered by large enterprises or special agencies. Thus, it's a closed framework from the perspective of resource utilization. Solving comprehensive geographic issues requires integrating multifarious heterogeneous geographical models and data. In this case, an open computing platform is in need, with which the model owners can package and deploy their models into cloud conveniently, while model users can search, access and utilize those models with cloud facility. Based on this concept, the open cloud service strategies for the sharing of heterogeneous geographic analysis models is studied in this article. The key technology: unified cloud interface strategy, sharing platform based on cloud service, and computing platform based on cloud service are discussed in detail, and related experiments are conducted for further verification.
Executable research compendia in geoscience research infrastructures

NASA Astrophysics Data System (ADS)

Nüst, Daniel

2017-04-01

From generation through analysis and collaboration to communication, scientific research requires the right tools. Scientists create their own software using third party libraries and platforms. Cloud computing, Open Science, public data infrastructures, and Open Source enable scientists with unprecedented opportunites, nowadays often in a field "Computational X" (e.g. computational seismology) or X-informatics (e.g. geoinformatics) [0]. This increases complexity and generates more innovation, e.g. Environmental Research Infrastructures (environmental RIs [1]). Researchers in Computational X write their software relying on both source code (e.g. from https://github.com) and binary libraries (e.g. from package managers such as APT, https://wiki.debian.org/Apt, or CRAN, https://cran.r-project.org/). They download data from domain specific (cf. https://re3data.org) or generic (e.g. https://zenodo.org) data repositories, and deploy computations remotely (e.g. European Open Science Cloud). The results themselves are archived, given persistent identifiers, connected to other works (e.g. using https://orcid.org/), and listed in metadata catalogues. A single researcher, intentionally or not, interacts with all sub-systems of RIs: data acquisition, data access, data processing, data curation, and community support [3]. To preserve computational research [3] proposes the Executable Research Compendium (ERC), a container format closing the gap of dependency preservation by encapsulating the runtime environment. ERCs and RIs can be integrated for different uses: (i) Coherence: ERC services validate completeness, integrity and results (ii) Metadata: ERCs connect the different parts of a piece of research and faciliate discovery (iii) Exchange and Preservation: ERC as usable building blocks are the shared and archived entity (iv) Self-consistency: ERCs remove dependence on ephemeral sources (v) Execution: ERC services create and execute a packaged analysis but integrate with existing platforms for display and control These integrations are vital for capturing workflows in RIs and connect key stakeholders (scientists, publishers, librarians). They are demonstrated using developments by the DFG-funded project Opening Reproducible Research (http://o2r.info). Semi-automatic creation of ERCs based on research workflows is a core goal of the project. References [0] Tony Hey, Stewart Tansley, Kristin Tolle (eds), 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research. [1] P. Martin et al., Open Information Linking for Environmental Research Infrastructures, 2015 IEEE 11th International Conference on e-Science, Munich, 2015, pp. 513-520. doi: 10.1109/eScience.2015.66 [2] Y. Chen et al., Analysis of Common Requirements for Environmental Science Research Infrastructures, The International Symposium on Grids and Clouds (ISGC) 2013, Taipei, 2013, http://pos.sissa.it/archive/conferences/179/032/ISGC [3] Opening Reproducible Research, Geophysical Research Abstracts Vol. 18, EGU2016-7396, 2016, http://meetingorganizer.copernicus.org/EGU2016/EGU2016-7396.pdf

SAME4HPC: A Promising Approach in Building a Scalable and Mobile Environment for High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karthik, Rajasekar

2014-01-01

In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
Construction and application of Red5 cluster based on OpenStack

NASA Astrophysics Data System (ADS)

Wang, Jiaqing; Song, Jianxin

2017-08-01

With the application and development of cloud computing technology in various fields, the resource utilization rate of the data center has been improved obviously, and the system based on cloud computing platform has also improved the expansibility and stability. In the traditional way, Red5 cluster resource utilization is low and the system stability is poor. This paper uses cloud computing to efficiently calculate the resource allocation ability, and builds a Red5 server cluster based on OpenStack. Multimedia applications can be published to the Red5 cloud server cluster. The system achieves the flexible construction of computing resources, but also greatly improves the stability of the cluster and service efficiency.
A Geospatial Information Grid Framework for Geological Survey.

PubMed

Wu, Liang; Xue, Lei; Li, Chaoling; Lv, Xia; Chen, Zhanlong; Guo, Mingqiang; Xie, Zhong

2015-01-01

The use of digital information in geological fields is becoming very important. Thus, informatization in geological surveys should not stagnate as a result of the level of data accumulation. The integration and sharing of distributed, multi-source, heterogeneous geological information is an open problem in geological domains. Applications and services use geological spatial data with many features, including being cross-region and cross-domain and requiring real-time updating. As a result of these features, desktop and web-based geographic information systems (GISs) experience difficulties in meeting the demand for geological spatial information. To facilitate the real-time sharing of data and services in distributed environments, a GIS platform that is open, integrative, reconfigurable, reusable and elastic would represent an indispensable tool. The purpose of this paper is to develop a geological cloud-computing platform for integrating and sharing geological information based on a cloud architecture. Thus, the geological cloud-computing platform defines geological ontology semantics; designs a standard geological information framework and a standard resource integration model; builds a peer-to-peer node management mechanism; achieves the description, organization, discovery, computing and integration of the distributed resources; and provides the distributed spatial meta service, the spatial information catalog service, the multi-mode geological data service and the spatial data interoperation service. The geological survey information cloud-computing platform has been implemented, and based on the platform, some geological data services and geological processing services were developed. Furthermore, an iron mine resource forecast and an evaluation service is introduced in this paper.
A Geospatial Information Grid Framework for Geological Survey

PubMed Central

Wu, Liang; Xue, Lei; Li, Chaoling; Lv, Xia; Chen, Zhanlong; Guo, Mingqiang; Xie, Zhong

2015-01-01

The use of digital information in geological fields is becoming very important. Thus, informatization in geological surveys should not stagnate as a result of the level of data accumulation. The integration and sharing of distributed, multi-source, heterogeneous geological information is an open problem in geological domains. Applications and services use geological spatial data with many features, including being cross-region and cross-domain and requiring real-time updating. As a result of these features, desktop and web-based geographic information systems (GISs) experience difficulties in meeting the demand for geological spatial information. To facilitate the real-time sharing of data and services in distributed environments, a GIS platform that is open, integrative, reconfigurable, reusable and elastic would represent an indispensable tool. The purpose of this paper is to develop a geological cloud-computing platform for integrating and sharing geological information based on a cloud architecture. Thus, the geological cloud-computing platform defines geological ontology semantics; designs a standard geological information framework and a standard resource integration model; builds a peer-to-peer node management mechanism; achieves the description, organization, discovery, computing and integration of the distributed resources; and provides the distributed spatial meta service, the spatial information catalog service, the multi-mode geological data service and the spatial data interoperation service. The geological survey information cloud-computing platform has been implemented, and based on the platform, some geological data services and geological processing services were developed. Furthermore, an iron mine resource forecast and an evaluation service is introduced in this paper. PMID:26710255
Design and deployment of an elastic network test-bed in IHEP data center based on SDN

NASA Astrophysics Data System (ADS)

Zeng, Shan; Qi, Fazhi; Chen, Gang

2017-10-01

High energy physics experiments produce huge amounts of raw data, while because of the sharing characteristics of the network resources, there is no guarantee of the available bandwidth for each experiment which may cause link congestion problems. On the other side, with the development of cloud computing technologies, IHEP have established a cloud platform based on OpenStack which can ensure the flexibility of the computing and storage resources, and more and more computing applications have been deployed on virtual machines established by OpenStack. However, under the traditional network architecture, network capability can’t be required elastically, which becomes the bottleneck of restricting the flexible application of cloud computing. In order to solve the above problems, we propose an elastic cloud data center network architecture based on SDN, and we also design a high performance controller cluster based on OpenDaylight. In the end, we present our current test results.
Trusted computing strengthens cloud authentication.

PubMed

Ghazizadeh, Eghbal; Zamani, Mazdak; Ab Manan, Jamalul-lail; Alizadeh, Mojtaba

2014-01-01

Cloud computing is a new generation of technology which is designed to provide the commercial necessities, solve the IT management issues, and run the appropriate applications. Another entry on the list of cloud functions which has been handled internally is Identity Access Management (IAM). Companies encounter IAM as security challenges while adopting more technologies became apparent. Trust Multi-tenancy and trusted computing based on a Trusted Platform Module (TPM) are great technologies for solving the trust and security concerns in the cloud identity environment. Single sign-on (SSO) and OpenID have been released to solve security and privacy problems for cloud identity. This paper proposes the use of trusted computing, Federated Identity Management, and OpenID Web SSO to solve identity theft in the cloud. Besides, this proposed model has been simulated in .Net environment. Security analyzing, simulation, and BLP confidential model are three ways to evaluate and analyze our proposed model.
Trusted Computing Strengthens Cloud Authentication

PubMed Central

2014-01-01

Cloud computing is a new generation of technology which is designed to provide the commercial necessities, solve the IT management issues, and run the appropriate applications. Another entry on the list of cloud functions which has been handled internally is Identity Access Management (IAM). Companies encounter IAM as security challenges while adopting more technologies became apparent. Trust Multi-tenancy and trusted computing based on a Trusted Platform Module (TPM) are great technologies for solving the trust and security concerns in the cloud identity environment. Single sign-on (SSO) and OpenID have been released to solve security and privacy problems for cloud identity. This paper proposes the use of trusted computing, Federated Identity Management, and OpenID Web SSO to solve identity theft in the cloud. Besides, this proposed model has been simulated in .Net environment. Security analyzing, simulation, and BLP confidential model are three ways to evaluate and analyze our proposed model. PMID:24701149
Satellite Cloud and Radiative Property Processing and Distribution System on the NASA Langley ASDC OpenStack and OpenShift Cloud Platform

NASA Astrophysics Data System (ADS)

Nguyen, L.; Chee, T.; Palikonda, R.; Smith, W. L., Jr.; Bedka, K. M.; Spangenberg, D.; Vakhnin, A.; Lutz, N. E.; Walter, J.; Kusterer, J.

2017-12-01

Cloud Computing offers new opportunities for large-scale scientific data producers to utilize Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) IT resources to process and deliver data products in an operational environment where timely delivery, reliability, and availability are critical. The NASA Langley Research Center Atmospheric Science Data Center (ASDC) is building and testing a private and public facing cloud for users in the Science Directorate to utilize as an everyday production environment. The NASA SatCORPS (Satellite ClOud and Radiation Property Retrieval System) team processes and derives near real-time (NRT) global cloud products from operational geostationary (GEO) satellite imager datasets. To deliver these products, we will utilize the public facing cloud and OpenShift to deploy a load-balanced webserver for data storage, access, and dissemination. The OpenStack private cloud will host data ingest and computational capabilities for SatCORPS processing. This paper will discuss the SatCORPS migration towards, and usage of, the ASDC Cloud Services in an operational environment. Detailed lessons learned from use of prior cloud providers, specifically the Amazon Web Services (AWS) GovCloud and the Government Cloud administered by the Langley Managed Cloud Environment (LMCE) will also be discussed.
Advancing global marine biogeography research with open-source GIS software and cloud-computing

USGS Publications Warehouse

Fujioka, Ei; Vanden Berghe, Edward; Donnelly, Ben; Castillo, Julio; Cleary, Jesse; Holmes, Chris; McKnight, Sean; Halpin, patrick

2012-01-01

Across many scientific domains, the ability to aggregate disparate datasets enables more meaningful global analyses. Within marine biology, the Census of Marine Life served as the catalyst for such a global data aggregation effort. Under the Census framework, the Ocean Biogeographic Information System was established to coordinate an unprecedented aggregation of global marine biogeography data. The OBIS data system now contains 31.3 million observations, freely accessible through a geospatial portal. The challenges of storing, querying, disseminating, and mapping a global data collection of this complexity and magnitude are significant. In the face of declining performance and expanding feature requests, a redevelopment of the OBIS data system was undertaken. Following an Open Source philosophy, the OBIS technology stack was rebuilt using PostgreSQL, PostGIS, GeoServer and OpenLayers. This approach has markedly improved the performance and online user experience while maintaining a standards-compliant and interoperable framework. Due to the distributed nature of the project and increasing needs for storage, scalability and deployment flexibility, the entire hardware and software stack was built on a Cloud Computing environment. The flexibility of the platform, combined with the power of the application stack, enabled rapid re-development of the OBIS infrastructure, and ensured complete standards-compliance.
Cloud-based Web Services for Near-Real-Time Web access to NPP Satellite Imagery and other Data

NASA Astrophysics Data System (ADS)

Evans, J. D.; Valente, E. G.

2010-12-01

We are building a scalable, cloud computing-based infrastructure for Web access to near-real-time data products synthesized from the U.S. National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP) and other geospatial and meteorological data. Given recent and ongoing changes in the the NPP and NPOESS programs (now Joint Polar Satellite System), the need for timely delivery of NPP data is urgent. We propose an alternative to a traditional, centralized ground segment, using distributed Direct Broadcast facilities linked to industry-standard Web services by a streamlined processing chain running in a scalable cloud computing environment. Our processing chain, currently implemented on Amazon.com's Elastic Compute Cloud (EC2), retrieves raw data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) and synthesizes data products such as Sea-Surface Temperature, Vegetation Indices, etc. The cloud computing approach lets us grow and shrink computing resources to meet large and rapid fluctuations (twice daily) in both end-user demand and data availability from polar-orbiting sensors. Early prototypes have delivered various data products to end-users with latencies between 6 and 32 minutes. We have begun to replicate machine instances in the cloud, so as to reduce latency and maintain near-real time data access regardless of increased data input rates or user demand -- all at quite moderate monthly costs. Our service-based approach (in which users invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored and composite (e.g., false-color multiband) products on demand. To facilitate broad impact and adoption of our technology, we have emphasized open, industry-standard software interfaces and open source software. Through our work, we envision the widespread establishment of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sensors. A scalable architecture based on cloud computing ensures cost-effective, real-time processing and delivery of NPP and other data. Access via standard Web services maximizes its interoperability and usefulness.
Development of AN Open-Source Automatic Deformation Monitoring System for Geodetical and Geotechnical Measurements

NASA Astrophysics Data System (ADS)

Engel, P.; Schweimler, B.

2016-04-01

The deformation monitoring of structures and buildings is an important task field of modern engineering surveying, ensuring the standing and reliability of supervised objects over a long period. Several commercial hardware and software solutions for the realization of such monitoring measurements are available on the market. In addition to them, a research team at the Neubrandenburg University of Applied Sciences (NUAS) is actively developing a software package for monitoring purposes in geodesy and geotechnics, which is distributed under an open source licence and free of charge. The task of managing an open source project is well-known in computer science, but it is fairly new in a geodetic context. This paper contributes to that issue by detailing applications, frameworks, and interfaces for the design and implementation of open hardware and software solutions for sensor control, sensor networks, and data management in automatic deformation monitoring. It will be discussed how the development effort of networked applications can be reduced by using free programming tools, cloud computing technologies, and rapid prototyping methods.
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

PubMed

Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

2014-01-01

As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Enabling a new Paradigm to Address Big Data and Open Science Challenges

NASA Astrophysics Data System (ADS)

Ramamurthy, Mohan; Fisher, Ward

2017-04-01

Data are not only the lifeblood of the geosciences but they have become the currency of the modern world in science and society. Rapid advances in computing, communi¬cations, and observational technologies — along with concomitant advances in high-resolution modeling, ensemble and coupled-systems predictions of the Earth system — are revolutionizing nearly every aspect of our field. Modern data volumes from high-resolution ensemble prediction/projection/simulation systems and next-generation remote-sensing systems like hyper-spectral satellite sensors and phased-array radars are staggering. For example, CMIP efforts alone will generate many petabytes of climate projection data for use in assessments of climate change. And NOAA's National Climatic Data Center projects that it will archive over 350 petabytes by 2030. For researchers and educators, this deluge and the increasing complexity of data brings challenges along with the opportunities for discovery and scientific breakthroughs. The potential for big data to transform the geosciences is enormous, but realizing the next frontier depends on effectively managing, analyzing, and exploiting these heterogeneous data sources, extracting knowledge and useful information from heterogeneous data sources in ways that were previously impossible, to enable discoveries and gain new insights. At the same time, there is a growing focus on the area of "Reproducibility or Replicability in Science" that has implications for Open Science. The advent of cloud computing has opened new avenues for not only addressing both big data and Open Science challenges to accelerate scientific discoveries. However, to successfully leverage the enormous potential of cloud technologies, it will require the data providers and the scientific communities to develop new paradigms to enable next-generation workflows and transform the conduct of science. Making data readily available is a necessary but not a sufficient condition. Data providers also need to give scientists an ecosystem that includes data, tools, workflows and other services needed to perform analytics, integration, interpretation, and synthesis - all in the same environment or platform. Instead of moving data to processing systems near users, as is the tradition, the cloud permits one to bring processing, computing, analysis and visualization to data - so called data proximate workbench capabilities, also known as server-side processing. In this talk, I will present the ongoing work at Unidata to facilitate a new paradigm for doing science by offering a suite of tools, resources, and platforms to leverage cloud technologies for addressing both big data and Open Science/reproducibility challenges. That work includes the development and deployment of new protocols for data access and server-side operations and Docker container images of key applications, JupyterHub Python notebook tools, and cloud-based analysis and visualization capability via the CloudIDV tool to enable reproducible workflows and effectively use the accessed data.
Scaling the CERN OpenStack cloud

NASA Astrophysics Data System (ADS)

Bell, T.; Bompastor, B.; Bukowiec, S.; Castro Leon, J.; Denis, M. K.; van Eldik, J.; Fermin Lobo, M.; Fernandez Alvarez, L.; Fernandez Rodriguez, D.; Marino, A.; Moreira, B.; Noel, B.; Oulevey, T.; Takase, W.; Wiebalck, A.; Zilli, S.

2015-12-01

CERN has been running a production OpenStack cloud since July 2013 to support physics computing and infrastructure services for the site. In the past year, CERN Cloud Infrastructure has seen a constant increase in nodes, virtual machines, users and projects. This paper will present what has been done in order to make the CERN cloud infrastructure scale out.
SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision.

PubMed

Wiewiórka, Marek S; Messina, Antonio; Pacholewska, Alicja; Maffioletti, Sergio; Gawrysiak, Piotr; Okoniewski, Michał J

2014-09-15

Many time-consuming analyses of next -: generation sequencing data can be addressed with modern cloud computing. The Apache Hadoop-based solutions have become popular in genomics BECAUSE OF: their scalability in a cloud infrastructure. So far, most of these tools have been used for batch data processing rather than interactive data querying. The SparkSeq software has been created to take advantage of a new MapReduce framework, Apache Spark, for next-generation sequencing data. SparkSeq is a general-purpose, flexible and easily extendable library for genomic cloud computing. It can be used to build genomic analysis pipelines in Scala and run them in an interactive way. SparkSeq opens up the possibility of customized ad hoc secondary analyses and iterative machine learning algorithms. This article demonstrates its scalability and overall fast performance by running the analyses of sequencing datasets. Tests of SparkSeq also prove that the use of cache and HDFS block size can be tuned for the optimal performance on multiple worker nodes. Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
IN13B-1660: Analytics and Visualization Pipelines for Big Data on the NASA Earth Exchange (NEX) and OpenNEX

NASA Technical Reports Server (NTRS)

Chaudhary, Aashish; Votava, Petr; Nemani, Ramakrishna R.; Michaelis, Andrew; Kotfila, Chris

2016-01-01

We are developing capabilities for an integrated petabyte-scale Earth science collaborative analysis and visualization environment. The ultimate goal is to deploy this environment within the NASA Earth Exchange (NEX) and OpenNEX in order to enhance existing science data production pipelines in both high-performance computing (HPC) and cloud environments. Bridging of HPC and cloud is a fairly new concept under active research and this system significantly enhances the ability of the scientific community to accelerate analysis and visualization of Earth science data from NASA missions, model outputs and other sources. We have developed a web-based system that seamlessly interfaces with both high-performance computing (HPC) and cloud environments, providing tools that enable science teams to develop and deploy large-scale analysis, visualization and QA pipelines of both the production process and the data products, and enable sharing results with the community. Our project is developed in several stages each addressing separate challenge - workflow integration, parallel execution in either cloud or HPC environments and big-data analytics or visualization. This work benefits a number of existing and upcoming projects supported by NEX, such as the Web Enabled Landsat Data (WELD), where we are developing a new QA pipeline for the 25PB system.
Analytics and Visualization Pipelines for Big Data on the NASA Earth Exchange (NEX) and OpenNEX

NASA Astrophysics Data System (ADS)

Chaudhary, A.; Votava, P.; Nemani, R. R.; Michaelis, A.; Kotfila, C.

2016-12-01

We are developing capabilities for an integrated petabyte-scale Earth science collaborative analysis and visualization environment. The ultimate goal is to deploy this environment within the NASA Earth Exchange (NEX) and OpenNEX in order to enhance existing science data production pipelines in both high-performance computing (HPC) and cloud environments. Bridging of HPC and cloud is a fairly new concept under active research and this system significantly enhances the ability of the scientific community to accelerate analysis and visualization of Earth science data from NASA missions, model outputs and other sources. We have developed a web-based system that seamlessly interfaces with both high-performance computing (HPC) and cloud environments, providing tools that enable science teams to develop and deploy large-scale analysis, visualization and QA pipelines of both the production process and the data products, and enable sharing results with the community. Our project is developed in several stages each addressing separate challenge - workflow integration, parallel execution in either cloud or HPC environments and big-data analytics or visualization. This work benefits a number of existing and upcoming projects supported by NEX, such as the Web Enabled Landsat Data (WELD), where we are developing a new QA pipeline for the 25PB system.
Homomorphic encryption experiments on IBM's cloud quantum computing platform

NASA Astrophysics Data System (ADS)

Huang, He-Liang; Zhao, You-Wei; Li, Tan; Li, Feng-Guang; Du, Yu-Tao; Fu, Xiang-Qun; Zhang, Shuo; Wang, Xiang; Bao, Wan-Su

2017-02-01

Quantum computing has undergone rapid development in recent years. Owing to limitations on scalability, personal quantum computers still seem slightly unrealistic in the near future. The first practical quantum computer for ordinary users is likely to be on the cloud. However, the adoption of cloud computing is possible only if security is ensured. Homomorphic encryption is a cryptographic protocol that allows computation to be performed on encrypted data without decrypting them, so it is well suited to cloud computing. Here, we first applied homomorphic encryption on IBM's cloud quantum computer platform. In our experiments, we successfully implemented a quantum algorithm for linear equations while protecting our privacy. This demonstration opens a feasible path to the next stage of development of cloud quantum information technology.
Integration of Cloud resources in the LHCb Distributed Computing

NASA Astrophysics Data System (ADS)

Úbeda García, Mario; Méndez Muñoz, Víctor; Stagni, Federico; Cabarrou, Baptiste; Rauschmayr, Nathalie; Charpentier, Philippe; Closier, Joel

2014-06-01

This contribution describes how Cloud resources have been integrated in the LHCb Distributed Computing. LHCb is using its specific Dirac extension (LHCbDirac) as an interware for its Distributed Computing. So far, it was seamlessly integrating Grid resources and Computer clusters. The cloud extension of DIRAC (VMDIRAC) allows the integration of Cloud computing infrastructures. It is able to interact with multiple types of infrastructures in commercial and institutional clouds, supported by multiple interfaces (Amazon EC2, OpenNebula, OpenStack and CloudStack) - instantiates, monitors and manages Virtual Machines running on this aggregation of Cloud resources. Moreover, specifications for institutional Cloud resources proposed by Worldwide LHC Computing Grid (WLCG), mainly by the High Energy Physics Unix Information Exchange (HEPiX) group, have been taken into account. Several initiatives and computing resource providers in the eScience environment have already deployed IaaS in production during 2013. Keeping this on mind, pros and cons of a cloud based infrasctructure have been studied in contrast with the current setup. As a result, this work addresses four different use cases which represent a major improvement on several levels of our infrastructure. We describe the solution implemented by LHCb for the contextualisation of the VMs based on the idea of Cloud Site. We report on operational experience of using in production several institutional Cloud resources that are thus becoming integral part of the LHCb Distributed Computing resources. Furthermore, we describe as well the gradual migration of our Service Infrastructure towards a fully distributed architecture following the Service as a Service (SaaS) model.
Enhancing Security by System-Level Virtualization in Cloud Computing Environments

NASA Astrophysics Data System (ADS)

Sun, Dawei; Chang, Guiran; Tan, Chunguang; Wang, Xingwei

Many trends are opening up the era of cloud computing, which will reshape the IT industry. Virtualization techniques have become an indispensable ingredient for almost all cloud computing system. By the virtual environments, cloud provider is able to run varieties of operating systems as needed by each cloud user. Virtualization can improve reliability, security, and availability of applications by using consolidation, isolation, and fault tolerance. In addition, it is possible to balance the workloads by using live migration techniques. In this paper, the definition of cloud computing is given; and then the service and deployment models are introduced. An analysis of security issues and challenges in implementation of cloud computing is identified. Moreover, a system-level virtualization case is established to enhance the security of cloud computing environments.

Efficient LIDAR Point Cloud Data Managing and Processing in a Hadoop-Based Distributed Framework

NASA Astrophysics Data System (ADS)

Wang, C.; Hu, F.; Sha, D.; Han, X.

2017-10-01

Light Detection and Ranging (LiDAR) is one of the most promising technologies in surveying and mapping city management, forestry, object recognition, computer vision engineer and others. However, it is challenging to efficiently storage, query and analyze the high-resolution 3D LiDAR data due to its volume and complexity. In order to improve the productivity of Lidar data processing, this study proposes a Hadoop-based framework to efficiently manage and process LiDAR data in a distributed and parallel manner, which takes advantage of Hadoop's storage and computing ability. At the same time, the Point Cloud Library (PCL), an open-source project for 2D/3D image and point cloud processing, is integrated with HDFS and MapReduce to conduct the Lidar data analysis algorithms provided by PCL in a parallel fashion. The experiment results show that the proposed framework can efficiently manage and process big LiDAR data.
MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.

PubMed

Expósito, Roberto R; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan

2017-09-01

This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool. Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es . rreye@udc.es. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Implementation of Web-Based Education in Egypt through Cloud Computing Technologies and Its Effect on Higher Education

ERIC Educational Resources Information Center

El-Seoud, M. Samir Abou; El-Sofany, Hosam F.; Taj-Eddin, Islam A. T. F.; Nosseir, Ann; El-Khouly, Mahmoud M.

2013-01-01

The information technology educational programs at most universities in Egypt face many obstacles that can be overcome using technology enhanced learning. An open source Moodle eLearning platform has been implemented at many public and private universities in Egypt, as an aid to deliver e-content and to provide the institution with various…
Development of an Innovative Interactive Virtual Classroom System for K-12 Education Using Google App Engine

ERIC Educational Resources Information Center

Mumba, Frackson; Zhu, Mengxia

2013-01-01

This paper presents a Simulation-based interactive Virtual ClassRoom web system (SVCR: www.vclasie.com) powered by the state-of-the-art cloud computing technology from Google SVCR integrates popular free open-source math, science and engineering simulations and provides functions such as secure user access control and management of courses,…
Sirepo for Synchrotron Radiation Workshop

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagler, Robert; Moeller, Paul; Rakitin, Maksim

Sirepo is an open source framework for cloud computing. The graphical user interface (GUI) for Sirepo, also known as the client, executes in any HTML5 compliant web browser on any computing platform, including tablets. The client is built in JavaScript, making use of the following open source libraries: Bootstrap, which is fundamental for cross-platform web applications; AngularJS, which provides a model–view–controller (MVC) architecture and GUI components; and D3.js, which provides interactive plots and data-driven transformations. The Sirepo server is built on the following Python technologies: Flask, which is a lightweight framework for web development; Jinja, which is a secure andmore » widely used templating language; and Werkzeug, a utility library that is compliant with the WSGI standard. We use Nginx as the HTTP server and proxy, which provides a scalable event-driven architecture. The physics codes supported by Sirepo execute inside a Docker container. One of the codes supported by Sirepo is the Synchrotron Radiation Workshop (SRW). SRW computes synchrotron radiation from relativistic electrons in arbitrary magnetic fields and propagates the radiation wavefronts through optical beamlines. SRW is open source and is primarily supported by Dr. Oleg Chubar of NSLS-II at Brookhaven National Laboratory.« less
Comparison of Point Cloud Registration Algorithms for Better Result Assessment - Towards AN Open-Source Solution

NASA Astrophysics Data System (ADS)

Lachat, E.; Landes, T.; Grussenmeyer, P.

2018-05-01

Terrestrial and airborne laser scanning, photogrammetry and more generally 3D recording techniques are used in a wide range of applications. After recording several individual 3D datasets known in local systems, one of the first crucial processing steps is the registration of these data into a common reference frame. To perform such a 3D transformation, commercial and open source software as well as programs from the academic community are available. Due to some lacks in terms of computation transparency and quality assessment in these solutions, it has been decided to develop an open source algorithm which is presented in this paper. It is dedicated to the simultaneous registration of multiple point clouds as well as their georeferencing. The idea is to use this algorithm as a start point for further implementations, involving the possibility of combining 3D data from different sources. Parallel to the presentation of the global registration methodology which has been employed, the aim of this paper is to confront the results achieved this way with the above-mentioned existing solutions. For this purpose, first results obtained with the proposed algorithm to perform the global registration of ten laser scanning point clouds are presented. An analysis of the quality criteria delivered by two selected software used in this study and a reflexion about these criteria is also performed to complete the comparison of the obtained results. The final aim of this paper is to validate the current efficiency of the proposed method through these comparisons.
Managing competing elastic Grid and Cloud scientific computing applications using OpenNebula

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Lusso, S.; Masera, M.; Vallero, S.

2015-12-01

Elastic cloud computing applications, i.e. applications that automatically scale according to computing needs, work on the ideal assumption of infinite resources. While large public cloud infrastructures may be a reasonable approximation of this condition, scientific computing centres like WLCG Grid sites usually work in a saturated regime, in which applications compete for scarce resources through queues, priorities and scheduling policies, and keeping a fraction of the computing cores idle to allow for headroom is usually not an option. In our particular environment one of the applications (a WLCG Tier-2 Grid site) is much larger than all the others and cannot autoscale easily. Nevertheless, other smaller applications can benefit of automatic elasticity; the implementation of this property in our infrastructure, based on the OpenNebula cloud stack, will be described and the very first operational experiences with a small number of strategies for timely allocation and release of resources will be discussed.
Scalable computing for evolutionary genomics.

PubMed

Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

2012-01-01

Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy

PubMed Central

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-01-01

Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.

PubMed

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-06-01

Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
A Hybrid Cloud Computing Service for Earth Sciences

NASA Astrophysics Data System (ADS)

Yang, C. P.

2016-12-01

Cloud Computing is becoming a norm for providing computing capabilities for advancing Earth sciences including big Earth data management, processing, analytics, model simulations, and many other aspects. A hybrid spatiotemporal cloud computing service is bulit at George Mason NSF spatiotemporal innovation center to meet this demands. This paper will report the service including several aspects: 1) the hardware includes 500 computing services and close to 2PB storage as well as connection to XSEDE Jetstream and Caltech experimental cloud computing environment for sharing the resource; 2) the cloud service is geographically distributed at east coast, west coast, and central region; 3) the cloud includes private clouds managed using open stack and eucalyptus, DC2 is used to bridge these and the public AWS cloud for interoperability and sharing computing resources when high demands surfing; 4) the cloud service is used to support NSF EarthCube program through the ECITE project, ESIP through the ESIP cloud computing cluster, semantics testbed cluster, and other clusters; 5) the cloud service is also available for the earth science communities to conduct geoscience. A brief introduction about how to use the cloud service will be included.
A Cloud-Computing Service for Environmental Geophysics and Seismic Data Processing

NASA Astrophysics Data System (ADS)

Heilmann, B. Z.; Maggi, P.; Piras, A.; Satta, G.; Deidda, G. P.; Bonomi, E.

2012-04-01

Cloud computing is establishing worldwide as a new high performance computing paradigm that offers formidable possibilities to industry and science. The presented cloud-computing portal, part of the Grida3 project, provides an innovative approach to seismic data processing by combining open-source state-of-the-art processing software and cloud-computing technology, making possible the effective use of distributed computation and data management with administratively distant resources. We substituted the user-side demanding hardware and software requirements by remote access to high-performance grid-computing facilities. As a result, data processing can be done quasi in real-time being ubiquitously controlled via Internet by a user-friendly web-browser interface. Besides the obvious advantages over locally installed seismic-processing packages, the presented cloud-computing solution creates completely new possibilities for scientific education, collaboration, and presentation of reproducible results. The web-browser interface of our portal is based on the commercially supported grid portal EnginFrame, an open framework based on Java, XML, and Web Services. We selected the hosted applications with the objective to allow the construction of typical 2D time-domain seismic-imaging workflows as used for environmental studies and, originally, for hydrocarbon exploration. For data visualization and pre-processing, we chose the free software package Seismic Un*x. We ported tools for trace balancing, amplitude gaining, muting, frequency filtering, dip filtering, deconvolution and rendering, with a customized choice of options as services onto the cloud-computing portal. For structural imaging and velocity-model building, we developed a grid version of the Common-Reflection-Surface stack, a data-driven imaging method that requires no user interaction at run time such as manual picking in prestack volumes or velocity spectra. Due to its high level of automation, CRS stacking can benefit largely from the hardware parallelism provided by the cloud deployment. The resulting output, post-stack section, coherence, and NMO-velocity panels are used to generate a smooth migration-velocity model. Residual static corrections are calculated as a by-product of the stack and can be applied iteratively. As a final step, a time migrated subsurface image is obtained by a parallelized Kirchhoff time migration scheme. Processing can be done step-by-step or using a graphical workflow editor that can launch a series of pipelined tasks. The status of the submitted jobs is monitored by a dedicated service. All results are stored in project directories, where they can be downloaded of viewed directly in the browser. Currently, the portal has access to three research clusters having a total number of 70 nodes with 4 cores each. They are shared with four other cloud-computing applications bundled within the GRIDA3 project. To demonstrate the functionality of our "seismic cloud lab", we will present results obtained for three different types of data, all taken from hydrogeophysical studies: (1) a seismic reflection data set, made of compressional waves from explosive sources, recorded in Muravera, Sardinia; (2) a shear-wave data set from, Sardinia; (3) a multi-offset Ground-Penetrating-Radar data set from Larreule, France. The presented work was funded by the government of the Autonomous Region of Sardinia and by the Italian Ministry of Research and Education.
RBioCloud: A Light-Weight Framework for Bioconductor and R-based Jobs on the Cloud.

PubMed

Varghese, Blesson; Patel, Ishan; Barker, Adam

2015-01-01

Large-scale ad hoc analytics of genomic data is popular using the R-programming language supported by over 700 software packages provided by Bioconductor. More recently, analytical jobs are benefitting from on-demand computing and storage, their scalability and their low maintenance cost, all of which are offered by the cloud. While biologists and bioinformaticists can take an analytical job and execute it on their personal workstations, it remains challenging to seamlessly execute the job on the cloud infrastructure without extensive knowledge of the cloud dashboard. How analytical jobs can not only with minimum effort be executed on the cloud, but also how both the resources and data required by the job can be managed is explored in this paper. An open-source light-weight framework for executing R-scripts using Bioconductor packages, referred to as `RBioCloud', is designed and developed. RBioCloud offers a set of simple command-line tools for managing the cloud resources, the data and the execution of the job. Three biological test cases validate the feasibility of RBioCloud. The framework is available from http://www.rbiocloud.com.
Open Source Dataturbine (OSDT) Android Sensorpod in Environmental Observing Systems

NASA Astrophysics Data System (ADS)

Fountain, T. R.; Shin, P.; Tilak, S.; Trinh, T.; Smith, J.; Kram, S.

2014-12-01

The OSDT Android SensorPod is a custom-designed mobile computing platform for assembling wireless sensor networks for environmental monitoring applications. Funded by an award from the Gordon and Betty Moore Foundation, the OSDT SensorPod represents a significant technological advance in the application of mobile and cloud computing technologies to near-real-time applications in environmental science, natural resources management, and disaster response and recovery. It provides a modular architecture based on open standards and open-source software that allows system developers to align their projects with industry best practices and technology trends, while avoiding commercial vendor lock-in to expensive proprietary software and hardware systems. The integration of mobile and cloud-computing infrastructure represents a disruptive technology in the field of environmental science, since basic assumptions about technology requirements are now open to revision, e.g., the roles of special purpose data loggers and dedicated site infrastructure. The OSDT Android SensorPod was designed with these considerations in mind, and the resulting system exhibits the following characteristics - it is flexible, efficient and robust. The system was developed and tested in the three science applications: 1) a fresh water limnology deployment in Wisconsin, 2) a near coastal marine science deployment at the UCSD Scripps Pier, and 3) a terrestrial ecological deployment in the mountains of Taiwan. As part of a public education and outreach effort, a Facebook page with daily ocean pH measurements from the UCSD Scripps pier was developed. Wireless sensor networks and the virtualization of data and network services is the future of environmental science infrastructure. The OSDT Android SensorPod was designed and developed to harness these new technology developments for environmental monitoring applications.
Investigation into Cloud Computing for More Robust Automated Bulk Image Geoprocessing

NASA Technical Reports Server (NTRS)

Brown, Richard B.; Smoot, James C.; Underwood, Lauren; Armstrong, C. Duane

2012-01-01

Geospatial resource assessments frequently require timely geospatial data processing that involves large multivariate remote sensing data sets. In particular, for disasters, response requires rapid access to large data volumes, substantial storage space and high performance processing capability. The processing and distribution of this data into usable information products requires a processing pipeline that can efficiently manage the required storage, computing utilities, and data handling requirements. In recent years, with the availability of cloud computing technology, cloud processing platforms have made available a powerful new computing infrastructure resource that can meet this need. To assess the utility of this resource, this project investigates cloud computing platforms for bulk, automated geoprocessing capabilities with respect to data handling and application development requirements. This presentation is of work being conducted by Applied Sciences Program Office at NASA-Stennis Space Center. A prototypical set of image manipulation and transformation processes that incorporate sample Unmanned Airborne System data were developed to create value-added products and tested for implementation on the "cloud". This project outlines the steps involved in creating and testing of open source software developed process code on a local prototype platform, and then transitioning this code with associated environment requirements into an analogous, but memory and processor enhanced cloud platform. A data processing cloud was used to store both standard digital camera panchromatic and multi-band image data, which were subsequently subjected to standard image processing functions such as NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), band stacking, reprojection, and other similar type data processes. Cloud infrastructure service providers were evaluated by taking these locally tested processing functions, and then applying them to a given cloud-enabled infrastructure to assesses and compare environment setup options and enabled technologies. This project reviews findings that were observed when cloud platforms were evaluated for bulk geoprocessing capabilities based on data handling and application development requirements.
Cloud prediction of protein structure and function with PredictProtein for Debian.

PubMed

Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

2013-01-01

We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.
Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

PubMed Central

Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

2013-01-01

We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032
Monte Carlo simulation of photon migration in a cloud computing environment with MapReduce

PubMed Central

Pratx, Guillem; Xing, Lei

2011-01-01

Monte Carlo simulation is considered the most reliable method for modeling photon migration in heterogeneous media. However, its widespread use is hindered by the high computational cost. The purpose of this work is to report on our implementation of a simple MapReduce method for performing fault-tolerant Monte Carlo computations in a massively-parallel cloud computing environment. We ported the MC321 Monte Carlo package to Hadoop, an open-source MapReduce framework. In this implementation, Map tasks compute photon histories in parallel while a Reduce task scores photon absorption. The distributed implementation was evaluated on a commercial compute cloud. The simulation time was found to be linearly dependent on the number of photons and inversely proportional to the number of nodes. For a cluster size of 240 nodes, the simulation of 100 billion photon histories took 22 min, a 1258 × speed-up compared to the single-threaded Monte Carlo program. The overall computational throughput was 85,178 photon histories per node per second, with a latency of 100 s. The distributed simulation produced the same output as the original implementation and was resilient to hardware failure: the correctness of the simulation was unaffected by the shutdown of 50% of the nodes. PMID:22191916
Clinician accessible tools for GUI computational models of transcranial electrical stimulation: BONSAI and SPHERES.

PubMed

Truong, Dennis Q; Hüber, Mathias; Xie, Xihe; Datta, Abhishek; Rahman, Asif; Parra, Lucas C; Dmochowski, Jacek P; Bikson, Marom

2014-01-01

Computational models of brain current flow during transcranial electrical stimulation (tES), including transcranial direct current stimulation (tDCS) and transcranial alternating current stimulation (tACS), are increasingly used to understand and optimize clinical trials. We propose that broad dissemination requires a simple graphical user interface (GUI) software that allows users to explore and design montages in real-time, based on their own clinical/experimental experience and objectives. We introduce two complimentary open-source platforms for this purpose: BONSAI and SPHERES. BONSAI is a web (cloud) based application (available at neuralengr.com/bonsai) that can be accessed through any flash-supported browser interface. SPHERES (available at neuralengr.com/spheres) is a stand-alone GUI application that allow consideration of arbitrary montages on a concentric sphere model by leveraging an analytical solution. These open-source tES modeling platforms are designed go be upgraded and enhanced. Trade-offs between open-access approaches that balance ease of access, speed, and flexibility are discussed. Copyright © 2014 Elsevier Inc. All rights reserved.
Design and Implementation of a Modern Automatic Deformation Monitoring System

NASA Astrophysics Data System (ADS)

Engel, Philipp; Schweimler, Björn

2016-03-01

The deformation monitoring of structures and buildings is an important task field of modern engineering surveying, ensuring the standing and reliability of supervised objects over a long period. Several commercial hardware and software solutions for the realization of such monitoring measurements are available on the market. In addition to them, a research team at the University of Applied Sciences in Neubrandenburg (NUAS) is actively developing a software package for monitoring purposes in geodesy and geotechnics, which is distributed under an open source licence and free of charge. The task of managing an open source project is well-known in computer science, but it is fairly new in a geodetic context. This paper contributes to that issue by detailing applications, frameworks, and interfaces for the design and implementation of open hardware and software solutions for sensor control, sensor networks, and data management in automatic deformation monitoring. It will be discussed how the development effort of networked applications can be reduced by using free programming tools, cloud computing technologies, and rapid prototyping methods.

A case study in open source innovation: developing the Tidepool Platform for interoperability in type 1 diabetes management.

PubMed

Neinstein, Aaron; Wong, Jenise; Look, Howard; Arbiter, Brandon; Quirk, Kent; McCanne, Steve; Sun, Yao; Blum, Michael; Adi, Saleh

2016-03-01

Develop a device-agnostic cloud platform to host diabetes device data and catalyze an ecosystem of software innovation for type 1 diabetes (T1D) management. An interdisciplinary team decided to establish a nonprofit company, Tidepool, and build open-source software. Through a user-centered design process, the authors created a software platform, the Tidepool Platform, to upload and host T1D device data in an integrated, device-agnostic fashion, as well as an application ("app"), Blip, to visualize the data. Tidepool's software utilizes the principles of modular components, modern web design including REST APIs and JavaScript, cloud computing, agile development methodology, and robust privacy and security. By consolidating the currently scattered and siloed T1D device data ecosystem into one open platform, Tidepool can improve access to the data and enable new possibilities and efficiencies in T1D clinical care and research. The Tidepool Platform decouples diabetes apps from diabetes devices, allowing software developers to build innovative apps without requiring them to design a unique back-end (e.g., database and security) or unique ways of ingesting device data. It allows people with T1D to choose to use any preferred app regardless of which device(s) they use. The authors believe that the Tidepool Platform can solve two current problems in the T1D device landscape: 1) limited access to T1D device data and 2) poor interoperability of data from different devices. If proven effective, Tidepool's open source, cloud model for health data interoperability is applicable to other healthcare use cases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
A case study in open source innovation: developing the Tidepool Platform for interoperability in type 1 diabetes management

PubMed Central

Wong, Jenise; Look, Howard; Arbiter, Brandon; Quirk, Kent; McCanne, Steve; Sun, Yao; Blum, Michael; Adi, Saleh

2016-01-01

Objective Develop a device-agnostic cloud platform to host diabetes device data and catalyze an ecosystem of software innovation for type 1 diabetes (T1D) management. Materials and Methods An interdisciplinary team decided to establish a nonprofit company, Tidepool, and build open-source software. Results Through a user-centered design process, the authors created a software platform, the Tidepool Platform, to upload and host T1D device data in an integrated, device-agnostic fashion, as well as an application (“app”), Blip, to visualize the data. Tidepool’s software utilizes the principles of modular components, modern web design including REST APIs and JavaScript, cloud computing, agile development methodology, and robust privacy and security. Discussion By consolidating the currently scattered and siloed T1D device data ecosystem into one open platform, Tidepool can improve access to the data and enable new possibilities and efficiencies in T1D clinical care and research. The Tidepool Platform decouples diabetes apps from diabetes devices, allowing software developers to build innovative apps without requiring them to design a unique back-end (e.g., database and security) or unique ways of ingesting device data. It allows people with T1D to choose to use any preferred app regardless of which device(s) they use. Conclusion The authors believe that the Tidepool Platform can solve two current problems in the T1D device landscape: 1) limited access to T1D device data and 2) poor interoperability of data from different devices. If proven effective, Tidepool’s open source, cloud model for health data interoperability is applicable to other healthcare use cases. PMID:26338218
JINR cloud infrastructure evolution

NASA Astrophysics Data System (ADS)

Baranov, A. V.; Balashov, N. A.; Kutovskiy, N. A.; Semenov, R. N.

2016-09-01

To fulfil JINR commitments in different national and international projects related to the use of modern information technologies such as cloud and grid computing as well as to provide a modern tool for JINR users for their scientific research a cloud infrastructure was deployed at Laboratory of Information Technologies of Joint Institute for Nuclear Research. OpenNebula software was chosen as a cloud platform. Initially it was set up in simple configuration with single front-end host and a few cloud nodes. Some custom development was done to tune JINR cloud installation to fit local needs: web form in the cloud web-interface for resources request, a menu item with cloud utilization statistics, user authentication via Kerberos, custom driver for OpenVZ containers. Because of high demand in that cloud service and its resources over-utilization it was re-designed to cover increasing users' needs in capacity, availability and reliability. Recently a new cloud instance has been deployed in high-availability configuration with distributed network file system and additional computing power.
Arc4nix: A cross-platform geospatial analytical library for cluster and cloud computing

NASA Astrophysics Data System (ADS)

Tang, Jingyin; Matyas, Corene J.

2018-02-01

Big Data in geospatial technology is a grand challenge for processing capacity. The ability to use a GIS for geospatial analysis on Cloud Computing and High Performance Computing (HPC) clusters has emerged as a new approach to provide feasible solutions. However, users lack the ability to migrate existing research tools to a Cloud Computing or HPC-based environment because of the incompatibility of the market-dominating ArcGIS software stack and Linux operating system. This manuscript details a cross-platform geospatial library "arc4nix" to bridge this gap. Arc4nix provides an application programming interface compatible with ArcGIS and its Python library "arcpy". Arc4nix uses a decoupled client-server architecture that permits geospatial analytical functions to run on the remote server and other functions to run on the native Python environment. It uses functional programming and meta-programming language to dynamically construct Python codes containing actual geospatial calculations, send them to a server and retrieve results. Arc4nix allows users to employ their arcpy-based script in a Cloud Computing and HPC environment with minimal or no modification. It also supports parallelizing tasks using multiple CPU cores and nodes for large-scale analyses. A case study of geospatial processing of a numerical weather model's output shows that arcpy scales linearly in a distributed environment. Arc4nix is open-source software.
Computer Education and Instructional Technology Teacher Trainees' Opinions about Cloud Computing Technology

ERIC Educational Resources Information Center

Karamete, Aysen

2015-01-01

This study aims to show the present conditions about the usage of cloud computing in the department of Computer Education and Instructional Technology (CEIT) amongst teacher trainees in School of Necatibey Education, Balikesir University, Turkey. In this study, a questionnaire with open-ended questions was used. 17 CEIT teacher trainees…
WASS: an open-source stereo processing pipeline for sea waves 3D reconstruction

NASA Astrophysics Data System (ADS)

Bergamasco, Filippo; Benetazzo, Alvise; Torsello, Andrea; Barbariol, Francesco; Carniel, Sandro; Sclavo, Mauro

2017-04-01

Stereo 3D reconstruction of ocean waves is gaining more and more popularity in the oceanographic community. In fact, recent advances of both computer vision algorithms and CPU processing power can now allow the study of the spatio-temporal wave fields with unprecedented accuracy, especially at small scales. Even if simple in theory, multiple details are difficult to be mastered for a practitioner so that the implementation of a 3D reconstruction pipeline is in general considered a complex task. For instance, camera calibration, reliable stereo feature matching and mean sea-plane estimation are all factors for which a well designed implementation can make the difference to obtain valuable results. For this reason, we believe that the open availability of a well-tested software package that automates the steps from stereo images to a 3D point cloud would be a valuable addition for future researches in this area. We present WASS, a completely Open-Source stereo processing pipeline for sea waves 3D reconstruction, available at http://www.dais.unive.it/wass/. Our tool completely automates the recovery of dense point clouds from stereo images by providing three main functionalities. First, WASS can automatically recover the extrinsic parameters of the stereo rig (up to scale) so that no delicate calibration has to be performed on the field. Second, WASS implements a fast 3D dense stereo reconstruction procedure so that an accurate 3D point cloud can be computed from each stereo pair. We rely on the well-consolidated OpenCV library both for the image stereo rectification and disparity map recovery. Lastly, a set of 2D and 3D filtering techniques both on the disparity map and the produced point cloud are implemented to remove the vast majority of erroneous points that can naturally arise while analyzing the optically complex nature of the water surface (examples are sun-glares, large white-capped areas, fog and water areosol, etc). Developed to be as fast as possible, WASS can process roughly four 5 MPixel stereo frames per minute (on a consumer i7 CPU) to produce a sequence of outlier-free point clouds with more than 3 million points each. Finally, it comes with an easy to use user interface and designed to be scalable on multiple parallel CPUs.
Integrating Cloud-Computing-Specific Model into Aircraft Design

NASA Astrophysics Data System (ADS)

Zhimin, Tian; Qi, Lin; Guangwen, Yang

Cloud Computing is becoming increasingly relevant, as it will enable companies involved in spreading this technology to open the door to Web 3.0. In the paper, the new categories of services introduced will slowly replace many types of computational resources currently used. In this perspective, grid computing, the basic element for the large scale supply of cloud services, will play a fundamental role in defining how those services will be provided. The paper tries to integrate cloud computing specific model into aircraft design. This work has acquired good results in sharing licenses of large scale and expensive software, such as CFD (Computational Fluid Dynamics), UG, CATIA, and so on.
Open Reading Frame Phylogenetic Analysis on the Cloud

PubMed Central

2013-01-01

Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843
Enabling Research Network Connectivity to Clouds with Virtual Router Technology

NASA Astrophysics Data System (ADS)

Seuster, R.; Casteels, K.; Leavett-Brown, CR; Paterson, M.; Sobie, RJ

2017-10-01

The use of opportunistic cloud resources by HEP experiments has significantly increased over the past few years. Clouds that are owned or managed by the HEP community are connected to the LHCONE network or the research network with global access to HEP computing resources. Private clouds, such as those supported by non-HEP research funds are generally connected to the international research network; however, commercial clouds are either not connected to the research network or only connect to research sites within their national boundaries. Since research network connectivity is a requirement for HEP applications, we need to find a solution that provides a high-speed connection. We are studying a solution with a virtual router that will address the use case when a commercial cloud has research network connectivity in a limited region. In this situation, we host a virtual router in our HEP site and require that all traffic from the commercial site transit through the virtual router. Although this may increase the network path and also the load on the HEP site, it is a workable solution that would enable the use of the remote cloud for low I/O applications. We are exploring some simple open-source solutions. In this paper, we present the results of our studies and how it will benefit our use of private and public clouds for HEP computing.
Generalized Intelligent Framework for Tutoring (GIFT) Cloud/Virtual Open Campus Quick Start Guide (Revision 1)

DTIC Science & Technology

2017-06-01

for GIFT Cloud, the web -based application version of the Generalized Intelligent Framework for Tutoring (GIFT). GIFT is a modular, open-source...external applications. GIFT is available to users with a GIFT Account at no cost. GIFT Cloud is an implementation of GIFT. This web -based application...section. Approved for public release; distribution is unlimited. 3 3. Requirements for GIFT Cloud GIFT Cloud is accessed via a web browser
Performance testing of 3D point cloud software

NASA Astrophysics Data System (ADS)

Varela-González, M.; González-Jorge, H.; Riveiro, B.; Arias, P.

2013-10-01

LiDAR systems are being used widely in recent years for many applications in the engineering field: civil engineering, cultural heritage, mining, industry and environmental engineering. One of the most important limitations of this technology is the large computational requirements involved in data processing, especially for large mobile LiDAR datasets. Several software solutions for data managing are available in the market, including open source suites, however, users often unknown methodologies to verify their performance properly. In this work a methodology for LiDAR software performance testing is presented and four different suites are studied: QT Modeler, VR Mesh, AutoCAD 3D Civil and the Point Cloud Library running in software developed at the University of Vigo (SITEGI). The software based on the Point Cloud Library shows better results in the loading time of the point clouds and CPU usage. However, it is not as strong as commercial suites in working set and commit size tests.
Diagnosing turbulence for research aircraft safety using open source toolkits

NASA Astrophysics Data System (ADS)

Lang, T. J.; Guy, N.

Open source software toolkits have been developed and applied to diagnose in-cloud turbulence in the vicinity of Earth science research aircraft, via analysis of ground-based Doppler radar data. Based on multiple retrospective analyses, these toolkits show promise for detecting significant turbulence well prior to cloud penetrations by research aircraft. A pilot study demonstrated the ability to provide mission scientists turbulence estimates in near real time during an actual field campaign, and thus these toolkits are recommended for usage in future cloud-penetrating aircraft field campaigns.
Seqcrawler: biological data indexing and browsing platform.

PubMed

Sallou, Olivier; Bretaudeau, Anthony; Roult, Aurelien

2012-07-24

Seqcrawler takes its roots in software like SRS or Lucegene. It provides an indexing platform to ease the search of data and meta-data in biological banks and it can scale to face the current flow of data. While many biological bank search tools are available on the Internet, mainly provided by large organizations to search their data, there is a lack of free and open source solutions to browse one's own set of data with a flexible query system and able to scale from a single computer to a cloud system. A personal index platform will help labs and bioinformaticians to search their meta-data but also to build a larger information system with custom subsets of data. The software is scalable from a single computer to a cloud-based infrastructure. It has been successfully tested in a private cloud with 3 index shards (pieces of index) hosting ~400 millions of sequence information (whole GenBank, UniProt, PDB and others) for a total size of 600 GB in a fault tolerant architecture (high-availability). It has also been successfully integrated with software to add extra meta-data from blast results to enhance users' result analysis. Seqcrawler provides a complete open source search and store solution for labs or platforms needing to manage large amount of data/meta-data with a flexible and customizable web interface. All components (search engine, visualization and data storage), though independent, share a common and coherent data system that can be queried with a simple HTTP interface. The solution scales easily and can also provide a high availability infrastructure.
Leveraging human oversight and intervention in large-scale parallel processing of open-source data

NASA Astrophysics Data System (ADS)

Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

2015-05-01

The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Virtualized Networks and Virtualized Optical Line Terminal (vOLT)

NASA Astrophysics Data System (ADS)

Ma, Jonathan; Israel, Stephen

2017-03-01

The success of the Internet and the proliferation of the Internet of Things (IoT) devices is forcing telecommunications carriers to re-architecture a central office as a datacenter (CORD) so as to bring the datacenter economics and cloud agility to a central office (CO). The Open Network Operating System (ONOS) is the first open-source software-defined network (SDN) operating system which is capable of managing and controlling network, computing, and storage resources to support CORD infrastructure and network virtualization. The virtualized Optical Line Termination (vOLT) is one of the key components in such virtualized networks.
Efficacy of Cloud-Radiative Perturbations in Deep Open- and Closed-Cell Stratocumulus Clouds due to Aerosol Perturbations

NASA Astrophysics Data System (ADS)

Possner, A.; Wang, H.; Caldeira, K.; Wood, R.; Ackerman, T. P.

2017-12-01

Aerosol-cloud interactions (ACIs) in marine stratocumulus remain a significant source of uncertainty in constraining the cloud-radiative effect in a changing climate. Ship tracks are undoubted manifestations of ACIs embedded within stratocumulus cloud decks and have proven to be a useful framework to study the effect of aerosol perturbations on cloud morphology, macrophysical, microphyiscal and cloud-radiative properties. However, so far most observational (Christensen et al. 2012, Chen et al. 2015) and numerical studies (Wang et al. 2011, Possner et al. 2015, Berner et al. 2015) have concentrated on ship tracks in shallow boundary layers of depths between 300 - 800 m, while most stratocumulus decks form in significantly deeper boundary layers (Muhlbauer et al. 2014). In this study we investigate the efficacy of aerosol perturbations in deep open and closed cell stratocumulus. Multi-day idealised cloud-resolving simulations are performed for the RF06 flight of the VOCALS-Rex field campaign (Wood et al. 2011). During this flight pockets of deep open and closed cells were observed in a 1410 m deep boundary layer. The efficacy of aerosol perturbations of varied concentration and spatial gradients in altering the cloud micro- and macrophysical state and cloud-radiative effect is determined in both cloud regimes. Our simulations show that a continued point source emission flux of 1.16*1011 particles m-2 s-1 applied within a 300x300 m2 gridbox induces pronounced cloud cover changes in approximately a third of the simulated 80x80 km2 domain, a weakening of the diurnal cycle in the open-cell regime and a resulting increase in domain-mean cloud albedo of 0.2. Furthermore, we contrast the efficacy of equal strength near-surface or above-cloud aerosol perturbations in altering the cloud state.
Architecture Design and Experimental Platform Demonstration of Optical Network based on OpenFlow Protocol

NASA Astrophysics Data System (ADS)

Xing, Fangyuan; Wang, Honghuan; Yin, Hongxi; Li, Ming; Luo, Shenzi; Wu, Chenguang

2016-02-01

With the extensive application of cloud computing and data centres, as well as the constantly emerging services, the big data with the burst characteristic has brought huge challenges to optical networks. Consequently, the software defined optical network (SDON) that combines optical networks with software defined network (SDN), has attracted much attention. In this paper, an OpenFlow-enabled optical node employed in optical cross-connect (OXC) and reconfigurable optical add/drop multiplexer (ROADM), is proposed. An open source OpenFlow controller is extended on routing strategies. In addition, the experiment platform based on OpenFlow protocol for software defined optical network, is designed. The feasibility and availability of the OpenFlow-enabled optical nodes and the extended OpenFlow controller are validated by the connectivity test, protection switching and load balancing experiments in this test platform.
Self-Similar Spin Images for Point Cloud Matching

NASA Astrophysics Data System (ADS)

Pulido, Daniel

The rapid growth of Light Detection And Ranging (Lidar) technologies that collect, process, and disseminate 3D point clouds have allowed for increasingly accurate spatial modeling and analysis of the real world. Lidar sensors can generate massive 3D point clouds of a collection area that provide highly detailed spatial and radiometric information. However, a Lidar collection can be expensive and time consuming. Simultaneously, the growth of crowdsourced Web 2.0 data (e.g., Flickr, OpenStreetMap) have provided researchers with a wealth of freely available data sources that cover a variety of geographic areas. Crowdsourced data can be of varying quality and density. In addition, since it is typically not collected as part of a dedicated experiment but rather volunteered, when and where the data is collected is arbitrary. The integration of these two sources of geoinformation can provide researchers the ability to generate products and derive intelligence that mitigate their respective disadvantages and combine their advantages. Therefore, this research will address the problem of fusing two point clouds from potentially different sources. Specifically, we will consider two problems: scale matching and feature matching. Scale matching consists of computing feature metrics of each point cloud and analyzing their distributions to determine scale differences. Feature matching consists of defining local descriptors that are invariant to common dataset distortions (e.g., rotation and translation). Additionally, after matching the point clouds they can be registered and processed further (e.g., change detection). The objective of this research is to develop novel methods to fuse and enhance two point clouds from potentially disparate sources (e.g., Lidar and crowdsourced Web 2.0 datasets). The scope of this research is to investigate both scale and feature matching between two point clouds. The specific focus of this research will be in developing a novel local descriptor based on the concept of self-similarity to aid in the scale and feature matching steps. An open problem in fusion is how best to extract features from two point clouds and then perform feature-based matching. The proposed approach for this matching step is the use of local self-similarity as an invariant measure to match features. In particular, the proposed approach is to combine the concept of local self-similarity with a well-known feature descriptor, Spin Images, and thereby define "Self-Similar Spin Images". This approach is then extended to the case of matching two points clouds in very different coordinate systems (e.g., a geo-referenced Lidar point cloud and stereo-image derived point cloud without geo-referencing). The use of Self-Similar Spin Images is again applied to address this problem by introducing a "Self-Similar Keyscale" that matches the spatial scales of two point clouds. Another open problem is how best to detect changes in content between two point clouds. A method is proposed to find changes between two point clouds by analyzing the order statistics of the nearest neighbors between the two clouds, and thereby define the "Nearest Neighbor Order Statistic" method. Note that the well-known Hausdorff distance is a special case as being just the maximum order statistic. Therefore, by studying the entire histogram of these nearest neighbors it is expected to yield a more robust method to detect points that are present in one cloud but not the other. This approach is applied at multiple resolutions. Therefore, changes detected at the coarsest level will yield large missing targets and at finer levels will yield smaller targets.
Context-aware distributed cloud computing using CloudScheduler

NASA Astrophysics Data System (ADS)

Seuster, R.; Leavett-Brown, CR; Casteels, K.; Driemel, C.; Paterson, M.; Ring, D.; Sobie, RJ; Taylor, RP; Weldon, J.

2017-10-01

The distributed cloud using the CloudScheduler VM provisioning service is one of the longest running systems for HEP workloads. It has run millions of jobs for ATLAS and Belle II over the past few years using private and commercial clouds around the world. Our goal is to scale the distributed cloud to the 10,000-core level, with the ability to run any type of application (low I/O, high I/O and high memory) on any cloud. To achieve this goal, we have been implementing changes that utilize context-aware computing designs that are currently employed in the mobile communication industry. Context-awareness makes use of real-time and archived data to respond to user or system requirements. In our distributed cloud, we have many opportunistic clouds with no local HEP services, software or storage repositories. A context-aware design significantly improves the reliability and performance of our system by locating the nearest location of the required services. We describe how we are collecting and managing contextual information from our workload management systems, the clouds, the virtual machines and our services. This information is used not only to monitor the system but also to carry out automated corrective actions. We are incrementally adding new alerting and response services to our distributed cloud. This will enable us to scale the number of clouds and virtual machines. Further, a context-aware design will enable us to run analysis or high I/O application on opportunistic clouds. We envisage an open-source HTTP data federation (for example, the DynaFed system at CERN) as a service that would provide us access to existing storage elements used by the HEP experiments.
High-performance scientific computing in the cloud

NASA Astrophysics Data System (ADS)

Jorissen, Kevin; Vila, Fernando; Rehr, John

2011-03-01

Cloud computing has the potential to open up high-performance computational science to a much broader class of researchers, owing to its ability to provide on-demand, virtualized computational resources. However, before such approaches can become commonplace, user-friendly tools must be developed that hide the unfamiliar cloud environment and streamline the management of cloud resources for many scientific applications. We have recently shown that high-performance cloud computing is feasible for parallelized x-ray spectroscopy calculations. We now present benchmark results for a wider selection of scientific applications focusing on electronic structure and spectroscopic simulation software in condensed matter physics. These applications are driven by an improved portable interface that can manage virtual clusters and run various applications in the cloud. We also describe a next generation of cluster tools, aimed at improved performance and a more robust cluster deployment. Supported by NSF grant OCI-1048052.

The EPOS Vision for the Open Science Cloud

NASA Astrophysics Data System (ADS)

Jeffery, Keith; Harrison, Matt; Cocco, Massimo

2016-04-01

Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.
Subject-enabled analytics model on measurement statistics in health risk expert system for public health informatics.

PubMed

Chung, Chi-Jung; Kuo, Yu-Chen; Hsieh, Yun-Yu; Li, Tsai-Chung; Lin, Cheng-Chieh; Liang, Wen-Miin; Liao, Li-Na; Li, Chia-Ing; Lin, Hsueh-Chun

2017-11-01

This study applied open source technology to establish a subject-enabled analytics model that can enhance measurement statistics of case studies with the public health data in cloud computing. The infrastructure of the proposed model comprises three domains: 1) the health measurement data warehouse (HMDW) for the case study repository, 2) the self-developed modules of online health risk information statistics (HRIStat) for cloud computing, and 3) the prototype of a Web-based process automation system in statistics (PASIS) for the health risk assessment of case studies with subject-enabled evaluation. The system design employed freeware including Java applications, MySQL, and R packages to drive a health risk expert system (HRES). In the design, the HRIStat modules enforce the typical analytics methods for biomedical statistics, and the PASIS interfaces enable process automation of the HRES for cloud computing. The Web-based model supports both modes, step-by-step analysis and auto-computing process, respectively for preliminary evaluation and real time computation. The proposed model was evaluated by computing prior researches in relation to the epidemiological measurement of diseases that were caused by either heavy metal exposures in the environment or clinical complications in hospital. The simulation validity was approved by the commercial statistics software. The model was installed in a stand-alone computer and in a cloud-server workstation to verify computing performance for a data amount of more than 230K sets. Both setups reached efficiency of about 10 5 sets per second. The Web-based PASIS interface can be used for cloud computing, and the HRIStat module can be flexibly expanded with advanced subjects for measurement statistics. The analytics procedure of the HRES prototype is capable of providing assessment criteria prior to estimating the potential risk to public health. Copyright © 2017 Elsevier B.V. All rights reserved.
ASSURED CLOUD COMPUTING UNIVERSITY CENTER OFEXCELLENCE (ACC UCOE)

DTIC Science & Technology

2018-01-18

average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed...infrastructure security -Design of algorithms and techniques for real- time assuredness in cloud computing -Map-reduce task assignment with data locality...46 DESIGN OF ALGORITHMS AND TECHNIQUES FOR REAL- TIME ASSUREDNESS IN CLOUD COMPUTING
Scaling predictive modeling in drug development with cloud computing.

PubMed

Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola

2015-01-26

Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
HPC: Rent or Buy

ERIC Educational Resources Information Center

Fredette, Michelle

2012-01-01

"Rent or buy?" is a question people ask about everything from housing to textbooks. It is also a question universities must consider when it comes to high-performance computing (HPC). With the advent of Amazon's Elastic Compute Cloud (EC2), Microsoft Windows HPC Server, Rackspace's OpenStack, and other cloud-based services, researchers now have…
Web-based Tsunami Early Warning System with instant Tsunami Propagation Calculations in the GPU Cloud

NASA Astrophysics Data System (ADS)

Hammitzsch, M.; Spazier, J.; Reißland, S.

2014-12-01

Usually, tsunami early warning and mitigation systems (TWS or TEWS) are based on several software components deployed in a client-server based infrastructure. The vast majority of systems importantly include desktop-based clients with a graphical user interface (GUI) for the operators in early warning centers. However, in times of cloud computing and ubiquitous computing the use of concepts and paradigms, introduced by continuously evolving approaches in information and communications technology (ICT), have to be considered even for early warning systems (EWS). Based on the experiences and the knowledge gained in three research projects - 'German Indonesian Tsunami Early Warning System' (GITEWS), 'Distant Early Warning System' (DEWS), and 'Collaborative, Complex, and Critical Decision-Support in Evolving Crises' (TRIDEC) - new technologies are exploited to implement a cloud-based and web-based prototype to open up new prospects for EWS. This prototype, named 'TRIDEC Cloud', merges several complementary external and in-house cloud-based services into one platform for automated background computation with graphics processing units (GPU), for web-mapping of hazard specific geospatial data, and for serving relevant functionality to handle, share, and communicate threat specific information in a collaborative and distributed environment. The prototype in its current version addresses tsunami early warning and mitigation. The integration of GPU accelerated tsunami simulation computations have been an integral part of this prototype to foster early warning with on-demand tsunami predictions based on actual source parameters. However, the platform is meant for researchers around the world to make use of the cloud-based GPU computation to analyze other types of geohazards and natural hazards and react upon the computed situation picture with a web-based GUI in a web browser at remote sites. The current website is an early alpha version for demonstration purposes to give the concept a whirl and to shape science's future. Further functionality, improvements and possible profound changes have to implemented successively based on the users' evolving needs.
Scaling Agile Infrastructure to People

NASA Astrophysics Data System (ADS)

Jones, B.; McCance, G.; Traylen, S.; Barrientos Arias, N.

2015-12-01

When CERN migrated its infrastructure away from homegrown fabric management tools to emerging industry-standard open-source solutions, the immediate technical challenges and motivation were clear. The move to a multi-site Cloud Computing model meant that the tool chains that were growing around this ecosystem would be a good choice, the challenge was to leverage them. The use of open-source tools brings challenges other than merely how to deploy them. Homegrown software, for all the deficiencies identified at the outset of the project, has the benefit of growing with the organization. This paper will examine what challenges there were in adapting open-source tools to the needs of the organization, particularly in the areas of multi-group development and security. Additionally, the increase in scale of the plant required changes to how Change Management was organized and managed. Continuous Integration techniques are used in order to manage the rate of change across multiple groups, and the tools and workflow for this will be examined.
The StratusLab cloud distribution: Use-cases and support for scientific applications

NASA Astrophysics Data System (ADS)

Floros, E.

2012-04-01

The StratusLab project is integrating an open cloud software distribution that enables organizations to setup and provide their own private or public IaaS (Infrastructure as a Service) computing clouds. StratusLab distribution capitalizes on popular infrastructure virtualization solutions like KVM, the OpenNebula virtual machine manager, Claudia service manager and SlipStream deployment platform, which are further enhanced and expanded with additional components developed within the project. The StratusLab distribution covers the core aspects of a cloud IaaS architecture, namely Computing (life-cycle management of virtual machines), Storage, Appliance management and Networking. The resulting software stack provides a packaged turn-key solution for deploying cloud computing services. The cloud computing infrastructures deployed using StratusLab can support a wide range of scientific and business use cases. Grid computing has been the primary use case pursued by the project and for this reason the initial priority has been the support for the deployment and operation of fully virtualized production-level grid sites; a goal that has already been achieved by operating such a site as part of EGI's (European Grid Initiative) pan-european grid infrastructure. In this area the project is currently working to provide non-trivial capabilities like elastic and autonomic management of grid site resources. Although grid computing has been the motivating paradigm, StratusLab's cloud distribution can support a wider range of use cases. Towards this direction, we have developed and currently provide support for setting up general purpose computing solutions like Hadoop, MPI and Torque clusters. For what concerns scientific applications the project is collaborating closely with the Bioinformatics community in order to prepare VM appliances and deploy optimized services for bioinformatics applications. In a similar manner additional scientific disciplines like Earth Science can take advantage of StratusLab cloud solutions. Interested users are welcomed to join StratusLab's user community by getting access to the reference cloud services deployed by the project and offered to the public.
TethysCluster: A comprehensive approach for harnessing cloud resources for hydrologic modeling

NASA Astrophysics Data System (ADS)

Nelson, J.; Jones, N.; Ames, D. P.

2015-12-01

Advances in water resources modeling are improving the information that can be supplied to support decisions affecting the safety and sustainability of society. However, as water resources models become more sophisticated and data-intensive they require more computational power to run. Purchasing and maintaining the computing facilities needed to support certain modeling tasks has been cost-prohibitive for many organizations. With the advent of the cloud, the computing resources needed to address this challenge are now available and cost-effective, yet there still remains a significant technical barrier to leverage these resources. This barrier inhibits many decision makers and even trained engineers from taking advantage of the best science and tools available. Here we present the Python tools TethysCluster and CondorPy, that have been developed to lower the barrier to model computation in the cloud by providing (1) programmatic access to dynamically scalable computing resources, (2) a batch scheduling system to queue and dispatch the jobs to the computing resources, (3) data management for job inputs and outputs, and (4) the ability to dynamically create, submit, and monitor computing jobs. These Python tools leverage the open source, computing-resource management, and job management software, HTCondor, to offer a flexible and scalable distributed-computing environment. While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision-support systems deployed as web apps.
Exploration of cloud computing late start LDRD #149630 : Raincoat. v. 2.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Echeverria, Victor T.; Metral, Michael David; Leger, Michelle A.

This report contains documentation from an interoperability study conducted under the Late Start LDRD 149630, Exploration of Cloud Computing. A small late-start LDRD from last year resulted in a study (Raincoat) on using Virtual Private Networks (VPNs) to enhance security in a hybrid cloud environment. Raincoat initially explored the use of OpenVPN on IPv4 and demonstrates that it is possible to secure the communication channel between two small 'test' clouds (a few nodes each) at New Mexico Tech and Sandia. We extended the Raincoat study to add IPSec support via Vyatta routers, to interface with a public cloud (Amazon Elasticmore » Compute Cloud (EC2)), and to be significantly more scalable than the previous iteration. The study contributed to our understanding of interoperability in a hybrid cloud.« less
Enabling BOINC in infrastructure as a service cloud system

NASA Astrophysics Data System (ADS)

Montes, Diego; Añel, Juan A.; Pena, Tomás F.; Uhe, Peter; Wallom, David C. H.

2017-02-01

Volunteer or crowd computing is becoming increasingly popular for solving complex research problems from an increasingly diverse range of areas. The majority of these have been built using the Berkeley Open Infrastructure for Network Computing (BOINC) platform, which provides a range of different services to manage all computation aspects of a project. The BOINC system is ideal in those cases where not only does the research community involved need low-cost access to massive computing resources but also where there is a significant public interest in the research being done.We discuss the way in which cloud services can help BOINC-based projects to deliver results in a fast, on demand manner. This is difficult to achieve using volunteers, and at the same time, using scalable cloud resources for short on demand projects can optimize the use of the available resources. We show how this design can be used as an efficient distributed computing platform within the cloud, and outline new approaches that could open up new possibilities in this field, using Climateprediction.net (http://www.climateprediction.net/) as a case study.
Multidimensional Environmental Data Resource Brokering on Computational Grids and Scientific Clouds

NASA Astrophysics Data System (ADS)

Montella, Raffaele; Giunta, Giulio; Laccetti, Giuliano

Grid computing has widely evolved over the past years, and its capabilities have found their way even into business products and are no longer relegated to scientific applications. Today, grid computing technology is not restricted to a set of specific grid open source or industrial products, but rather it is comprised of a set of capabilities virtually within any kind of software to create shared and highly collaborative production environments. These environments are focused on computational (workload) capabilities and the integration of information (data) into those computational capabilities. An active grid computing application field is the fully virtualization of scientific instruments in order to increase their availability and decrease operational and maintaining costs. Computational and information grids allow to manage real-world objects in a service-oriented way using industrial world-spread standards.
Sirepo - Warp

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagler, Robert; Moeller, Paul

Sirepo is an open source framework for cloud computing. The graphical user interface (GUI) for Sirepo, also known as the client, executes in any HTML5 compliant web browser on any computing platform, including tablets. The client is built in JavaScript, making use of the following open source libraries: Bootstrap, which is fundamental for cross-platform web applications; AngularJS, which provides a model–view–controller (MVC) architecture and GUI components; and D3.js, which provides interactive plots and data-driven transformations. The Sirepo server is built on the following Python technologies: Flask, which is a lightweight framework for web development; Jin-ja, which is a secure andmore » widely used templating language; and Werkzeug, a utility library that is compliant with the WSGI standard. We use Nginx as the HTTP server and proxy, which provides a scalable event-driven architecture. The physics codes supported by Sirepo execute inside a Docker container. One of the codes supported by Sirepo is Warp. Warp is a particle-in-cell (PIC) code de-signed to simulate high-intensity charged particle beams and plasmas in both the electrostatic and electromagnetic regimes, with a wide variety of integrated physics models and diagnostics. At pre-sent, Sirepo supports a small subset of Warp’s capabilities. Warp is open source and is part of the Berkeley Lab Accelerator Simulation Toolkit.« less
Multi-Dimensional Optimization for Cloud Based Multi-Tier Applications

ERIC Educational Resources Information Center

Jung, Gueyoung

2010-01-01

Emerging trends toward cloud computing and virtualization have been opening new avenues to meet enormous demands of space, resource utilization, and energy efficiency in modern data centers. By being allowed to host many multi-tier applications in consolidated environments, cloud infrastructure providers enable resources to be shared among these…
Agile Infrastructure Monitoring

NASA Astrophysics Data System (ADS)

Andrade, P.; Ascenso, J.; Fedorko, I.; Fiorini, B.; Paladin, M.; Pigueiras, L.; Santos, M.

2014-06-01

At the present time, data centres are facing a massive rise in virtualisation and cloud computing. The Agile Infrastructure (AI) project is working to deliver new solutions to ease the management of CERN data centres. Part of the solution consists in a new "shared monitoring architecture" which collects and manages monitoring data from all data centre resources. In this article, we present the building blocks of this new monitoring architecture, the different open source technologies selected for each architecture layer, and how we are building a community around this common effort.
The Confluence of GIS, Cloud and Open Source, Enabling Big Raster Data Applications

NASA Astrophysics Data System (ADS)

Plesea, L.; Emmart, C. B.; Boller, R. A.; Becker, P.; Baynes, K.

2016-12-01

The rapid evolution of available cloud services is profoundly changing the way applications are being developed and used. Massive object stores, service scalability, continuous integration are some of the most important cloud technology advances that directly influence science applications and GIS. At the same time, more and more scientists are using GIS platforms in their day to day research. Yet with new opportunities there are always some challenges. Given the large amount of data commonly required in science applications, usually large raster datasets, connectivity is one of the biggest problems. Connectivity has two aspects, one being the limited bandwidth and latency of the communication link due to the geographical location of the resources, the other one being the interoperability and intrinsic efficiency of the interface protocol used to connect. NASA and Esri are actively helping each other and collaborating on a few open source projects, aiming to provide some of the core technology components to directly address the GIS enabled data connectivity problems. Last year Esri contributed LERC, a very fast and efficient compression algorithm to the GDAL/MRF format, which itself is a NASA/Esri collaboration project. The MRF raster format has some cloud aware features that make it possible to build high performance web services on cloud platforms, as some of the Esri projects demonstrate. Currently, another NASA open source project, the high performance OnEarth WMTS server is being refactored and enhanced to better integrate with MRF, GDAL and Esri software. Taken together, the GDAL, MRF and OnEarth form the core of an open source CloudGIS toolkit that is already showing results. Since it is well integrated with GDAL, which is the most common interoperability component of GIS applications, this approach should improve the connectivity and performance of many science and GIS applications in the cloud.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

PubMed

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-12-01

Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment

PubMed Central

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-01-01

Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842
Georeferencing UAS Derivatives Through Point Cloud Registration with Archived Lidar Datasets

NASA Astrophysics Data System (ADS)

Magtalas, M. S. L. Y.; Aves, J. C. L.; Blanco, A. C.

2016-10-01

Georeferencing gathered images is a common step before performing spatial analysis and other processes on acquired datasets using unmanned aerial systems (UAS). Methods of applying spatial information to aerial images or their derivatives is through onboard GPS (Global Positioning Systems) geotagging, or through tying of models through GCPs (Ground Control Points) acquired in the field. Currently, UAS (Unmanned Aerial System) derivatives are limited to meter-levels of accuracy when their generation is unaided with points of known position on the ground. The use of ground control points established using survey-grade GPS or GNSS receivers can greatly reduce model errors to centimeter levels. However, this comes with additional costs not only with instrument acquisition and survey operations, but also in actual time spent in the field. This study uses a workflow for cloud-based post-processing of UAS data in combination with already existing LiDAR data. The georeferencing of the UAV point cloud is executed using the Iterative Closest Point algorithm (ICP). It is applied through the open-source CloudCompare software (Girardeau-Montaut, 2006) on a `skeleton point cloud'. This skeleton point cloud consists of manually extracted features consistent on both LiDAR and UAV data. For this cloud, roads and buildings with minimal deviations given their differing dates of acquisition are considered consistent. Transformation parameters are computed for the skeleton cloud which could then be applied to the whole UAS dataset. In addition, a separate cloud consisting of non-vegetation features automatically derived using CANUPO classification algorithm (Brodu and Lague, 2012) was used to generate a separate set of parameters. Ground survey is done to validate the transformed cloud. An RMSE value of around 16 centimeters was found when comparing validation data to the models georeferenced using the CANUPO cloud and the manual skeleton cloud. Cloud-to-cloud distance computations of CANUPO and manual skeleton clouds were obtained with values for both equal to around 0.67 meters at 1.73 standard deviation.
Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele

2014-11-11

has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less

Towards a Low-Cost Real-Time Photogrammetric Landslide Monitoring System Utilising Mobile and Cloud Computing Technology

NASA Astrophysics Data System (ADS)

Chidburee, P.; Mills, J. P.; Miller, P. E.; Fieber, K. D.

2016-06-01

Close-range photogrammetric techniques offer a potentially low-cost approach in terms of implementation and operation for initial assessment and monitoring of landslide processes over small areas. In particular, the Structure-from-Motion (SfM) pipeline is now extensively used to help overcome many constraints of traditional digital photogrammetry, offering increased user-friendliness to nonexperts, as well as lower costs. However, a landslide monitoring approach based on the SfM technique also presents some potential drawbacks due to the difficulty in managing and processing a large volume of data in real-time. This research addresses the aforementioned issues by attempting to combine a mobile device with cloud computing technology to develop a photogrammetric measurement solution as part of a monitoring system for landslide hazard analysis. The research presented here focusses on (i) the development of an Android mobile application; (ii) the implementation of SfM-based open-source software in the Amazon cloud computing web service, and (iii) performance assessment through a simulated environment using data collected at a recognized landslide test site in North Yorkshire, UK. Whilst the landslide monitoring mobile application is under development, this paper describes experiments carried out to ensure effective performance of the system in the future. Investigations presented here describe the initial assessment of a cloud-implemented approach, which is developed around the well-known VisualSFM algorithm. Results are compared to point clouds obtained from alternative SfM 3D reconstruction approaches considering a commercial software solution (Agisoft PhotoScan) and a web-based system (Autodesk 123D Catch). Investigations demonstrate that the cloud-based photogrammetric measurement system is capable of providing results of centimeter-level accuracy, evidencing its potential to provide an effective approach for quantifying and analyzing landslide hazard at a local-scale.
The Matsu Wheel: A Cloud-Based Framework for Efficient Analysis and Reanalysis of Earth Satellite Imagery

NASA Technical Reports Server (NTRS)

Patterson, Maria T.; Anderson, Nicholas; Bennett, Collin; Bruggemann, Jacob; Grossman, Robert L.; Handy, Matthew; Ly, Vuong; Mandl, Daniel J.; Pederson, Shane; Pivarski, James;

2016-01-01

Project Matsu is a collaboration between the Open Commons Consortium and NASA focused on developing open source technology for cloud-based processing of Earth satellite imagery with practical applications to aid in natural disaster detection and relief. Project Matsu has developed an open source cloud-based infrastructure to process, analyze, and reanalyze large collections of hyperspectral satellite image data using OpenStack, Hadoop, MapReduce and related technologies. We describe a framework for efficient analysis of large amounts of data called the Matsu "Wheel." The Matsu Wheel is currently used to process incoming hyperspectral satellite data produced daily by NASA's Earth Observing-1 (EO-1) satellite. The framework allows batches of analytics, scanning for new data, to be applied to data as it flows in. In the Matsu Wheel, the data only need to be accessed and preprocessed once, regardless of the number or types of analytics, which can easily be slotted into the existing framework. The Matsu Wheel system provides a significantly more efficient use of computational resources over alternative methods when the data are large, have high-volume throughput, may require heavy preprocessing, and are typically used for many types of analysis. We also describe our preliminary Wheel analytics, including an anomaly detector for rare spectral signatures or thermal anomalies in hyperspectral data and a land cover classifier that can be used for water and flood detection. Each of these analytics can generate visual reports accessible via the web for the public and interested decision makers. The result products of the analytics are also made accessible through an Open Geospatial Compliant (OGC)-compliant Web Map Service (WMS) for further distribution. The Matsu Wheel allows many shared data services to be performed together to efficiently use resources for processing hyperspectral satellite image data and other, e.g., large environmental datasets that may be analyzed for many purposes.

Implementation of Grid Tier 2 and Tier 3 facilities on a Distributed OpenStack Cloud

NASA Astrophysics Data System (ADS)

Limosani, Antonio; Boland, Lucien; Coddington, Paul; Crosby, Sean; Huang, Joanna; Sevior, Martin; Wilson, Ross; Zhang, Shunde

2014-06-01

The Australian Government is making a AUD 100 million investment in Compute and Storage for the academic community. The Compute facilities are provided in the form of 30,000 CPU cores located at 8 nodes around Australia in a distributed virtualized Infrastructure as a Service facility based on OpenStack. The storage will eventually consist of over 100 petabytes located at 6 nodes. All will be linked via a 100 Gb/s network. This proceeding describes the development of a fully connected WLCG Tier-2 grid site as well as a general purpose Tier-3 computing cluster based on this architecture. The facility employs an extension to Torque to enable dynamic allocations of virtual machine instances. A base Scientific Linux virtual machine (VM) image is deployed in the OpenStack cloud and automatically configured as required using Puppet. Custom scripts are used to launch multiple VMs, integrate them into the dynamic Torque cluster and to mount remote file systems. We report on our experience in developing this nation-wide ATLAS and Belle II Tier 2 and Tier 3 computing infrastructure using the national Research Cloud and storage facilities.
A study on strategic provisioning of cloud computing services.

PubMed

Whaiduzzaman, Md; Haque, Mohammad Nazmul; Rejaul Karim Chowdhury, Md; Gani, Abdullah

2014-01-01

Cloud computing is currently emerging as an ever-changing, growing paradigm that models "everything-as-a-service." Virtualised physical resources, infrastructure, and applications are supplied by service provisioning in the cloud. The evolution in the adoption of cloud computing is driven by clear and distinct promising features for both cloud users and cloud providers. However, the increasing number of cloud providers and the variety of service offerings have made it difficult for the customers to choose the best services. By employing successful service provisioning, the essential services required by customers, such as agility and availability, pricing, security and trust, and user metrics can be guaranteed by service provisioning. Hence, continuous service provisioning that satisfies the user requirements is a mandatory feature for the cloud user and vitally important in cloud computing service offerings. Therefore, we aim to review the state-of-the-art service provisioning objectives, essential services, topologies, user requirements, necessary metrics, and pricing mechanisms. We synthesize and summarize different provision techniques, approaches, and models through a comprehensive literature review. A thematic taxonomy of cloud service provisioning is presented after the systematic review. Finally, future research directions and open research issues are identified.
A Study on Strategic Provisioning of Cloud Computing Services

PubMed Central

Rejaul Karim Chowdhury, Md

2014-01-01

Cloud computing is currently emerging as an ever-changing, growing paradigm that models “everything-as-a-service.” Virtualised physical resources, infrastructure, and applications are supplied by service provisioning in the cloud. The evolution in the adoption of cloud computing is driven by clear and distinct promising features for both cloud users and cloud providers. However, the increasing number of cloud providers and the variety of service offerings have made it difficult for the customers to choose the best services. By employing successful service provisioning, the essential services required by customers, such as agility and availability, pricing, security and trust, and user metrics can be guaranteed by service provisioning. Hence, continuous service provisioning that satisfies the user requirements is a mandatory feature for the cloud user and vitally important in cloud computing service offerings. Therefore, we aim to review the state-of-the-art service provisioning objectives, essential services, topologies, user requirements, necessary metrics, and pricing mechanisms. We synthesize and summarize different provision techniques, approaches, and models through a comprehensive literature review. A thematic taxonomy of cloud service provisioning is presented after the systematic review. Finally, future research directions and open research issues are identified. PMID:25032243
Design of Control Plane Architecture Based on Cloud Platform and Experimental Network Demonstration for Multi-domain SDON

NASA Astrophysics Data System (ADS)

Li, Ming; Yin, Hongxi; Xing, Fangyuan; Wang, Jingchao; Wang, Honghuan

2016-02-01

With the features of network virtualization and resource programming, Software Defined Optical Network (SDON) is considered as the future development trend of optical network, provisioning a more flexible, efficient and open network function, supporting intraconnection and interconnection of data centers. Meanwhile cloud platform can provide powerful computing, storage and management capabilities. In this paper, with the coordination of SDON and cloud platform, a multi-domain SDON architecture based on cloud control plane has been proposed, which is composed of data centers with database (DB), path computation element (PCE), SDON controller and orchestrator. In addition, the structure of the multidomain SDON orchestrator and OpenFlow-enabled optical node are proposed to realize the combination of centralized and distributed effective management and control platform. Finally, the functional verification and demonstration are performed through our optical experiment network.
GenomeVIP: a cloud platform for genomic variant discovery and interpretation

PubMed Central

Mashl, R. Jay; Scott, Adam D.; Huang, Kuan-lin; Wyczalkowski, Matthew A.; Yoon, Christopher J.; Niu, Beifang; DeNardo, Erin; Yellapantula, Venkata D.; Handsaker, Robert E.; Chen, Ken; Koboldt, Daniel C.; Ye, Kai; Fenyö, David; Raphael, Benjamin J.; Wendl, Michael C.; Ding, Li

2017-01-01

Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets. PMID:28522612
Leveraging Cloud Technology to Provide a Responsive, Reliable and Scalable Backend for the Virtual Ice Sheet Laboratory Using the Ice Sheet System Model and Amazon's Elastic Compute Cloud

NASA Astrophysics Data System (ADS)

Perez, G. L.; Larour, E. Y.; Halkides, D. J.; Cheng, D. L. C.

2015-12-01

The Virtual Ice Sheet Laboratory(VISL) is a Cryosphere outreach effort byscientists at the Jet Propulsion Laboratory(JPL) in Pasadena, CA, Earth and SpaceResearch(ESR) in Seattle, WA, and the University of California at Irvine (UCI), with the goal of providing interactive lessons for K-12 and college level students,while conforming to STEM guidelines. At the core of VISL is the Ice Sheet System Model(ISSM), an open-source project developed jointlyat JPL and UCI whose main purpose is to model the evolution of the polar ice caps in Greenland and Antarctica. By using ISSM, VISL students have access tostate-of-the-art modeling software that is being used to conduct scientificresearch by users all over the world. However, providing this functionality isby no means simple. The modeling of ice sheets in response to sea and atmospheric temperatures, among many other possible parameters, requiressignificant computational resources. Furthermore, this service needs to beresponsive and capable of handling burst requests produced by classrooms ofstudents. Cloud computing providers represent a burgeoning industry. With majorinvestments by tech giants like Amazon, Google and Microsoft, it has never beeneasier or more affordable to deploy computational elements on-demand. This isexactly what VISL needs and ISSM is capable of. Moreover, this is a promisingalternative to investing in expensive and rapidly devaluing hardware.
Tidal disruption of open clusters in their parent molecular clouds

NASA Technical Reports Server (NTRS)

Long, Kevin

1989-01-01

A simple model of tidal encounters has been applied to the problem of an open cluster in a clumpy molecular cloud. The parameters of the clumps are taken from the Blitz, Stark, and Long (1988) catalog of clumps in the Rosette molecular cloud. Encounters are modeled as impulsive, rectilinear collisions between Plummer spheres, but the tidal approximation is not invoked. Mass and binding energy changes during an encounter are computed by considering the velocity impulses given to individual stars in a random realization of a Plummer sphere. Mean rates of mass and binding energy loss are then computed by integrating over many encounters. Self-similar evolutionary calculations using these rates indicate that the disruption process is most sensitive to the cluster radius and relatively insensitive to cluster mass. The calculations indicate that clusters which are born in a cloud similar to the Rosette with a cluster radius greater than about 2.5 pc will not survive long enough to leave the cloud. The majority of clusters, however, have smaller radii and will survive the passage through their parent cloud.
Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects

PubMed Central

Clematis, Andrea; Quarati, Alfonso; Cesini, Daniele; Milanesi, Luciano; Merelli, Ivan

2013-01-01

Cloud computing opens new perspectives for small-medium biotechnology laboratories that need to perform bioinformatics analysis in a flexible and effective way. This seems particularly true for hybrid clouds that couple the scalability offered by general-purpose public clouds with the greater control and ad hoc customizations supplied by the private ones. A hybrid cloud broker, acting as an intermediary between users and public providers, can support customers in the selection of the most suitable offers, optionally adding the provisioning of dedicated services with higher levels of quality. This paper analyses some economic and practical aspects of exploiting cloud computing in a real research scenario for the in silico drug discovery in terms of requirements, costs, and computational load based on the number of expected users. In particular, our work is aimed at supporting both the researchers and the cloud broker delivering an IaaS cloud infrastructure for biotechnology laboratories exposing different levels of nonfunctional requirements. PMID:24106693
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Towards Cloud-based Asynchronous Elasticity for Iterative HPC Applications

NASA Astrophysics Data System (ADS)

da Rosa Righi, Rodrigo; Facco Rodrigues, Vinicius; André da Costa, Cristiano; Kreutz, Diego; Heiss, Hans-Ulrich

2015-10-01

Elasticity is one of the key features of cloud computing. It allows applications to dynamically scale computing and storage resources, avoiding over- and under-provisioning. In high performance computing (HPC), initiatives are normally modeled to handle bag-of-tasks or key-value applications through a load balancer and a loosely-coupled set of virtual machine (VM) instances. In the joint-field of Message Passing Interface (MPI) and tightly-coupled HPC applications, we observe the need of rewriting source codes, previous knowledge of the application and/or stop-reconfigure-and-go approaches to address cloud elasticity. Besides, there are problems related to how profit this new feature in the HPC scope, since in MPI 2.0 applications the programmers need to handle communicators by themselves, and a sudden consolidation of a VM, together with a process, can compromise the entire execution. To address these issues, we propose a PaaS-based elasticity model, named AutoElastic. It acts as a middleware that allows iterative HPC applications to take advantage of dynamic resource provisioning of cloud infrastructures without any major modification. AutoElastic provides a new concept denoted here as asynchronous elasticity, i.e., it provides a framework to allow applications to either increase or decrease their computing resources without blocking the current execution. The feasibility of AutoElastic is demonstrated through a prototype that runs a CPU-bound numerical integration application on top of the OpenNebula middleware. The results showed the saving of about 3 min at each scaling out operations, emphasizing the contribution of the new concept on contexts where seconds are precious.
Cloud4Psi: cloud computing for 3D protein structure similarity searching.

PubMed

Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur

2014-10-01

Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching

PubMed Central

Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur

2014-01-01

Summary: Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Availability and implementation: Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. Contact: dariusz.mrozek@polsl.pl PMID:24930141
Survey of MapReduce frame operation in bioinformatics.

PubMed

Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

2014-07-01

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening.

PubMed

Karthikeyan, Muthukumarasamy; Pandit, Deepak; Bhavasar, Arvind; Vyas, Renu

2015-01-01

The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platorm ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.
Research on the digital education resources of sharing pattern in independent colleges based on cloud computing environment

NASA Astrophysics Data System (ADS)

Xiong, Ting; He, Zhiwen

2017-06-01

Cloud computing was first proposed by Google Company in the United States, which was based on the Internet center, providing a standard and open network sharing service approach. With the rapid development of the higher education in China, the educational resources provided by colleges and universities had greatly gap in the actual needs of teaching resources. therefore, Cloud computing of using the Internet technology to provide shared methods liked the timely rain, which had become an important means of the Digital Education on sharing applications in the current higher education. Based on Cloud computing environment, the paper analyzed the existing problems about the sharing of digital educational resources in Jiangxi Province Independent Colleges. According to the sharing characteristics of mass storage, efficient operation and low input about Cloud computing, the author explored and studied the design of the sharing model about the digital educational resources of higher education in Independent College. Finally, the design of the shared model was put into the practical applications.
Consolidation of cloud computing in ATLAS

NASA Astrophysics Data System (ADS)

Taylor, Ryan P.; Domingues Cordeiro, Cristovao Jose; Giordano, Domenico; Hover, John; Kouba, Tomas; Love, Peter; McNab, Andrew; Schovancova, Jaroslava; Sobie, Randall; ATLAS Collaboration

2017-10-01

Throughout the first half of LHC Run 2, ATLAS cloud computing has undergone a period of consolidation, characterized by building upon previously established systems, with the aim of reducing operational effort, improving robustness, and reaching higher scale. This paper describes the current state of ATLAS cloud computing. Cloud activities are converging on a common contextualization approach for virtual machines, and cloud resources are sharing monitoring and service discovery components. We describe the integration of Vacuum resources, streamlined usage of the Simulation at Point 1 cloud for offline processing, extreme scaling on Amazon compute resources, and procurement of commercial cloud capacity in Europe. Finally, building on the previously established monitoring infrastructure, we have deployed a real-time monitoring and alerting platform which coalesces data from multiple sources, provides flexible visualization via customizable dashboards, and issues alerts and carries out corrective actions in response to problems.
Cloud Environment Automation: from infrastructure deployment to application monitoring

NASA Astrophysics Data System (ADS)

Aiftimiei, C.; Costantini, A.; Bucchi, R.; Italiano, A.; Michelotto, D.; Panella, M.; Pergolesi, M.; Saletta, M.; Traldi, S.; Vistoli, C.; Zizzi, G.; Salomoni, D.

2017-10-01

The potential offered by the cloud paradigm is often limited by technical issues, rules and regulations. In particular, the activities related to the design and deployment of the Infrastructure as a Service (IaaS) cloud layer can be difficult to apply and time-consuming for the infrastructure maintainers. In this paper the research activity, carried out during the Open City Platform (OCP) research project [1], aimed at designing and developing an automatic tool for cloud-based IaaS deployment is presented. Open City Platform is an industrial research project funded by the Italian Ministry of University and Research (MIUR), started in 2014. It intends to research, develop and test new technological solutions open, interoperable and usable on-demand in the field of Cloud Computing, along with new sustainable organizational models that can be deployed for and adopted by the Public Administrations (PA). The presented work and the related outcomes are aimed at simplifying the deployment and maintenance of a complete IaaS cloud-based infrastructure.

phpMs: A PHP-Based Mass Spectrometry Utilities Library.

PubMed

Collins, Andrew; Jones, Andrew R

2018-03-02

The recent establishment of cloud computing, high-throughput networking, and more versatile web standards and browsers has led to a renewed interest in web-based applications. While traditionally big data has been the domain of optimized desktop and server applications, it is now possible to store vast amounts of data and perform the necessary calculations offsite in cloud storage and computing providers, with the results visualized in a high-quality cross-platform interface via a web browser. There are number of emerging platforms for cloud-based mass spectrometry data analysis; however, there is limited pre-existing code accessible to web developers, especially for those that are constrained to a shared hosting environment where Java and C applications are often forbidden from use by the hosting provider. To remedy this, we provide an open-source mass spectrometry library for one of the most commonly used web development languages, PHP. Our new library, phpMs, provides objects for storing and manipulating spectra and identification data as well as utilities for file reading, file writing, calculations, peptide fragmentation, and protein digestion as well as a software interface for controlling search engines. We provide a working demonstration of some of the capabilities at http://pgb.liv.ac.uk/phpMs .
TomoMiner and TomoMinerCloud: A software platform for large-scale subtomogram structural analysis

PubMed Central

Frazier, Zachary; Xu, Min; Alber, Frank

2017-01-01

SUMMARY Cryo-electron tomography (cryoET) captures the 3D electron density distribution of macromolecular complexes in close to native state. With the rapid advance of cryoET acquisition technologies, it is possible to generate large numbers (>100,000) of subtomograms, each containing a macromolecular complex. Often, these subtomograms represent a heterogeneous sample due to variations in structure and composition of a complex in situ form or because particles are a mixture of different complexes. In this case subtomograms must be classified. However, classification of large numbers of subtomograms is a time-intensive task and often a limiting bottleneck. This paper introduces an open source software platform, TomoMiner, for large-scale subtomogram classification, template matching, subtomogram averaging, and alignment. Its scalable and robust parallel processing allows efficient classification of tens to hundreds of thousands of subtomograms. Additionally, TomoMiner provides a pre-configured TomoMinerCloud computing service permitting users without sufficient computing resources instant access to TomoMiners high-performance features. PMID:28552576
Voltages induced on a power distribution line by overhead cloud lightning

NASA Technical Reports Server (NTRS)

Yacoub, Ziad; Rubinstein, Marcos; Uman, Martin A.; Thomson, Ewen M.; Medelius, Pedro J.

1991-01-01

Voltages induced by overhead cloud lightning on a 448 m open circuited power distribution line and the corresponding north-south component of the lightning magnetic field were simultaneously measured at the NASA Kennedy Space Center during the summer of 1986. The incident electric field was calculated from the measured magnetic field. The electric field was then used as an input to the computer program, EMPLIN, that calculated the voltages at the two ends of the power line. EMPLIN models the frequency domain field/power coupling theory found, for example, in Ianoz et al. The direction of the source, which is also one of the inputs to EMPLIN, was crudely determined from a three station time delay technique. The authors found reasonably good agreement between calculated and measured waveforms.
Integrating multiple scientific computing needs via a Private Cloud infrastructure

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Brunetti, R.; Lusso, S.; Vallero, S.

2014-06-01

In a typical scientific computing centre, diverse applications coexist and share a single physical infrastructure. An underlying Private Cloud facility eases the management and maintenance of heterogeneous use cases such as multipurpose or application-specific batch farms, Grid sites catering to different communities, parallel interactive data analysis facilities and others. It allows to dynamically and efficiently allocate resources to any application and to tailor the virtual machines according to the applications' requirements. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques; for example, rolling updates can be performed easily and minimizing the downtime. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 site and a dynamically expandable PROOF-based Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The Private Cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem (used in two different configurations for worker- and service-class hypervisors) and the OpenWRT Linux distribution (used for network virtualization). A future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and by using mainstream contextualization tools like CloudInit.
Open Source Surrogate Safety Assessment Model, 2017 Enhancement and Update: SSAM Version 3.0 [Tech Brief

DOT National Transportation Integrated Search

2016-11-17

The ETFOMM (Enhanced Transportation Flow Open Source Microscopic Model) Cloud Service (ECS) is a software product sponsored by the U.S. Department of Transportation in conjunction with the Microscopic Traffic Simulation Models and SoftwareAn Op...
Millstone: software for multiplex microbial genome analysis and engineering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.

Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering.

PubMed

Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

2017-05-25

Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering

DOE PAGES

Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.; ...

2017-05-25

Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Cloud regimes as phase transitions

NASA Astrophysics Data System (ADS)

Stechmann, Samuel; Hottovy, Scott

2017-11-01

Clouds are repeatedly identified as a leading source of uncertainty in future climate predictions. Of particular importance are stratocumulus clouds, which can appear as either (i) closed cells that reflect solar radiation back to space or (ii) open cells that allow solar radiation to reach the Earth's surface. Here we show that these clouds regimes - open versus closed cells - fit the paradigm of a phase transition. In addition, this paradigm characterizes pockets of open cells (POCs) as the interface between the open- and closed-cell regimes, and it identifies shallow cumulus clouds as a regime of higher variability. This behavior can be understood using an idealized model for the dynamics of atmospheric water as a stochastic diffusion process. Similar viewpoints of deep convection and self-organized criticality will also be discussed. With these new conceptual viewpoints, ideas from statistical mechanics could potentially be used for understanding uncertainties related to clouds in the climate system and climate predictions. The research of S.N.S. is partially supported by a Sloan Research Fellowship, ONR Young Investigator Award N00014-12-1-0744, and ONR MURI Grant N00014-12-1-0912.
Science Gateways, Scientific Workflows and Open Community Software

NASA Astrophysics Data System (ADS)

Pierce, M. E.; Marru, S.

2014-12-01

Science gateways and scientific workflows occupy different ends of the spectrum of user-focused cyberinfrastructure. Gateways, sometimes called science portals, provide a way for enabling large numbers of users to take advantage of advanced computing resources (supercomputers, advanced storage systems, science clouds) by providing Web and desktop interfaces and supporting services. Scientific workflows, at the other end of the spectrum, support advanced usage of cyberinfrastructure that enable "power users" to undertake computational experiments that are not easily done through the usual mechanisms (managing simulations across multiple sites, for example). Despite these different target communities, gateways and workflows share many similarities and can potentially be accommodated by the same software system. For example, pipelines to process InSAR imagery sets or to datamine GPS time series data are workflows. The results and the ability to make downstream products may be made available through a gateway, and power users may want to provide their own custom pipelines. In this abstract, we discuss our efforts to build an open source software system, Apache Airavata, that can accommodate both gateway and workflow use cases. Our approach is general, and we have applied the software to problems in a number of scientific domains. In this talk, we discuss our applications to usage scenarios specific to earth science, focusing on earthquake physics examples drawn from the QuakSim.org and GeoGateway.org efforts. We also examine the role of the Apache Software Foundation's open community model as a way to build up common commmunity codes that do not depend upon a single "owner" to sustain. Pushing beyond open source software, we also see the need to provide gateways and workflow systems as cloud services. These services centralize operations, provide well-defined programming interfaces, scale elastically, and have global-scale fault tolerance. We discuss our work providing Apache Airavata as a hosted service to provide these features.
Design Patterns to Achieve 300x Speedup for Oceanographic Analytics in the Cloud

NASA Astrophysics Data System (ADS)

Jacob, J. C.; Greguska, F. R., III; Huang, T.; Quach, N.; Wilson, B. D.

2017-12-01

We describe how we achieve super-linear speedup over standard approaches for oceanographic analytics on a cluster computer and the Amazon Web Services (AWS) cloud. NEXUS is an open source platform for big data analytics in the cloud that enables this performance through a combination of horizontally scalable data parallelism with Apache Spark and rapid data search, subset, and retrieval with tiled array storage in cloud-aware NoSQL databases like Solr and Cassandra. NEXUS is the engine behind several public portals at NASA and OceanWorks is a newly funded project for the ocean community that will mature and extend this capability for improved data discovery, subset, quality screening, analysis, matchup of satellite and in situ measurements, and visualization. We review the Python language API for Spark and how to use it to quickly convert existing programs to use Spark to run with cloud-scale parallelism, and discuss strategies to improve performance. We explain how partitioning the data over space, time, or both leads to algorithmic design patterns for Spark analytics that can be applied to many different algorithms. We use NEXUS analytics as examples, including area-averaged time series, time averaged map, and correlation map.
plas.io: Open Source, Browser-based WebGL Point Cloud Visualization

NASA Astrophysics Data System (ADS)

Butler, H.; Finnegan, D. C.; Gadomski, P. J.; Verma, U. K.

2014-12-01

Point cloud data, in the form of Light Detection and Ranging (LiDAR), RADAR, or semi-global matching (SGM) image processing, are rapidly becoming a foundational data type to quantify and characterize geospatial processes. Visualization of these data, due to overall volume and irregular arrangement, is often difficult. Technological advancement in web browsers, in the form of WebGL and HTML5, have made interactivity and visualization capabilities ubiquitously available which once only existed in desktop software. plas.io is an open source JavaScript application that provides point cloud visualization, exploitation, and compression features in a web-browser platform, reducing the reliance for client-based desktop applications. The wide reach of WebGL and browser-based technologies mean plas.io's capabilities can be delivered to a diverse list of devices -- from phones and tablets to high-end workstations -- with very little custom software development. These properties make plas.io an ideal open platform for researchers and software developers to communicate visualizations of complex and rich point cloud data to devices to which everyone has easy access.
The AppScale Cloud Platform

PubMed Central

Krintz, Chandra

2013-01-01

AppScale is an open source distributed software system that implements a cloud platform as a service (PaaS). AppScale makes cloud applications easy to deploy and scale over disparate cloud fabrics, implementing a set of APIs and architecture that also makes apps portable across the services they employ. AppScale is API-compatible with Google App Engine (GAE) and thus executes GAE applications on-premise or over other cloud infrastructures, without modification. PMID:23828721
Enhancing data utilization through adoption of cloud-based data architectures (Invited Paper 211869)

NASA Astrophysics Data System (ADS)

Kearns, E. J.

2017-12-01

A traditional approach to data distribution and utilization of open government data involves continuously moving those data from a central government location to each potential user, who would then utilize them on their local computer systems. An alternate approach would be to bring those users to the open government data, where users would also have access to computing and analytics capabilities that would support data utilization. NOAA's Big Data Project is exploring such an alternate approach through an experimental collaboration with Amazon Web Services, Google Cloud Platform, IBM, Microsoft Azure, and the Open Commons Consortium. As part of this ongoing experiment, NOAA is providing open data of interest which are freely hosted by the Big Data Project Collaborators, who provide a variety of cloud-based services and capabilities to enable utilization by data users. By the terms of the agreement, the Collaborators may charge for those value-added services and processing capacities to recover their costs to freely host the data and to generate profits if so desired. Initial results have shown sustained increases in data utilization from 2 to over 100 times previously-observed access patterns from traditional approaches. Significantly increased utilization speed as compared to the traditional approach has also been observed by NOAA data users who have volunteered their experiences on these cloud-based systems. The potential for implementing and sustaining the alternate cloud-based approach as part of a change in operational data utilization strategies will be discussed.
The Community Intercomparison Suite (CIS)

NASA Astrophysics Data System (ADS)

Watson-Parris, Duncan; Schutgens, Nick; Cook, Nick; Kipling, Zak; Kershaw, Phil; Gryspeerdt, Ed; Lawrence, Bryan; Stier, Philip

2017-04-01

Earth observations (both remote and in-situ) create vast amounts of data providing invaluable constraints for the climate science community. Efficient exploitation of these complex and highly heterogeneous datasets has been limited however by the lack of suitable software tools, particularly for comparison of gridded and ungridded data, thus reducing scientific productivity. CIS (http://cistools.net) is an open-source, command line tool and Python library which allows the straight-forward quantitative analysis, intercomparison and visualisation of remote sensing, in-situ and model data. The CIS can read gridded and ungridded remote sensing, in-situ and model data - and many other data sources 'out-of-the-box', such as ESA Aerosol and Cloud CCI product, MODIS, Cloud CCI, Cloudsat, AERONET. Perhaps most importantly however CIS also employs a modular plugin architecture to allow for the reading of limitless different data types. Users are able to write their own plugins for reading the data sources which they are familiar with, and share them within the community, allowing all to benefit from their expertise. To enable the intercomparison of this data the CIS provides a number of operations including: the aggregation of ungridded and gridded datasets to coarser representations using a number of different built in averaging kernels; the subsetting of data to reduce its extent or dimensionality; the co-location of two distinct datasets onto a single set of co-ordinates; the visualisation of the input or output data through a number of different plots and graphs; the evaluation of arbitrary mathematical expressions against any number of datasets; and a number of other supporting functions such as a statistical comparison of two co-located datasets. These operations can be performed efficiently on local machines or large computing clusters - and is already available on the JASMIN computing facility. A case-study using the GASSP collection of in-situ aerosol observations will demonstrate the power of using CIS to perform model evaluations. The use of an open-source, community developed tool in this way opens up a huge amount of data which would previously have been inaccessible to many users, while also providing replicable, repeatable analysis which scientists and policy-makers alike can trust and understand.
MarFS, a Near-POSIX Interface to Cloud Objects

DOE Office of Scientific and Technical Information (OSTI.GOV)

Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson

The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less
MarFS, a Near-POSIX Interface to Cloud Objects

DOE PAGES

Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson; ...

2017-01-01

The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less
BluePyOpt: Leveraging Open Source Software and Cloud Infrastructure to Optimise Model Parameters in Neuroscience.

PubMed

Van Geit, Werner; Gevaert, Michael; Chindemi, Giuseppe; Rössert, Christian; Courcol, Jean-Denis; Muller, Eilif B; Schürmann, Felix; Segev, Idan; Markram, Henry

2016-01-01

At many scales in neuroscience, appropriate mathematical models take the form of complex dynamical systems. Parameterizing such models to conform to the multitude of available experimental constraints is a global non-linear optimisation problem with a complex fitness landscape, requiring numerical techniques to find suitable approximate solutions. Stochastic optimisation approaches, such as evolutionary algorithms, have been shown to be effective, but often the setting up of such optimisations and the choice of a specific search algorithm and its parameters is non-trivial, requiring domain-specific expertise. Here we describe BluePyOpt, a Python package targeted at the broad neuroscience community to simplify this task. BluePyOpt is an extensible framework for data-driven model parameter optimisation that wraps and standardizes several existing open-source tools. It simplifies the task of creating and sharing these optimisations, and the associated techniques and knowledge. This is achieved by abstracting the optimisation and evaluation tasks into various reusable and flexible discrete elements according to established best-practices. Further, BluePyOpt provides methods for setting up both small- and large-scale optimisations on a variety of platforms, ranging from laptops to Linux clusters and cloud-based compute infrastructures. The versatility of the BluePyOpt framework is demonstrated by working through three representative neuroscience specific use cases.
ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification.

PubMed

Bao, Riyue; Hernandez, Kyle; Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

2015-01-01

Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.
Visualization and Quality Control Web Tools for CERES Products

NASA Astrophysics Data System (ADS)

Mitrescu, C.; Doelling, D. R.

2017-12-01

The NASA CERES project continues to provide the scientific communities a wide variety of satellite-derived data products such as observed TOA broadband shortwave and longwave observed fluxes, computed TOA and Surface fluxes, as well as cloud, aerosol, and other atmospheric parameters. They encompass a wide range of temporal and spatial resolutions, suited to specific applications. CERES data is used mostly by climate modeling communities but also by a wide variety of educational institutions. To better serve our users, a web-based Ordering and Visualization Tool (OVT) was developed by using Opens Source Software such as Eclipse, java, javascript, OpenLayer, Flot, Google Maps, python, and others. Due to increased demand by our own scientists, we also implemented a series of specialized functions to be used in the process of CERES Data Quality Control (QC) such as 1- and 2-D histograms, anomalies and differences, temporal and spatial averaging, side-by-side parameter comparison, and others that made the process of QC far easier and faster, but more importantly far more portable. With the integration of ground site observed surface fluxes we further facilitate the CERES project to QC the CERES computed surface fluxes. An overview of the CERES OVT basic functions using Open Source Software, as well as future steps in expanding its capabilities will be presented at the meeting.

Hybrid Cloud Computing Environment for EarthCube and Geoscience Community

NASA Astrophysics Data System (ADS)

Yang, C. P.; Qin, H.

2016-12-01

The NSF EarthCube Integration and Test Environment (ECITE) has built a hybrid cloud computing environment to provides cloud resources from private cloud environments by using cloud system software - OpenStack and Eucalyptus, and also manages public cloud - Amazon Web Service that allow resource synchronizing and bursting between private and public cloud. On ECITE hybrid cloud platform, EarthCube and geoscience community can deploy and manage the applications by using base virtual machine images or customized virtual machines, analyze big datasets by using virtual clusters, and real-time monitor the virtual resource usage on the cloud. Currently, a number of EarthCube projects have deployed or started migrating their projects to this platform, such as CHORDS, BCube, CINERGI, OntoSoft, and some other EarthCube building blocks. To accomplish the deployment or migration, administrator of ECITE hybrid cloud platform prepares the specific needs (e.g. images, port numbers, usable cloud capacity, etc.) of each project in advance base on the communications between ECITE and participant projects, and then the scientists or IT technicians in those projects launch one or multiple virtual machines, access the virtual machine(s) to set up computing environment if need be, and migrate their codes, documents or data without caring about the heterogeneity in structure and operations among different cloud platforms.
The Cloud Area Padovana: from pilot to production

NASA Astrophysics Data System (ADS)

Andreetto, P.; Costa, F.; Crescente, A.; Dorigo, A.; Fantinel, S.; Fanzago, F.; Sgaravatto, M.; Traldi, S.; Verlato, M.; Zangrando, L.

2017-10-01

The Cloud Area Padovana has been running for almost two years. This is an OpenStack-based scientific cloud, spread across two different sites: the INFN Padova Unit and the INFN Legnaro National Labs. The hardware resources have been scaled horizontally and vertically, by upgrading some hypervisors and by adding new ones: currently it provides about 1100 cores. Some in-house developments were also integrated in the OpenStack dashboard, such as a tool for user and project registrations with direct support for the INFN-AAI Identity Provider as a new option for the user authentication. In collaboration with the EU-funded Indigo DataCloud project, the integration with Docker-based containers has been experimented with and will be available in production soon. This computing facility now satisfies the computational and storage demands of more than 70 users affiliated with about 20 research projects. We present here the architecture of this Cloud infrastructure, the tools and procedures used to operate it. We also focus on the lessons learnt in these two years, describing the problems that were found and the corrective actions that had to be applied. We also discuss about the chosen strategy for upgrades, which combines the need to promptly integrate the OpenStack new developments, the demand to reduce the downtimes of the infrastructure, and the need to limit the effort requested for such updates. We also discuss how this Cloud infrastructure is being used. In particular we focus on two big physics experiments which are intensively exploiting this computing facility: CMS and SPES. CMS deployed on the cloud a complex computational infrastructure, composed of several user interfaces for job submission in the Grid environment/local batch queues or for interactive processes; this is fully integrated with the local Tier-2 facility. To avoid a static allocation of the resources, an elastic cluster, based on cernVM, has been configured: it allows to automatically create and delete virtual machines according to the user needs. SPES, using a client-server system called TraceWin, exploits INFN’s virtual resources performing a very large number of simulations on about a thousand nodes elastically managed.
OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.

2016-12-01

The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time-series. The presentation will cover the architecture of OceanXtremes, parallelization of the climatology computation and anomaly detection algorithms using Spark, example results for SST and other time-series, and parallel performance metrics.
Towards large-scale data analysis: challenges in the design of portable systems and use of Cloud computing.

PubMed

Diaz, Javier; Arrizabalaga, Saioa; Bustamante, Paul; Mesa, Iker; Añorga, Javier; Goya, Jon

2013-01-01

Portable systems and global communications open a broad spectrum for new health applications. In the framework of electrophysiological applications, several challenges are faced when developing portable systems embedded in Cloud computing services. In order to facilitate new developers in this area based on our experience, five areas of interest are presented in this paper where strategies can be applied for improving the performance of portable systems: transducer and conditioning, processing, wireless communications, battery and power management. Likewise, for Cloud services, scalability, portability, privacy and security guidelines have been highlighted.
The Virtual Geophysics Laboratory (VGL): Scientific Workflows Operating Across Organizations and Across Infrastructures

NASA Astrophysics Data System (ADS)

Cox, S. J.; Wyborn, L. A.; Fraser, R.; Rankine, T.; Woodcock, R.; Vote, J.; Evans, B.

2012-12-01

The Virtual Geophysics Laboratory (VGL) is web portal that provides geoscientists with an integrated online environment that: seamlessly accesses geophysical and geoscience data services from the AuScope national geoscience information infrastructure; loosely couples these data to a variety of gesocience software tools; and provides large scale processing facilities via cloud computing. VGL is a collaboration between CSIRO, Geoscience Australia, National Computational Infrastructure, Monash University, Australian National University and the University of Queensland. The VGL provides a distributed system whereby a user can enter an online virtual laboratory to seamlessly connect to OGC web services for geoscience data. The data is supplied in open standards formats using international standards like GeoSciML. A VGL user uses a web mapping interface to discover and filter the data sources using spatial and attribute filters to define a subset. Once the data is selected the user is not required to download the data. VGL collates the service query information for later in the processing workflow where it will be staged directly to the computing facilities. The combination of deferring data download and access to Cloud computing enables VGL users to access their data at higher resolutions and to undertake larger scale inversions, more complex models and simulations than their own local computing facilities might allow. Inside the Virtual Geophysics Laboratory, the user has access to a library of existing models, complete with exemplar workflows for specific scientific problems based on those models. For example, the user can load a geological model published by Geoscience Australia, apply a basic deformation workflow provided by a CSIRO scientist, and have it run in a scientific code from Monash. Finally the user can publish these results to share with a colleague or cite in a paper. This opens new opportunities for access and collaboration as all the resources (models, code, data, processing) are shared in the one virtual laboratory. VGL provides end users with access to an intuitive, user-centered interface that leverages cloud storage and cloud and cluster processing from both the research communities and commercial suppliers (e.g. Amazon). As the underlying data and information services are agnostic of the scientific domain, they can support many other data types. This fundamental characteristic results in a highly reusable virtual laboratory infrastructure that could also be used for example natural hazards, satellite processing, soil geochemistry, climate modeling, agriculture crop modeling.
Enabling Large-Scale Biomedical Analysis in the Cloud

PubMed Central

Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen

2013-01-01

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system

NASA Astrophysics Data System (ADS)

Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.

2014-06-01

The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
Visualization and Quality Control Web Tools for CERES Products

NASA Astrophysics Data System (ADS)

Mitrescu, C.; Doelling, D. R.; Rutan, D. A.

2016-12-01

The CERES project continues to provide the scientific community a wide variety of satellite-derived data products such as observed TOA broadband shortwave and longwave observed fluxes, computed TOA and Surface fluxes, as well as cloud, aerosol, and other atmospheric parameters. They encompass a wide range of temporal and spatial resolutions, suited to specific applications. Now in its 16-year, CERES products are mostly used by climate modeling communities that focus on global mean energetics, meridianal heat transport, and climate trend studies. In order to serve all our users, we developed a web-based Ordering and Visualization Tool (OVT). Using Opens Source Software such as Eclipse, java, javascript, OpenLayer, Flot, Google Maps, python, and others, the OVT Team developed a series of specialized functions to be used in the process of CERES Data Quality Control (QC). We mention 1- and 2-D histogram, anomaly, deseasonalization, temporal and spatial averaging, side-by-side parameter comparison, and others that made the process of QC far easier and faster, but more importantly far more portable. We are now in the process of integrating ground site observed surface fluxes to further facilitate the CERES project to QC the CERES computed surface fluxes. These features will give users the opportunity to perform their own comparisons of the CERES computed surface fluxes and observed ground site fluxes. An overview of the CERES OVT basic functions using Open Source Software, as well as future steps in expanding its capabilities will be presented at the meeting.
Generating Accurate 3d Models of Architectural Heritage Structures Using Low-Cost Camera and Open Source Algorithms

NASA Astrophysics Data System (ADS)

Zacharek, M.; Delis, P.; Kedzierski, M.; Fryskowska, A.

2017-05-01

These studies have been conductedusing non-metric digital camera and dense image matching algorithms, as non-contact methods of creating monuments documentation.In order toprocess the imagery, few open-source software and algorithms of generating adense point cloud from images have been executed. In the research, the OSM Bundler, VisualSFM software, and web application ARC3D were used. Images obtained for each of the investigated objects were processed using those applications, and then dense point clouds and textured 3D models were created. As a result of post-processing, obtained models were filtered and scaled.The research showedthat even using the open-source software it is possible toobtain accurate 3D models of structures (with an accuracy of a few centimeters), but for the purpose of documentation and conservation of cultural and historical heritage, such accuracy can be insufficient.
A Comparative Study of Point Cloud Data Collection and Processing

NASA Astrophysics Data System (ADS)

Pippin, J. E.; Matheney, M.; Gentle, J. N., Jr.; Pierce, S. A.; Fuentes-Pineda, G.

2016-12-01

Over the past decade, there has been dramatic growth in the acquisition of publicly funded high-resolution topographic data for scientific, environmental, engineering and planning purposes. These data sets are valuable for applications of interest across a large and varied user community. However, because of the large volumes of data produced by high-resolution mapping technologies and expense of aerial data collection, it is often difficult to collect and distribute these datasets. Furthermore, the data can be technically challenging to process, requiring software and computing resources not readily available to many users. This study presents a comparison of advanced computing hardware and software that is used to collect and process point cloud datasets, such as LIDAR scans. Activities included implementation and testing of open source libraries and applications for point cloud data processing such as, Meshlab, Blender, PDAL, and PCL. Additionally, a suite of commercial scale applications, Skanect and Cloudcompare, were applied to raw datasets. Handheld hardware solutions, a Structure Scanner and Xbox 360 Kinect V1, were tested for their ability to scan at three field locations. The resultant data projects successfully scanned and processed subsurface karst features ranging from small stalactites to large rooms, as well as a surface waterfall feature. Outcomes support the feasibility of rapid sensing in 3D at field scales.
Cloud-In-Cell modeling of shocked particle-laden flows at a ``SPARSE'' cost

NASA Astrophysics Data System (ADS)

Taverniers, Soren; Jacobs, Gustaaf; Sen, Oishik; Udaykumar, H. S.

2017-11-01

A common tool for enabling process-scale simulations of shocked particle-laden flows is Eulerian-Lagrangian Particle-Source-In-Cell (PSIC) modeling where each particle is traced in its Lagrangian frame and treated as a mathematical point. Its dynamics are governed by Stokes drag corrected for high Reynolds and Mach numbers. The computational burden is often reduced further through a ``Cloud-In-Cell'' (CIC) approach which amalgamates groups of physical particles into computational ``macro-particles''. CIC does not account for subgrid particle fluctuations, leading to erroneous predictions of cloud dynamics. A Subgrid Particle-Averaged Reynolds-Stress Equivalent (SPARSE) model is proposed that incorporates subgrid interphase velocity and temperature perturbations. A bivariate Gaussian source distribution, whose covariance captures the cloud's deformation to first order, accounts for the particles' momentum and energy influence on the carrier gas. SPARSE is validated by conducting tests on the interaction of a particle cloud with the accelerated flow behind a shock. The cloud's average dynamics and its deformation over time predicted with SPARSE converge to their counterparts computed with reference PSIC models as the number of Gaussians is increased from 1 to 16. This work was supported by AFOSR Grant No. FA9550-16-1-0008.
COMBAT: mobile-Cloud-based cOmpute/coMmunications infrastructure for BATtlefield applications

NASA Astrophysics Data System (ADS)

Soyata, Tolga; Muraleedharan, Rajani; Langdon, Jonathan; Funai, Colin; Ames, Scott; Kwon, Minseok; Heinzelman, Wendi

2012-05-01

The amount of data processed annually over the Internet has crossed the zetabyte boundary, yet this Big Data cannot be efficiently processed or stored using today's mobile devices. Parallel to this explosive growth in data, a substantial increase in mobile compute-capability and the advances in cloud computing have brought the state-of-the- art in mobile-cloud computing to an inflection point, where the right architecture may allow mobile devices to run applications utilizing Big Data and intensive computing. In this paper, we propose the MObile Cloud-based Hybrid Architecture (MOCHA), which formulates a solution to permit mobile-cloud computing applications such as object recognition in the battlefield by introducing a mid-stage compute- and storage-layer, called the cloudlet. MOCHA is built on the key observation that many mobile-cloud applications have the following characteristics: 1) they are compute-intensive, requiring the compute-power of a supercomputer, and 2) they use Big Data, requiring a communications link to cloud-based database sources in near-real-time. In this paper, we describe the operation of MOCHA in battlefield applications, by formulating the aforementioned mobile and cloudlet to be housed within a soldier's vest and inside a military vehicle, respectively, and enabling access to the cloud through high latency satellite links. We provide simulations using the traditional mobile-cloud approach as well as utilizing MOCHA with a mid-stage cloudlet to quantify the utility of this architecture. We show that the MOCHA platform for mobile-cloud computing promises a future for critical battlefield applications that access Big Data, which is currently not possible using existing technology.
Secure open cloud in data transmission using reference pattern and identity with enhanced remote privacy checking

NASA Astrophysics Data System (ADS)

Vijay Singh, Ran; Agilandeeswari, L.

2017-11-01

To handle the large amount of client’s data in open cloud lots of security issues need to be address. Client’s privacy should not be known to other group members without data owner’s valid permission. Sometime clients are fended to have accessing with open cloud servers due to some restrictions. To overcome the security issues and these restrictions related to storing, data sharing in an inter domain network and privacy checking, we propose a model in this paper which is based on an identity based cryptography in data transmission and intermediate entity which have client’s reference with identity that will take control handling of data transmission in an open cloud environment and an extended remote privacy checking technique which will work at admin side. On behalf of data owner’s authority this proposed model will give best options to have secure cryptography in data transmission and remote privacy checking either as private or public or instructed. The hardness of Computational Diffie-Hellman assumption algorithm for key exchange makes this proposed model more secure than existing models which are being used for public cloud environment.
Identifying Key Features, Cutting Edge Cloud Resources, and Artificial Intelligence Tools to Achieve User-Friendly Water Science in the Cloud

NASA Astrophysics Data System (ADS)

Pierce, S. A.

2017-12-01

Decision making for groundwater systems is becoming increasingly important, as shifting water demands increasingly impact aquifers. As buffer systems, aquifers provide room for resilient responses and augment the actual timeframe for hydrological response. Yet the pace impacts, climate shifts, and degradation of water resources is accelerating. To meet these new drivers, groundwater science is transitioning toward the emerging field of Integrated Water Resources Management, or IWRM. IWRM incorporates a broad array of dimensions, methods, and tools to address problems that tend to be complex. Computational tools and accessible cyberinfrastructure (CI) are needed to cross the chasm between science and society. Fortunately cloud computing environments, such as the new Jetstream system, are evolving rapidly. While still targeting scientific user groups systems such as, Jetstream, offer configurable cyberinfrastructure to enable interactive computing and data analysis resources on demand. The web-based interfaces allow researchers to rapidly customize virtual machines, modify computing architecture and increase the usability and access for broader audiences to advanced compute environments. The result enables dexterous configurations and opening up opportunities for IWRM modelers to expand the reach of analyses, number of case studies, and quality of engagement with stakeholders and decision makers. The acute need to identify improved IWRM solutions paired with advanced computational resources refocuses the attention of IWRM researchers on applications, workflows, and intelligent systems that are capable of accelerating progress. IWRM must address key drivers of community concern, implement transdisciplinary methodologies, adapt and apply decision support tools in order to effectively support decisions about groundwater resource management. This presentation will provide an overview of advanced computing services in the cloud using integrated groundwater management case studies to highlight how Cloud CI streamlines the process for setting up an interactive decision support system. Moreover, advances in artificial intelligence offer new techniques for old problems from integrating data to adaptive sensing or from interactive dashboards to optimizing multi-attribute problems. The combination of scientific expertise, flexible cloud computing solutions, and intelligent systems opens new research horizons.
Program listing for the REEDM (Rocket Exhaust Effluent Diffusion Model) computer program

NASA Technical Reports Server (NTRS)

Bjorklund, J. R.; Dumbauld, R. K.; Cheney, C. S.; Geary, H. V.

1982-01-01

The program listing for the REEDM Computer Program is provided. A mathematical description of the atmospheric dispersion models, cloud-rise models, and other formulas used in the REEDM model; vehicle and source parameters, other pertinent physical properties of the rocket exhaust cloud and meteorological layering techniques; user's instructions for the REEDM computer program; and worked example problems are contained in NASA CR-3646.
Dynamic partitioning as a way to exploit new computing paradigms: the cloud use case.

NASA Astrophysics Data System (ADS)

Ciaschini, Vincenzo; Dal Pra, Stefano; dell'Agnello, Luca

2015-12-01

The WLCG community and many groups in the HEP community have based their computing strategy on the Grid paradigm, which proved successful and still ensures its goals. However, Grid technology has not spread much over other communities; in the commercial world, the cloud paradigm is the emerging way to provide computing services. WLCG experiments aim to achieve integration of their existing current computing model with cloud deployments and take advantage of the so-called opportunistic resources (including HPC facilities) which are usually not Grid compliant. One missing feature in the most common cloud frameworks, is the concept of job scheduler, which plays a key role in a traditional computing centre, by enabling a fairshare based access at the resources to the experiments in a scenario where demand greatly outstrips availability. At CNAF we are investigating the possibility to access the Tier-1 computing resources as an OpenStack based cloud service. The system, exploiting the dynamic partitioning mechanism already being used to enable Multicore computing, allowed us to avoid a static splitting of the computing resources in the Tier-1 farm, while permitting a share friendly approach. The hosts in a dynamically partitioned farm may be moved to or from the partition, according to suitable policies for request and release of computing resources. Nodes being requested in the partition switch their role and become available to play a different one. In the cloud use case hosts may switch from acting as Worker Node in the Batch system farm to cloud compute node member, made available to tenants. In this paper we describe the dynamic partitioning concept, its implementation and integration with our current batch system, LSF.
A cloud-based system for measuring radiation treatment plan similarity

NASA Astrophysics Data System (ADS)

Andrea, Jennifer

PURPOSE: Radiation therapy is used to treat cancer using carefully designed plans that maximize the radiation dose delivered to the target and minimize damage to healthy tissue, with the dose administered over multiple occasions. Creating treatment plans is a laborious process and presents an obstacle to more frequent replanning, which remains an unsolved problem. However, in between new plans being created, the patient's anatomy can change due to multiple factors including reduction in tumor size and loss of weight, which results in poorer patient outcomes. Cloud computing is a newer technology that is slowly being used for medical applications with promising results. The objective of this work was to design and build a system that could analyze a database of previously created treatment plans, which are stored with their associated anatomical information in studies, to find the one with the most similar anatomy to a new patient. The analyses would be performed in parallel on the cloud to decrease the computation time of finding this plan. METHODS: The system used SlicerRT, a radiation therapy toolkit for the open-source platform 3D Slicer, for its tools to perform the similarity analysis algorithm. Amazon Web Services was used for the cloud instances on which the analyses were performed, as well as for storage of the radiation therapy studies and messaging between the instances and a master local computer. A module was built in SlicerRT to provide the user with an interface to direct the system on the cloud, as well as to perform other related tasks. RESULTS: The cloud-based system out-performed previous methods of conducting the similarity analyses in terms of time, as it analyzed 100 studies in approximately 13 minutes, and produced the same similarity values as those methods. It also scaled up to larger numbers of studies to analyze in the database with a small increase in computation time of just over 2 minutes. CONCLUSION: This system successfully analyzes a large database of radiation therapy studies and finds the one that is most similar to a new patient, which represents a potential step forward in achieving feasible adaptive radiation therapy replanning.
Reconstructing evolutionary trees in parallel for massive sequences.

PubMed

Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

2017-12-14

Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Secure and Resilient Cloud Computing for the Department of Defense

DTIC Science & Technology

2015-11-16

platform as a service (PaaS), and software as a service ( SaaS )—that target system administrators, developers, and end-users respectively (see Table 2...interfaces (API) and services Medium Amazon Elastic MapReduce, MathWorks Cloud, Red Hat OpenShift SaaS Full-fledged applications Low Google gMail
TOSCA-based orchestration of complex clusters at the IaaS level

NASA Astrophysics Data System (ADS)

Caballer, M.; Donvito, G.; Moltó, G.; Rocha, R.; Velten, M.

2017-10-01

This paper describes the adoption and extension of the TOSCA standard by the INDIGO-DataCloud project for the definition and deployment of complex computing clusters together with the required support in both OpenStack and OpenNebula, carried out in close collaboration with industry partners such as IBM. Two examples of these clusters are described in this paper, the definition of an elastic computing cluster to support the Galaxy bioinformatics application where the nodes are dynamically added and removed from the cluster to adapt to the workload, and the definition of an scalable Apache Mesos cluster for the execution of batch jobs and support for long-running services. The coupling of TOSCA with Ansible Roles to perform automated installation has resulted in the definition of high-level, deterministic templates to provision complex computing clusters across different Cloud sites.

Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis.

PubMed

Sreedharan, Vipin T; Schultheiss, Sebastian J; Jean, Géraldine; Kahles, André; Bohnert, Regina; Drewe, Philipp; Mudrakarta, Pramod; Görnitz, Nico; Zeller, Georg; Rätsch, Gunnar

2014-05-01

We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Enhanced K-means clustering with encryption on cloud

NASA Astrophysics Data System (ADS)

Singh, Iqjot; Dwivedi, Prerna; Gupta, Taru; Shynu, P. G.

2017-11-01

This paper tries to solve the problem of storing and managing big files over cloud by implementing hashing on Hadoop in big-data and ensure security while uploading and downloading files. Cloud computing is a term that emphasis on sharing data and facilitates to share infrastructure and resources.[10] Hadoop is an open source software that gives us access to store and manage big files according to our needs on cloud. K-means clustering algorithm is an algorithm used to calculate distance between the centroid of the cluster and the data points. Hashing is a algorithm in which we are storing and retrieving data with hash keys. The hashing algorithm is called as hash function which is used to portray the original data and later to fetch the data stored at the specific key. [17] Encryption is a process to transform electronic data into non readable form known as cipher text. Decryption is the opposite process of encryption, it transforms the cipher text into plain text that the end user can read and understand well. For encryption and decryption we are using Symmetric key cryptographic algorithm. In symmetric key cryptography are using DES algorithm for a secure storage of the files. [3
Towards a Multi-Mission, Airborne Science Data System Environment

NASA Astrophysics Data System (ADS)

Crichton, D. J.; Hardman, S.; Law, E.; Freeborn, D.; Kay-Im, E.; Lau, G.; Oswald, J.

2011-12-01

NASA earth science instruments are increasingly relying on airborne missions. However, traditionally, there has been limited common infrastructure support available to principal investigators in the area of science data systems. As a result, each investigator has been required to develop their own computing infrastructures for the science data system. Typically there is little software reuse and many projects lack sufficient resources to provide a robust infrastructure to capture, process, distribute and archive the observations acquired from airborne flights. At NASA's Jet Propulsion Laboratory (JPL), we have been developing a multi-mission data system infrastructure for airborne instruments called the Airborne Cloud Computing Environment (ACCE). ACCE encompasses the end-to-end lifecycle covering planning, provisioning of data system capabilities, and support for scientific analysis in order to improve the quality, cost effectiveness, and capabilities to enable new scientific discovery and research in earth observation. This includes improving data system interoperability across each instrument. A principal characteristic is being able to provide an agile infrastructure that is architected to allow for a variety of configurations of the infrastructure from locally installed compute and storage services to provisioning those services via the "cloud" from cloud computer vendors such as Amazon.com. Investigators often have different needs that require a flexible configuration. The data system infrastructure is built on the Apache's Object Oriented Data Technology (OODT) suite of components which has been used for a number of spaceborne missions and provides a rich set of open source software components and services for constructing science processing and data management systems. In 2010, a partnership was formed between the ACCE team and the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to support the data processing and data management needs. A principal goal is to provide support for the Fourier Transform Spectrometer (FTS) instrument which will produce over 700,000 soundings over the life of their three-year mission. The cost to purchase and operate a cluster-based system in order to generate Level 2 Full Physics products from this data was prohibitive. Through an evaluation of cloud computing solutions, Amazon's Elastic Compute Cloud (EC2) was selected for the CARVE deployment. As the ACCE infrastructure is developed and extended to form an infrastructure for airborne missions, the experience of working with CARVE has provided a number of lessons learned and has proven to be important in reinforcing the unique aspects of airborne missions and the importance of the ACCE infrastructure in developing a cost effective, flexible multi-mission capability that leverages emerging capabilities in cloud computing, workflow management, and distributed computing.
A price and performance comparison of three different storage architectures for data in cloud-based systems

NASA Astrophysics Data System (ADS)

Gallagher, J. H. R.; Jelenak, A.; Potter, N.; Fulker, D. W.; Habermann, T.

2017-12-01

Providing data services based on cloud computing technology that is equivalent to those developed for traditional computing and storage systems is critical for successful migration to cloud-based architectures for data production, scientific analysis and storage. OPeNDAP Web-service capabilities (comprising the Data Access Protocol (DAP) specification plus open-source software for realizing DAP in servers and clients) are among the most widely deployed means for achieving data-as-service functionality in the Earth sciences. OPeNDAP services are especially common in traditional data center environments where servers offer access to datasets stored in (very large) file systems, and a preponderance of the source data for these services is being stored in the Hierarchical Data Format Version 5 (HDF5). Three candidate architectures for serving NASA satellite Earth Science HDF5 data via Hyrax running on Amazon Web Services (AWS) were developed and their performance examined for a set of representative use cases. The performance was based both on runtime and incurred cost. The three architectures differ in how HDF5 files are stored in the Amazon Simple Storage Service (S3) and how the Hyrax server (as an EC2 instance) retrieves their data. The results for both the serial and parallel access to HDF5 data in the S3 will be presented. While the study focused on HDF5 data, OPeNDAP and the Hyrax data server, the architectures are generic and the analysis can be extrapolated to many different data formats, web APIs, and data servers.
Astronomy In The Cloud: Using Mapreduce For Image Coaddition

NASA Astrophysics Data System (ADS)

Wiley, Keith; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.

2011-01-01

In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computational challenges such as anomaly detection, classification, and moving object tracking. Since such studies require the highest quality data, methods such as image coaddition, i.e., registration, stacking, and mosaicing, will be critical to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources, e.g., asteroids, or transient objects, e.g., supernovae, these datastreams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this paper we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data is partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources, i.e., platforms where Hadoop is offered as a service. We report on our experience implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multi-terabyte imaging dataset provides a good testbed for algorithm development since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image coaddition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results compring their performance. This work is funded by the NSF and by NASA.
Astronomy in the Cloud: Using MapReduce for Image Co-Addition

NASA Astrophysics Data System (ADS)

Wiley, K.; Connolly, A.; Gardner, J.; Krughoff, S.; Balazinska, M.; Howe, B.; Kwon, Y.; Bu, Y.

2011-03-01

In the coming decade, astronomical surveys of the sky will generate tens of terabytes of images and detect hundreds of millions of sources every night. The study of these sources will involve computation challenges such as anomaly detection and classification and moving-object tracking. Since such studies benefit from the highest-quality data, methods such as image co-addition, i.e., astrometric registration followed by per-pixel summation, will be a critical preprocessing step prior to scientific investigation. With a requirement that these images be analyzed on a nightly basis to identify moving sources such as potentially hazardous asteroids or transient objects such as supernovae, these data streams present many computational challenges. Given the quantity of data involved, the computational load of these problems can only be addressed by distributing the workload over a large number of nodes. However, the high data throughput demanded by these applications may present scalability challenges for certain storage architectures. One scalable data-processing method that has emerged in recent years is MapReduce, and in this article we focus on its popular open-source implementation called Hadoop. In the Hadoop framework, the data are partitioned among storage attached directly to worker nodes, and the processing workload is scheduled in parallel on the nodes that contain the required input data. A further motivation for using Hadoop is that it allows us to exploit cloud computing resources: i.e., platforms where Hadoop is offered as a service. We report on our experience of implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop. This multiterabyte imaging data set provides a good testbed for algorithm development, since its scope and structure approximate future surveys. First, we describe MapReduce and how we adapted image co-addition to the MapReduce framework. Then we describe a number of optimizations to our basic approach and report experimental results comparing their performance.
Space Subdivision in Indoor Mobile Laser Scanning Point Clouds Based on Scanline Analysis.

PubMed

Zheng, Yi; Peter, Michael; Zhong, Ruofei; Oude Elberink, Sander; Zhou, Quan

2018-06-05

Indoor space subdivision is an important aspect of scene analysis that provides essential information for many applications, such as indoor navigation and evacuation route planning. Until now, most proposed scene understanding algorithms have been based on whole point clouds, which has led to complicated operations, high computational loads and low processing speed. This paper presents novel methods to efficiently extract the location of openings (e.g., doors and windows) and to subdivide space by analyzing scanlines. An opening detection method is demonstrated that analyses the local geometric regularity in scanlines to refine the extracted opening. Moreover, a space subdivision method based on the extracted openings and the scanning system trajectory is described. Finally, the opening detection and space subdivision results are saved as point cloud labels which will be used for further investigations. The method has been tested on a real dataset collected by ZEB-REVO. The experimental results validate the completeness and correctness of the proposed method for different indoor environment and scanning paths.
SeqWare Query Engine: storing and searching sequence data in the cloud.

PubMed

O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

2010-12-21

Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
SeqWare Query Engine: storing and searching sequence data in the cloud

PubMed Central

2010-01-01

Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981
SaaS enabled admission control for MCMC simulation in cloud computing infrastructures

NASA Astrophysics Data System (ADS)

Vázquez-Poletti, J. L.; Moreno-Vozmediano, R.; Han, R.; Wang, W.; Llorente, I. M.

2017-02-01

Markov Chain Monte Carlo (MCMC) methods are widely used in the field of simulation and modelling of materials, producing applications that require a great amount of computational resources. Cloud computing represents a seamless source for these resources in the form of HPC. However, resource over-consumption can be an important drawback, specially if the cloud provision process is not appropriately optimized. In the present contribution we propose a two-level solution that, on one hand, takes advantage of approximate computing for reducing the resource demand and on the other, uses admission control policies for guaranteeing an optimal provision to running applications.
Proposed Use of the NASA Ames Nebula Cloud Computing Platform for Numerical Weather Prediction and the Distribution of High Resolution Satellite Imagery

NASA Technical Reports Server (NTRS)

Limaye, Ashutosh S.; Molthan, Andrew L.; Srikishen, Jayanthi

2010-01-01

The development of the Nebula Cloud Computing Platform at NASA Ames Research Center provides an open-source solution for the deployment of scalable computing and storage capabilities relevant to the execution of real-time weather forecasts and the distribution of high resolution satellite data to the operational weather community. Two projects at Marshall Space Flight Center may benefit from use of the Nebula system. The NASA Short-term Prediction Research and Transition (SPoRT) Center facilitates the use of unique NASA satellite data and research capabilities in the operational weather community by providing datasets relevant to numerical weather prediction, and satellite data sets useful in weather analysis. SERVIR provides satellite data products for decision support, emphasizing environmental threats such as wildfires, floods, landslides, and other hazards, with interests in numerical weather prediction in support of disaster response. The Weather Research and Forecast (WRF) model Environmental Modeling System (WRF-EMS) has been configured for Nebula cloud computing use via the creation of a disk image and deployment of repeated instances. Given the available infrastructure within Nebula and the "infrastructure as a service" concept, the system appears well-suited for the rapid deployment of additional forecast models over different domains, in response to real-time research applications or disaster response. Future investigations into Nebula capabilities will focus on the development of a web mapping server and load balancing configuration to support the distribution of high resolution satellite data sets to users within the National Weather Service and international partners of SERVIR.
Stewardship and management challenges within a cloud-based open data ecosystem (Invited Paper 211863)

NASA Astrophysics Data System (ADS)

Kearns, E. J.

2017-12-01

NOAA's Big Data Project is conducting an experiment in the collaborative distribution of open government data to non-governmental cloud-based systems. Through Cooperative Research and Development Agreements signed in 2015 between NOAA and Amazon Web Services, Google Cloud Platform, IBM, Microsoft Azure, and the Open Commons Consortium, NOAA is distributing open government data to a wide community of potential users. There are a number of significant advantages related to the use of open data on commercial cloud platforms, but through this experiment NOAA is also discovering significant challenges for those stewarding and maintaining NOAA's data resources in support of users in the wider open data ecosystem. Among the challenges that will be discussed are: the need to provide effective interpretation of the data content to enable their use by data scientists from other expert communities; effective maintenance of Collaborators' open data stores through coordinated publication of new data and new versions of older data; the provenance and verification of open data as authentic NOAA-sourced data across multiple management boundaries and analytical tools; and keeping pace with the accelerating expectations of users with regard to improved quality control, data latency, availability, and discoverability. Suggested strategies to address these challenges will also be described.
Cloud and surface textural features in polar regions

NASA Technical Reports Server (NTRS)

Welch, Ronald M.; Kuo, Kwo-Sen; Sengupta, Sailes K.

1990-01-01

The study examines the textural signatures of clouds, ice-covered mountains, solid and broken sea ice and floes, and open water. The textural features are computed from sum and difference histogram and gray-level difference vector statistics defined at various pixel displacement distances derived from Landsat multispectral scanner data. Polar cloudiness, snow-covered mountainous regions, solid sea ice, glaciers, and open water have distinguishable texture features. This suggests that textural measures can be successfully applied to the detection of clouds over snow-covered mountains, an ability of considerable importance for the modeling of snow-melt runoff. However, broken stratocumulus cloud decks and thin cirrus over broken sea ice remain difficult to distinguish texturally. It is concluded that even with high spatial resolution imagery, it may not be possible to distinguish broken stratocumulus and thin clouds from sea ice in the marginal ice zone using the visible channel textural features alone.
Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

NASA Astrophysics Data System (ADS)

Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

2014-06-01

Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.
Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt

Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared overmore » the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.« less
McIDAS-V: Advanced Visualization for 3D Remote Sensing Data

NASA Astrophysics Data System (ADS)

Rink, T.; Achtor, T. H.

2010-12-01

McIDAS-V is a Java-based, open-source, freely available software package for analysis and visualization of geophysical data. Its advanced capabilities provide very interactive 4-D displays, including 3D volumetric rendering and fast sub-manifold slicing, linked to an abstract mathematical data model with built-in metadata for units, coordinate system transforms and sampling topology. A Jython interface provides user defined analysis and computation in terms of the internal data model. These powerful capabilities to integrate data, analysis and visualization are being applied to hyper-spectral sounding retrievals, eg. AIRS and IASI, of moisture and cloud density to interrogate and analyze their 3D structure, as well as, validate with instruments such as CALIPSO, CloudSat and MODIS. The object oriented framework design allows for specialized extensions for novel displays and new sources of data. Community defined CF-conventions for gridded data are understood by the software, and can be immediately imported into the application. This presentation will show examples how McIDAS-V is used in 3-dimensional data analysis, display and evaluation.
Integration of XRootD into the cloud infrastructure for ALICE data analysis

NASA Astrophysics Data System (ADS)

Kompaniets, Mikhail; Shadura, Oksana; Svirin, Pavlo; Yurchenko, Volodymyr; Zarochentsev, Andrey

2015-12-01

Cloud technologies allow easy load balancing between different tasks and projects. From the viewpoint of the data analysis in the ALICE experiment, cloud allows to deploy software using Cern Virtual Machine (CernVM) and CernVM File System (CVMFS), to run different (including outdated) versions of software for long term data preservation and to dynamically allocate resources for different computing activities, e.g. grid site, ALICE Analysis Facility (AAF) and possible usage for local projects or other LHC experiments. We present a cloud solution for Tier-3 sites based on OpenStack and Ceph distributed storage with an integrated XRootD based storage element (SE). One of the key features of the solution is based on idea that Ceph has been used as a backend for Cinder Block Storage service for OpenStack, and in the same time as a storage backend for XRootD, with redundancy and availability of data preserved by Ceph settings. For faster and easier OpenStack deployment was applied the Packstack solution, which is based on the Puppet configuration management system. Ceph installation and configuration operations are structured and converted to Puppet manifests describing node configurations and integrated into Packstack. This solution can be easily deployed, maintained and used even in small groups with limited computing resources and small organizations, which usually have lack of IT support. The proposed infrastructure has been tested on two different clouds (SPbSU & BITP) and integrates successfully with the ALICE data analysis model.
ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification

PubMed Central

Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

2015-01-01

Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud. PMID:26271043
Phenomenology tools on cloud infrastructures using OpenStack

NASA Astrophysics Data System (ADS)

Campos, I.; Fernández-del-Castillo, E.; Heinemeyer, S.; Lopez-Garcia, A.; Pahlen, F.; Borges, G.

2013-04-01

We present a new environment for computations in particle physics phenomenology employing recent developments in cloud computing. On this environment users can create and manage "virtual" machines on which the phenomenology codes/tools can be deployed easily in an automated way. We analyze the performance of this environment based on "virtual" machines versus the utilization of physical hardware. In this way we provide a qualitative result for the influence of the host operating system on the performance of a representative set of applications for phenomenology calculations.
BluePyOpt: Leveraging Open Source Software and Cloud Infrastructure to Optimise Model Parameters in Neuroscience

PubMed Central

Van Geit, Werner; Gevaert, Michael; Chindemi, Giuseppe; Rössert, Christian; Courcol, Jean-Denis; Muller, Eilif B.; Schürmann, Felix; Segev, Idan; Markram, Henry

2016-01-01

At many scales in neuroscience, appropriate mathematical models take the form of complex dynamical systems. Parameterizing such models to conform to the multitude of available experimental constraints is a global non-linear optimisation problem with a complex fitness landscape, requiring numerical techniques to find suitable approximate solutions. Stochastic optimisation approaches, such as evolutionary algorithms, have been shown to be effective, but often the setting up of such optimisations and the choice of a specific search algorithm and its parameters is non-trivial, requiring domain-specific expertise. Here we describe BluePyOpt, a Python package targeted at the broad neuroscience community to simplify this task. BluePyOpt is an extensible framework for data-driven model parameter optimisation that wraps and standardizes several existing open-source tools. It simplifies the task of creating and sharing these optimisations, and the associated techniques and knowledge. This is achieved by abstracting the optimisation and evaluation tasks into various reusable and flexible discrete elements according to established best-practices. Further, BluePyOpt provides methods for setting up both small- and large-scale optimisations on a variety of platforms, ranging from laptops to Linux clusters and cloud-based compute infrastructures. The versatility of the BluePyOpt framework is demonstrated by working through three representative neuroscience specific use cases. PMID:27375471

A modular (almost) automatic set-up for elastic multi-tenants cloud (micro)infrastructures

NASA Astrophysics Data System (ADS)

Amoroso, A.; Astorino, F.; Bagnasco, S.; Balashov, N. A.; Bianchi, F.; Destefanis, M.; Lusso, S.; Maggiora, M.; Pellegrino, J.; Yan, L.; Yan, T.; Zhang, X.; Zhao, X.

2017-10-01

An auto-installing tool on an usb drive can allow for a quick and easy automatic deployment of OpenNebula-based cloud infrastructures remotely managed by a central VMDIRAC instance. A single team, in the main site of an HEP Collaboration or elsewhere, can manage and run a relatively large network of federated (micro-)cloud infrastructures, making an highly dynamic and elastic use of computing resources. Exploiting such an approach can lead to modular systems of cloud-bursting infrastructures addressing complex real-life scenarios.
Cloud-based large-scale air traffic flow optimization

NASA Astrophysics Data System (ADS)

Cao, Yi

The ever-increasing traffic demand makes the efficient use of airspace an imperative mission, and this paper presents an effort in response to this call. Firstly, a new aggregate model, called Link Transmission Model (LTM), is proposed, which models the nationwide traffic as a network of flight routes identified by origin-destination pairs. The traversal time of a flight route is assumed to be the mode of distribution of historical flight records, and the mode is estimated by using Kernel Density Estimation. As this simplification abstracts away physical trajectory details, the complexity of modeling is drastically decreased, resulting in efficient traffic forecasting. The predicative capability of LTM is validated against recorded traffic data. Secondly, a nationwide traffic flow optimization problem with airport and en route capacity constraints is formulated based on LTM. The optimization problem aims at alleviating traffic congestions with minimal global delays. This problem is intractable due to millions of variables. A dual decomposition method is applied to decompose the large-scale problem such that the subproblems are solvable. However, the whole problem is still computational expensive to solve since each subproblem is an smaller integer programming problem that pursues integer solutions. Solving an integer programing problem is known to be far more time-consuming than solving its linear relaxation. In addition, sequential execution on a standalone computer leads to linear runtime increase when the problem size increases. To address the computational efficiency problem, a parallel computing framework is designed which accommodates concurrent executions via multithreading programming. The multithreaded version is compared with its monolithic version to show decreased runtime. Finally, an open-source cloud computing framework, Hadoop MapReduce, is employed for better scalability and reliability. This framework is an "off-the-shelf" parallel computing model that can be used for both offline historical traffic data analysis and online traffic flow optimization. It provides an efficient and robust platform for easy deployment and implementation. A small cloud consisting of five workstations was configured and used to demonstrate the advantages of cloud computing in dealing with large-scale parallelizable traffic problems.
TRIDEC Cloud - a Web-based Platform for Tsunami Early Warning tested with NEAMWave14 Scenarios

NASA Astrophysics Data System (ADS)

Hammitzsch, Martin; Spazier, Johannes; Reißland, Sven; Necmioglu, Ocal; Comoglu, Mustafa; Ozer Sozdinler, Ceren; Carrilho, Fernando; Wächter, Joachim

2015-04-01

In times of cloud computing and ubiquitous computing the use of concepts and paradigms introduced by information and communications technology (ICT) have to be considered even for early warning systems (EWS). Based on the experiences and the knowledge gained in research projects new technologies are exploited to implement a cloud-based and web-based platform - the TRIDEC Cloud - to open up new prospects for EWS. The platform in its current version addresses tsunami early warning and mitigation. It merges several complementary external and in-house cloud-based services for instant tsunami propagation calculations and automated background computation with graphics processing units (GPU), for web-mapping of hazard specific geospatial data, and for serving relevant functionality to handle, share, and communicate threat specific information in a collaborative and distributed environment. The TRIDEC Cloud can be accessed in two different modes, the monitoring mode and the exercise-and-training mode. The monitoring mode provides important functionality required to act in a real event. So far, the monitoring mode integrates historic and real-time sea level data and latest earthquake information. The integration of sources is supported by a simple and secure interface. The exercise and training mode enables training and exercises with virtual scenarios. This mode disconnects real world systems and connects with a virtual environment that receives virtual earthquake information and virtual sea level data re-played by a scenario player. Thus operators and other stakeholders are able to train skills and prepare for real events and large exercises. The GFZ German Research Centre for Geosciences (GFZ), the Kandilli Observatory and Earthquake Research Institute (KOERI), and the Portuguese Institute for the Sea and Atmosphere (IPMA) have used the opportunity provided by NEAMWave14 to test the TRIDEC Cloud as a collaborative activity based on previous partnership and commitments at the European scale. The TRIDEC Cloud has not been involved officially in Part B of the NEAMWave14 scenarios. However, the scenarios have been used by GFZ, KOERI, and IPMA for testing in exercise runs on October 27-28, 2014. Additionally, the Greek NEAMWave14 scenario has been tested in an exercise run by GFZ only on October 29, 2014 (see ICG/NEAMTWS-XI/13). The exercise runs demonstrated that operators in warning centres and stakeholders of other involved parties just need a standard web browser to access a full-fledged TEWS. The integration of GPU accelerated tsunami simulation computations have been an integral part to foster early warning with on-demand tsunami predictions based on actual source parameters. Thus tsunami travel times, estimated times of arrival and estimated wave heights are available immediately for visualization and for further analysis and processing. The generation of warning messages is based on internationally agreed message structures and includes static and dynamic information based on earthquake information, instant computations of tsunami simulations, and actual measurements. Generated messages are served for review, modification, and addressing in one simple form for dissemination via Cloud Messages, Shared Maps, e-mail, FTP/GTS, SMS, and FAX. Cloud Messages and Shared Maps are complementary channels and integrate interactive event and simulation data. Thus recipients are enabled to interact dynamically with a map and diagrams beyond traditional text information.
Research on Visualization of Ground Laser Radar Data Based on Osg

NASA Astrophysics Data System (ADS)

Huang, H.; Hu, C.; Zhang, F.; Xue, H.

2018-04-01

Three-dimensional (3D) laser scanning is a new advanced technology integrating light, machine, electricity, and computer technologies. It can conduct 3D scanning to the whole shape and form of space objects with high precision. With this technology, you can directly collect the point cloud data of a ground object and create the structure of it for rendering. People use excellent 3D rendering engine to optimize and display the 3D model in order to meet the higher requirements of real time realism rendering and the complexity of the scene. OpenSceneGraph (OSG) is an open source 3D graphics engine. Compared with the current mainstream 3D rendering engine, OSG is practical, economical, and easy to expand. Therefore, OSG is widely used in the fields of virtual simulation, virtual reality, science and engineering visualization. In this paper, a dynamic and interactive ground LiDAR data visualization platform is constructed based on the OSG and the cross-platform C++ application development framework Qt. In view of the point cloud data of .txt format and the triangulation network data file of .obj format, the functions of 3D laser point cloud and triangulation network data display are realized. It is proved by experiments that the platform is of strong practical value as it is easy to operate and provides good interaction.
SenseMyHeart: A cloud service and API for wearable heart monitors.

PubMed

Pinto Silva, P M; Silva Cunha, J P

2015-01-01

In the era of ubiquitous computing, the growing adoption of wearable systems and body sensor networks is trailing the path for new research and software for cardiovascular intensity, energy expenditure and stress and fatigue detection through cardiovascular monitoring. Several systems have received clinical-certification and provide huge amounts of reliable heart-related data in a continuous basis. PhysioNet provides equally reliable open-source software tools for ECG processing and analysis that can be combined with these devices. However, this software remains difficult to use in a mobile environment and for researchers unfamiliar with Linux-based systems. In the present paper we present an approach that aims at tackling these limitations by developing a cloud service that provides an API for a PhysioNet-based pipeline for ECG processing and Heart Rate Variability measurement. We describe the proposed solution, along with its advantages and tradeoffs. We also present some client tools (windows and Android) and several projects where the developed cloud service has been used successfully as a standard for Heart Rate and Heart Rate Variability studies in different scenarios.
Scaling Critical Zone analysis tasks from desktop to the cloud utilizing contemporary distributed computing and data management approaches: A case study for project based learning of Cyberinfrastructure concepts

NASA Astrophysics Data System (ADS)

Swetnam, T. L.; Pelletier, J. D.; Merchant, N.; Callahan, N.; Lyons, E.

2015-12-01

Earth science is making rapid advances through effective utilization of large-scale data repositories such as aerial LiDAR and access to NSF-funded cyberinfrastructures (e.g. the OpenTopography.org data portal, iPlant Collaborative, and XSEDE). Scaling analysis tasks that are traditionally developed using desktops, laptops or computing clusters to effectively leverage national and regional scale cyberinfrastructure pose unique challenges and barriers to adoption. To address some of these challenges in Fall 2014 an 'Applied Cyberinfrastructure Concepts' a project-based learning course (ISTA 420/520) at the University of Arizona focused on developing scalable models of 'Effective Energy and Mass Transfer' (EEMT, MJ m-2 yr-1) for use by the NSF Critical Zone Observatories (CZO) project. EEMT is a quantitative measure of the flux of available energy to the critical zone, and its computation involves inputs that have broad applicability (e.g. solar insolation). The course comprised of 25 students with varying level of computational skills and with no prior domain background in the geosciences, collaborated with domain experts to develop the scalable workflow. The original workflow relying on open-source QGIS platform on a laptop was scaled to effectively utilize cloud environments (Openstack), UA Campus HPC systems, iRODS, and other XSEDE and OSG resources. The project utilizes public data, e.g. DEMs produced by OpenTopography.org and climate data from Daymet, which are processed using GDAL, GRASS and SAGA and the Makeflow and Work-queue task management software packages. Students were placed into collaborative groups to develop the separate aspects of the project. They were allowed to change teams, alter workflows, and design and develop novel code. The students were able to identify all necessary dependencies, recompile source onto the target execution platforms, and demonstrate a functional workflow, which was further improved upon by one of the group leaders over Spring 2015. All of the code, documentation and workflow description are currently available on GitHub and a public data portal is in development. We present a case study of how students reacted to the challenge of a real science problem, their interactions with end-users, what went right, and what could be done better in the future.
Scalable and Resilient Middleware to Handle Information Exchange during Environment Crisis

NASA Astrophysics Data System (ADS)

Tao, R.; Poslad, S.; Moßgraber, J.; Middleton, S.; Hammitzsch, M.

2012-04-01

The EU FP7 TRIDEC project focuses on enabling real-time, intelligent, information management of collaborative, complex, critical decision processes for earth management. A key challenge is to promote a communication infrastructure to facilitate interoperable environment information services during environment events and crises such as tsunamis and drilling, during which increasing volumes and dimensionality of disparate information sources, including sensor-based and human-based ones, can result, and need to be managed. Such a system needs to support: scalable, distributed messaging; asynchronous messaging; open messaging to handling changing clients such as new and retired automated system and human information sources becoming online or offline; flexible data filtering, and heterogeneous access networks (e.g., GSM, WLAN and LAN). In addition, the system needs to be resilient to handle the ICT system failures, e.g. failure, degradation and overloads, during environment events. There are several system middleware choices for TRIDEC based upon a Service-oriented-architecture (SOA), Event-driven-Architecture (EDA), Cloud Computing, and Enterprise Service Bus (ESB). In an SOA, everything is a service (e.g. data access, processing and exchange); clients can request on demand or subscribe to services registered by providers; more often interaction is synchronous. In an EDA system, events that represent significant changes in state can be processed simply, or as streams or more complexly. Cloud computing is a virtualization, interoperable and elastic resource allocation model. An ESB, a fundamental component for enterprise messaging, supports synchronous and asynchronous message exchange models and has inbuilt resilience against ICT failure. Our middleware proposal is an ESB based hybrid architecture model: an SOA extension supports more synchronous workflows; EDA assists the ESB to handle more complex event processing; Cloud computing can be used to increase and decrease the ESB resources on demand. To reify this hybrid ESB centric architecture, we will adopt two complementary approaches: an open source one for scalability and resilience improvement while a commercial one can be used for ultra-speed messaging, whilst we can bridge between these two to support interoperability. In TRIDEC, to manage such a hybrid messaging system, overlay and underlay management techniques will be adopted. The managers (both global and local) will collect, store and update status information (e.g. CPU utilization, free space, number of clients) and balance the usage, throughput, and delays to improve resilience and scalability. The expected resilience improvement includes dynamic failover, self-healing, pre-emptive load balancing, and bottleneck prediction while the expected improvement for scalability includes capacity estimation, Http Bridge, and automatic configuration and reconfiguration (e.g. add or delete clients and servers).
Software Defined Networking challenges and future direction: A case study of implementing SDN features on OpenStack private cloud

NASA Astrophysics Data System (ADS)

Shamugam, Veeramani; Murray, I.; Leong, J. A.; Sidhu, Amandeep S.

2016-03-01

Cloud computing provides services on demand instantly, such as access to network infrastructure consisting of computing hardware, operating systems, network storage, database and applications. Network usage and demands are growing at a very fast rate and to meet the current requirements, there is a need for automatic infrastructure scaling. Traditional networks are difficult to automate because of the distributed nature of their decision making process for switching or routing which are collocated on the same device. Managing complex environments using traditional networks is time-consuming and expensive, especially in the case of generating virtual machines, migration and network configuration. To mitigate the challenges, network operations require efficient, flexible, agile and scalable software defined networks (SDN). This paper discuss various issues in SDN and suggests how to mitigate the network management related issues. A private cloud prototype test bed was setup to implement the SDN on the OpenStack platform to test and evaluate the various network performances provided by the various configurations.
Implementation and performance test of cloud platform based on Hadoop

NASA Astrophysics Data System (ADS)

Xu, Jingxian; Guo, Jianhong; Ren, Chunlan

2018-01-01

Hadoop, as an open source project for the Apache foundation, is a distributed computing framework that deals with large amounts of data and has been widely used in the Internet industry. Therefore, it is meaningful to study the implementation of Hadoop platform and the performance of test platform. The purpose of this subject is to study the method of building Hadoop platform and to study the performance of test platform. This paper presents a method to implement Hadoop platform and a test platform performance method. Experimental results show that the proposed test performance method is effective and it can detect the performance of Hadoop platform.
Adventures in Private Cloud: Balancing Cost and Capability at the CloudSat Data Processing Center

NASA Astrophysics Data System (ADS)

Partain, P.; Finley, S.; Fluke, J.; Haynes, J. M.; Cronk, H. Q.; Miller, S. D.

2016-12-01

Since the beginning of the CloudSat Mission in 2006, The CloudSat Data Processing Center (DPC) at the Cooperative Institute for Research in the Atmosphere (CIRA) has been ingesting data from the satellite and other A-Train sensors, producing data products, and distributing them to researchers around the world. The computing infrastructure was specifically designed to fulfill the requirements as specified at the beginning of what nominally was a two-year mission. The environment consisted of servers dedicated to specific processing tasks in a rigid workflow to generate the required products. To the benefit of science and with credit to the mission engineers, CloudSat has lasted well beyond its planned lifetime and is still collecting data ten years later. Over that period requirements of the data processing system have greatly expanded and opportunities for providing value-added services have presented themselves. But while demands on the system have increased, the initial design allowed for very little expansion in terms of scalability and flexibility. The design did change to include virtual machine processing nodes and distributed workflows but infrastructure management was still a time consuming task when system modification was required to run new tests or implement new processes. To address the scalability, flexibility, and manageability of the system Cloud computing methods and technologies are now being employed. The use of a public cloud like Amazon Elastic Compute Cloud or Google Compute Engine was considered but, among other issues, data transfer and storage cost becomes a problem especially when demand fluctuates as a result of reprocessing and the introduction of new products and services. Instead, the existing system was converted to an on premises private Cloud using the OpenStack computing platform and Ceph software defined storage to reap the benefits of the Cloud computing paradigm. This work details the decisions that were made, the benefits that have been realized, the difficulties that were encountered and issues that still exist.
[Porting Radiotherapy Software of Varian to Cloud Platform].

PubMed

Zou, Lian; Zhang, Weisha; Liu, Xiangxiang; Xie, Zhao; Xie, Yaoqin

2017-09-30

To develop a low-cost private cloud platform of radiotherapy software. First, a private cloud platform which was based on OpenStack and the virtual GPU hardware was builded. Then on the private cloud platform, all the Varian radiotherapy software modules were installed to the virtual machine, and the corresponding function configuration was completed. Finally the software on the cloud was able to be accessed by virtual desktop client. The function test results of the cloud workstation show that a cloud workstation is equivalent to an isolated physical workstation, and any clients on the LAN can use the cloud workstation smoothly. The cloud platform transplantation in this study is economical and practical. The project not only improves the utilization rates of radiotherapy software, but also makes it possible that the cloud computing technology can expand its applications to the field of radiation oncology.
Cloud Computing for Mission Design and Operations

NASA Technical Reports Server (NTRS)

Arrieta, Juan; Attiyah, Amy; Beswick, Robert; Gerasimantos, Dimitrios

2012-01-01

The space mission design and operations community already recognizes the value of cloud computing and virtualization. However, natural and valid concerns, like security, privacy, up-time, and vendor lock-in, have prevented a more widespread and expedited adoption into official workflows. In the interest of alleviating these concerns, we propose a series of guidelines for internally deploying a resource-oriented hub of data and algorithms. These guidelines provide a roadmap for implementing an architecture inspired in the cloud computing model: associative, elastic, semantical, interconnected, and adaptive. The architecture can be summarized as exposing data and algorithms as resource-oriented Web services, coordinated via messaging, and running on virtual machines; it is simple, and based on widely adopted standards, protocols, and tools. The architecture may help reduce common sources of complexity intrinsic to data-driven, collaborative interactions and, most importantly, it may provide the means for teams and agencies to evaluate the cloud computing model in their specific context, with minimal infrastructure changes, and before committing to a specific cloud services provider.
Unified Geophysical Cloud Platform (UGCP) for Seismic Monitoring and other Geophysical Applications.

NASA Astrophysics Data System (ADS)

Synytsky, R.; Starovoit, Y. O.; Henadiy, S.; Lobzakov, V.; Kolesnikov, L.

2016-12-01

We present Unified Geophysical Cloud Platform (UGCP) or UniGeoCloud as an innovative approach for geophysical data processing in the Cloud environment with the ability to run any type of data processing software in isolated environment within the single Cloud platform. We've developed a simple and quick method of several open-source widely known software seismic packages (SeisComp3, Earthworm, Geotool, MSNoise) installation which does not require knowledge of system administration, configuration, OS compatibility issues etc. and other often annoying details preventing time wasting for system configuration work. Installation process is simplified as "mouse click" on selected software package from the Cloud market place. The main objective of the developed capability was the software tools conception with which users are able to design and install quickly their own highly reliable and highly available virtual IT-infrastructure for the organization of seismic (and in future other geophysical) data processing for either research or monitoring purposes. These tools provide access to any seismic station data available in open IP configuration from the different networks affiliated with different Institutions and Organizations. It allows also setting up your own network as you desire by selecting either regionally deployed stations or the worldwide global network based on stations selection form the global map. The processing software and products and research results could be easily monitored from everywhere using variety of user's devices form desk top computers to IT gadgets. Currents efforts of the development team are directed to achieve Scalability, Reliability and Sustainability (SRS) of proposed solutions allowing any user to run their applications with the confidence of no data loss and no failure of the monitoring or research software components. The system is suitable for quick rollout of NDC-in-Box software package developed for State Signatories and aimed for promotion of data processing collected by the IMS Network.
Fast two-stream method for computing diurnal-mean actinic flux in vertically inhomogeneous atmospheres

NASA Technical Reports Server (NTRS)

Filyushkin, V. V.; Madronich, S.; Brasseur, G. P.; Petropavlovskikh, I. V.

1994-01-01

Based on a derivation of the two-stream daytime-mean equations of radiative flux transfer, a method for computing the daytime-mean actinic fluxes in the absorbing and scattering vertically inhomogeneous atmosphere is suggested. The method applies direct daytime integration of the particular solutions of the two-stream approximations or the source functions. It is valid for any duration of period of averaging. The merit of the method is that the multiple scattering computation is carried out only once for the whole averaging period. It can be implemented with a number of widely used two-stream approximations. The method agrees with the results obtained with 200-point multiple scattering calculations. The method was also tested in runs with a 1-km cloud layer with optical depth of 10, as well as with aerosol background. Comparison of the results obtained for a cloud subdivided into 20 layers with those obtained for a one-layer cloud with the same optical parameters showed that direct integration of particular solutions possesses an 'analytical' accuracy. In the case of the source function interpolation, the actinic fluxes calculated above the one-layer and 20-layer clouds agreed within 1%-1.5%, while below the cloud they may differ up to 5% (in the worst case). The ways of enhancing the accuracy (in a 'two-stream sense') and computational efficiency of the method are discussed.
An Expressive, Lightweight and Secure Construction of Key Policy Attribute-Based Cloud Data Sharing Access Control

NASA Astrophysics Data System (ADS)

Lin, Guofen; Hong, Hanshu; Xia, Yunhao; Sun, Zhixin

2017-10-01

Attribute-based encryption (ABE) is an interesting cryptographic technique for flexible cloud data sharing access control. However, some open challenges hinder its practical application. In previous schemes, all attributes are considered as in the same status while they are not in most of practical scenarios. Meanwhile, the size of access policy increases dramatically with the raise of its expressiveness complexity. In addition, current research hardly notices that mobile front-end devices, such as smartphones, are poor in computational performance while too much bilinear pairing computation is needed for ABE. In this paper, we propose a key-policy weighted attribute-based encryption without bilinear pairing computation (KP-WABE-WB) for secure cloud data sharing access control. A simple weighted mechanism is presented to describe different importance of each attribute. We introduce a novel construction of ABE without executing any bilinear pairing computation. Compared to previous schemes, our scheme has a better performance in expressiveness of access policy and computational efficiency.
The Metadata Cloud: The Last Piece of a Distributed Data System Model

NASA Astrophysics Data System (ADS)

King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.

2012-12-01

Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
Deploying Crowd-Sourced Formal Verification Systems in a DoD Network

DTIC Science & Technology

2013-09-01

INTENTIONALLY LEFT BLANK 1 I. INTRODUCTION A. INTRODUCTION In 2014 cyber attacks on critical infrastructure are expected to increase...CSFV systems on the Internet‒‒possibly using cloud infrastructure (Dean, 2013). By using Amazon Compute Cloud (EC2) systems, DARPA will use ordinary...through standard access methods. Those clients could be mobile phones, laptops, netbooks, tablet computers or personal digital assistants (PDAs) (Smoot
Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud.

PubMed

Di Tommaso, Paolo; Orobitg, Miquel; Guirado, Fernando; Cores, Fernado; Espinosa, Toni; Notredame, Cedric

2010-08-01

We present the first parallel implementation of the T-Coffee consistency-based multiple aligner. We benchmark it on the Amazon Elastic Cloud (EC2) and show that the parallelization procedure is reasonably effective. We also conclude that for a web server with moderate usage (10K hits/month) the cloud provides a cost-effective alternative to in-house deployment. T-Coffee is a freeware open source package available from http://www.tcoffee.org/homepage.html
Design and verification of a cloud field optical simulator

NASA Technical Reports Server (NTRS)

Davis, J. M.; Cox, S. K.; Mckee, T. B.

1982-01-01

A concept and an apparatus designed to investigate the reflected and transmitted distributions of light from optically thick clouds is presented. The Cloud Field Optical Simulator (CFOS) is a laboratory device which utilizes an array of incandescent lamps as a source, simulated clouds made from cotton or styrofoam as targets, and an array of silicon photodiodes as detectors. The device allows virtually any source-target-detector geometry to be examined. Similitude between real clouds and their CFOS cotton or styrofoam counterparts is established by relying on a linear relationship between optical depth and the ratio of reflected to transmitted light for a semi-infinite layer. Comparisons of principal plane radiances observed by the CFOS with Monte Carlo computations for a water cloud at 0.7 microns show excellent agreement.
A cloud platform for remote diagnosis of breast cancer in mammography by fusion of machine and human intelligence

NASA Astrophysics Data System (ADS)

Jiang, Guodong; Fan, Ming; Li, Lihua

2016-03-01

Mammography is the gold standard for breast cancer screening, reducing mortality by about 30%. The application of a computer-aided detection (CAD) system to assist a single radiologist is important to further improve mammographic sensitivity for breast cancer detection. In this study, a design and realization of the prototype for remote diagnosis system in mammography based on cloud platform were proposed. To build this system, technologies were utilized including medical image information construction, cloud infrastructure and human-machine diagnosis model. Specifically, on one hand, web platform for remote diagnosis was established by J2EE web technology. Moreover, background design was realized through Hadoop open-source framework. On the other hand, storage system was built up with Hadoop distributed file system (HDFS) technology which enables users to easily develop and run on massive data application, and give full play to the advantages of cloud computing which is characterized by high efficiency, scalability and low cost. In addition, the CAD system was realized through MapReduce frame. The diagnosis module in this system implemented the algorithms of fusion of machine and human intelligence. Specifically, we combined results of diagnoses from doctors' experience and traditional CAD by using the man-machine intelligent fusion model based on Alpha-Integration and multi-agent algorithm. Finally, the applications on different levels of this system in the platform were also discussed. This diagnosis system will have great importance for the balanced health resource, lower medical expense and improvement of accuracy of diagnosis in basic medical institutes.

Community-driven computational biology with Debian Linux.

PubMed

Möller, Steffen; Krabbenhöft, Hajo Nils; Tille, Andreas; Paleino, David; Williams, Alan; Wolstencroft, Katy; Goble, Carole; Holland, Richard; Belhachemi, Dominique; Plessy, Charles

2010-12-21

The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlying hardware and the analysis pipelines can be shared along with the software. Debian Med closes the gap between developers and users. It provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology. For geographically distributed teams it can ensure they are working on the same versions of tools, in the same conditions. This contributes to the world-wide networking of researchers.
Advances in Grid Computing for the FabrIc for Frontier Experiments Project at Fermialb

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herner, K.; Alba Hernandex, A. F.; Bhat, S.

The FabrIc for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientic Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of diering size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certicate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have signicantly matured, and present an increasinglymore » complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the eorts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production work ows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular work ows, and support troubleshooting and triage in case of problems. Recently a new certicate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specic third-party Certicate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.« less
Advances in Grid Computing for the Fabric for Frontier Experiments Project at Fermilab

NASA Astrophysics Data System (ADS)

Herner, K.; Alba Hernandez, A. F.; Bhat, S.; Box, D.; Boyd, J.; Di Benedetto, V.; Ding, P.; Dykstra, D.; Fattoruso, M.; Garzoglio, G.; Kirby, M.; Kreymer, A.; Levshina, T.; Mazzacane, A.; Mengel, M.; Mhashilkar, P.; Podstavkov, V.; Retzke, K.; Sharma, N.; Teheran, J.

2017-10-01

The Fabric for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientific Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of differing size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certificate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have significantly matured, and present an increasingly complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the efforts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production workflows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular workflows, and support troubleshooting and triage in case of problems. Recently a new certificate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specific third-party Certificate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.
NCI's Distributed Geospatial Data Server

NASA Astrophysics Data System (ADS)

Larraondo, P. R.; Evans, B. J. K.; Antony, J.

2016-12-01

Earth systems, environmental and geophysics datasets are an extremely valuable source of information about the state and evolution of the Earth. However, different disciplines and applications require this data to be post-processed in different ways before it can be used. For researchers experimenting with algorithms across large datasets or combining multiple data sets, the traditional approach to batch data processing and storing all the output for later analysis rapidly becomes unfeasible, and often requires additional work to publish for others to use. Recent developments on distributed computing using interactive access to significant cloud infrastructure opens the door for new ways of processing data on demand, hence alleviating the need for storage space for each individual copy of each product. The Australian National Computational Infrastructure (NCI) has developed a highly distributed geospatial data server which supports interactive processing of large geospatial data products, including satellite Earth Observation data and global model data, using flexible user-defined functions. This system dynamically and efficiently distributes the required computations among cloud nodes and thus provides a scalable analysis capability. In many cases this completely alleviates the need to preprocess and store the data as products. This system presents a standards-compliant interface, allowing ready accessibility for users of the data. Typical data wrangling problems such as handling different file formats and data types, or harmonising the coordinate projections or temporal and spatial resolutions, can now be handled automatically by this service. The geospatial data server exposes functionality for specifying how the data should be aggregated and transformed. The resulting products can be served using several standards such as the Open Geospatial Consortium's (OGC) Web Map Service (WMS) or Web Feature Service (WFS), Open Street Map tiles, or raw binary arrays under different conventions. We will show some cases where we have used this new capability to provide a significant improvement over previous approaches.
OMPC: an Open-Source MATLAB®-to-Python Compiler

PubMed Central

Jurica, Peter; van Leeuwen, Cees

2008-01-01

Free access to scientific information facilitates scientific progress. Open-access scientific journals are a first step in this direction; a further step is to make auxiliary and supplementary materials that accompany scientific publications, such as methodological procedures and data-analysis tools, open and accessible to the scientific community. To this purpose it is instrumental to establish a software base, which will grow toward a comprehensive free and open-source language of technical and scientific computing. Endeavors in this direction are met with an important obstacle. MATLAB®, the predominant computation tool in many fields of research, is a closed-source commercial product. To facilitate the transition to an open computation platform, we propose Open-source MATLAB®-to-Python Compiler (OMPC), a platform that uses syntax adaptation and emulation to allow transparent import of existing MATLAB® functions into Python programs. The imported MATLAB® modules will run independently of MATLAB®, relying on Python's numerical and scientific libraries. Python offers a stable and mature open source platform that, in many respects, surpasses commonly used, expensive commercial closed source packages. The proposed software will therefore facilitate the transparent transition towards a free and general open-source lingua franca for scientific computation, while enabling access to the existing methods and algorithms of technical computing already available in MATLAB®. OMPC is available at http://ompc.juricap.com. PMID:19225577
HyspIRI Low Latency Concept and Benchmarks

NASA Technical Reports Server (NTRS)

Mandl, Dan

2010-01-01

Topics include HyspIRI low latency data ops concept, HyspIRI data flow, ongoing efforts, experiment with Web Coverage Processing Service (WCPS) approach to injecting new algorithms into SensorWeb, low fidelity HyspIRI IPM testbed, compute cloud testbed, open cloud testbed environment, Global Lambda Integrated Facility (GLIF) and OCC collaboration with Starlight, delay tolerant network (DTN) protocol benchmarking, and EO-1 configuration for preliminary DTN prototype.
Software for Real-Time Analysis of Subsonic Test Shot Accuracy

DTIC Science & Technology

2014-03-01

used the C++ programming language, the Open Source Computer Vision ( OpenCV ®) software library, and Microsoft Windows® Application Programming...video for comparison through OpenCV image analysis tools. Based on the comparison, the software then computed the coordinates of each shot relative to...DWB researchers wanted to use the Open Source Computer Vision ( OpenCV ) software library for capturing and analyzing frames of video. OpenCV contains
A Web-based Tool for Transparent, Collaborative Urban Water System Planning for Monterrey, Mexico

NASA Astrophysics Data System (ADS)

Rheinheimer, D. E.; Medellin-Azuara, J.; Garza Díaz, L. E.; Ramírez, A. I.

2017-12-01

Recent rapid advances in web technologies and cloud computing show great promise for facilitating collaboration and transparency in water planning efforts. Water resources planning is increasingly in the context of a rapidly urbanizing world, particularly in developing countries. In such countries with democratic traditions, the degree of transparency and collaboration in water planning can mean the difference between success and failure of water planning efforts. This is exemplified in the city of Monterrey, Mexico, where an effort to build a new long-distance aqueduct to increase water supply to the city dramatically failed due to lack of transparency and top-down planning. To help address, we used a new, web-based water system modeling platform, called OpenAgua, to develop a prototype decision support system for water planning in Monterrey. OpenAgua is designed to promote transparency and collaboration, as well as provide strong, cloud-based, water system modeling capabilities. We developed and assessed five water management options intended to increase water supply yield and/or reliability, a dominant water management concern in Latin America generally: 1) a new long-distance source (the previously-rejected project), 2) a new nearby reservoir, 3) expansion/re-operation of an existing major canal, 4) desalination, and 5) industrial water reuse. Using the integrated modeling and analytic capabilities of OpenAgua, and some customization, we assessed the performance of these options for water supply yield and reliability to help identify the most promising ones. In presenting this assessment, we demonstrate the viability of using online, cloud-based modeling systems for improving transparency and collaboration in decision making, reducing the gap between citizens, policy makers and water managers, and future directions.
Open NASA Earth Exchange (OpenNEX): Strategies for enabling cross organization collaboration in the earth sciences

NASA Astrophysics Data System (ADS)

Michaelis, A.; Ganguly, S.; Nemani, R. R.; Votava, P.; Wang, W.; Lee, T. J.; Dungan, J. L.

2014-12-01

Sharing community-valued codes, intermediary datasets and results from individual efforts with others that are not in a direct funded collaboration can be a challenge. Cross organization collaboration is often impeded due to infrastructure security constraints, rigid financial controls, bureaucracy, and workforce nationalities, etc., which can force groups to work in a segmented fashion and/or through awkward and suboptimal web services. We show how a focused community may come together, share modeling and analysis codes, computing configurations, scientific results, knowledge and expertise on a public cloud platform; diverse groups of researchers working together at "arms length". Through the OpenNEX experimental workshop, users can view short technical "how-to" videos and explore encapsulated working environment. Workshop participants can easily instantiate Amazon Machine Images (AMI) or launch full cluster and data processing configurations within minutes. Enabling users to instantiate computing environments from configuration templates on large public cloud infrastructures, such as Amazon Web Services, may provide a mechanism for groups to easily use each others work and collaborate indirectly. Moreover, using the public cloud for this workshop allowed a single group to host a large read only data archive, making datasets of interest to the community widely available on the public cloud, enabling other groups to directly connect to the data and reduce the costs of the collaborative work by freeing other individual groups from redundantly retrieving, integrating or financing the storage of the datasets of interest.
Design and implementation of a cloud based lithography illumination pupil processing application

NASA Astrophysics Data System (ADS)

Zhang, Youbao; Ma, Xinghua; Zhu, Jing; Zhang, Fang; Huang, Huijie

2017-02-01

Pupil parameters are important parameters to evaluate the quality of lithography illumination system. In this paper, a cloud based full-featured pupil processing application is implemented. A web browser is used for the UI (User Interface), the websocket protocol and JSON format are used for the communication between the client and the server, and the computing part is implemented in the server side, where the application integrated a variety of high quality professional libraries, such as image processing libraries libvips and ImageMagic, automatic reporting system latex, etc., to support the program. The cloud based framework takes advantage of server's superior computing power and rich software collections, and the program could run anywhere there is a modern browser due to its web UI design. Compared to the traditional way of software operation model: purchased, licensed, shipped, downloaded, installed, maintained, and upgraded, the new cloud based approach, which is no installation, easy to use and maintenance, opens up a new way. Cloud based application probably is the future of the software development.
Simulation of partially coherent light propagation using parallel computing devices

NASA Astrophysics Data System (ADS)

Magalhães, Tiago C.; Rebordão, José M.

2017-08-01

Light acquires or loses coherence and coherence is one of the few optical observables. Spectra can be derived from coherence functions and understanding any interferometric experiment is also relying upon coherence functions. Beyond the two limiting cases (full coherence or incoherence) the coherence of light is always partial and it changes with propagation. We have implemented a code to compute the propagation of partially coherent light from the source plane to the observation plane using parallel computing devices (PCDs). In this paper, we restrict the propagation in free space only. To this end, we used the Open Computing Language (OpenCL) and the open-source toolkit PyOpenCL, which gives access to OpenCL parallel computation through Python. To test our code, we chose two coherence source models: an incoherent source and a Gaussian Schell-model source. In the former case, we divided into two different source shapes: circular and rectangular. The results were compared to the theoretical values. Our implemented code allows one to choose between the PyOpenCL implementation and a standard one, i.e using the CPU only. To test the computation time for each implementation (PyOpenCL and standard), we used several computer systems with different CPUs and GPUs. We used powers of two for the dimensions of the cross-spectral density matrix (e.g. 324, 644) and a significant speed increase is observed in the PyOpenCL implementation when compared to the standard one. This can be an important tool for studying new source models.
Design and verification of a cloud field optical simulator

NASA Technical Reports Server (NTRS)

Davis, J. M.; Cox, S. K.; Mckee, T. B.

1983-01-01

A concept and an apparatus designed to investigate the reflected and transmitted distributions of light from optically thick clouds is presented. The Cloud Field Optical Simulator (CFOS) is a laboratory device which utilizes an array of incandescent lamps as a source, simulated clouds made from cotton or styrofoam as targets, and an array of silicon photodiodes as detectors. The device allows virtually any source-target-detector geometry to be examined. Similitude between real clouds and their CFOS cotton or styrofoam counterparts is established by relying on a linear relationship between optical depth and the ratio of reflected to transmitted light for a semiinfinite layer. Comparisons of principal plane radiances observed by the CFOS with Monte Carlo computations for a water cloud at 0.7 micron show excellent agreement. Initial applications of the CFOS are discussed.
Federated and Cloud Enabled Resources for Data Management and Utilization

NASA Astrophysics Data System (ADS)

Rankin, R.; Gordon, M.; Potter, R. G.; Satchwill, B.

2011-12-01

The emergence of cloud computing over the past three years has led to a paradigm shift in how data can be managed, processed and made accessible. Building on the federated data management system offered through the Canadian Space Science Data Portal (www.cssdp.ca), we demonstrate how heterogeneous and geographically distributed data sets and modeling tools have been integrated to form a virtual data center and computational modeling platform that has services for data processing and visualization embedded within it. We also discuss positive and negative experiences in utilizing Eucalyptus and OpenStack cloud applications, and job scheduling facilitated by Condor and Star Cluster. We summarize our findings by demonstrating use of these technologies in the Cloud Enabled Space Weather Data Assimilation and Modeling Platform CESWP (www.ceswp.ca), which is funded through Canarie's (canarie.ca) Network Enabled Platforms program in Canada.
Jade: using on-demand cloud analysis to give scientists back their flow

NASA Astrophysics Data System (ADS)

Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.

2017-12-01

The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.
How NASA is Building a Petabyte Scale Geospatial Archive in the Cloud

NASA Technical Reports Server (NTRS)

Pilone, Dan; Quinn, Patrick; Jazayeri, Alireza; Baynes, Kathleen; Murphy, Kevin J.

2018-01-01

NASA's Earth Observing System Data and Information System (EOSDIS) is working towards a vision of a cloud-based, highly-flexible, ingest, archive, management, and distribution system for its ever-growing and evolving data holdings. This free and open source system, Cumulus, is emerging from its prototyping stages and is poised to make a huge impact on how NASA manages and disseminates its Earth science data. This talk outlines the motivation for this work, present the achievements and hurdles of the past 18 months and charts a course for the future expansion of Cumulus. We explore not just the technical, but also the socio-technical challenges that we face in evolving a system of this magnitude into the cloud. The NASA EOSDIS archive is currently at nearly 30 PBs and will grow to over 300PBs in the coming years. We've presented progress on this effort at AWS re:Invent and the American Geophysical Union (AGU) Fall Meeting in 2017 and hope to have the opportunity to share with FOSS4G attendees information on the availability of the open sourced software and how NASA intends on making its Earth Observing Geospatial data available for free to the public in the cloud.
AstroCloud, a Cyber-Infrastructure for Astronomy Research: Overview

NASA Astrophysics Data System (ADS)

Cui, C.; Yu, C.; Xiao, J.; He, B.; Li, C.; Fan, D.; Wang, C.; Hong, Z.; Li, S.; Mi, L.; Wan, W.; Cao, Z.; Wang, J.; Yin, S.; Fan, Y.; Wang, J.

2015-09-01

AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Tasks such as proposal submission, proposal peer-review, data archiving, data quality control, data release and open access, Cloud based data processing and analyzing, will be all supported on the platform. It will act as a full lifecycle management system for astronomical data and telescopes. Achievements from international Virtual Observatories and Cloud Computing are adopted heavily. In this paper, backgrounds of the project, key features of the system, and latest progresses are introduced.
Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies

PubMed Central

Zhao, Shanrong; Prenger, Kurt; Smith, Lance

2013-01-01

RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets. PMID:25937948
IoT gateways, cloud and the last mile for energy efficiency and sustainability in the era of CPS expansion: "A bot is irrigating my farm.. "

NASA Astrophysics Data System (ADS)

Papageorgas, Panagiotis G.; Agavanakis, Kyriakos; Dogas, Ioannis; Piromalis, Dimitrios D.

2018-05-01

A cloud-based architecture is presented for the internetworking of sensors and actuators through a universal gateway, network server and application user interface design. The proposed approach targets to Energy Efficiency and sustainability in a holistic way, by integrating an open-source test bed prototype based on long-range low-bandwidth wireless networking technology for sensing and actuation as the elementary block of a viable, cost-effective and reliable solution. The prototype presented is capable of supporting both sensors and actuators, processing data locally and transmitting the results of the imposed computations to a higher level node. Additionally, it is combined with a service-oriented architecture and involves publish/subscribe middleware protocols and cloud technology to confront with the system needs in terms of data volume and processing power. In this context, the integration of instant message (chat) services is demonstrated so that they can be part of an emerging global-scope eco-system of Cyber-Physical Systems to support a wide variety of IoT applications, with strong advantages such as usability, scalability and security, while adopting a unified gateway design and a simple - yet powerful - user interface.
D Data Acquisition Based on Opencv for Close-Range Photogrammetry Applications

NASA Astrophysics Data System (ADS)

Jurjević, L.; Gašparović, M.

2017-05-01

Development of the technology in the area of the cameras, computers and algorithms for 3D the reconstruction of the objects from the images resulted in the increased popularity of the photogrammetry. Algorithms for the 3D model reconstruction are so advanced that almost anyone can make a 3D model of photographed object. The main goal of this paper is to examine the possibility of obtaining 3D data for the purposes of the close-range photogrammetry applications, based on the open source technologies. All steps of obtaining 3D point cloud are covered in this paper. Special attention is given to the camera calibration, for which two-step process of calibration is used. Both, presented algorithm and accuracy of the point cloud are tested by calculating the spatial difference between referent and produced point clouds. During algorithm testing, robustness and swiftness of obtaining 3D data is noted, and certainly usage of this and similar algorithms has a lot of potential in the real-time application. That is the reason why this research can find its application in the architecture, spatial planning, protection of cultural heritage, forensic, mechanical engineering, traffic management, medicine and other sciences.
Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

PubMed

Zhao, Shanrong; Prenger, Kurt; Smith, Lance

2013-01-01

RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.

a Low-Cost Panoramic Camera for the 3d Documentation of Contaminated Crime Scenes

NASA Astrophysics Data System (ADS)

Abate, D.; Toschi, I.; Sturdy-Colls, C.; Remondino, F.

2017-11-01

Crime scene documentation is a fundamental task which has to be undertaken in a fast, accurate and reliable way, highlighting evidence which can be further used for ensuring justice for victims and for guaranteeing the successful prosecution of perpetrators. The main focus of this paper is on the documentation of a typical crime scene and on the rapid recording of any possible contamination that could have influenced its original appearance. A 3D reconstruction of the environment is first generated by processing panoramas acquired with the low-cost Ricoh Theta 360 camera, and further analysed to highlight potentials and limits of this emerging and consumer-grade technology. Then, a methodology is proposed for the rapid recording of changes occurring between the original and the contaminated crime scene. The approach is based on an automatic 3D feature-based data registration, followed by a cloud-to-cloud distance computation, given as input the 3D point clouds generated before and after e.g. the misplacement of evidence. All the algorithms adopted for panoramas pre-processing, photogrammetric 3D reconstruction, 3D geometry registration and analysis, are presented and currently available in open-source or low-cost software solutions.
Generic Module for Collecting Data in Smart Cities

NASA Astrophysics Data System (ADS)

Martinez, A.; Ramirez, F.; Estrada, H.; Torres, L. A.

2017-09-01

The Future Internet brings new technologies to the common life of people, such as Internet of Things, Cloud Computing or Big Data. All this technologies have change the way people communicate and also the way the devices interact with the context, giving rise to new paradigms, as the case of smart cities. Currently, the mobile devices represent one of main sources of information for new applications that take into account the user context, such as apps for mobility, health, of security. Several platforms have been proposed that consider the development of Future Internet applications, however, no generic modules can be found that implement the collection of context data from smartphones. In this research work we present a generic module to collect data from different sensors of the mobile devices and also to send, in a standard manner, this data to the Open FIWARE Cloud to be stored or analyzed by software tools. The proposed module enables the human-as-a-sensor approach for FIWARE Platform.
OMPC: an Open-Source MATLAB-to-Python Compiler.

PubMed

Jurica, Peter; van Leeuwen, Cees

2009-01-01

Free access to scientific information facilitates scientific progress. Open-access scientific journals are a first step in this direction; a further step is to make auxiliary and supplementary materials that accompany scientific publications, such as methodological procedures and data-analysis tools, open and accessible to the scientific community. To this purpose it is instrumental to establish a software base, which will grow toward a comprehensive free and open-source language of technical and scientific computing. Endeavors in this direction are met with an important obstacle. MATLAB((R)), the predominant computation tool in many fields of research, is a closed-source commercial product. To facilitate the transition to an open computation platform, we propose Open-source MATLAB((R))-to-Python Compiler (OMPC), a platform that uses syntax adaptation and emulation to allow transparent import of existing MATLAB((R)) functions into Python programs. The imported MATLAB((R)) modules will run independently of MATLAB((R)), relying on Python's numerical and scientific libraries. Python offers a stable and mature open source platform that, in many respects, surpasses commonly used, expensive commercial closed source packages. The proposed software will therefore facilitate the transparent transition towards a free and general open-source lingua franca for scientific computation, while enabling access to the existing methods and algorithms of technical computing already available in MATLAB((R)). OMPC is available at http://ompc.juricap.com.
The design of an m-Health monitoring system based on a cloud computing platform

NASA Astrophysics Data System (ADS)

Xu, Boyi; Xu, Lida; Cai, Hongming; Jiang, Lihong; Luo, Yang; Gu, Yizhi

2017-01-01

Compared to traditional medical services provided within hospitals, m-Health monitoring systems (MHMSs) face more challenges in personalised health data processing. To achieve personalised and high-quality health monitoring by means of new technologies, such as mobile network and cloud computing, in this paper, a framework of an m-Health monitoring system based on a cloud computing platform (Cloud-MHMS) is designed to implement pervasive health monitoring. Furthermore, the modules of the framework, which are Cloud Storage and Multiple Tenants Access Control Layer, Healthcare Data Annotation Layer, and Healthcare Data Analysis Layer, are discussed. In the data storage layer, a multiple tenant access method is designed to protect patient privacy. In the data annotation layer, linked open data are adopted to augment health data interoperability semantically. In the data analysis layer, the process mining algorithm and similarity calculating method are implemented to support personalised treatment plan selection. These three modules cooperate to implement the core functions in the process of health monitoring, which are data storage, data processing, and data analysis. Finally, we study the application of our architecture in the monitoring of antimicrobial drug usage to demonstrate the usability of our method in personal healthcare analysis.
BioBlocks: Programming Protocols in Biology Made Easier.

PubMed

Gupta, Vishal; Irimia, Jesús; Pau, Iván; Rodríguez-Patón, Alfonso

2017-07-21

The methods to execute biological experiments are evolving. Affordable fluid handling robots and on-demand biology enterprises are making automating entire experiments a reality. Automation offers the benefit of high-throughput experimentation, rapid prototyping, and improved reproducibility of results. However, learning to automate and codify experiments is a difficult task as it requires programming expertise. Here, we present a web-based visual development environment called BioBlocks for describing experimental protocols in biology. It is based on Google's Blockly and Scratch, and requires little or no experience in computer programming to automate the execution of experiments. The experiments can be specified, saved, modified, and shared between multiple users in an easy manner. BioBlocks is open-source and can be customized to execute protocols on local robotic platforms or remotely, that is, in the cloud. It aims to serve as a de facto open standard for programming protocols in Biology.
Impacts of cloud water droplets on the OH production rate from peroxide photolysis.

PubMed

Martins-Costa, M T C; Anglada, J M; Francisco, J S; Ruiz-López, Manuel F

2017-12-06

Understanding the difference between observed and modeled concentrations of HO x radicals in the troposphere is a current major issue in atmospheric chemistry. It is widely believed that existing atmospheric models miss a source of such radicals and several potential new sources have been proposed. In recent years, interest has increased on the role played by cloud droplets and organic aerosols. Computer modeling of ozone photolysis, for instance, has shown that atmospheric aqueous interfaces accelerate the associated OH production rate by as much as 3-4 orders of magnitude. Since methylhydroperoxide is a main source and sink of HO x radicals, especially at low NO x concentrations, it is fundamental to assess what is the influence of clouds on its chemistry and photochemistry. In this study, computer simulations for the photolysis of methylhydroperoxide at the air-water interface have been carried out showing that the OH production rate is severely enhanced, reaching a comparable level to ozone photolysis.
Leveraging Open Standards and Technologies to Enhance Community Access to Earth Science Lidar Data

NASA Astrophysics Data System (ADS)

Crosby, C. J.; Nandigam, V.; Krishnan, S.; Cowart, C.; Baru, C.; Arrowsmith, R.

2011-12-01

Lidar (Light Detection and Ranging) data, collected from space, airborne and terrestrial platforms, have emerged as an invaluable tool for a variety of Earth science applications ranging from ice sheet monitoring to modeling of earth surface processes. However, lidar present a unique suite of challenges from the perspective of building cyberinfrastructure systems that enable the scientific community to access these valuable research datasets. Lidar data are typically characterized by millions to billions of individual measurements of x,y,z position plus attributes; these "raw" data are also often accompanied by derived raster products and are frequently terabytes in size. As a relatively new and rapidly evolving data collection technology, relevant open data standards and software projects are immature compared to those for other remote sensing platforms. The NSF-funded OpenTopography Facility project has developed an online lidar data access and processing system that co-locates data with on-demand processing tools to enable users to access both raw point cloud data as well as custom derived products and visualizations. OpenTopography is built on a Service Oriented Architecture (SOA) in which applications and data resources are deployed as standards compliant (XML and SOAP) Web services with the open source Opal Toolkit. To develop the underlying applications for data access, filtering and conversion, and various processing tasks, OpenTopography has heavily leveraged existing open source software efforts for both lidar and raster data. Operating on the de facto LAS binary point cloud format (maintained by ASPRS), open source libLAS and LASlib libraries provide OpenTopography data ingestion, query and translation capabilities. Similarly, raster data manipulation is performed through a suite of services built on the Geospatial Data Abstraction Library (GDAL). OpenTopography has also developed our own algorithm for high-performance gridding of lidar point cloud data, Points2Grid, and have released the code as an open source project. An emerging conversation that the lidar community and OpenTopography are actively engaged in is the need for open, community supported standards and metadata for both full waveform and terrestrial (waveform and discrete return) lidar data. Further, given the immature nature of many lidar data archives and limited online access to public domain data, there is an opportunity to develop interoperable data catalogs based on an open standard such as the OGC CSW specification to facilitate discovery and access to Earth science oriented lidar data.
OpenNEX, a private-public partnership in support of the national climate assessment

NASA Astrophysics Data System (ADS)

Nemani, R. R.; Wang, W.; Michaelis, A.; Votava, P.; Ganguly, S.

2016-12-01

The NASA Earth Exchange (NEX) is a collaborative computing platform that has been developed with the objective of bringing scientists together with the software tools, massive global datasets, and supercomputing resources necessary to accelerate research in Earth systems science and global change. NEX is funded as an enabling tool for sustaining the national climate assessment. Over the past five years, researchers have used the NEX platform and produced a number of data sets highly relevant to the National Climate Assessment. These include high-resolution climate projections using different downscaling techniques and trends in historical climate from satellite data. To enable a broader community in exploiting the above datasets, the NEX team partnered with public cloud providers to create the OpenNEX platform. OpenNEX provides ready access to NEX data holdings on a number of public cloud platforms along with pertinent analysis tools and workflows in the form of Machine Images and Docker Containers, lectures and tutorials by experts. We will showcase some of the applications of OpenNEX data and tools by the community on Amazon Web Services, Google Cloud and the NEX Sandbox.
A pilot biomedical engineering course in rapid prototyping for mobile health.

PubMed

Stokes, Todd H; Venugopalan, Janani; Hubbard, Elena N; Wang, May D

2013-01-01

Rapid prototyping of medically assistive mobile devices promises to fuel innovation and provides opportunity for hands-on engineering training in biomedical engineering curricula. This paper presents the design and outcomes of a course offered during a 16-week semester in Fall 2011 with 11 students enrolled. The syllabus covered a mobile health design process from end-to-end, including storyboarding, non-functional prototypes, integrated circuit programming, 3D modeling, 3D printing, cloud computing database programming, and developing patient engagement through animated videos describing the benefits of a new device. Most technologies presented in this class are open source and thus provide unlimited "hackability". They are also cost-effective and easily transferrable to other departments.
A review on the state-of-the-art privacy-preserving approaches in the e-health clouds.

PubMed

Abbas, Assad; Khan, Samee U

2014-07-01

Cloud computing is emerging as a new computing paradigm in the healthcare sector besides other business domains. Large numbers of health organizations have started shifting the electronic health information to the cloud environment. Introducing the cloud services in the health sector not only facilitates the exchange of electronic medical records among the hospitals and clinics, but also enables the cloud to act as a medical record storage center. Moreover, shifting to the cloud environment relieves the healthcare organizations of the tedious tasks of infrastructure management and also minimizes development and maintenance costs. Nonetheless, storing the patient health data in the third-party servers also entails serious threats to data privacy. Because of probable disclosure of medical records stored and exchanged in the cloud, the patients' privacy concerns should essentially be considered when designing the security and privacy mechanisms. Various approaches have been used to preserve the privacy of the health information in the cloud environment. This survey aims to encompass the state-of-the-art privacy-preserving approaches employed in the e-Health clouds. Moreover, the privacy-preserving approaches are classified into cryptographic and noncryptographic approaches and taxonomy of the approaches is also presented. Furthermore, the strengths and weaknesses of the presented approaches are reported and some open issues are highlighted.
MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants.

PubMed

Elshazly, Hatem; Souilmi, Yassine; Tonellato, Peter J; Wall, Dennis P; Abouelhoda, Mohamed

2017-01-20

Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
Looking at Clouds from All Sides Now

ERIC Educational Resources Information Center

Katz, Richard N.

2010-01-01

On February 9 and 10, 2010, fifty leaders from colleges, universities, corporations, professional associations, and state networks met in Tempe, Arizona, to discuss cloud computing and the impending shift in the mix of where infrastructure, applications, and services are sourced. This group identified a set of actions that colleges and…
Point Cloud Management Through the Realization of the Intelligent Cloud Viewer Software

NASA Astrophysics Data System (ADS)

Costantino, D.; Angelini, M. G.; Settembrini, F.

2017-05-01

The paper presents a software dedicated to the elaboration of point clouds, called Intelligent Cloud Viewer (ICV), made in-house by AESEI software (Spin-Off of Politecnico di Bari), allowing to view point cloud of several tens of millions of points, also on of "no" very high performance systems. The elaborations are carried out on the whole point cloud and managed by means of the display only part of it in order to speed up rendering. It is designed for 64-bit Windows and is fully written in C ++ and integrates different specialized modules for computer graphics (Open Inventor by SGI, Silicon Graphics Inc), maths (BLAS, EIGEN), computational geometry (CGAL, Computational Geometry Algorithms Library), registration and advanced algorithms for point clouds (PCL, Point Cloud Library), advanced data structures (BOOST, Basic Object Oriented Supporting Tools), etc. ICV incorporates a number of features such as, for example, cropping, transformation and georeferencing, matching, registration, decimation, sections, distances calculation between clouds, etc. It has been tested on photographic and TLS (Terrestrial Laser Scanner) data, obtaining satisfactory results. The potentialities of the software have been tested by carrying out the photogrammetric survey of the Castel del Monte which was already available in previous laser scanner survey made from the ground by the same authors. For the aerophotogrammetric survey has been adopted a flight height of approximately 1000ft AGL (Above Ground Level) and, overall, have been acquired over 800 photos in just over 15 minutes, with a covering not less than 80%, the planned speed of about 90 knots.
Collaborative Working Architecture for IoT-Based Applications.

PubMed

Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus

2018-05-23

The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.
Feasibility and demonstration of a cloud-based RIID analysis system

NASA Astrophysics Data System (ADS)

Wright, Michael C.; Hertz, Kristin L.; Johnson, William C.; Sword, Eric D.; Younkin, James R.; Sadler, Lorraine E.

2015-06-01

A significant limitation in the operational utility of handheld and backpack radioisotope identifiers (RIIDs) is the inability of their onboard algorithms to accurately and reliably identify the isotopic sources of the measured gamma-ray energy spectrum. A possible solution is to move the spectral analysis computations to an external device, the cloud, where significantly greater capabilities are available. The implementation and demonstration of a prototype cloud-based RIID analysis system have shown this type of system to be feasible with currently available communication and computational technology. A system study has shown that the potential user community could derive significant benefits from an appropriately implemented cloud-based analysis system and has identified the design and operational characteristics required by the users and stakeholders for such a system. A general description of the hardware and software necessary to implement reliable cloud-based analysis, the value of the cloud expressed by the user community, and the aspects of the cloud implemented in the demonstrations are discussed.
AtomPy: an open atomic-data curation environment

NASA Astrophysics Data System (ADS)

Bautista, Manuel; Mendoza, Claudio; Boswell, Josiah S; Ajoku, Chukwuemeka

2014-06-01

We present a cloud-computing environment for atomic data curation, networking among atomic data providers and users, teaching-and-learning, and interfacing with spectral modeling software. The system is based on Google-Drive Sheets, Pandas (Python Data Analysis Library) DataFrames, and IPython Notebooks for open community-driven curation of atomic data for scientific and technological applications. The atomic model for each ionic species is contained in a multi-sheet Google-Drive workbook, where the atomic parameters from all known public sources are progressively stored. Metadata (provenance, community discussion, etc.) accompanying every entry in the database are stored through Notebooks. Education tools on the physics of atomic processes as well as their relevance to plasma and spectral modeling are based on IPython Notebooks that integrate written material, images, videos, and active computer-tool workflows. Data processing workflows and collaborative software developments are encouraged and managed through the GitHub social network. Relevant issues this platform intends to address are: (i) data quality by allowing open access to both data producers and users in order to attain completeness, accuracy, consistency, provenance and currentness; (ii) comparisons of different datasets to facilitate accuracy assessment; (iii) downloading to local data structures (i.e. Pandas DataFrames) for further manipulation and analysis by prospective users; and (iv) data preservation by avoiding the discard of outdated sets.
Real-time WAMI streaming target tracking in fog

NASA Astrophysics Data System (ADS)

Chen, Yu; Blasch, Erik; Chen, Ning; Deng, Anna; Ling, Haibin; Chen, Genshe

2016-05-01

Real-time information fusion based on WAMI (Wide-Area Motion Imagery), FMV (Full Motion Video), and Text data is highly desired for many mission critical emergency or security applications. Cloud Computing has been considered promising to achieve big data integration from multi-modal sources. In many mission critical tasks, however, powerful Cloud technology cannot satisfy the tight latency tolerance as the servers are allocated far from the sensing platform, actually there is no guaranteed connection in the emergency situations. Therefore, data processing, information fusion, and decision making are required to be executed on-site (i.e., near the data collection). Fog Computing, a recently proposed extension and complement for Cloud Computing, enables computing on-site without outsourcing jobs to a remote Cloud. In this work, we have investigated the feasibility of processing streaming WAMI in the Fog for real-time, online, uninterrupted target tracking. Using a single target tracking algorithm, we studied the performance of a Fog Computing prototype. The experimental results are very encouraging that validated the effectiveness of our Fog approach to achieve real-time frame rates.
Spatially explicit spectral analysis of point clouds and geospatial data

USGS Publications Warehouse

Buscombe, Daniel D.

2015-01-01

The increasing use of spatially explicit analyses of high-resolution spatially distributed data (imagery and point clouds) for the purposes of characterising spatial heterogeneity in geophysical phenomena necessitates the development of custom analytical and computational tools. In recent years, such analyses have become the basis of, for example, automated texture characterisation and segmentation, roughness and grain size calculation, and feature detection and classification, from a variety of data types. In this work, much use has been made of statistical descriptors of localised spatial variations in amplitude variance (roughness), however the horizontal scale (wavelength) and spacing of roughness elements is rarely considered. This is despite the fact that the ratio of characteristic vertical to horizontal scales is not constant and can yield important information about physical scaling relationships. Spectral analysis is a hitherto under-utilised but powerful means to acquire statistical information about relevant amplitude and wavelength scales, simultaneously and with computational efficiency. Further, quantifying spatially distributed data in the frequency domain lends itself to the development of stochastic models for probing the underlying mechanisms which govern the spatial distribution of geological and geophysical phenomena. The software packagePySESA (Python program for Spatially Explicit Spectral Analysis) has been developed for generic analyses of spatially distributed data in both the spatial and frequency domains. Developed predominantly in Python, it accesses libraries written in Cython and C++ for efficiency. It is open source and modular, therefore readily incorporated into, and combined with, other data analysis tools and frameworks with particular utility for supporting research in the fields of geomorphology, geophysics, hydrography, photogrammetry and remote sensing. The analytical and computational structure of the toolbox is described, and its functionality illustrated with an example of a high-resolution bathymetric point cloud data collected with multibeam echosounder.
Cloud-Based Tools to Support High-Resolution Modeling (Invited)

NASA Astrophysics Data System (ADS)

Jones, N.; Nelson, J.; Swain, N.; Christensen, S.

2013-12-01

The majority of watershed models developed to support decision-making by water management agencies are simple, lumped-parameter models. Maturity in research codes and advances in the computational power from multi-core processors on desktop machines, commercial cloud-computing resources, and supercomputers with thousands of cores have created new opportunities for employing more accurate, high-resolution distributed models for routine use in decision support. The barriers for using such models on a more routine basis include massive amounts of spatial data that must be processed for each new scenario and lack of efficient visualization tools. In this presentation we will review a current NSF-funded project called CI-WATER that is intended to overcome many of these roadblocks associated with high-resolution modeling. We are developing a suite of tools that will make it possible to deploy customized web-based apps for running custom scenarios for high-resolution models with minimal effort. These tools are based on a software stack that includes 52 North, MapServer, PostGIS, HT Condor, CKAN, and Python. This open source stack provides a simple scripting environment for quickly configuring new custom applications for running high-resolution models as geoprocessing workflows. The HT Condor component facilitates simple access to local distributed computers or commercial cloud resources when necessary for stochastic simulations. The CKAN framework provides a powerful suite of tools for hosting such workflows in a web-based environment that includes visualization tools and storage of model simulations in a database to archival, querying, and sharing of model results. Prototype applications including land use change, snow melt, and burned area analysis will be presented. This material is based upon work supported by the National Science Foundation under Grant No. 1135482
Genomics and privacy: implications of the new reality of closed data for the field.

PubMed

Greenbaum, Dov; Sboner, Andrea; Mu, Xinmeng Jasmine; Gerstein, Mark

2011-12-01

Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can "slice" and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches-for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums. © 2011 Greenbaum et al.

Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

PubMed

Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

2013-06-27

Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.
Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

PubMed Central

2013-01-01

Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html. PMID:23802613
User's manual for the REEDM (Rocket Exhaust Effluent Diffusion Model) computer program

NASA Technical Reports Server (NTRS)

Bjorklund, J. R.; Dumbauld, R. K.; Cheney, C. S.; Geary, H. V.

1982-01-01

The REEDM computer program predicts concentrations, dosages, and depositions downwind from normal and abnormal launches of rocket vehicles at NASA's Kennedy Space Center. The atmospheric dispersion models, cloud-rise models, and other formulas used in the REEDM model are described mathematically Vehicle and source parameters, other pertinent physical properties of the rocket exhaust cloud, and meteorological layering techniques are presented as well as user's instructions for REEDM. Worked example problems are included.
Building a cloud based distributed active archive data center

NASA Astrophysics Data System (ADS)

Ramachandran, Rahul; Baynes, Katie; Murphy, Kevin

2017-04-01

NASA's Earth Science Data System (ESDS) Program serves as a central cog in facilitating the implementation of NASA's Earth Science strategic plan. Since 1994, the ESDS Program has committed to the full and open sharing of Earth science data obtained from NASA instruments to all users. One of the key responsibilities of the ESDS Program is to continuously evolve the entire data and information system to maximize returns on the collected NASA data. An independent review was conducted in 2015 to holistically review the EOSDIS in order to identify gaps. The review recommendations were to investigate two areas: one, whether commercial cloud providers offer potential for storage, processing, and operational efficiencies, and two, the potential development of new data access and analysis paradigms. In response, ESDS has initiated several prototypes investigating the advantages and risks of leveraging cloud computing. This poster will provide an overview of one such prototyping activity, "Cumulus". Cumulus is being designed and developed as a "native" cloud-based data ingest, archive and management system that can be used for all future NASA Earth science data streams. The long term vision for Cumulus, its requirements, overall architecture, and implementation details, as well as lessons learned from the completion of the first phase of this prototype will be covered. We envision Cumulus will foster design of new analysis/visualization tools to leverage collocated data from all of the distributed DAACs as well as elastic cloud computing resources to open new research opportunities.
A Lightweight Remote Parallel Visualization Platform for Interactive Massive Time-varying Climate Data Analysis

NASA Astrophysics Data System (ADS)

Li, J.; Zhang, T.; Huang, Q.; Liu, Q.

2014-12-01

Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
What's the Point of a Raster ? Advantages of 3D Point Cloud Processing over Raster Based Methods for Accurate Geomorphic Analysis of High Resolution Topography.

NASA Astrophysics Data System (ADS)

Lague, D.

2014-12-01

High Resolution Topographic (HRT) datasets are predominantly stored and analyzed as 2D raster grids of elevations (i.e., Digital Elevation Models). Raster grid processing is common in GIS software and benefits from a large library of fast algorithms dedicated to geometrical analysis, drainage network computation and topographic change measurement. Yet, all instruments or methods currently generating HRT datasets (e.g., ALS, TLS, SFM, stereo satellite imagery) output natively 3D unstructured point clouds that are (i) non-regularly sampled, (ii) incomplete (e.g., submerged parts of river channels are rarely measured), and (iii) include 3D elements (e.g., vegetation, vertical features such as river banks or cliffs) that cannot be accurately described in a DEM. Interpolating the raw point cloud onto a 2D grid generally results in a loss of position accuracy, spatial resolution and in more or less controlled interpolation. Here I demonstrate how studying earth surface topography and processes directly on native 3D point cloud datasets offers several advantages over raster based methods: point cloud methods preserve the accuracy of the original data, can better handle the evaluation of uncertainty associated to topographic change measurements and are more suitable to study vegetation characteristics and steep features of the landscape. In this presentation, I will illustrate and compare Point Cloud based and Raster based workflows with various examples involving ALS, TLS and SFM for the analysis of bank erosion processes in bedrock and alluvial rivers, rockfall statistics (including rockfall volume estimate directly from point clouds) and the interaction of vegetation/hydraulics and sedimentation in salt marshes. These workflows use 2 recently published algorithms for point cloud classification (CANUPO) and point cloud comparison (M3C2) now implemented in the open source software CloudCompare.
GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data

NASA Astrophysics Data System (ADS)

Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.

2016-12-01

Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.
Marine CCN Activation: A Battle Between Primary and Secondary Sources

NASA Astrophysics Data System (ADS)

Fossum, K. N.; Ovadnevaite, J.; Ceburnis, D.; Preissler, J.; O'Dowd, C. D. D.

2017-12-01

Low-altitude marine clouds are cooling components of the Earth's radiative budget, and the direct measurements of the properties of these cloud forming particles, called cloud condensation nuclei (CCN), helps modellers reconstruct aerosol-to-cloud droplet processes, improving climate predictions. In this study, CCN are directly measured (CCNC commercially available from Droplet Measurement Technologies, Inc.), resolving activation efficiency at varying supersaturated conditions. Previous studies show that sub-micron sea salt particulates activate competitively, reducing the cloud peak supersaturation and inhibiting the activation of sulphate particulates into cloud droplets, making chemical composition an important component in determining cloud droplet number concentration (CDNC). This effect and the sea salt numbers needed to induce it have not been previously studied long-term in the natural environment. As part of this work, data was analysed from a two month marine research ship campaign during the Antarctic Austral summer, in 2015. Ambient aerosol in the Scotia Sea region was sampled continuously, and through the use of multiple aerosol in-situ instruments, this study shows that CCN number in both the open ocean and ice-pack influenced air masses are largely proportionate to secondary aerosol. However, open ocean air masses show a significant primary aerosol influence which changes the aerosol characteristics. Higher sea salt mass concentrations in the open ocean lead to better CCN activation efficiencies. Coupled with high wind speeds and sea surface turbulence, open ocean air masses show a repression of the CDNC number compared with the theoretical values that should be expected with the sub-cloud aerosol number concentration. This is not seen in the ice-pack air masses. Work is ongoing, looking into a long-term North Atlantic marine aerosol data set, but it would appear that chemical composition plays a large role in aerosol to cloud droplet processes, and can initially restrict CDNC when sea salt is abundant and updraft velocities are relatively low.
Matsu: An Elastic Cloud Connected to a SensorWeb for Disaster Response

NASA Technical Reports Server (NTRS)

Mandl, Daniel

2011-01-01

This slide presentation reviews the use of cloud computing combined with the SensorWeb in aiding disaster response planning. Included is an overview of the architecture of the SensorWeb, and overviews of the phase 1 of the EO-1 system and the steps to improve it to transform it to an On-demand product cloud as part of the Open Cloud Consortium (OCC). The effectiveness of this system is demonstrated in the SensorWeb for the Namibia flood in 2010, using information blended from MODIS, TRMM, River Gauge data, and the Google Earth version of Namibia the system enabled river surge predictions and could enable planning for future disaster responses.
CyVerse Data Commons: lessons learned in cyberinfrastructure management and data hosting from the Life Sciences

NASA Astrophysics Data System (ADS)

Swetnam, T. L.; Walls, R.; Merchant, N.

2017-12-01

CyVerse, is a US National Science Foundation funded initiative "to design, deploy, and expand a national cyberinfrastructure for life sciences research, and to train scientists in its use," supporting and enabling cross disciplinary collaborations across institutions. CyVerse' free, open-source, cyberinfrastructure is being adopted into biogeoscience and space sciences research. CyVerse data-science agnostic platforms provide shared data storage, high performance computing, and cloud computing that allow analysis of very large data sets (including incomplete or work-in-progress data sets). Part of CyVerse success has been in addressing the handling of data through its entire lifecycle, from creation to final publication in a digital data repository to reuse in new analyses. CyVerse developers and user communities have learned many lessons that are germane to Earth and Environmental Science. We present an overview of the tools and services available through CyVerse including: interactive computing with the Discovery Environment (https://de.cyverse.org/), an interactive data science workbench featuring data storage and transfer via the Data Store; cloud computing with Atmosphere (https://atmo.cyverse.org); and access to HPC via Agave API (https://agaveapi.co/). Each CyVerse service emphasizes access to long term data storage, including our own Data Commons (http://datacommons.cyverse.org), as well as external repositories. The Data Commons service manages, organizes, preserves, publishes, allows for discovery and reuse of data. All data published to CyVerse's Curated Data receive a permanent identifier (PID) in the form of a DOI (Digital Object Identifier) or ARK (Archival Resource Key). Data that is more fluid can also be published in the Data commons through Community Collaborated data. The Data Commons provides landing pages, permanent DOIs or ARKs, and supports data reuse and citation through features such as open data licenses and downloadable citations. The ability to access and do computing on data within the CyVerse framework or with external compute resources when necessary, has proven highly beneficial to our user community, which has continuously grown since the inception of CyVerse nine years ago.
Comparing and characterizing three-dimensional point clouds derived by structure from motion photogrammetry

NASA Astrophysics Data System (ADS)

Schwind, Michael

Structure from Motion (SfM) is a photogrammetric technique whereby three-dimensional structures (3D) are estimated from overlapping two-dimensional (2D) image sequences. It is studied in the field of computer vision and utilized in fields such as archeology, engineering, and the geosciences. Currently, many SfM software packages exist that allow for the generation of 3D point clouds. Little work has been done to show how topographic data generated from these software differ over varying terrain types and why they might produce different results. This work aims to compare and characterize the differences between point clouds generated by three different SfM software packages: two well-known proprietary solutions (Pix4D, Agisoft PhotoScan) and one open source solution (OpenDroneMap). Five terrain types were imaged utilizing a DJI Phantom 3 Professional small unmanned aircraft system (sUAS). These terrain types include a marsh environment, a gently sloped sandy beach and jetties, a forested peninsula, a house, and a flat parking lot. Each set of imagery was processed with each software and then directly compared to each other. Before processing the sets of imagery, the software settings were analyzed and chosen in a manner that allowed for the most similar settings to be set across the three software types. This was done in an attempt to minimize point cloud differences caused by dissimilar settings. The characteristics of the resultant point clouds were then compared with each other. Furthermore, a terrestrial light detection and ranging (LiDAR) survey was conducted over the flat parking lot using a Riegl VZ- 400 scanner. This data served as ground truth in order to conduct an accuracy assessment of the sUAS-SfM point clouds. Differences were found between the different results, apparent not only in the characteristics of the clouds, but also the accuracy. This study allows for users of SfM photogrammetry to have a better understanding of how different processing software compare and the inherent sensitivity of SfM automation in 3D reconstruction. Because this study used mostly default settings within the software, it would be beneficial for further research to investigate the effects of changing parameters have on the fidelity of point cloud datasets generated from different SfM software packages.
Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application.

PubMed

Hanwell, Marcus D; de Jong, Wibe A; Harris, Christopher J

2017-10-30

An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and offer command-line tools to automate interaction-connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platform with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefits of the connected web-going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript-based web client.
Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application

DOE PAGES

Hanwell, Marcus D.; de Jong, Wibe A.; Harris, Christopher J.

2017-10-30

An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and offer command-line tools to automate interaction - connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platformmore » with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefits of the connected web - going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript-based web client.« less
Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hanwell, Marcus D.; de Jong, Wibe A.; Harris, Christopher J.

An end-to-end platform for chemical science research has been developed that integrates data from computational and experimental approaches through a modern web-based interface. The platform offers an interactive visualization and analytics environment that functions well on mobile, laptop and desktop devices. It offers pragmatic solutions to ensure that large and complex data sets are more accessible. Existing desktop applications/frameworks were extended to integrate with high-performance computing resources, and offer command-line tools to automate interaction - connecting distributed teams to this software platform on their own terms. The platform was developed openly, and all source code hosted on the GitHub platformmore » with automated deployment possible using Ansible coupled with standard Ubuntu-based machine images deployed to cloud machines. The platform is designed to enable teams to reap the benefits of the connected web - going beyond what conventional search and analytics platforms offer in this area. It also has the goal of offering federated instances, that can be customized to the sites/research performed. Data gets stored using JSON, extending upon previous approaches using XML, building structures that support computational chemistry calculations. These structures were developed to make it easy to process data across different languages, and send data to a JavaScript-based web client.« less
Multi-source Geospatial Data Analysis with Google Earth Engine

NASA Astrophysics Data System (ADS)

Erickson, T.

2014-12-01

The Google Earth Engine platform is a cloud computing environment for data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog is a multi-petabyte archive of georeferenced datasets that include images from Earth observing satellite and airborne sensors (examples: USGS Landsat, NASA MODIS, USDA NAIP), weather and climate datasets, and digital elevation models. Earth Engine supports both a just-in-time computation model that enables real-time preview and debugging during algorithm development for open-ended data exploration, and a batch computation mode for applying algorithms over large spatial and temporal extents. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, and resampling, which facilitates writing algorithms that combine data from multiple sensors and/or models. Although the primary use of Earth Engine, to date, has been the analysis of large Earth observing satellite datasets, the computational platform is generally applicable to a wide variety of use cases that require large-scale geospatial data analyses. This presentation will focus on how Earth Engine facilitates the analysis of geospatial data streams that originate from multiple separate sources (and often communities) and how it enables collaboration during algorithm development and data exploration. The talk will highlight current projects/analyses that are enabled by this functionality.https://earthengine.google.org
The Computable Catchment: An executable document for model-data software sharing, reproducibility and interactive visualization

NASA Astrophysics Data System (ADS)

Gil, Y.; Duffy, C.

2015-12-01

This paper proposes the concept of a "Computable Catchment" which is used to develop a collaborative platform for watershed modeling and data analysis. The object of the research is a sharable, executable document similar to a pdf, but one that includes documentation of the underlying theoretical concepts, interactive computational/numerical resources, linkage to essential data repositories and the ability for interactive model-data visualization and analysis. The executable document for each catchment is stored in the cloud with automatic provisioning and a unique identifier allowing collaborative model and data enhancements for historical hydroclimatic reconstruction and/or future landuse or climate change scenarios to be easily reconstructed or extended. The Computable Catchment adopts metadata standards for naming all variables in the model and the data. The a-priori or initial data is derived from national data sources for soils, hydrogeology, climate, and land cover available from the www.hydroterre.psu.edu data service (Leonard and Duffy, 2015). The executable document is based on Wolfram CDF or Computable Document Format with an interactive open-source reader accessible by any modern computing platform. The CDF file and contents can be uploaded to a website or simply shared as a normal document maintaining all interactive features of the model and data. The Computable Catchment concept represents one application for Geoscience Papers of the Future representing an extensible document that combines theory, models, data and analysis that are digitally shared, documented and reused among research collaborators, students, educators and decision makers.
Community-driven computational biology with Debian Linux

PubMed Central

2010-01-01

Background The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. Results The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlying hardware and the analysis pipelines can be shared along with the software. Conclusions Debian Med closes the gap between developers and users. It provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology. For geographically distributed teams it can ensure they are working on the same versions of tools, in the same conditions. This contributes to the world-wide networking of researchers. PMID:21210984
Operation of the Australian Store.Synchrotron for macromolecular crystallography

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meyer, Grischa R.; Aragão, David; Mudie, Nathan J.

2014-10-01

The Store.Synchrotron service, a fully functional, cloud computing-based solution to raw X-ray data archiving and dissemination at the Australian Synchrotron, is described. The Store.Synchrotron service, a fully functional, cloud computing-based solution to raw X-ray data archiving and dissemination at the Australian Synchrotron, is described. The service automatically receives and archives raw diffraction data, related metadata and preliminary results of automated data-processing workflows. Data are able to be shared with collaborators and opened to the public. In the nine months since its deployment in August 2013, the service has handled over 22.4 TB of raw data (∼1.7 million diffraction images). Severalmore » real examples from the Australian crystallographic community are described that illustrate the advantages of the approach, which include real-time online data access and fully redundant, secure storage. Discoveries in biological sciences increasingly require multidisciplinary approaches. With this in mind, Store.Synchrotron has been developed as a component within a greater service that can combine data from other instruments at the Australian Synchrotron, as well as instruments at the Australian neutron source ANSTO. It is therefore envisaged that this will serve as a model implementation of raw data archiving and dissemination within the structural biology research community.« less
Massive stereo-based DTM production for Mars on cloud computers

NASA Astrophysics Data System (ADS)

Tao, Y.; Muller, J.-P.; Sidiropoulos, P.; Xiong, Si-Ting; Putri, A. R. D.; Walter, S. H. G.; Veitch-Michaelis, J.; Yershov, V.

2018-05-01

Digital Terrain Model (DTM) creation is essential to improving our understanding of the formation processes of the Martian surface. Although there have been previous demonstrations of open-source or commercial planetary 3D reconstruction software, planetary scientists are still struggling with creating good quality DTMs that meet their science needs, especially when there is a requirement to produce a large number of high quality DTMs using "free" software. In this paper, we describe a new open source system to overcome many of these obstacles by demonstrating results in the context of issues found from experience with several planetary DTM pipelines. We introduce a new fully automated multi-resolution DTM processing chain for NASA Mars Reconnaissance Orbiter (MRO) Context Camera (CTX) and High Resolution Imaging Science Experiment (HiRISE) stereo processing, called the Co-registration Ames Stereo Pipeline (ASP) Gotcha Optimised (CASP-GO), based on the open source NASA ASP. CASP-GO employs tie-point based multi-resolution image co-registration, and Gotcha sub-pixel refinement and densification. CASP-GO pipeline is used to produce planet-wide CTX and HiRISE DTMs that guarantee global geo-referencing compliance with respect to High Resolution Stereo Colour imaging (HRSC), and thence to the Mars Orbiter Laser Altimeter (MOLA); providing refined stereo matching completeness and accuracy. All software and good quality products introduced in this paper are being made open-source to the planetary science community through collaboration with NASA Ames, United States Geological Survey (USGS) and the Jet Propulsion Laboratory (JPL), Advanced Multi-Mission Operations System (AMMOS) Planetary Data System (PDS) Pipeline Service (APPS-PDS4), as well as browseable and visualisable through the iMars web based Geographic Information System (webGIS) system.
The Namibia Early Flood Warning System, A CEOS Pilot Project

NASA Technical Reports Server (NTRS)

Mandl, Daniel; Frye, Stuart; Cappelaere, Pat; Sohlberg, Robert; Handy, Matthew; Grossman, Robert

2012-01-01

Over the past year few years, an international collaboration has developed a pilot project under the auspices of Committee on Earth Observation Satellite (CEOS) Disasters team. The overall team consists of civilian satellite agencies. For this pilot effort, the development team consists of NASA, Canadian Space Agency, Univ. of Maryland, Univ. of Colorado, Univ. of Oklahoma, Ukraine Space Research Institute and Joint Research Center(JRC) for European Commission. This development team collaborates with regional , national and international agencies to deliver end-to-end disaster coverage. In particular, the team in collaborating on this effort with the Namibia Department of Hydrology to begin in Namibia . However, the ultimate goal is to expand the functionality to provide early warning over the South Africa region. The initial collaboration was initiated by United Nations Office of Outer Space Affairs and CEOS Working Group for Information Systems and Services (WGISS). The initial driver was to demonstrate international interoperability using various space agency sensors and models along with regional in-situ ground sensors. In 2010, the team created a preliminary semi-manual system to demonstrate moving and combining key data streams and delivering the data to the Namibia Department of Hydrology during their flood season which typically is January through April. In this pilot, a variety of moderate resolution and high resolution satellite flood imagery was rapidly delivered and used in conjunction with flood predictive models in Namibia. This was collected in conjunction with ground measurements and was used to examine how to create a customized flood early warning system. During the first year, the team made use of SensorWeb technology to gather various sensor data which was used to monitor flood waves traveling down basins originating in Angola, but eventually flooding villages in Namibia. The team made use of standardized interfaces such as those articulated under the Open Cloud Consortium (OGC) Sensor Web Enablement (SWE) set of web services was good [1][2]. However, it was discovered that in order to make a system like this functional, there were many performance issues. Data sets were large and located in a variety of location behind firewalls and had to be accessed across open networks, so security was an issue. Furthermore, the network access acted as bottleneck to transfer map products to where they are needed. Finally, during disasters, many users and computer processes act in parallel and thus it was very easy to overload the single string of computers stitched together in a virtual system that was initially developed. To address some of these performance issues, the team partnered with the Open Cloud Consortium (OCC) who supplied a Computation Cloud located at the University of Illinois at Chicago and some manpower to administer this Cloud. The Flood SensorWeb [3] system was interfaced to the Cloud to provide a high performance user interface and product development engine. Figure 1 shows the functional diagram of the Flood SensorWeb. Figure 2 shows some of the functionality of the Computation Cloud that was integrated. A significant portion of the original system was ported to the Cloud and during the past year, technical issues were resolved which included web access to the Cloud, security over the open Internet, beginning experiments on how to handle surge capacity by using the virtual machines in the cloud in parallel, using tiling techniques to render large data sets as layers on map, interfaces to allow user to customize the data processing/product chain and other performance enhancing techniques. The conclusion reached from the effort and this presentation is that defining the interoperability standards in a small fraction of the work. For example, once open web service standards were defined, many users could not make use of the standards due to security restrictions. Furthermore, once an interoperable sysm is functional, then a surge of users can render a system unusable, especially in the disaster domain.

Performing quantum computing experiments in the cloud

NASA Astrophysics Data System (ADS)

Devitt, Simon J.

2016-09-01

Quantum computing technology has reached a second renaissance in the past five years. Increased interest from both the private and public sector combined with extraordinary theoretical and experimental progress has solidified this technology as a major advancement in the 21st century. As anticipated my many, some of the first realizations of quantum computing technology has occured over the cloud, with users logging onto dedicated hardware over the classical internet. Recently, IBM has released the Quantum Experience, which allows users to access a five-qubit quantum processor. In this paper we take advantage of this online availability of actual quantum hardware and present four quantum information experiments. We utilize the IBM chip to realize protocols in quantum error correction, quantum arithmetic, quantum graph theory, and fault-tolerant quantum computation by accessing the device remotely through the cloud. While the results are subject to significant noise, the correct results are returned from the chip. This demonstrates the power of experimental groups opening up their technology to a wider audience and will hopefully allow for the next stage of development in quantum information technology.
An open, interoperable, transdisciplinary approach to a point cloud data service using OGC standards and open source software.

NASA Astrophysics Data System (ADS)

Steer, Adam; Trenham, Claire; Druken, Kelsey; Evans, Benjamin; Wyborn, Lesley

2017-04-01

High resolution point clouds and other topology-free point data sources are widely utilised for research, management and planning activities. A key goal for research and management users is making these data and common derivatives available in a way which is seamlessly interoperable with other observed and modelled data. The Australian National Computational Infrastructure (NCI) stores point data from a range of disciplines, including terrestrial and airborne LiDAR surveys, 3D photogrammetry, airborne and ground-based geophysical observations, bathymetric observations and 4D marine tracers. These data are stored alongside a significant store of Earth systems data including climate and weather, ecology, hydrology, geoscience and satellite observations, and available from NCI's National Environmental Research Data Interoperability Platform (NERDIP) [1]. Because of the NERDIP requirement for interoperability with gridded datasets, the data models required to store these data may not conform to the LAS/LAZ format - the widely accepted community standard for point data storage and transfer. The goal for NCI is making point data discoverable, accessible and useable in ways which allow seamless integration with earth observation datasets and model outputs - in turn assisting researchers and decision-makers in the often-convoluted process of handling and analyzing massive point datasets. With a use-case of providing a web data service and supporting a derived product workflow, NCI has implemented and tested a web-based point cloud service using the Open Geospatial Consortium (OGC) Web Processing Service [2] as a transaction handler between a web-based client and server-side computing tools based on a native Linux operating system. Using this model, the underlying toolset for driving a data service is flexible and can take advantage of NCI's highly scalable research cloud. Present work focusses on the Point Data Abstraction Library (PDAL) [3] as a logical choice for efficiently handling LAS/LAZ based point workflows, and native HDF5 libraries for handling point data kept in HDF5-based structures (eg NetCDF4, SPDlib [4]). Points stored in database tables (eg postgres-pointcloud [5]) will be considered as testing continues. Visualising and exploring massive point datasets in a web browser alongside multiple datasets has been demonstrated by the entwine-3D tiles project [6]. This is a powerful interface which enables users to investigate and select appropriate data, and is also being investigated as a potential front-end to a WPS-based point data service. In this work we show preliminary results for a WPS-based point data access system, in preparation for demonstration at FOSS4G 2017, Boston (http://2017.foss4g.org/) [1] http://nci.org.au/data-collections/nerdip/ [2] http://www.opengeospatial.org/standards/wps [3] http://www.pdal.io [4] http://www.spdlib.org/doku.php [5] https://github.com/pgpointcloud/pointcloud [6] http://cesium.entwine.io
Identifying Meteorological Controls on Open and Closed Mesoscale Cellular Convection Associated with Marine Cold Air Outbreaks

NASA Astrophysics Data System (ADS)

McCoy, Isabel L.; Wood, Robert; Fletcher, Jennifer K.

2017-11-01

Mesoscale cellular convective (MCC) clouds occur in large-scale patterns over the ocean and have important radiative effects on the climate system. An examination of time-varying meteorological conditions associated with satellite-observed open and closed MCC clouds is conducted to illustrate the influence of large-scale meteorological conditions. Marine cold air outbreaks (MCAO) influence the development of open MCC clouds and the transition from closed to open MCC clouds. MCC neural network classifications on Moderate Resolution Imaging Spectroradiometer (MODIS) data for 2008 are collocated with Clouds and the Earth's Radiant Energy System (CERES) data and ERA-Interim reanalysis to determine the radiative effects of MCC clouds and their thermodynamic environments. Closed MCC clouds are found to have much higher albedo on average than open MCC clouds for the same cloud fraction. Three meteorological control metrics are tested: sea-air temperature difference (ΔT), estimated inversion strength (EIS), and a MCAO index (M). These predictive metrics illustrate the importance of atmospheric surface forcing and static stability for open and closed MCC cloud formation. Predictive sigmoidal relations are found between M and MCC cloud frequency globally and regionally: negative for closed MCC cloud and positive for open MCC cloud. The open MCC cloud seasonal cycle is well correlated with M, while the seasonality of closed MCC clouds is well correlated with M in the midlatitudes and EIS in the tropics and subtropics. M is found to best distinguish open and closed MCC clouds on average over shorter time scales. The possibility of a MCC cloud feedback is discussed.
Data-Proximate Analysis and Visualization in the Cloud using Cloudstream, an Open-Source Application Streaming Technology Stack

NASA Astrophysics Data System (ADS)

Fisher, W. I.

2017-12-01

The rise in cloud computing, coupled with the growth of "Big Data", has lead to a migration away from local scientific data storage. The increasing size of remote scientific data sets increase, however, makes it difficult for scientists to subject them to large-scale analysis and visualization. These large datasets can take an inordinate amount of time to download; subsetting is a potential solution, but subsetting services are not yet ubiquitous. Data providers may also pay steep prices, as many cloud providers meter data based on how much data leaves their cloud service. The solution to this problem is a deceptively simple one; move data analysis and visualization tools to the cloud, so that scientists may perform data-proximate analysis and visualization. This results in increased transfer speeds, while egress costs are lowered or completely eliminated. Moving standard desktop analysis and visualization tools to the cloud is enabled via a technique called "Application Streaming". This technology allows a program to run entirely on a remote virtual machine while still allowing for interactivity and dynamic visualizations. When coupled with containerization technology such as Docker, we are able to easily deploy legacy analysis and visualization software to the cloud whilst retaining access via a desktop, netbook, a smartphone, or the next generation of hardware, whatever it may be. Unidata has created a Docker-based solution for easily adapting legacy software for Application Streaming. This technology stack, dubbed Cloudstream, allows desktop software to run in the cloud with little-to-no effort. The docker container is configured by editing text files, and the legacy software does not need to be modified in any way. This work will discuss the underlying technologies used by Cloudstream, and outline how to use Cloudstream to run and access an existing desktop application to the cloud.
Open Source Molecular Modeling

PubMed Central

Pirhadi, Somayeh; Sunseri, Jocelyn; Koes, David Ryan

2016-01-01

The success of molecular modeling and computational chemistry efforts are, by definition, dependent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry. PMID:27631126
Cloud Offload in Hostile Environments

DTIC Science & Technology

2011-12-01

of recognized objects in an input image. FACE: Windows XP C++ application based on the OpenCV library [45]. It returns the coordinates and identities...SOLDIER. Energy-Efficient Technolo- gies for the Dismounted Soldier”. National Research Council, 1997. [16] COMMITTEE ON SOLDIER POWER/ENERGY SYSTEMS...vol. 4658 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2007. [45] OPENCV . OpenCV Wiki. http://opencv.willowgarage.com/wiki/. [46
Detecting Distributed SQL Injection Attacks in a Eucalyptus Cloud Environment

NASA Technical Reports Server (NTRS)

Kebert, Alan; Barnejee, Bikramjit; Solano, Juan; Solano, Wanda

2013-01-01

The cloud computing environment offers malicious users the ability to spawn multiple instances of cloud nodes that are similar to virtual machines, except that they can have separate external IP addresses. In this paper we demonstrate how this ability can be exploited by an attacker to distribute his/her attack, in particular SQL injection attacks, in such a way that an intrusion detection system (IDS) could fail to identify this attack. To demonstrate this, we set up a small private cloud, established a vulnerable website in one instance, and placed an IDS within the cloud to monitor the network traffic. We found that an attacker could quite easily defeat the IDS by periodically altering its IP address. To detect such an attacker, we propose to use multi-agent plan recognition, where the multiple source IPs are considered as different agents who are mounting a collaborative attack. We show that such a formulation of this problem yields a more sophisticated approach to detecting SQL injection attacks within a cloud computing environment.
Prototype methodology for obtaining cloud seeding guidance from HRRR model data

NASA Astrophysics Data System (ADS)

Dawson, N.; Blestrud, D.; Kunkel, M. L.; Waller, B.; Ceratto, J.

2017-12-01

Weather model data, along with real time observations, are critical to determine whether atmospheric conditions are prime for super-cooled liquid water during cloud seeding operations. Cloud seeding groups can either use operational forecast models, or run their own model on a computer cluster. A custom weather model provides the most flexibility, but is also expensive. For programs with smaller budgets, openly-available operational forecasting models are the de facto method for obtaining forecast data. The new High-Resolution Rapid Refresh (HRRR) model (3 x 3 km grid size), developed by the Earth System Research Laboratory (ESRL), provides hourly model runs with 18 forecast hours per run. While the model cannot be fine-tuned for a specific area or edited to provide cloud-seeding-specific output, model output is openly available on a near-real-time basis. This presentation focuses on a prototype methodology for using HRRR model data to create maps which aid in near-real-time cloud seeding decision making. The R programming language is utilized to run a script on a Windows® desktop/laptop computer either on a schedule (such as every half hour) or manually. The latest HRRR model run is downloaded from NOAA's Operational Model Archive and Distribution System (NOMADS). A GRIB-filter service, provided by NOMADS, is used to obtain surface and mandatory pressure level data for a subset domain which greatly cuts down on the amount of data transfer. Then, a set of criteria, identified by the Idaho Power Atmospheric Science Group, is used to create guidance maps. These criteria include atmospheric stability (lapse rates), dew point depression, air temperature, and wet bulb temperature. The maps highlight potential areas where super-cooled liquid water may exist, reasons as to why cloud seeding should not be attempted, and wind speed at flight level.
Arctic sea ice melt leads to atmospheric new particle formation.

PubMed

Dall Osto, M; Beddows, D C S; Tunved, P; Krejci, R; Ström, J; Hansson, H-C; Yoon, Y J; Park, Ki-Tae; Becagli, S; Udisti, R; Onasch, T; O Dowd, C D; Simó, R; Harrison, Roy M

2017-06-12

Atmospheric new particle formation (NPF) and growth significantly influences climate by supplying new seeds for cloud condensation and brightness. Currently, there is a lack of understanding of whether and how marine biota emissions affect aerosol-cloud-climate interactions in the Arctic. Here, the aerosol population was categorised via cluster analysis of aerosol size distributions taken at Mt Zeppelin (Svalbard) during a 11 year record. The daily temporal occurrence of NPF events likely caused by nucleation in the polar marine boundary layer was quantified annually as 18%, with a peak of 51% during summer months. Air mass trajectory analysis and atmospheric nitrogen and sulphur tracers link these frequent nucleation events to biogenic precursors released by open water and melting sea ice regions. The occurrence of such events across a full decade was anti-correlated with sea ice extent. New particles originating from open water and open pack ice increased the cloud condensation nuclei concentration background by at least ca. 20%, supporting a marine biosphere-climate link through sea ice melt and low altitude clouds that may have contributed to accelerate Arctic warming. Our results prompt a better representation of biogenic aerosol sources in Arctic climate models.
Cloud Computing Fundamentals

NASA Astrophysics Data System (ADS)

Furht, Borko

In the introductory chapter we define the concept of cloud computing and cloud services, and we introduce layers and types of cloud computing. We discuss the differences between cloud computing and cloud services. New technologies that enabled cloud computing are presented next. We also discuss cloud computing features, standards, and security issues. We introduce the key cloud computing platforms, their vendors, and their offerings. We discuss cloud computing challenges and the future of cloud computing.
Increasing the value of geospatial informatics with open approaches for Big Data

NASA Astrophysics Data System (ADS)

Percivall, G.; Bermudez, L. E.

2017-12-01

Open approaches to big data provide geoscientists with new capabilities to address problems of unmatched size and complexity. Consensus approaches for Big Geo Data have been addressed in multiple international workshops and testbeds organized by the Open Geospatial Consortium (OGC) in the past year. Participants came from government (NASA, ESA, USGS, NOAA, DOE); research (ORNL, NCSA, IU, JPL, CRIM, RENCI); industry (ESRI, Digital Globe, IBM, rasdaman); standards (JTC 1/NIST); and open source software communities. Results from the workshops and testbeds are documented in Testbed reports and a White Paper published by the OGC. The White Paper identifies the following set of use cases: Collection and Ingest: Remote sensed data processing; Data stream processing Prepare and Structure: SQL and NoSQL databases; Data linking; Feature identification Analytics and Visualization: Spatial-temporal analytics; Machine Learning; Data Exploration Modeling and Prediction: Integrated environmental models; Urban 4D models. Open implementations were developed in the Arctic Spatial Data Pilot using Discrete Global Grid Systems (DGGS) and in Testbeds using WPS and ESGF to publish climate predictions. Further development activities to advance open implementations of Big Geo Data include the following: Open Cloud Computing: Avoid vendor lock-in through API interoperability and Application portability. Open Source Extensions: Implement geospatial data representations in projects from Apache, Location Tech, and OSGeo. Investigate parallelization strategies for N-Dimensional spatial data. Geospatial Data Representations: Schemas to improve processing and analysis using geospatial concepts: Features, Coverages, DGGS. Use geospatial encodings like NetCDF and GeoPackge. Big Linked Geodata: Use linked data methods scaled to big geodata. Analysis Ready Data: Support "Download as last resort" and "Analytics as a service". Promote elements common to "datacubes."
Methodological challenges and analytic opportunities for modeling and interpreting Big Healthcare Data.

PubMed

Dinov, Ivo D

2016-01-01

Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be 'team science'.
OpenFDA: an innovative platform providing access to a wealth of FDA's publicly available data.

PubMed

Kass-Hout, Taha A; Xu, Zhiheng; Mohebbi, Matthew; Nelsen, Hans; Baker, Adam; Levine, Jonathan; Johanson, Elaine; Bright, Roselie A

2016-05-01

The objective of openFDA is to facilitate access and use of big important Food and Drug Administration public datasets by developers, researchers, and the public through harmonization of data across disparate FDA datasets provided via application programming interfaces (APIs). Using cutting-edge technologies deployed on FDA's new public cloud computing infrastructure, openFDA provides open data for easier, faster (over 300 requests per second per process), and better access to FDA datasets; open source code and documentation shared on GitHub for open community contributions of examples, apps and ideas; and infrastructure that can be adopted for other public health big data challenges. Since its launch on June 2, 2014, openFDA has developed four APIs for drug and device adverse events, recall information for all FDA-regulated products, and drug labeling. There have been more than 20 million API calls (more than half from outside the United States), 6000 registered users, 20,000 connected Internet Protocol addresses, and dozens of new software (mobile or web) apps developed. A case study demonstrates a use of openFDA data to understand an apparent association of a drug with an adverse event. With easier and faster access to these datasets, consumers worldwide can learn more about FDA-regulated products. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.
OpenFDA: an innovative platform providing access to a wealth of FDA’s publicly available data

PubMed Central

Kass-Hout, Taha A; Mohebbi, Matthew; Nelsen, Hans; Baker, Adam; Levine, Jonathan; Johanson, Elaine; Bright, Roselie A

2016-01-01

Objective The objective of openFDA is to facilitate access and use of big important Food and Drug Administration public datasets by developers, researchers, and the public through harmonization of data across disparate FDA datasets provided via application programming interfaces (APIs). Materials and Methods Using cutting-edge technologies deployed on FDA’s new public cloud computing infrastructure, openFDA provides open data for easier, faster (over 300 requests per second per process), and better access to FDA datasets; open source code and documentation shared on GitHub for open community contributions of examples, apps and ideas; and infrastructure that can be adopted for other public health big data challenges. Results Since its launch on June 2, 2014, openFDA has developed four APIs for drug and device adverse events, recall information for all FDA-regulated products, and drug labeling. There have been more than 20 million API calls (more than half from outside the United States), 6000 registered users, 20,000 connected Internet Protocol addresses, and dozens of new software (mobile or web) apps developed. A case study demonstrates a use of openFDA data to understand an apparent association of a drug with an adverse event. Conclusion With easier and faster access to these datasets, consumers worldwide can learn more about FDA-regulated products. PMID:26644398
Performance Comparison of Big Data Analytics With NEXUS and Giovanni

NASA Astrophysics Data System (ADS)

Jacob, J. C.; Huang, T.; Lynnes, C.

2016-12-01

NEXUS is an emerging data-intensive analysis framework developed with a new approach for handling science data that enables large-scale data analysis. It is available through open source. We compare performance of NEXUS and Giovanni for 3 statistics algorithms applied to NASA datasets. Giovanni is a statistics web service at NASA Distributed Active Archive Centers (DAACs). NEXUS is a cloud-computing environment developed at JPL and built on Apache Solr, Cassandra, and Spark. We compute global time-averaged map, correlation map, and area-averaged time series. The first two algorithms average over time to produce a value for each pixel in a 2-D map. The third algorithm averages spatially to produce a single value for each time step. This talk is our report on benchmark comparison findings that indicate 15x speedup with NEXUS over Giovanni to compute area-averaged time series of daily precipitation rate for the Tropical Rainfall Measuring Mission (TRMM with 0.25 degree spatial resolution) for the Continental United States over 14 years (2000-2014) with 64-way parallelism and 545 tiles per granule. 16-way parallelism with 16 tiles per granule worked best with NEXUS for computing an 18-year (1998-2015) TRMM daily precipitation global time averaged map (2.5 times speedup) and 18-year global map of correlation between TRMM daily precipitation and TRMM real time daily precipitation (7x speedup). These and other benchmark results will be presented along with key lessons learned in applying the NEXUS tiling approach to big data analytics in the cloud.
Cloudy Skies over AGN: Observations with Simbol-X

NASA Astrophysics Data System (ADS)

Salvati, M.; Risaliti, G.

2009-05-01

Recent time-resolved spectroscopic X-ray studies of bright obscured AGN show that column density variability on time scales of hours/days may be common, at least for sources with NH>1023 cm-2. This opens new oppurtunities in the analysis of the structure of the circumnuclear medium and of the X-ray source: resolving the variations due to single clouds covering/uncovering the X-ray source provides tight constraints on the source size, the clouds' size and distance, and their average number, density and column density. We show how Simbol-X will provide a breakthrough in this field, thanks to its broad band coverage, allowing (a) to precisely disentangle the continuum and NH variations, and (2) to extend the NH variability analysis to column densities >1023 cm-2.
Elastic Extension of a CMS Computing Centre Resources on External Clouds

NASA Astrophysics Data System (ADS)

Codispoti, G.; Di Maria, R.; Aiftimiei, C.; Bonacorsi, D.; Calligola, P.; Ciaschini, V.; Costantini, A.; Dal Pra, S.; DeGirolamo, D.; Grandi, C.; Michelotto, D.; Panella, M.; Peco, G.; Sapunenko, V.; Sgaravatto, M.; Taneja, S.; Zizzi, G.

2016-10-01

After the successful LHC data taking in Run-I and in view of the future runs, the LHC experiments are facing new challenges in the design and operation of the computing facilities. The computing infrastructure for Run-II is dimensioned to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, CMS - along the lines followed by other LHC experiments - is exploring the opportunity to access Cloud resources provided by external partners or commercial providers. Specific use cases have already been explored and successfully exploited during Long Shutdown 1 (LS1) and the first part of Run 2. In this work we present the proof of concept of the elastic extension of a CMS site, specifically the Bologna Tier-3, on an external OpenStack infrastructure. We focus on the “Cloud Bursting” of a CMS Grid site using a newly designed LSF configuration that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on the OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage. The amount of resources allocated thus can be elastically modeled to cope up with the needs of CMS experiment and local users. Moreover, a direct access/integration of OpenStack resources to the CMS workload management system is explored. In this paper we present this approach, we report on the performances of the on-demand allocated resources, and we discuss the lessons learned and the next steps.
Expeditionary Oblong Mezzanine

DTIC Science & Technology

2016-03-01

Operating System OSI Open Systems Interconnection OS X Operating System Ten PDU Power Distribution Unit POE Power Over Ethernet xvii SAAS ...providing infrastructure as a service (IaaS) and software as a service ( SaaS ) cloud computing technologies. IaaS is a way of providing computing services...such as servers, storage, and network equipment services (Mell & Grance, 2009). SaaS is a means of providing software and applications as an on
Remote Numerical Simulations of the Interaction of High Velocity Clouds with Random Magnetic Fields

NASA Astrophysics Data System (ADS)

Santillan, Alfredo; Hernandez--Cervantes, Liliana; Gonzalez--Ponce, Alejandro; Kim, Jongsoo

The numerical simulations associated with the interaction of High Velocity Clouds (HVC) with the Magnetized Galactic Interstellar Medium (ISM) are a powerful tool to describe the evolution of the interaction of these objects in our Galaxy. In this work we present a new project referred to as Theoretical Virtual i Observatories. It is oriented toward to perform numerical simulations in real time through a Web page. This is a powerful astrophysical computational tool that consists of an intuitive graphical user interface (GUI) and a database produced by numerical calculations. In this Website the user can make use of the existing numerical simulations from the database or run a new simulation introducing initial conditions such as temperatures, densities, velocities, and magnetic field intensities for both the ISM and HVC. The prototype is programmed using Linux, Apache, MySQL, and PHP (LAMP), based on the open source philosophy. All simulations were performed with the MHD code ZEUS-3D, which solves the ideal MHD equations by finite differences on a fixed Eulerian mesh. Finally, we present typical results that can be obtained with this tool.
The Human Toxome Collaboratorium: A Shared Environment for Multi-Omic Computational Collaboration within a Consortium.

PubMed

Fasani, Rick A; Livi, Carolina B; Choudhury, Dipanwita R; Kleensang, Andre; Bouhifd, Mounir; Pendse, Salil N; McMullen, Patrick D; Andersen, Melvin E; Hartung, Thomas; Rosenberg, Michael

2015-01-01

The Human Toxome Project is part of a long-term vision to modernize toxicity testing for the 21st century. In the initial phase of the project, a consortium of six academic, commercial, and government organizations has partnered to map pathways of toxicity, using endocrine disruption as a model hazard. Experimental data is generated at multiple sites, and analyzed using a range of computational tools. While effectively gathering, managing, and analyzing the data for high-content experiments is a challenge in its own right, doing so for a growing number of -omics technologies, with larger data sets, across multiple institutions complicates the process. Interestingly, one of the most difficult, ongoing challenges has been the computational collaboration between the geographically separate institutions. Existing solutions cannot handle the growing heterogeneous data, provide a computational environment for consistent analysis, accommodate different workflows, and adapt to the constantly evolving methods and goals of a research project. To meet the needs of the project, we have created and managed The Human Toxome Collaboratorium, a shared computational environment hosted on third-party cloud services. The Collaboratorium provides a familiar virtual desktop, with a mix of commercial, open-source, and custom-built applications. It shares some of the challenges of traditional information technology, but with unique and unexpected constraints that emerge from the cloud. Here we describe the problems we faced, the current architecture of the solution, an example of its use, the major lessons we learned, and the future potential of the concept. In particular, the Collaboratorium represents a novel distribution method that could increase the reproducibility and reusability of results from similar large, multi-omic studies.

A Development of Lightweight Grid Interface

NASA Astrophysics Data System (ADS)

Iwai, G.; Kawai, Y.; Sasaki, T.; Watase, Y.

2011-12-01

In order to help a rapid development of Grid/Cloud aware applications, we have developed API to abstract the distributed computing infrastructures based on SAGA (A Simple API for Grid Applications). SAGA, which is standardized in the OGF (Open Grid Forum), defines API specifications to access distributed computing infrastructures, such as Grid, Cloud and local computing resources. The Universal Grid API (UGAPI), which is a set of command line interfaces (CLI) and APIs, aims to offer simpler API to combine several SAGA interfaces with richer functionalities. These CLIs of the UGAPI offer typical functionalities required by end users for job management and file access to the different distributed computing infrastructures as well as local computing resources. We have also built a web interface for the particle therapy simulation and demonstrated the large scale calculation using the different infrastructures at the same time. In this paper, we would like to present how the web interface based on UGAPI and SAGA achieve more efficient utilization of computing resources over the different infrastructures with technical details and practical experiences.
Open-source meteor detection software for low-cost single-board computers

NASA Astrophysics Data System (ADS)

Vida, D.; Zubović, D.; Šegon, D.; Gural, P.; Cupec, R.

2016-01-01

This work aims to overcome the current price threshold of meteor stations which can sometimes deter meteor enthusiasts from owning one. In recent years small card-sized computers became widely available and are used for numerous applications. To utilize such computers for meteor work, software which can run on them is needed. In this paper we present a detailed description of newly-developed open-source software for fireball and meteor detection optimized for running on low-cost single board computers. Furthermore, an update on the development of automated open-source software which will handle video capture, fireball and meteor detection, astrometry and photometry is given.
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

PubMed Central

Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

2015-01-01

Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation. PMID:26501966
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.

PubMed

Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

2015-01-01

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
Considerations for Software Defined Networking (SDN): Approaches and use cases

NASA Astrophysics Data System (ADS)

Bakshi, K.

Software Defined Networking (SDN) is an evolutionary approach to network design and functionality based on the ability to programmatically modify the behavior of network devices. SDN uses user-customizable and configurable software that's independent of hardware to enable networked systems to expand data flow control. SDN is in large part about understanding and managing a network as a unified abstraction. It will make networks more flexible, dynamic, and cost-efficient, while greatly simplifying operational complexity. And this advanced solution provides several benefits including network and service customizability, configurability, improved operations, and increased performance. There are several approaches to SDN and its practical implementation. Among them, two have risen to prominence with differences in pedigree and implementation. This paper's main focus will be to define, review, and evaluate salient approaches and use cases of the OpenFlow and Virtual Network Overlay approaches to SDN. OpenFlow is a communication protocol that gives access to the forwarding plane of a network's switches and routers. The Virtual Network Overlay relies on a completely virtualized network infrastructure and services to abstract the underlying physical network, which allows the overlay to be mobile to other physical networks. This is an important requirement for cloud computing, where applications and associated network services are migrated to cloud service providers and remote data centers on the fly as resource demands dictate. The paper will discuss how and where SDN can be applied and implemented, including research and academia, virtual multitenant data center, and cloud computing applications. Specific attention will be given to the cloud computing use case, where automated provisioning and programmable overlay for scalable multi-tenancy is leveraged via the SDN approach.
Large-Scale, Multi-Sensor Atmospheric Data Fusion Using Hybrid Cloud Computing

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.

2015-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, MODIS, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. HySDS is a Hybrid-Cloud Science Data System that has been developed and applied under NASA AIST, MEaSUREs, and ACCESS grants. HySDS uses the SciFlow workflow engine to partition analysis workflows into parallel tasks (e.g. segmenting by time or space) that are pushed into a durable job queue. The tasks are "pulled" from the queue by worker Virtual Machines (VM's) and executed in an on-premise Cloud (Eucalyptus or OpenStack) or at Amazon in the public Cloud or govCloud. In this way, years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the transferred data. We are using HySDS to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a MEASURES grant. We will present the architecture of HySDS, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. Our system demonstrates how one can pull A-Train variables (Levels 2 & 3) on-demand into the Amazon Cloud, and cache only those variables that are heavily used, so that any number of compute jobs can be executed "near" the multi-sensor data. Decade-long, multi-sensor studies can be performed without pre-staging data, with the researcher paying only his own Cloud compute bill.
Transport of infrared radiation in cuboidal clouds

NASA Technical Reports Server (NTRS)

HARSHVARDHAN; Weinman, J. A.; Davies, R.

1981-01-01

The transport of infrared radiation in a single cuboidal cloud using a vertical two steam approximation was modeled. The emittance of the top face of the model cloud is always less than that for a plane parallel cloud of the same optical depth. The hemisphere flux escaping from the cloud top has a gradient from the center to the edges which brighten when the cloud is over warmer ground. Cooling rate calculations in the 8 to 13.6 micrometer region show that there is cooling from the sides of the cloud at all levels even when there is heating of the core from the ground below. The radiances exiting from model cuboidal clouds were computed by path integration over the source function obtained with the two stream approximation. It is suggested that the brightness temperature measured from finite clouds will overestimate the cloud top temperature.
Cloud storage based mobile assessment facility for patients with post-traumatic stress disorder using integrated signal processing algorithm

NASA Astrophysics Data System (ADS)

Balbin, Jessie R.; Pinugu, Jasmine Nadja J.; Basco, Abigail Joy S.; Cabanada, Myla B.; Gonzales, Patrisha Melrose V.; Marasigan, Juan Carlos C.

2017-06-01

The research aims to build a tool in assessing patients for post-traumatic stress disorder or PTSD. The parameters used are heart rate, skin conductivity, and facial gestures. Facial gestures are recorded using OpenFace, an open-source face recognition program that uses facial action units in to track facial movements. Heart rate and skin conductivity is measured through sensors operated using Raspberry Pi. Results are stored in a database for easy and quick access. Databases to be used are uploaded to a cloud platform so that doctors have direct access to the data. This research aims to analyze these parameters and give accurate assessment of the patient.
Calibration and Field Deployment of the NSF G-V VCSEL Hygrometer

NASA Astrophysics Data System (ADS)

DiGangi, J. P.; O'Brien, A.; Diao, M.; Hamm, C.; Zhang, Q.; Beaton, S. P.; Zondlo, M. A.

2012-12-01

Cloud formation and dynamics have a significant influence on the Earth's radiative forcing budget, which illustrates the importance of clouds with respect to global climate. Therefore, an accurate understanding of the microscale processes dictating cloud formation is crucial for accurate computer modeling of global climate change. A critical tool for understanding these processes from an airborne platform is an instrument capable of measuring water vapor with both high accuracy and time, thus spatial, resolution. Our work focuses on an open-path, compact, vertical-cavity surface-emitting laser (VCSEL) absorption-based hygrometer, capable of 25 Hz temporal resolution, deployed on the NSF/NCAR Gulfstream-V aircraft platform. The open path nature of our instrument also helps to minimize sampling artifacts. We will discuss our efforts toward achieving within 5% accuracy over 5 orders of magnitude of water vapor concentrations. This involves an intercomparison of five independent calibration methods: ice surface saturators using an oil temperature bath, solvent slush baths (e.g. chloroform/LN2, water/ice), a research-grade frost point hygrometer, static pressure experiments, and Pt catalyzed hydrogen gas. This wide variety of available tools allows us to accurately constrain the calibrant water vapor concentrations both before and after the VCSEL hygrometer sampling chamber. For example, the mixing ratio as measured by research-grade frost point hygrometer after the VCSEL hygrometer agreed within 2% of the mixing ration expected from the water/ice bubbler source before the VCSEL over the temperature range -50°C to 20°C. Finally, due to the compact nature of our instrument, we are able to perform these calibrations simultaneously at the same temperatures (-80°C to 30°C) and pressures (150 mbar to 760 mbar) as sampled ambient air during a flight. This higher accuracy can significantly influence the science utilizing this data, which we will illustrate using preliminary data from our most recent field deployment, the NSF Deep Convective Clouds and Chemistry Experiment in May-June 2012
Cloud based, Open Source Software Application for Mitigating Herbicide Drift

NASA Astrophysics Data System (ADS)

Saraswat, D.; Scott, B.

2014-12-01

The spread of herbicide resistant weeds has resulted in the need for clearly marked fields. In response to this need, the University of Arkansas Cooperative Extension Service launched a program named Flag the Technology in 2011. This program uses color-coded flags as a visual alert of the herbicide trait technology within a farm field. The flag based program also serves to help avoid herbicide misapplication and prevent herbicide drift damage between fields with differing crop technologies. This program has been endorsed by Southern Weed Science Society of America and is attracting interest from across the USA, Canada, and Australia. However, flags have risk of misplacement or disappearance due to mischief or severe windstorms/thunderstorms, respectively. This presentation will discuss the design and development of a cloud-based, free application utilizing open-source technologies, called Flag the Technology Cloud (FTTCloud), for allowing agricultural stakeholders to color code their farm fields for indicating herbicide resistant technologies. The developed software utilizes modern web development practices, widely used design technologies, and basic geographic information system (GIS) based interactive interfaces for representing, color-coding, searching, and visualizing fields. This program has also been made compatible for a wider usability on different size devices- smartphones, tablets, desktops and laptops.
Optical properties of potential condensates in exoplanetary atmospheres

NASA Astrophysics Data System (ADS)

Kitzmann, Daniel; Heng, Kevin

2018-03-01

The prevalence of clouds in currently observable exoplanetary atmospheres motivates the compilation and calculation of their optical properties. First, we present a new open-source Mie scattering code known as LX-MIE, which is able to consider large-size parameters (˜107) using a single computational treatment. We validate LX-MIE against the classical MIEVO code as well as previous studies. Secondly, we embark on an expanded survey of the published literature for both the real and imaginary components of the refractive indices of 32 condensate species. As much as possible, we rely on experimental measurements of the refractive indices and resort to obtaining the real from the imaginary component (or vice versa), via the Kramers-Kronig relation, only in the absence of data. We use these refractive indices as input for LX-MIE to compute the absorption, scattering and extinction efficiencies of all 32 condensate species. Finally, we use a three-parameter function to provide convenient fits to the shape of the extinction efficiency curve. We show that the errors associated with these simple fits in the Wide Field Camera 3 (WFC3), J, H, and K wavebands are ˜ 10 per cent. These fits allow for the extinction cross-section or opacity of the condensate species to be easily included in retrieval analyses of transmission spectra. We discuss prospects for future experimental work. The compilation of the optical constants and LX-MIE is publicly available as part of the open-source Exoclime Simulation Platform (http://www.exoclime.org).
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

PubMed Central

Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

2013-01-01

Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Templet Web: the use of volunteer computing approach in PaaS-style cloud

NASA Astrophysics Data System (ADS)

Vostokin, Sergei; Artamonov, Yuriy; Tsarev, Daniil

2018-03-01

This article presents the Templet Web cloud service. The service is designed for high-performance scientific computing automation. The use of high-performance technology is specifically required by new fields of computational science such as data mining, artificial intelligence, machine learning, and others. Cloud technologies provide a significant cost reduction for high-performance scientific applications. The main objectives to achieve this cost reduction in the Templet Web service design are: (a) the implementation of "on-demand" access; (b) source code deployment management; (c) high-performance computing programs development automation. The distinctive feature of the service is the approach mainly used in the field of volunteer computing, when a person who has access to a computer system delegates his access rights to the requesting user. We developed an access procedure, algorithms, and software for utilization of free computational resources of the academic cluster system in line with the methods of volunteer computing. The Templet Web service has been in operation for five years. It has been successfully used for conducting laboratory workshops and solving research problems, some of which are considered in this article. The article also provides an overview of research directions related to service development.
SU-D-BRD-02: A Web-Based Image Processing and Plan Evaluation Platform (WIPPEP) for Future Cloud-Based Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chai, X; Liu, L; Xing, L

Purpose: Visualization and processing of medical images and radiation treatment plan evaluation have traditionally been constrained to local workstations with limited computation power and ability of data sharing and software update. We present a web-based image processing and planning evaluation platform (WIPPEP) for radiotherapy applications with high efficiency, ubiquitous web access, and real-time data sharing. Methods: This software platform consists of three parts: web server, image server and computation server. Each independent server communicates with each other through HTTP requests. The web server is the key component that provides visualizations and user interface through front-end web browsers and relay informationmore » to the backend to process user requests. The image server serves as a PACS system. The computation server performs the actual image processing and dose calculation. The web server backend is developed using Java Servlets and the frontend is developed using HTML5, Javascript, and jQuery. The image server is based on open source DCME4CHEE PACS system. The computation server can be written in any programming language as long as it can send/receive HTTP requests. Our computation server was implemented in Delphi, Python and PHP, which can process data directly or via a C++ program DLL. Results: This software platform is running on a 32-core CPU server virtually hosting the web server, image server, and computation servers separately. Users can visit our internal website with Chrome browser, select a specific patient, visualize image and RT structures belonging to this patient and perform image segmentation running Delphi computation server and Monte Carlo dose calculation on Python or PHP computation server. Conclusion: We have developed a webbased image processing and plan evaluation platform prototype for radiotherapy. This system has clearly demonstrated the feasibility of performing image processing and plan evaluation platform through a web browser and exhibited potential for future cloud based radiotherapy.« less
QMachine: commodity supercomputing in web browsers.

PubMed

Wilkinson, Sean R; Almeida, Jonas S

2014-06-09

Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics' "Big Data" from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running "download and install" software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments.
Efficient Open Source Lidar for Desktop Users

NASA Astrophysics Data System (ADS)

Flanagan, Jacob P.

Lidar --- Light Detection and Ranging --- is a remote sensing technology that utilizes a device similar to a rangefinder to determine a distance to a target. A laser pulse is shot at an object and the time it takes for the pulse to return in measured. The distance to the object is easily calculated using the speed property of light. For lidar, this laser is moved (primarily in a rotational movement usually accompanied by a translational movement) and records the distances to objects several thousands of times per second. From this, a 3 dimensional structure can be procured in the form of a point cloud. A point cloud is a collection of 3 dimensional points with at least an x, a y and a z attribute. These 3 attributes represent the position of a single point in 3 dimensional space. Other attributes can be associated with the points that include properties such as the intensity of the return pulse, the color of the target or even the time the point was recorded. Another very useful, post processed attribute is point classification where a point is associated with the type of object the point represents (i.e. ground.). Lidar has gained popularity and advancements in the technology has made its collection easier and cheaper creating larger and denser datasets. The need to handle this data in a more efficiently manner has become a necessity; The processing, visualizing or even simply loading lidar can be computationally intensive due to its very large size. Standard remote sensing and geographical information systems (GIS) software (ENVI, ArcGIS, etc.) was not originally built for optimized point cloud processing and its implementation is an afterthought and therefore inefficient. Newer, more optimized software for point cloud processing (QTModeler, TopoDOT, etc.) usually lack more advanced processing tools, requires higher end computers and are very costly. Existing open source lidar approaches the loading and processing of lidar in an iterative fashion that requires implementing batch coding and processing time that could take months for a standard lidar dataset. This project attempts to build a software with the best approach for creating, importing and exporting, manipulating and processing lidar, especially in the environmental field. Development of this software is described in 3 sections - (1) explanation of the search methods for efficiently extracting the "area of interest" (AOI) data from disk (file space), (2) using file space (for storage), budgeting memory space (for efficient processing) and moving between the two, and (3) method development for creating lidar products (usually raster based) used in environmental modeling and analysis (i.e.: hydrology feature extraction, geomorphological studies, ecology modeling, etc.).
Status and Roadmap of CernVM

NASA Astrophysics Data System (ADS)

Berzano, D.; Blomer, J.; Buncic, P.; Charalampidis, I.; Ganis, G.; Meusel, R.

2015-12-01

Cloud resources nowadays contribute an essential share of resources for computing in high-energy physics. Such resources can be either provided by private or public IaaS clouds (e.g. OpenStack, Amazon EC2, Google Compute Engine) or by volunteers computers (e.g. LHC@Home 2.0). In any case, experiments need to prepare a virtual machine image that provides the execution environment for the physics application at hand. The CernVM virtual machine since version 3 is a minimal and versatile virtual machine image capable of booting different operating systems. The virtual machine image is less than 20 megabyte in size. The actual operating system is delivered on demand by the CernVM File System. CernVM 3 has matured from a prototype to a production environment. It is used, for instance, to run LHC applications in the cloud, to tune event generators using a network of volunteer computers, and as a container for the historic Scientific Linux 5 and Scientific Linux 4 based software environments in the course of long-term data preservation efforts of the ALICE, CMS, and ALEPH experiments. We present experience and lessons learned from the use of CernVM at scale. We also provide an outlook on the upcoming developments. These developments include adding support for Scientific Linux 7, the use of container virtualization, such as provided by Docker, and the streamlining of virtual machine contextualization towards the cloud-init industry standard.
Construction of a Digital Learning Environment Based on Cloud Computing

ERIC Educational Resources Information Center

Ding, Jihong; Xiong, Caiping; Liu, Huazhong

2015-01-01

Constructing the digital learning environment for ubiquitous learning and asynchronous distributed learning has opened up immense amounts of concrete research. However, current digital learning environments do not fully fulfill the expectations on supporting interactive group learning, shared understanding and social construction of knowledge.…
The EarthServer Federation: State, Role, and Contribution to GEOSS

NASA Astrophysics Data System (ADS)

Merticariu, Vlad; Baumann, Peter

2016-04-01

The intercontinental EarthServer initiative has established a European datacube platform with proven scalability: known databases exceed 100 TB, and single queries have been split across more than 1,000 cloud nodes. Its service interface being rigorously based on the OGC "Big Geo Data" standards, Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS), a series of clients can dock into the services, ranging from open-source OpenLayers and QGIS over open-source NASA WorldWind to proprietary ESRI ArcGIS. Datacube fusion in a "mix and match" style is supported by the platform technolgy, the rasdaman Array Database System, which transparently federates queries so that users simply approach any node of the federation to access any data item, internally optimized for minimal data transfer. Notably, rasdaman is part of GEOSS GCI. NASA is contributing its Web WorldWind virtual globe for user-friendly data extraction, navigation, and analysis. Integrated datacube / metadata queries are contributed by CITE. Current federation members include ESA (managed by MEEO sr.l.), Plymouth Marine Laboratory (PML), the European Centre for Medium-Range Weather Forecast (ECMWF), Australia's National Computational Infrastructure, and Jacobs University (adding in Planetary Science). Further data centers have expressed interest in joining. We present the EarthServer approach, discuss its underlying technology, and illustrate the contribution this datacube platform can make to GEOSS.
Dusty Donuts: Modeling the Reverberation Response of the Circumnuclear Dusty Torus Emission in AGN

NASA Astrophysics Data System (ADS)

Almeyda, Triana R.

The obscuring circumnuclear torus of dusty molecular gas is one of the major components of AGN (active galactic nuclei), yet its size, composition, and structure are not well understood. These properties can be studied by analyzing the temporal variations of the infrared (IR) dust emission from the torus in response to variations in the AGN continuum luminosity; a technique known as reverberation mapping. In a recent international campaign 12 AGN were monitored using the Spitzer Space Telescope and several ground-based telescopes, providing a unique set of well-sampled mid-IR and optical light curves which are required in order to determine the approximate sizes of the tori in these AGN. To help extract structural information contained in the data a computer model, TORMAC, has been developed that simulates the reverberation response of the clumpy torus emission. Given an input optical light curve, the code computes the emission of a 3D ensemble of dust clouds as a function of time at selected IR wavelengths, taking into account light travel delays. A large library of torus reverberation response simulations has been constructed, to investigate the effects of various geometrical and structural properties such as inclination, cloud distribution, disk half-opening angle, and radial depth. The effects of dust cloud orientation, cloud optical depth, anisotropy of the illuminating AGN radiation field, dust cloud shadowing, and cloud occultation are also explored in detail. TORMAC was also used to generate synthetic IR light curves for the Seyfert 1 galaxy, NGC 6418, using the observed optical light curve as the input, to investigate how the torus and dust cloud properties incorporated in the code affect the results obtained from reverberation mapping. This dissertation presents the most comprehensive investigation to date showing that radiative transfer effects within the torus and anisotropic illumination of the torus can strongly influence the torus IR response at different wavelengths, and should be accounted for when interpreting reverberation mapping data. TORMAC provides a powerful modeling tool that can generate simulated IR light curves for direct comparison to observations. As many types of astronomical sources are both variable and embedded in, or surrounded, by dust, TORMAC also has applications for dust reverberation studies well beyond the AGN observed in the Spitzer monitoring campaign.

Lowering the barriers to computational modeling of Earth's surface: coupling Jupyter Notebooks with Landlab, HydroShare, and CyberGIS for research and education.

NASA Astrophysics Data System (ADS)

Bandaragoda, C.; Castronova, A. M.; Phuong, J.; Istanbulluoglu, E.; Strauch, R. L.; Nudurupati, S. S.; Tarboton, D. G.; Wang, S. W.; Yin, D.; Barnhart, K. R.; Tucker, G. E.; Hutton, E.; Hobley, D. E. J.; Gasparini, N. M.; Adams, J. M.

2017-12-01

The ability to test hypotheses about hydrology, geomorphology and atmospheric processes is invaluable to research in the era of big data. Although community resources are available, there remain significant educational, logistical and time investment barriers to their use. Knowledge infrastructure is an emerging intellectual framework to understand how people are creating, sharing and distributing knowledge - which has been dramatically transformed by Internet technologies. In addition to the technical and social components in a cyberinfrastructure system, knowledge infrastructure considers educational, institutional, and open source governance components required to advance knowledge. We are designing an infrastructure environment that lowers common barriers to reproducing modeling experiments for earth surface investigation. Landlab is an open-source modeling toolkit for building, coupling, and exploring two-dimensional numerical models. HydroShare is an online collaborative environment for sharing hydrologic data and models. CyberGIS-Jupyter is an innovative cyberGIS framework for achieving data-intensive, reproducible, and scalable geospatial analytics using the Jupyter Notebook based on ROGER - the first cyberGIS supercomputer, so that models that can be elastically reproduced through cloud computing approaches. Our team of geomorphologists, hydrologists, and computer geoscientists has created a new infrastructure environment that combines these three pieces of software to enable knowledge discovery. Through this novel integration, any user can interactively execute and explore their shared data and model resources. Landlab on HydroShare with CyberGIS-Jupyter supports the modeling continuum from fully developed modelling applications, prototyping new science tools, hands on research demonstrations for training workshops, and classroom applications. Computational geospatial models based on big data and high performance computing can now be more efficiently developed, improved, scaled, and seamlessly reproduced among multidisciplinary users, thereby expanding the active learning curriculum and research opportunities for students in earth surface modeling and informatics.
Distributed chemical computing using ChemStar: an open source java remote method invocation architecture applied to large scale molecular data from PubChem.

PubMed

Karthikeyan, M; Krishnan, S; Pandey, Anil Kumar; Bender, Andreas; Tropsha, Alexander

2008-04-01

We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its "plug-in architecture". The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
Development of a Cloud Resolving Model for Heterogeneous Supercomputers

NASA Astrophysics Data System (ADS)

Sreepathi, S.; Norman, M. R.; Pal, A.; Hannah, W.; Ponder, C.

2017-12-01

A cloud resolving climate model is needed to reduce major systematic errors in climate simulations due to structural uncertainty in numerical treatments of convection - such as convective storm systems. This research describes the porting effort to enable SAM (System for Atmosphere Modeling) cloud resolving model on heterogeneous supercomputers using GPUs (Graphical Processing Units). We have isolated a standalone configuration of SAM that is targeted to be integrated into the DOE ACME (Accelerated Climate Modeling for Energy) Earth System model. We have identified key computational kernels from the model and offloaded them to a GPU using the OpenACC programming model. Furthermore, we are investigating various optimization strategies intended to enhance GPU utilization including loop fusion/fission, coalesced data access and loop refactoring to a higher abstraction level. We will present early performance results, lessons learned as well as optimization strategies. The computational platform used in this study is the Summitdev system, an early testbed that is one generation removed from Summit, the next leadership class supercomputer at Oak Ridge National Laboratory. The system contains 54 nodes wherein each node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. This work is part of a larger project, ACME-MMF component of the U.S. Department of Energy(DOE) Exascale Computing Project. The ACME-MMF approach addresses structural uncertainty in cloud processes by replacing traditional parameterizations with cloud resolving "superparameterization" within each grid cell of global climate model. Super-parameterization dramatically increases arithmetic intensity, making the MMF approach an ideal strategy to achieve good performance on emerging exascale computing architectures. The goal of the project is to integrate superparameterization into ACME, and explore its full potential to scientifically and computationally advance climate simulation and prediction.
From One Pixel to One Earth: Building a Living Atlas in the Cloud to Analyze and Monitor Global Patterns

NASA Astrophysics Data System (ADS)

Moody, D.; Brumby, S. P.; Chartrand, R.; Franco, E.; Keisler, R.; Kelton, T.; Kontgis, C.; Mathis, M.; Raleigh, D.; Rudelis, X.; Skillman, S.; Warren, M. S.; Longbotham, N.

2016-12-01

The recent computing performance revolution has driven improvements in sensor, communication, and storage technology. Historical, multi-decadal remote sensing datasets at the petabyte scale are now available in commercial clouds, with new satellite constellations generating petabytes per year of high-resolution imagery with daily global coverage. Cloud computing and storage, combined with recent advances in machine learning and open software, are enabling understanding of the world at an unprecedented scale and detail. We have assembled all available satellite imagery from the USGS Landsat, NASA MODIS, and ESA Sentinel programs, as well as commercial PlanetScope and RapidEye imagery, and have analyzed over 2.8 quadrillion multispectral pixels. We leveraged the commercial cloud to generate a tiled, spatio-temporal mosaic of the Earth for fast iteration and development of new algorithms combining analysis techniques from remote sensing, machine learning, and scalable compute infrastructure. Our data platform enables processing at petabytes per day rates using multi-source data to produce calibrated, georeferenced imagery stacks at desired points in time and space that can be used for pixel level or global scale analysis. We demonstrate our data platform capability by using the European Space Agency's (ESA) published 2006 and 2009 GlobCover 20+ category label maps to train and test a Land Cover Land Use (LCLU) classifier, and generate current self-consistent LCLU maps in Brazil. We train a standard classifier on 2006 GlobCover categories using temporal imagery stacks, and we validate our results on co-registered 2009 Globcover LCLU maps and 2009 imagery. We then extend the derived LCLU model to current imagery stacks to generate an updated, in-season label map. Changes in LCLU labels can now be seamlessly monitored for a given location across the years in order to track, for example, cropland expansion, forest growth, and urban developments. An example of change monitoring is illustrated in the included figure showing rainfed cropland change in the Mato Grosso region of Brazil between 2006 and 2009.
Cloud-based opportunities in scientific computing: insights from processing Suomi National Polar-Orbiting Partnership (S-NPP) Direct Broadcast data

NASA Astrophysics Data System (ADS)

Evans, J. D.; Hao, W.; Chettri, S.

2013-12-01

The cloud is proving to be a uniquely promising platform for scientific computing. Our experience with processing satellite data using Amazon Web Services highlights several opportunities for enhanced performance, flexibility, and cost effectiveness in the cloud relative to traditional computing -- for example: - Direct readout from a polar-orbiting satellite such as the Suomi National Polar-Orbiting Partnership (S-NPP) requires bursts of processing a few times a day, separated by quiet periods when the satellite is out of receiving range. In the cloud, by starting and stopping virtual machines in minutes, we can marshal significant computing resources quickly when needed, but not pay for them when not needed. To take advantage of this capability, we are automating a data-driven approach to the management of cloud computing resources, in which new data availability triggers the creation of new virtual machines (of variable size and processing power) which last only until the processing workflow is complete. - 'Spot instances' are virtual machines that run as long as one's asking price is higher than the provider's variable spot price. Spot instances can greatly reduce the cost of computing -- for software systems that are engineered to withstand unpredictable interruptions in service (as occurs when a spot price exceeds the asking price). We are implementing an approach to workflow management that allows data processing workflows to resume with minimal delays after temporary spot price spikes. This will allow systems to take full advantage of variably-priced 'utility computing.' - Thanks to virtual machine images, we can easily launch multiple, identical machines differentiated only by 'user data' containing individualized instructions (e.g., to fetch particular datasets or to perform certain workflows or algorithms) This is particularly useful when (as is the case with S-NPP data) we need to launch many very similar machines to process an unpredictable number of data files concurrently. Our experience shows the viability and flexibility of this approach to workflow management for scientific data processing. - Finally, cloud computing is a promising platform for distributed volunteer ('interstitial') computing, via mechanisms such as the Berkeley Open Infrastructure for Network Computing (BOINC) popularized with the SETI@Home project and others such as ClimatePrediction.net and NASA's Climate@Home. Interstitial computing faces significant challenges as commodity computing shifts from (always on) desktop computers towards smartphones and tablets (untethered and running on scarce battery power); but cloud computing offers significant slack capacity. This capacity includes virtual machines with unused RAM or underused CPUs; virtual storage volumes allocated (& paid for) but not full; and virtual machines that are paid up for the current hour but whose work is complete. We are devising ways to facilitate the reuse of these resources (i.e., cloud-based interstitial computing) for satellite data processing and related analyses. We will present our findings and research directions on these and related topics.
McIDAS-V: A Data Analysis and Visualization Tool for Global Satellite Data

NASA Astrophysics Data System (ADS)

Achtor, T. H.; Rink, T. D.

2011-12-01

The Man-computer Interactive Data Access System (McIDAS-V) is a java-based, open-source, freely available system for scientists, researchers and algorithm developers working with atmospheric data. The McIDAS-V software tools provide powerful new data manipulation and visualization capabilities, including 4-dimensional displays, an abstract data model with integrated metadata, user defined computation, and a powerful scripting capability. As such, McIDAS-V is a valuable tool for scientists and researchers within the GEO and GOESS domains. The advancing polar and geostationary orbit environmental satellite missions conducted by several countries will carry advanced instrumentation and systems that will collect and distribute land, ocean, and atmosphere data. These systems provide atmospheric and sea surface temperatures, humidity sounding, cloud and aerosol properties, and numerous other environmental products. This presentation will display and demonstrate some of the capabilities of McIDAS-V to analyze and display high temporal and spectral resolution data using examples from international environmental satellites.
JobCenter: an open source, cross-platform, and distributed job queue management system optimized for scalability and versatility.

PubMed

Jaschob, Daniel; Riffle, Michael

2012-07-30

Laboratories engaged in computational biology or bioinformatics frequently need to run lengthy, multistep, and user-driven computational jobs. Each job can tie up a computer for a few minutes to several days, and many laboratories lack the expertise or resources to build and maintain a dedicated computer cluster. JobCenter is a client-server application and framework for job management and distributed job execution. The client and server components are both written in Java and are cross-platform and relatively easy to install. All communication with the server is client-driven, which allows worker nodes to run anywhere (even behind external firewalls or "in the cloud") and provides inherent load balancing. Adding a worker node to the worker pool is as simple as dropping the JobCenter client files onto any computer and performing basic configuration, which provides tremendous ease-of-use, flexibility, and limitless horizontal scalability. Each worker installation may be independently configured, including the types of jobs it is able to run. Executed jobs may be written in any language and may include multistep workflows. JobCenter is a versatile and scalable distributed job management system that allows laboratories to very efficiently distribute all computational work among available resources. JobCenter is freely available at http://code.google.com/p/jobcenter/.
Heterogeneous access and processing of EO-Data on a Cloud based Infrastructure delivering operational Products

NASA Astrophysics Data System (ADS)

Niggemann, F.; Appel, F.; Bach, H.; de la Mar, J.; Schirpke, B.; Dutting, K.; Rucker, G.; Leimbach, D.

2015-04-01

To address the challenges of effective data handling faced by Small and Medium Sized Enterprises (SMEs) a cloud-based infrastructure for accessing and processing of Earth Observation(EO)-data has been developed within the project APPS4GMES(www.apps4gmes.de). To gain homogenous multi mission data access an Input Data Portal (IDP) been implemented on this infrastructure. The IDP consists of an Open Geospatial Consortium (OGC) conformant catalogue, a consolidation module for format conversion and an OGC-conformant ordering framework. Metadata of various EO-sources and with different standards is harvested and transferred to an OGC conformant Earth Observation Product standard and inserted into the catalogue by a Metadata Harvester. The IDP can be accessed for search and ordering of the harvested datasets by the services implemented on the cloud infrastructure. Different land-surface services have been realised by the project partners, using the implemented IDP and cloud infrastructure. Results of these are customer ready products, as well as pre-products (e.g. atmospheric corrected EO data), serving as a basis for other services. Within the IDP an automated access to ESA's Sentinel-1 Scientific Data Hub has been implemented. Searching and downloading of the SAR data can be performed in an automated way. With the implementation of the Sentinel-1 Toolbox and own software, for processing of the datasets for further use, for example for Vista's snow monitoring, delivering input for the flood forecast services, can also be performed in an automated way. For performance tests of the cloud environment a sophisticated model based atmospheric correction and pre-classification service has been implemented. Tests conducted an automated synchronised processing of one entire Landsat 8 (LS-8) coverage for Germany and performance comparisons to standard desktop systems. Results of these tests, showing a performance improvement by the factor of six, proved the high flexibility and computing power of the cloud environment. To make full use of the cloud capabilities a possibility for automated upscaling of the hardware resources has been implemented. Together with the IDP infrastructure fast and automated processing of various satellite sources to deliver market ready products can be realised, thus increasing customer needs and numbers can be satisfied without loss of accuracy and quality.
Cloud-based calculators for fast and reliable access to NOAA's geomagnetic field models

NASA Astrophysics Data System (ADS)

Woods, A.; Nair, M. C.; Boneh, N.; Chulliat, A.

2017-12-01

While the Global Positioning System (GPS) provides accurate point locations, it does not provide pointing directions. Therefore, the absolute directional information provided by the Earth's magnetic field is of primary importance for navigation and for the pointing of technical devices such as aircrafts, satellites and lately, mobile phones. The major magnetic sources that affect compass-based navigation are the Earth's core, its magnetized crust and the electric currents in the ionosphere and magnetosphere. NOAA/CIRES Geomagnetism (ngdc.noaa.gov/geomag/) group develops and distributes models that describe all these important sources to aid navigation. Our geomagnetic models are used in variety of platforms including airplanes, ships, submarines and smartphones. While the magnetic field from Earth's core can be described in relatively fewer parameters and is suitable for offline computation, the magnetic sources from Earth's crust, ionosphere and magnetosphere require either significant computational resources or real-time capabilities and are not suitable for offline calculation. This is especially important for small navigational devices or embedded systems, where computational resources are limited. Recognizing the need for a fast and reliable access to our geomagnetic field models, we developed cloud-based application program interfaces (APIs) for NOAA's ionospheric and magnetospheric magnetic field models. In this paper we will describe the need for reliable magnetic calculators, the challenges faced in running geomagnetic field models in the cloud in real-time and the feedback from our user community. We discuss lessons learned harvesting and validating the data which powers our cloud services, as well as our strategies for maintaining near real-time service, including load-balancing, real-time monitoring, and instance cloning. We will also briefly talk about the progress we achieved on NOAA's Big Earth Data Initiative (BEDI) funded project to develop API interface to our Enhanced Magnetic Model (EMM).
Large Spatial Scale Ground Displacement Mapping through the P-SBAS Processing of Sentinel-1 Data on a Cloud Computing Environment

NASA Astrophysics Data System (ADS)

Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.

2017-12-01

Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of exploiting large computational and storage resources of Cloud Computing platforms for large scale DInSAR analysis. The presented Cloud Computing P-SBAS processing chain can be a precious tool in the perspective of developing operational services disposable for the EO scientific community related to hazard monitoring and risk prevention and mitigation.
Progress on the Fabric for Frontier Experiments Project at Fermilab

NASA Astrophysics Data System (ADS)

Box, Dennis; Boyd, Joseph; Dykstra, Dave; Garzoglio, Gabriele; Herner, Kenneth; Kirby, Michael; Kreymer, Arthur; Levshina, Tanya; Mhashilkar, Parag; Sharma, Neha

2015-12-01

The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.
Cloud Computing

DTIC Science & Technology

2010-04-29

Cloud Computing The answer, my friend, is blowing in the wind. The answer is blowing in the wind. 1Bingue ‐ Cook Cloud Computing STSC 2010... Cloud Computing STSC 2010 Objectives • Define the cloud • Risks of cloud computing f l d i• Essence o c ou comput ng • Deployed clouds in DoD 3Bingue...Cook Cloud Computing STSC 2010 Definitions of Cloud Computing Cloud computing is a model for enabling b d d ku
Use of Open Standards and Technologies at the Lunar Mapping and Modeling Project

NASA Astrophysics Data System (ADS)

Law, E.; Malhotra, S.; Bui, B.; Chang, G.; Goodale, C. E.; Ramirez, P.; Kim, R. M.; Sadaqathulla, S.; Rodriguez, L.

2011-12-01

The Lunar Mapping and Modeling Project (LMMP), led by the Marshall Space Flight center (MSFC), is tasked by NASA. The project is responsible for the development of an information system to support lunar exploration activities. It provides lunar explorers a set of tools and lunar map and model products that are predominantly derived from present lunar missions (e.g., the Lunar Reconnaissance Orbiter (LRO)) and from historical missions (e.g., Apollo). At Jet Propulsion Laboratory (JPL), we have built the LMMP interoperable geospatial information system's underlying infrastructure and a single point of entry - the LMMP Portal by employing a number of open standards and technologies. The Portal exposes a set of services to users to allow search, visualization, subset, and download of lunar data managed by the system. Users also have access to a set of tools that visualize, analyze and annotate the data. The infrastructure and Portal are based on web service oriented architecture. We designed the system to support solar system bodies in general including asteroids, earth and planets. We employed a combination of custom software, commercial and open-source components, off-the-shelf hardware and pay-by-use cloud computing services. The use of open standards and web service interfaces facilitate platform and application independent access to the services and data, offering for instances, iPad and Android mobile applications and large screen multi-touch with 3-D terrain viewing functions, for a rich browsing and analysis experience from a variety of platforms. The web services made use of open standards including: Representational State Transfer (REST); and Open Geospatial Consortium (OGC)'s Web Map Service (WMS), Web Coverage Service (WCS), Web Feature Service (WFS). Its data management services have been built on top of a set of open technologies including: Object Oriented Data Technology (OODT) - open source data catalog, archive, file management, data grid framework; openSSO - open source access management and federation platform; solr - open source enterprise search platform; redmine - open source project collaboration and management framework; GDAL - open source geospatial data abstraction library; and others. Its data products are compliant with Federal Geographic Data Committee (FGDC) metadata standard. This standardization allows users to access the data products via custom written applications or off-the-shelf applications such as GoogleEarth. We will demonstrate this ready-to-use system for data discovery and visualization by walking through the data services provided through the portal such as browse, search, and other tools. We will further demonstrate image viewing and layering of lunar map images from the Internet, via mobile devices such as Apple's iPad.
GC31G-1182: Opennex, a Private-Public Partnership in Support of the National Climate Assessment

NASA Technical Reports Server (NTRS)

Nemani, Ramakrishna R.; Wang, Weile; Michaelis, Andrew; Votava, Petr; Ganguly, Sangram

2016-01-01

The NASA Earth Exchange (NEX) is a collaborative computing platform that has been developed with the objective of bringing scientists together with the software tools, massive global datasets, and supercomputing resources necessary to accelerate research in Earth systems science and global change. NEX is funded as an enabling tool for sustaining the national climate assessment. Over the past five years, researchers have used the NEX platform and produced a number of data sets highly relevant to the National Climate Assessment. These include high-resolution climate projections using different downscaling techniques and trends in historical climate from satellite data. To enable a broader community in exploiting the above datasets, the NEX team partnered with public cloud providers to create the OpenNEX platform. OpenNEX provides ready access to NEX data holdings on a number of public cloud platforms along with pertinent analysis tools and workflows in the form of Machine Images and Docker Containers, lectures and tutorials by experts. We will showcase some of the applications of OpenNEX data and tools by the community on Amazon Web Services, Google Cloud and the NEX Sandbox.
Secure Dynamic access control scheme of PHR in cloud computing.

PubMed

Chen, Tzer-Shyong; Liu, Chia-Hui; Chen, Tzer-Long; Chen, Chin-Sheng; Bau, Jian-Guo; Lin, Tzu-Ching

2012-12-01

With the development of information technology and medical technology, medical information has been developed from traditional paper records into electronic medical records, which have now been widely applied. The new-style medical information exchange system "personal health records (PHR)" is gradually developed. PHR is a kind of health records maintained and recorded by individuals. An ideal personal health record could integrate personal medical information from different sources and provide complete and correct personal health and medical summary through the Internet or portable media under the requirements of security and privacy. A lot of personal health records are being utilized. The patient-centered PHR information exchange system allows the public autonomously maintain and manage personal health records. Such management is convenient for storing, accessing, and sharing personal medical records. With the emergence of Cloud computing, PHR service has been transferred to storing data into Cloud servers that the resources could be flexibly utilized and the operation cost can be reduced. Nevertheless, patients would face privacy problem when storing PHR data into Cloud. Besides, it requires a secure protection scheme to encrypt the medical records of each patient for storing PHR into Cloud server. In the encryption process, it would be a challenge to achieve accurately accessing to medical records and corresponding to flexibility and efficiency. A new PHR access control scheme under Cloud computing environments is proposed in this study. With Lagrange interpolation polynomial to establish a secure and effective PHR information access scheme, it allows to accurately access to PHR with security and is suitable for enormous multi-users. Moreover, this scheme also dynamically supports multi-users in Cloud computing environments with personal privacy and offers legal authorities to access to PHR. From security and effectiveness analyses, the proposed PHR access scheme in Cloud computing environments is proven flexible and secure and could effectively correspond to real-time appending and deleting user access authorization and appending and revising PHR records.
Coal and Open-pit surface mining impacts on American Lands (COAL)

NASA Astrophysics Data System (ADS)

Brown, T. A.; McGibbney, L. J.

2017-12-01

Mining is known to cause environmental degradation, but software tools to identify its impacts are lacking. However, remote sensing, spectral reflectance, and geographic data are readily available, and high-performance cloud computing resources exist for scientific research. Coal and Open-pit surface mining impacts on American Lands (COAL) provides a suite of algorithms and documentation to leverage these data and resources to identify evidence of mining and correlate it with environmental impacts over time.COAL was originally developed as a 2016 - 2017 senior capstone collaboration between scientists at the NASA Jet Propulsion Laboratory (JPL) and computer science students at Oregon State University (OSU). The COAL team implemented a free and open-source software library called "pycoal" in the Python programming language which facilitated a case study of the effects of coal mining on water resources. Evidence of acid mine drainage associated with an open-pit coal mine in New Mexico was derived by correlating imaging spectrometer data from the JPL Airborne Visible/InfraRed Imaging Spectrometer - Next Generation (AVIRIS-NG), spectral reflectance data published by the USGS Spectroscopy Laboratory in the USGS Digital Spectral Library 06, and GIS hydrography data published by the USGS National Geospatial Program in The National Map. This case study indicated that the spectral and geospatial algorithms developed by COAL can be used successfully to analyze the environmental impacts of mining activities.Continued development of COAL has been promoted by a Startup allocation award of high-performance computing resources from the Extreme Science and Engineering Discovery Environment (XSEDE). These resources allow the team to undertake further benchmarking, evaluation, and experimentation using multiple XSEDE resources. The opportunity to use computational infrastructure of this caliber will further enable the development of a science gateway to continue foundational COAL research.This work documents the original design and development of COAL and provides insight into continuing research efforts which have potential applications beyond the project to environmental data science and other fields.
Geospatial Data as a Service: The GEOGLAM Rangelands and Pasture Productivity Map Experience

NASA Astrophysics Data System (ADS)

Evans, B. J. K.; Antony, J.; Guerschman, J. P.; Larraondo, P. R.; Richards, C. J.

2017-12-01

Empowering end-users like pastoralists, land management specialists and land policy makers in the use of earth observation data for both day-to-day and seasonal planning needs both interactive delivery of multiple geospatial datasets and the capability of supporting on-the-fly dynamic queries while simultaneously fostering a community around the effort. The use of and wide adoption of large data archives, like those produced by earth observation missions, are often limited by compute and storage capabilities of the remote user. We demonstrate that wide-scale use of large data archives can be facilitated by end-users dynamically requesting value-added products using open standards (WCS, WMS, WPS), with compute running in the cloud or dedicated data-centres and visualizing outputs on web-front ends. As an example, we will demonstrate how a tool called GSKY can empower a remote end-user by providing the data delivery and analytics capabilities for the GEOGLAM Rangelands and Pasture Productivity (RAPP) Map tool. The GEOGLAM RAPP initiative from the Group on Earth Observations (GEO) and its Agricultural Monitoring subgroup aims at providing practical tools to end-users focusing on the important role of rangelands and pasture systems in providing food production security from both agricultural crops and animal protein. Figure 1, is a screen capture from the RAPP Map interface for an important pasture area in the Namibian rangelands. The RAPP Map has been in production for six months and has garnered significant interest from groups and users all over the world. GSKY, being formulated around the theme of Open Geospatial Data-as-a-Service capabilities uses distributed computing and storage to facilitate this. It works behind the scenes, accepting OGC standard requests in WCS, WMS and WPS. Results from these requests are rendered on a web-front end. In this way, the complexities of data locality and compute execution are masked from an end user. On-the-fly computation of products such as NDVI, Leaf Area Index, vegetation cover and others from original source data including MODIS are achived, with Landsat and Sentinel-2 on the horizon. Innovative use of cloud computing and storage along with flexible front-ends, allow the democratization of data dissemination and we hope better outcomes for the planet.
Low Cloud Feedback to Surface Warming in the World's First Global Climate Model with Explicit Embedded Boundary Layer Turbulence

NASA Astrophysics Data System (ADS)

Parishani, H.; Pritchard, M. S.; Bretherton, C. S.; Wyant, M. C.; Khairoutdinov, M.; Singh, B.

2017-12-01

Biases and parameterization formulation uncertainties in the representation of boundary layer clouds remain a leading source of possible systematic error in climate projections. Here we show the first results of cloud feedback to +4K SST warming in a new experimental climate model, the ``Ultra-Parameterized (UP)'' Community Atmosphere Model, UPCAM. We have developed UPCAM as an unusually high-resolution implementation of cloud superparameterization (SP) in which a global set of cloud resolving arrays is embedded in a host global climate model. In UP, the cloud-resolving scale includes sufficient internal resolution to explicitly generate the turbulent eddies that form marine stratocumulus and trade cumulus clouds. This is computationally costly but complements other available approaches for studying low clouds and their climate interaction, by avoiding parameterization of the relevant scales. In a recent publication we have shown that UP, while not without its own complexity trade-offs, can produce encouraging improvements in low cloud climatology in multi-month simulations of the present climate and is a promising target for exascale computing (Parishani et al. 2017). Here we show results of its low cloud feedback to warming in multi-year simulations for the first time. References: Parishani, H., M. S. Pritchard, C. S. Bretherton, M. C. Wyant, and M. Khairoutdinov (2017), Toward low-cloud-permitting cloud superparameterization with explicit boundary layer turbulence, J. Adv. Model. Earth Syst., 9, doi:10.1002/2017MS000968.
Stereo Vision Inside Tire

DTIC Science & Technology

2015-08-21

using the Open Computer Vision ( OpenCV ) libraries [6] for computer vision and the Qt library [7] for the user interface. The software has the...depth. The software application calibrates the cameras using the plane based calibration model from the OpenCV calib3D module and allows the...6] OpenCV . 2015. OpenCV Open Source Computer Vision. [Online]. Available at: opencv.org [Accessed]: 09/01/2015. [7] Qt. 2015. Qt Project home
Open source molecular modeling.

PubMed

Pirhadi, Somayeh; Sunseri, Jocelyn; Koes, David Ryan

2016-09-01

The success of molecular modeling and computational chemistry efforts are, by definition, dependent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry. An updated online version of this catalog can be found at https://opensourcemolecularmodeling.github.io. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

PubMed

Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

2018-01-01

Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
Implementation and use of a highly available and innovative IaaS solution: the Cloud Area Padovana

NASA Astrophysics Data System (ADS)

Aiftimiei, C.; Andreetto, P.; Bertocco, S.; Biasotto, M.; Dal Pra, S.; Costa, F.; Crescente, A.; Dorigo, A.; Fantinel, S.; Fanzago, F.; Frizziero, E.; Gulmini, M.; Michelotto, M.; Sgaravatto, M.; Traldi, S.; Venaruzzo, M.; Verlato, M.; Zangrando, L.

2015-12-01

While in the business world the cloud paradigm is typically implemented purchasing resources and services from third party providers (e.g. Amazon), in the scientific environment there's usually the need of on-premises IaaS infrastructures which allow efficient usage of the hardware distributed among (and owned by) different scientific administrative domains. In addition, the requirement of open source adoption has led to the choice of products like OpenStack by many organizations. We describe a use case of the Italian National Institute for Nuclear Physics (INFN) which resulted in the implementation of a unique cloud service, called ’Cloud Area Padovana’, which encompasses resources spread over two different sites: the INFN Legnaro National Laboratories and the INFN Padova division. We describe how this IaaS has been implemented, which technologies have been adopted and how services have been configured in high-availability (HA) mode. We also discuss how identity and authorization management were implemented, adopting a widely accepted standard architecture based on SAML2 and OpenID: by leveraging the versatility of those standards the integration with authentication federations like IDEM was implemented. We also discuss some other innovative developments, such as a pluggable scheduler, implemented as an extension of the native OpenStack scheduler, which allows the allocation of resources according to a fair-share based model and which provides a persistent queuing mechanism for handling user requests that can not be immediately served. Tools, technologies, procedures used to install, configure, monitor, operate this cloud service are also discussed. Finally we present some examples that show how this IaaS infrastructure is being used.
Exploring the Earth Using Deep Learning Techniques

NASA Astrophysics Data System (ADS)

Larraondo, P. R.; Evans, B. J. K.; Antony, J.

2016-12-01

Research using deep neural networks have significantly matured in recent times, and there is now a surge in interest to apply such methods to Earth systems science and the geosciences. When combined with Big Data, we believe there are opportunities for significantly transforming a number of areas relevant to researchers and policy makers. In particular, by using a combination of data from a range of satellite Earth observations as well as computer simulations from climate models and reanalysis, we can gain new insights into the information that is locked within the data. Global geospatial datasets describe a wide range of physical and chemical parameters, which are mostly available using regular grids covering large spatial and temporal extents. This makes them perfect candidates to apply deep learning methods. So far, these techniques have been successfully applied to image analysis through the use of convolutional neural networks. However, this is only one field of interest, and there is potential for many more use cases to be explored. The deep learning algorithms require fast access to large amounts of data in the form of tensors and make intensive use of CPU in order to train its models. The Australian National Computational Infrastructure (NCI) has recently augmented its Raijin 1.2 PFlop supercomputer with hardware accelerators. Together with NCI's 3000 core high performance OpenStack cloud, these computational systems have direct access to NCI's 10+ PBytes of datasets and associated Big Data software technologies (see http://geonetwork.nci.org.au/ and http://nci.org.au/systems-services/national-facility/nerdip/). To effectively use these computing infrastructures requires that both the data and software are organised in a way that readily supports the deep learning software ecosystem. Deep learning software, such as the open source TensorFlow library, has allowed us to demonstrate the possibility of generating geospatial models by combining information from our different data sources. This opens the door to an exciting new way of generating products and extracting features that have previously been labour intensive. In this paper, we will explore some of these geospatial use cases and share some of the lessons learned from this experience.
OpenMx: An Open Source Extended Structural Equation Modeling Framework

ERIC Educational Resources Information Center

Boker, Steven; Neale, Michael; Maes, Hermine; Wilde, Michael; Spiegel, Michael; Brick, Timothy; Spies, Jeffrey; Estabrook, Ryne; Kenny, Sarah; Bates, Timothy; Mehta, Paras; Fox, John

2011-01-01

OpenMx is free, full-featured, open source, structural equation modeling (SEM) software. OpenMx runs within the "R" statistical programming environment on Windows, Mac OS-X, and Linux computers. The rationale for developing OpenMx is discussed along with the philosophy behind the user interface. The OpenMx data structures are…
An Overview of Cloud Computing in Distributed Systems

NASA Astrophysics Data System (ADS)

Divakarla, Usha; Kumari, Geetha

2010-11-01

Cloud computing is the emerging trend in the field of distributed computing. Cloud computing evolved from grid computing and distributed computing. Cloud plays an important role in huge organizations in maintaining huge data with limited resources. Cloud also helps in resource sharing through some specific virtual machines provided by the cloud service provider. This paper gives an overview of the cloud organization and some of the basic security issues pertaining to the cloud.
Vision 2010: The Future of Higher Education Business and Learning Applications

ERIC Educational Resources Information Center

Carey, Patrick; Gleason, Bernard

2006-01-01

The global software industry is in the midst of a major evolutionary shift--one based on open computing--and this trend, like many transformative trends in technology, is being led by the IT staffs and academic computing faculty of the higher education industry. The elements of this open computing approach are open source, open standards, open…
Generation of Digital Surface Models from satellite photogrammetry: the DSM-OPT service of the ESA Geohazards Exploitation Platform (GEP)

NASA Astrophysics Data System (ADS)

Stumpf, André; Michéa, David; Malet, Jean-Philippe

2017-04-01

The continuously increasing fleet of agile stereo-capable very-high resolution (VHR) optical satellites has facilitated the acquisition of multi-view images of the earth surface. Theoretical revisit times have been reduced to less than one day and the highest spatial resolution which is commercially available amounts now to 30 cm/pixel. Digital Surface Models (DSM) and point clouds computed from such satellite stereo-acquisitions can provide valuable input for studies in geomorphology, tectonics, glaciology, hydrology and urban remote sensing The photogrammetric processing, however, still requires significant expertise, computational resources and costly commercial software. To enable a large Earth Science community (researcher and end-users) to process easily and rapidly VHR multi-view images, the work targets the implementation of a fully automatic satellite-photogrammetry pipeline (i.e DSM-OPT) on the ESA Geohazards Exploitation Platform (GEP). The implemented pipeline is based on the open-source photogrammetry library MicMac [1] and is designed for distributed processing on a cloud-based infrastructure. The service can be employed in pre-defined processing modes (i.e. urban, plain, hilly, and mountainous environments) or in an advanced processing mode (i.e. in which expert-users have the possibility to adapt the processing parameters to their specific applications). Four representative use cases are presented to illustrate the accuracy of the resulting surface models and ortho-images as well as the overall processing time. These use cases consisted of the construction of surface models from series of Pléiades images for four applications: urban analysis (Strasbourg, France), landslide detection in mountainous environments (South French Alps), co-seismic deformation in mountain environments (Central Italy earthquake sequence of 2016) and fault recognition for paleo-tectonic analysis (North-East India). Comparisons of the satellite-derived topography to airborne LiDAR topography are discussed. [1] Rupnik, E., Pierrot Deseilligny, M., Delorme, A., and Klinger, Y.: Refined satellite image orientation in the free open-source photogrammetric tools APERO/MICMAC, ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., III-1, 83-90, doi:10.5194/isprs-annals-III-1-83-2016, 2016.
A new 3D maser code applied to flaring events

NASA Astrophysics Data System (ADS)

Gray, M. D.; Mason, L.; Etoka, S.

2018-06-01

We set out the theory and discretization scheme for a new finite-element computer code, written specifically for the simulation of maser sources. The code was used to compute fractional inversions at each node of a 3D domain for a range of optical thicknesses. Saturation behaviour of the nodes with regard to location and optical depth was broadly as expected. We have demonstrated via formal solutions of the radiative transfer equation that the apparent size of the model maser cloud decreases as expected with optical depth as viewed by a distant observer. Simulations of rotation of the cloud allowed the construction of light curves for a number of observable quantities. Rotation of the model cloud may be a reasonable model for quasi-periodic variability, but cannot explain periodic flaring.
ACToR Chemical Structure processing using Open Source ChemInformatics Libraries (FutureToxII)

EPA Science Inventory

ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from ove...
Spatially Resolved Metal Gas Clouds

NASA Astrophysics Data System (ADS)

Péroux, C.; Rahmani, H.; Arrigoni Battaia, F.; Augustin, R.

2018-05-01

We now have mounting evidences that the circumgalactic medium (CGM) of galaxies is polluted with metals processed through stars. The fate of these metals is however still an open question and several findings indicate that they remain poorly mixed. A powerful tool to study the low-density gas of the CGM is offered by absorption lines in quasar spectra, although the information retrieved is limited to 1D along the sightline. We report the serendipitous discovery of two close-by bright zgal=1.148 extended galaxies with a fortuitous intervening zabs=1.067 foreground absorber. MUSE IFU observations spatially probes kpc-scales in absorption in the plane of the sky over a total area spanning ˜30 kpc-2. We identify two [O II] emitters at zabs down to 21 kpc with SFR˜2 M⊙/yr. We measure small fractional variations (<30%) in the equivalent widths of Fe II and Mg II cold gas absorbers on coherence scales of 8kpc but stronger variation on larger scales (25kpc). We compute the corresponding cloud gas mass <2 × 109M⊙. Our results indicate a good efficiency of the metal mixing on kpc-scales in the CGM of a typical z˜1 galaxy. This study show-cases new prospects for mapping the distribution and sizes of metal clouds observed in absorption against extended background sources with 3D spectroscopy.
Virtual Geophysics Laboratory: Exploiting the Cloud and Empowering Geophysicsts

NASA Astrophysics Data System (ADS)

Fraser, Ryan; Vote, Josh; Goh, Richard; Cox, Simon

2013-04-01

Over the last five decades geoscientists from Australian state and federal agencies have collected and assembled around 3 Petabytes of geoscience data sets under public funding. As a consequence of technological progress, data is now being acquired at exponential rates and in higher resolution than ever before. Effective use of these big data sets challenges the storage and computational infrastructure of most organizations. The Virtual Geophysics Laboratory (VGL) is a scientific workflow portal addresses some of the resulting issues by providing Australian geophysicists with access to a Web 2.0 or Rich Internet Application (RIA) based integrated environment that exploits eResearch tools and Cloud computing technology, and promotes collaboration between the user community. VGL simplifies and automates large portions of what were previously manually intensive scientific workflow processes, allowing scientists to focus on the natural science problems, rather than computer science and IT. A number of geophysical processing codes are incorporated to support multiple workflows. For example a gravity inversion can be performed by combining the Escript/Finley codes (from the University of Queensland) with the gravity data registered in VGL. Likewise, tectonic processes can also be modeled by combining the Underworld code (from Monash University) with one of the various 3D models available to VGL. Cloud services provide scalable and cost effective compute resources. VGL is built on top of mature standards-compliant information services, many deployed using the Spatial Information Services Stack (SISS), which provides direct access to geophysical data. A large number of data sets from Geoscience Australia assist users in data discovery. GeoNetwork provides a metadata catalog to store workflow results for future use, discovery and provenance tracking. VGL has been developed in collaboration with the research community using incremental software development practices and open source tools. While developed to provide the geophysics research community with a sustainable platform and scalable infrastructure; VGL has also developed a number of concepts, patterns and generic components of which have been reused for cases beyond geophysics, including natural hazards, satellite processing and other areas requiring spatial data discovery and processing. Future plans for VGL include a number of improvements in both functional and non-functional areas in response to its user community needs and advancement in information technologies. In particular, research is underway in the following areas (a) distributed and parallel workflow processing in the cloud, (b) seamless integration with various cloud providers, and (c) integration with virtual laboratories representing other science domains. Acknowledgements: VGL was developed by CSIRO in collaboration with Geoscience Australia, National Computational Infrastructure, Australia National University, Monash University and University of Queensland, and has been supported by the Australian Government's Education Investment Funds through NeCTAR.
Analysis on the security of cloud computing

NASA Astrophysics Data System (ADS)

He, Zhonglin; He, Yuhua

2011-02-01

Cloud computing is a new technology, which is the fusion of computer technology and Internet development. It will lead the revolution of IT and information field. However, in cloud computing data and application software is stored at large data centers, and the management of data and service is not completely trustable, resulting in safety problems, which is the difficult point to improve the quality of cloud service. This paper briefly introduces the concept of cloud computing. Considering the characteristics of cloud computing, it constructs the security architecture of cloud computing. At the same time, with an eye toward the security threats cloud computing faces, several corresponding strategies are provided from the aspect of cloud computing users and service providers.
Development of an open-source cloud-connected sensor-monitoring platform

USDA-ARS?s Scientific Manuscript database

Rapid advances in electronics and communications technologies offer continuously evolving options for sensing and awareness of the physical environment. Many of these advances are becoming increasingly available to “non-professionals,” that is, those without formal training or expertise in discipli...
Trends and New Directions in Software Architecture

DTIC Science & Technology

2014-10-10

frameworks  Open source  Cloud strategies  NoSQL  Machine Learning  MDD  Incremental approaches  Dashboards  Distributed development...complexity grows  NoSQL Models are not created equal 2014 Our Current Research  Lightweight Evaluation and Architecture Prototyping for Big Data
Exo-Transmit: An Open-Source Code for Calculating Transmission Spectra for Exoplanet Atmospheres of Varied Composition

NASA Astrophysics Data System (ADS)

Kempton, Eliza M.-R.; Lupu, Roxana; Owusu-Asare, Albert; Slough, Patrick; Cale, Bryson

2017-04-01

We present Exo-Transmit, a software package to calculate exoplanet transmission spectra for planets of varied composition. The code is designed to generate spectra of planets with a wide range of atmospheric composition, temperature, surface gravity, and size, and is therefore applicable to exoplanets ranging in mass and size from hot Jupiters down to rocky super-Earths. Spectra can be generated with or without clouds or hazes with options to (1) include an optically thick cloud deck at a user-specified atmospheric pressure or (2) to augment the nominal Rayleigh scattering by a user-specified factor. The Exo-Transmit code is written in C and is extremely easy to use. Typically the user will only need to edit parameters in a single user input file in order to run the code for a planet of their choosing. Exo-Transmit is available publicly on Github with open-source licensing at https://github.com/elizakempton/Exo_Transmit.
Open source bioimage informatics for cell biology.

PubMed

Swedlow, Jason R; Eliceiri, Kevin W

2009-11-01

Significant technical advances in imaging, molecular biology and genomics have fueled a revolution in cell biology, in that the molecular and structural processes of the cell are now visualized and measured routinely. Driving much of this recent development has been the advent of computational tools for the acquisition, visualization, analysis and dissemination of these datasets. These tools collectively make up a new subfield of computational biology called bioimage informatics, which is facilitated by open source approaches. We discuss why open source tools for image informatics in cell biology are needed, some of the key general attributes of what make an open source imaging application successful, and point to opportunities for further operability that should greatly accelerate future cell biology discovery.
Computational toxicology using the OpenTox application programming interface and Bioclipse

PubMed Central

2011-01-01

Background Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications. Findings This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources. Conclusions A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers. PMID:22075173
ScipionCloud: An integrative and interactive gateway for large scale cryo electron microscopy image processing on commercial and academic clouds.

PubMed

Cuenca-Alba, Jesús; Del Cano, Laura; Gómez Blanco, Josué; de la Rosa Trevín, José Miguel; Conesa Mingo, Pablo; Marabini, Roberto; S Sorzano, Carlos Oscar; Carazo, Jose María

2017-10-01

New instrumentation for cryo electron microscopy (cryoEM) has significantly increased data collection rate as well as data quality, creating bottlenecks at the image processing level. Current image processing model of moving the acquired images from the data source (electron microscope) to desktops or local clusters for processing is encountering many practical limitations. However, computing may also take place in distributed and decentralized environments. In this way, cloud is a new form of accessing computing and storage resources on demand. Here, we evaluate on how this new computational paradigm can be effectively used by extending our current integrative framework for image processing, creating ScipionCloud. This new development has resulted in a full installation of Scipion both in public and private clouds, accessible as public "images", with all the required preinstalled cryoEM software, just requiring a Web browser to access all Graphical User Interfaces. We have profiled the performance of different configurations on Amazon Web Services and the European Federated Cloud, always on architectures incorporating GPU's, and compared them with a local facility. We have also analyzed the economical convenience of different scenarios, so cryoEM scientists have a clearer picture of the setup that is best suited for their needs and budgets. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Future of Department of Defense Cloud Computing Amid Cultural Confusion

DTIC Science & Technology

2013-03-01

enterprise cloud - computing environment and transition to a public cloud service provider. Services have started the development of individual cloud - computing environments...endorsing cloud computing . It addresses related issues in matters of service culture changes and how strategic leaders will dictate the future of cloud ...through data center consolidation and individual Service provided cloud computing .
Enabling a systems biology knowledgebase with gaggle and firegoose

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baliga, Nitin S.

The overall goal of this project was to extend the existing Gaggle and Firegoose systems to develop an open-source technology that runs over the web and links desktop applications with many databases and software applications. This technology would enable researchers to incorporate workflows for data analysis that can be executed from this interface to other online applications. The four specific aims were to (1) provide one-click mapping of genes, proteins, and complexes across databases and species; (2) enable multiple simultaneous workflows; (3) expand sophisticated data analysis for online resources; and enhance open-source development of the Gaggle-Firegoose infrastructure. Gaggle is anmore » open-source Java software system that integrates existing bioinformatics programs and data sources into a user-friendly, extensible environment to allow interactive exploration, visualization, and analysis of systems biology data. Firegoose is an extension to the Mozilla Firefox web browser that enables data transfer between websites and desktop tools including Gaggle. In the last phase of this funding period, we have made substantial progress on development and application of the Gaggle integration framework. We implemented the workspace to the Network Portal. Users can capture data from Firegoose and save them to the workspace. Users can create workflows to start multiple software components programmatically and pass data between them. Results of analysis can be saved to the cloud so that they can be easily restored on any machine. We also developed the Gaggle Chrome Goose, a plugin for the Google Chrome browser in tandem with an opencpu server in the Amazon EC2 cloud. This allows users to interactively perform data analysis on a single web page using the R packages deployed on the opencpu server. The cloud-based framework facilitates collaboration between researchers from multiple organizations. We have made a number of enhancements to the cmonkey2 application to enable and improve the integration within different environments, and we have created a new tools pipeline for generating EGRIN2 models in a largely automated way.« less

Cloud-Based NoSQL Open Database of Pulmonary Nodules for Computer-Aided Lung Cancer Diagnosis and Reproducible Research.

PubMed

Ferreira Junior, José Raniery; Oliveira, Marcelo Costa; de Azevedo-Marques, Paulo Mazzoncini

2016-12-01

Lung cancer is the leading cause of cancer-related deaths in the world, and its main manifestation is pulmonary nodules. Detection and classification of pulmonary nodules are challenging tasks that must be done by qualified specialists, but image interpretation errors make those tasks difficult. In order to aid radiologists on those hard tasks, it is important to integrate the computer-based tools with the lesion detection, pathology diagnosis, and image interpretation processes. However, computer-aided diagnosis research faces the problem of not having enough shared medical reference data for the development, testing, and evaluation of computational methods for diagnosis. In order to minimize this problem, this paper presents a public nonrelational document-oriented cloud-based database of pulmonary nodules characterized by 3D texture attributes, identified by experienced radiologists and classified in nine different subjective characteristics by the same specialists. Our goal with the development of this database is to improve computer-aided lung cancer diagnosis and pulmonary nodule detection and classification research through the deployment of this database in a cloud Database as a Service framework. Pulmonary nodule data was provided by the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), image descriptors were acquired by a volumetric texture analysis, and database schema was developed using a document-oriented Not only Structured Query Language (NoSQL) approach. The proposed database is now with 379 exams, 838 nodules, and 8237 images, 4029 of them are CT scans and 4208 manually segmented nodules, and it is allocated in a MongoDB instance on a cloud infrastructure.
Marine boundary layer cloud regimes and POC formation in an LES coupled to a bulk aerosol scheme

NASA Astrophysics Data System (ADS)

Berner, A. H.; Bretherton, C. S.; Wood, R.; Muhlbauer, A.

2013-07-01

A large-eddy simulation (LES) coupled to a new bulk aerosol scheme is used to study long-lived regimes of aerosol-boundary layer cloud-precipitation interaction and the development of pockets of open cells (POCs) in subtropical stratocumulus cloud layers. The aerosol scheme prognoses mass and number concentration of a single log-normal accumulation mode with surface and entrainment sources, evolving subject to processing of activated aerosol and scavenging of dry aerosol by cloud and rain. The LES with the aerosol scheme is applied to a range of steadily-forced simulations idealized from a well-observed POC case. The long-term system evolution is explored with extended two-dimensional simulations of up to 20 days, mostly with diurnally-averaged insolation. One three-dimensional two-day simulation confirms the initial development of the corresponding two-dimensional case. With weak mean subsidence, an initially aerosol-rich mixed layer deepens, the capping stratocumulus cloud slowly thickens and increasingly depletes aerosol via precipitation accretion, then the boundary layer transitions within a few hours into an open-cell regime with scattered precipitating cumuli, in which entrainment is much weaker. The inversion slowly collapses for several days until the cumulus clouds are too shallow to efficiently precipitate. Inversion cloud then reforms and radiatively drives renewed entrainment, allowing the boundary layer to deepen and become more aerosol-rich, until the stratocumulus layer thickens enough to undergo another cycle of open-cell formation. If mean subsidence is stronger, the stratocumulus never thickens enough to initiate drizzle and settles into a steady state. With lower initial aerosol concentrations, this system quickly transitions into open cells, collapses, and redevelops into a different steady state with a shallow, optically thin cloud layer. In these steady states, interstitial scavenging by cloud droplets is the main sink of aerosol number. The system is described in a reduced two-dimensional phase plane with inversion height and boundary-layer average aerosol concentrations as the state variables. Simulations with a full diurnal cycle show similar evolutions, except that open-cell formation is phase-locked into the early morning hours. The same steadily-forced modeling framework is applied to the development and evolution of a POC and the surrounding overcast boundary layer. An initial aerosol perturbation applied to a portion of the model domain leads that portion to transition into open-cell convection, forming a POC. Reduced entrainment in the POC induces a negative feedback between areal fraction covered by the POC and boundary layer depth changes. This stabilizes the system by controlling liquid water path and precipitation sinks of aerosol number in the overcast region, while also preventing boundary-layer collapse within the POC, allowing the POC and overcast to coexist indefinitely in a quasi-steady equilibrium.
BioContainers: an open-source and community-driven framework for software standardization.

PubMed

da Veiga Leprevost, Felipe; Grüning, Björn A; Alves Aflitos, Saulo; Röst, Hannes L; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I; Perez-Riverol, Yasset

2017-08-15

BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). The software is freely available at github.com/BioContainers/. yperez@ebi.ac.uk. © The Author(s) 2017. Published by Oxford University Press.
BioContainers: an open-source and community-driven framework for software standardization

PubMed Central

da Veiga Leprevost, Felipe; Grüning, Björn A.; Alves Aflitos, Saulo; Röst, Hannes L.; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C.; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I.; Perez-Riverol, Yasset

2017-01-01

Abstract Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/. Contact yperez@ebi.ac.uk PMID:28379341
Virtual Labs (Science Gateways) as platforms for Free and Open Source Science

NASA Astrophysics Data System (ADS)

Lescinsky, David; Car, Nicholas; Fraser, Ryan; Friedrich, Carsten; Kemp, Carina; Squire, Geoffrey

2016-04-01

The Free and Open Source Software (FOSS) movement promotes community engagement in software development, as well as provides access to a range of sophisticated technologies that would be prohibitively expensive if obtained commercially. However, as geoinformatics and eResearch tools and services become more dispersed, it becomes more complicated to identify and interface between the many required components. Virtual Laboratories (VLs, also known as Science Gateways) simplify the management and coordination of these components by providing a platform linking many, if not all, of the steps in particular scientific processes. These enable scientists to focus on their science, rather than the underlying supporting technologies. We describe a modular, open source, VL infrastructure that can be reconfigured to create VLs for a wide range of disciplines. Development of this infrastructure has been led by CSIRO in collaboration with Geoscience Australia and the National Computational Infrastructure (NCI) with support from the National eResearch Collaboration Tools and Resources (NeCTAR) and the Australian National Data Service (ANDS). Initially, the infrastructure was developed to support the Virtual Geophysical Laboratory (VGL), and has subsequently been repurposed to create the Virtual Hazards Impact and Risk Laboratory (VHIRL) and the reconfigured Australian National Virtual Geophysics Laboratory (ANVGL). During each step of development, new capabilities and services have been added and/or enhanced. We plan on continuing to follow this model using a shared, community code base. The VL platform facilitates transparent and reproducible science by providing access to both the data and methodologies used during scientific investigations. This is further enhanced by the ability to set up and run investigations using computational resources accessed through the VL. Data is accessed using registries pointing to catalogues within public data repositories (notably including the NCI National Environmental Research Data Interoperability Platform), or by uploading data directly from user supplied addresses or files. Similarly, scientific software is accessed through registries pointing to software repositories (e.g., GitHub). Runs are configured by using or modifying default templates designed by subject matter experts. After the appropriate computational resources are identified by the user, Virtual Machines (VMs) are spun up and jobs are submitted to service providers (currently the NeCTAR public cloud or Amazon Web Services). Following completion of the jobs the results can be reviewed and downloaded if desired. By providing a unified platform for science, the VL infrastructure enables sophisticated provenance capture and management. The source of input data (including both collection and queries), user information, software information (version and configuration details) and output information are all captured and managed as a VL resource which can be linked to output data sets. This provenance resource provides a mechanism for publication and citation for Free and Open Source Science.
76 FR 33399 - Advisory Committee on International Economic Policy; Notice of Open Meeting

Federal Register 2010, 2011, 2012, 2013, 2014

2011-06-08

... to the advent of cloud computing as a new business model in international trade, the implications of... of State for Economic, Energy, and Business Affairs Jose W. Fernandez and Committee Chair Ted... and Business Affairs, at (202) 647-2231 or [email protected] . This announcement might appear in the...
Infrastructure Suitability Assessment Modeling for Cloud Computing Solutions

DTIC Science & Technology

2011-09-01

Virtualization vs . Para-Virtualization .......................................................10 Figure 4. Modeling alternatives in relation to model...the conceptual difference between full virtualization and para-virtualization. Figure 3. Full Virtualization vs . Para-Virtualization 2. XEN...Besides Microsoft’s own client implementations, dubbed “Remote Desktop Con- nection Client” for Windows® and Apple ® operating systems, various open
Application of Open Source Technologies for Oceanographic Data Analysis

NASA Astrophysics Data System (ADS)

Huang, T.; Gangl, M.; Quach, N. T.; Wilson, B. D.; Chang, G.; Armstrong, E. M.; Chin, T. M.; Greguska, F.

2015-12-01

NEXUS is a data-intensive analysis solution developed with a new approach for handling science data that enables large-scale data analysis by leveraging open source technologies such as Apache Cassandra, Apache Spark, Apache Solr, and Webification. NEXUS has been selected to provide on-the-fly time-series and histogram generation for the Soil Moisture Active Passive (SMAP) mission for Level 2 and Level 3 Active, Passive, and Active Passive products. It also provides an on-the-fly data subsetting capability. NEXUS is designed to scale horizontally, enabling it to handle massive amounts of data in parallel. It takes a new approach on managing time and geo-referenced array data by dividing data artifacts into chunks and stores them in an industry-standard, horizontally scaled NoSQL database. This approach enables the development of scalable data analysis services that can infuse and leverage the elastic computing infrastructure of the Cloud. It is equipped with a high-performance geospatial and indexed data search solution, coupled with a high-performance data Webification solution free from file I/O bottlenecks, as well as a high-performance, in-memory data analysis engine. In this talk, we will focus on the recently funded AIST 2014 project by using NEXUS as the core for oceanographic anomaly detection service and web portal. We call it, OceanXtremes
AOI 1— COMPUTATIONAL ENERGY SCIENCES:MULTIPHASE FLOW RESEARCH High-fidelity multi-phase radiation module for modern coal combustion systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Modest, Michael

The effects of radiation in particle-laden flows were the object of the present research. The presence of particles increases optical thickness substantially, making the use of the “optically thin” approximation in most cases a very poor assumption. However, since radiation fluxes peak at intermediate optical thicknesses, overall radiative effects may not necessarily be stronger than in gas combustion. Also, the spectral behavior of particle radiation properties is much more benign, making spectral models simpler (and making the assumption of a gray radiator halfway acceptable, at least for fluidized beds when gas radiation is not large). On the other hand, particlesmore » scatter radiation, making the radiative transfer equation (RTE) much more di fficult to solve. The research carried out in this project encompassed three general areas: (i) assessment of relevant radiation properties of particle clouds encountered in fluidized bed and pulverized coal combustors, (ii) development of proper spectral models for gas–particulate mixtures for various types of two-phase combustion flows, and (iii) development of a Radiative Transfer Equation (RTE) solution module for such applications. The resulting models were validated against artificial cases since open literature experimental data were not available. The final models are in modular form tailored toward maximum portability, and were incorporated into two research codes: (i) the open-source CFD code OpenFOAM, which we have extensively used in our previous work, and (ii) the open-source multi-phase flow code MFIX, which is maintained by NETL.« less
Possible external sources of terrestrial cloud cover variability: the solar wind

NASA Astrophysics Data System (ADS)

Voiculescu, Mirela; Usoskin, Ilya; Condurache-Bota, Simona

2014-05-01

Cloud cover plays an important role in the terrestrial radiation budget. The possible influence of the solar activity on cloud cover is still an open question with contradictory answers. An extraterrestrial factor potentially affecting the cloud cover is related to fields associated with solar wind. We focus here on a derived quantity, the interplanetary electric field (IEF), defined as the product between the solar wind speed and the meridional component, Bz, of the interplanetary magnetic field (IMF) in the Geocentric Solar Magnetospheric (GSM) system. We show that cloud cover at mid-high latitudes systematically correlates with positive IEF, which has a clear energetic input into the atmosphere, but not with negative IEF, in general agreement with predictions of the global electric circuit (GEC)-related mechanism. Since the IEF responds differently to solar activity than, for instance, cosmic ray flux or solar irradiance, we also show that such a study allows distinguishing one solar-driven mechanism of cloud evolution, via the GEC, from others. We also present results showing that the link between cloud cover and IMF varies depending on composition and altitude of clouds.
A New Heuristic Anonymization Technique for Privacy Preserved Datasets Publication on Cloud Computing

NASA Astrophysics Data System (ADS)

Aldeen Yousra, S.; Mazleena, Salleh

2018-05-01

Recent advancement in Information and Communication Technologies (ICT) demanded much of cloud services to sharing users’ private data. Data from various organizations are the vital information source for analysis and research. Generally, this sensitive or private data information involves medical, census, voter registration, social network, and customer services. Primary concern of cloud service providers in data publishing is to hide the sensitive information of individuals. One of the cloud services that fulfill the confidentiality concerns is Privacy Preserving Data Mining (PPDM). The PPDM service in Cloud Computing (CC) enables data publishing with minimized distortion and absolute privacy. In this method, datasets are anonymized via generalization to accomplish the privacy requirements. However, the well-known privacy preserving data mining technique called K-anonymity suffers from several limitations. To surmount those shortcomings, I propose a new heuristic anonymization framework for preserving the privacy of sensitive datasets when publishing on cloud. The advantages of K-anonymity, L-diversity and (α, k)-anonymity methods for efficient information utilization and privacy protection are emphasized. Experimental results revealed the superiority and outperformance of the developed technique than K-anonymity, L-diversity, and (α, k)-anonymity measure.
Airborne Open Polar/Imaging Nephelometer for Ice Particles in Cirrus Clouds and Aerosols Field Campaign Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martins, JV

2016-04-01

The Open Imaging Nephelometer (O-I-Neph) instrument is an adaptation of a proven laboratory instrument built and tested at the University of Maryland, Baltimore County (UMBC), the Polarized Imaging Nephelometer (PI-Neph). The instrument design of both imaging nephelometers uses a narrow-beam laser source and a wide-field-of-view imaging camera to capture the entire scattering-phase function in one image, quasi-instantaneously.
Cloud-Based Distributed Control of Unmanned Systems

DTIC Science & Technology

2015-04-01

during mission execution. At best, the data is saved onto hard-drives and is accessible only by the local team. Data history in a form available and...following open source technologies: GeoServer, OpenLayers, PostgreSQL , and PostGIS are chosen to implement the back-end database and server. A brief...geospatial map data. 3. PostgreSQL : An SQL-compliant object-relational database that easily scales to accommodate large amounts of data - upwards to
SU-E-T-314: The Application of Cloud Computing in Pencil Beam Scanning Proton Therapy Monte Carlo Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Z; Gao, M

Purpose: Monte Carlo simulation plays an important role for proton Pencil Beam Scanning (PBS) technique. However, MC simulation demands high computing power and is limited to few large proton centers that can afford a computer cluster. We study the feasibility of utilizing cloud computing in the MC simulation of PBS beams. Methods: A GATE/GEANT4 based MC simulation software was installed on a commercial cloud computing virtual machine (Linux 64-bits, Amazon EC2). Single spot Integral Depth Dose (IDD) curves and in-air transverse profiles were used to tune the source parameters to simulate an IBA machine. With the use of StarCluster softwaremore » developed at MIT, a Linux cluster with 2–100 nodes can be conveniently launched in the cloud. A proton PBS plan was then exported to the cloud where the MC simulation was run. Results: The simulated PBS plan has a field size of 10×10cm{sup 2}, 20cm range, 10cm modulation, and contains over 10,000 beam spots. EC2 instance type m1.medium was selected considering the CPU/memory requirement and 40 instances were used to form a Linux cluster. To minimize cost, master node was created with on-demand instance and worker nodes were created with spot-instance. The hourly cost for the 40-node cluster was $0.63 and the projected cost for a 100-node cluster was $1.41. Ten million events were simulated to plot PDD and profile, with each job containing 500k events. The simulation completed within 1 hour and an overall statistical uncertainty of < 2% was achieved. Good agreement between MC simulation and measurement was observed. Conclusion: Cloud computing is a cost-effective and easy to maintain platform to run proton PBS MC simulation. When proton MC packages such as GATE and TOPAS are combined with cloud computing, it will greatly facilitate the pursuing of PBS MC studies, especially for newly established proton centers or individual researchers.« less
Computer Forensics Education - the Open Source Approach

NASA Astrophysics Data System (ADS)

Huebner, Ewa; Bem, Derek; Cheung, Hon

In this chapter we discuss the application of the open source software tools in computer forensics education at tertiary level. We argue that open source tools are more suitable than commercial tools, as they provide the opportunity for students to gain in-depth understanding and appreciation of the computer forensic process as opposed to familiarity with one software product, however complex and multi-functional. With the access to all source programs the students become more than just the consumers of the tools as future forensic investigators. They can also examine the code, understand the relationship between the binary images and relevant data structures, and in the process gain necessary background to become the future creators of new and improved forensic software tools. As a case study we present an advanced subject, Computer Forensics Workshop, which we designed for the Bachelor's degree in computer science at the University of Western Sydney. We based all laboratory work and the main take-home project in this subject on open source software tools. We found that without exception more than one suitable tool can be found to cover each topic in the curriculum adequately. We argue that this approach prepares students better for forensic field work, as they gain confidence to use a variety of tools, not just a single product they are familiar with.
Remote Monitoring of Soil Water Content, Temperature, and Heat Flow Using Low-Cost Cellular (3G) IoT Technology

NASA Astrophysics Data System (ADS)

Ham, J. M.

2016-12-01

New microprocessor boards, open-source sensors, and cloud infrastructure developed for the Internet of Things (IoT) can be used to create low-cost monitoring systems for environmental research. This project describes two applications in soil science and hydrology: 1) remote monitoring of the soil temperature regime near oil and gas operations to detect the thermal signature associated with the natural source zone degradation of hydrocarbon contaminants in the vadose zone, and 2) remote monitoring of soil water content near the surface as part of a global citizen science network. In both cases, prototype data collection systems were built around the cellular (2G/3G) "Electron" microcontroller (www.particle.io). This device allows connectivity to the cloud using a low-cost global SIM and data plan. The systems have cellular connectivity in over 100 countries and data can be logged to the cloud for storage. Users can view data real time over any internet connection or via their smart phone. For both projects, data logging, storage, and visualization was done using IoT services like Thingspeak (thingspeak.com). The soil thermal monitoring system was tested on experimental plots in Colorado USA to evaluate the accuracy and reliability of different temperature sensors and 3D printed housings. The soil water experiment included comparison opens-source capacitance-based sensors to commercial versions. Results demonstrate the power of leveraging IoT technology for field research.
TERRA REF: Advancing phenomics with high resolution, open access sensor and genomics data

NASA Astrophysics Data System (ADS)

LeBauer, D.; Kooper, R.; Burnette, M.; Willis, C.

2017-12-01

Automated plant measurement has the potential to improve understanding of genetic and environmental controls on plant traits (phenotypes). The application of sensors and software in the automation of high throughput phenotyping reflects a fundamental shift from labor intensive hand measurements to drone, tractor, and robot mounted sensing platforms. These tools are expected to speed the rate of crop improvement by enabling plant breeders to more accurately select plants with improved yields, resource use efficiency, and stress tolerance. However, there are many challenges facing high throughput phenomics: sensors and platforms are expensive, currently there are few standard methods of data collection and storage, and the analysis of large data sets requires high performance computers and automated, reproducible computing pipelines. To overcome these obstacles and advance the science of high throughput phenomics, the TERRA Phenotyping Reference Platform (TERRA-REF) team is developing an open-access database of high resolution sensor data. TERRA REF is an integrated field and greenhouse phenotyping system that includes: a reference field scanner with fifteen sensors that can generate terrabytes of data each day at mm resolution; UAV, tractor, and fixed field sensing platforms; and an automated controlled-environment scanner. These platforms will enable investigation of diverse sensing modalities, and the investigation of traits under controlled and field environments. It is the goal of TERRA REF to lower the barrier to entry for academic and industry researchers by providing high-resolution data, open source software, and online computing resources. Our project is unique in that all data will be made fully public in November 2018, and is already available to early adopters through the beta-user program. We will describe the datasets and how to use them as well as the databases and computing pipeline and how these can be reused and remixed in other phenomics pipelines. Finally, we will describe the National Data Service workbench, a cloud computing platform that can access the petabyte scale data while supporting reproducible research.
Combustion Aerosol over Marine Stratus: Long Range Transport, Subsidence and Aerosol-Cloud Interactions over the South East Pacific

NASA Astrophysics Data System (ADS)

Clarke, A. D.; Snider, J.; Freitag, S.; Feingold, G.; Campos, T. L.; Breckhovskikh, V.; Kazil, J.

2011-12-01

The worlds largest stratus deck over the South East Pacific (SEP) was a study target for the VOCALS (http://www.eol.ucar.edu/projects/vocals/) experiment in October 2008. Aerosol-cloud interactions were one major goal of several ship and aircraft studies including results from 14 flights of the NCAR C-130 aircraft reported here. Each flight covered about a 1000 km range with multiple profiles and legs below, in and above the Sc deck. Strong aerosol sources along the coast of Chile were expected and found to influence cloud condensation nuclei (CCN) in coastal clouds. However; "rivers" of elevated CO, black carbon (BC) associated with combustion aerosol effective as CCN at <0.3%S were also common in subsiding FT air overlying the extensive Sc deck for over 1000km offshore. This subsidence, linked to the Hadley circulation, brought in aerosol from sources over the western Pacific as well as South America. Observed entrainment of this aerosol appeared linked to cloud related turbulence. When present, this combustion aerosol increased available CCN and decreased effective radius compared to clouds in "clean" MBL air advected from the South Pacific. We hypothesize that this entrainment can help buffer MBL clouds over the SEP against depletion of CCN by drizzle. This may delay transition of closed cell to open cell convection, potentially leading to increased lifetimes of Sc clouds that entrain such aerosol.
Performance analysis of data intensive cloud systems based on data management and replication: a survey

DOE Office of Scientific and Technical Information (OSTI.GOV)

Malik, Saif Ur Rehman; Khan, Samee U.; Ewen, Sam J.

2015-03-14

As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobiledevices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amountmore » of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies. Moreover, the breakdown of both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.« less
The Use of Computer Vision Algorithms for Automatic Orientation of Terrestrial Laser Scanning Data

NASA Astrophysics Data System (ADS)

Markiewicz, Jakub Stefan

2016-06-01

The paper presents analysis of the orientation of terrestrial laser scanning (TLS) data. In the proposed data processing methodology, point clouds are considered as panoramic images enriched by the depth map. Computer vision (CV) algorithms are used for orientation, which are applied for testing the correctness of the detection of tie points and time of computations, and for assessing difficulties in their implementation. The BRISK, FASRT, MSER, SIFT, SURF, ASIFT and CenSurE algorithms are used to search for key-points. The source data are point clouds acquired using a Z+F 5006h terrestrial laser scanner on the ruins of Iłża Castle, Poland. Algorithms allowing combination of the photogrammetric and CV approaches are also presented.

PyEEG: an open source Python module for EEG/MEG feature extraction.

PubMed

Bao, Forrest Sheng; Liu, Xin; Zhang, Christina

2011-01-01

Computer-aided diagnosis of neural diseases from EEG signals (or other physiological signals that can be treated as time series, e.g., MEG) is an emerging field that has gained much attention in past years. Extracting features is a key component in the analysis of EEG signals. In our previous works, we have implemented many EEG feature extraction functions in the Python programming language. As Python is gaining more ground in scientific computing, an open source Python module for extracting EEG features has the potential to save much time for computational neuroscientists. In this paper, we introduce PyEEG, an open source Python module for EEG feature extraction.
PyEEG: An Open Source Python Module for EEG/MEG Feature Extraction

PubMed Central

Bao, Forrest Sheng; Liu, Xin; Zhang, Christina

2011-01-01

Computer-aided diagnosis of neural diseases from EEG signals (or other physiological signals that can be treated as time series, e.g., MEG) is an emerging field that has gained much attention in past years. Extracting features is a key component in the analysis of EEG signals. In our previous works, we have implemented many EEG feature extraction functions in the Python programming language. As Python is gaining more ground in scientific computing, an open source Python module for extracting EEG features has the potential to save much time for computational neuroscientists. In this paper, we introduce PyEEG, an open source Python module for EEG feature extraction. PMID:21512582
Toward a Big Data Science: A challenge of "Science Cloud"

NASA Astrophysics Data System (ADS)

Murata, Ken T.; Watanabe, Hidenobu

2013-04-01

During these 50 years, along with appearance and development of high-performance computers (and super-computers), numerical simulation is considered to be a third methodology for science, following theoretical (first) and experimental and/or observational (second) approaches. The variety of data yielded by the second approaches has been getting more and more. It is due to the progress of technologies of experiments and observations. The amount of the data generated by the third methodologies has been getting larger and larger. It is because of tremendous development and programming techniques of super computers. Most of the data files created by both experiments/observations and numerical simulations are saved in digital formats and analyzed on computers. The researchers (domain experts) are interested in not only how to make experiments and/or observations or perform numerical simulations, but what information (new findings) to extract from the data. However, data does not usually tell anything about the science; sciences are implicitly hidden in the data. Researchers have to extract information to find new sciences from the data files. This is a basic concept of data intensive (data oriented) science for Big Data. As the scales of experiments and/or observations and numerical simulations get larger, new techniques and facilities are required to extract information from a large amount of data files. The technique is called as informatics as a fourth methodology for new sciences. Any methodologies must work on their facilities: for example, space environment are observed via spacecraft and numerical simulations are performed on super-computers, respectively in space science. The facility of the informatics, which deals with large-scale data, is a computational cloud system for science. This paper is to propose a cloud system for informatics, which has been developed at NICT (National Institute of Information and Communications Technology), Japan. The NICT science cloud, we named as OneSpaceNet (OSN), is the first open cloud system for scientists who are going to carry out their informatics for their own science. The science cloud is not for simple uses. Many functions are expected to the science cloud; such as data standardization, data collection and crawling, large and distributed data storage system, security and reliability, database and meta-database, data stewardship, long-term data preservation, data rescue and preservation, data mining, parallel processing, data publication and provision, semantic web, 3D and 4D visualization, out-reach and in-reach, and capacity buildings. Figure (not shown here) is a schematic picture of the NICT science cloud. Both types of data from observation and simulation are stored in the storage system in the science cloud. It should be noted that there are two types of data in observation. One is from archive site out of the cloud: this is a data to be downloaded through the Internet to the cloud. The other one is data from the equipment directly connected to the science cloud. They are often called as sensor clouds. In the present talk, we first introduce the NICT science cloud. We next demonstrate the efficiency of the science cloud, showing several scientific results which we achieved with this cloud system. Through the discussions and demonstrations, the potential performance of sciences cloud will be revealed for any research fields.
The International Symposium on Grids and Clouds and the Open Grid Forum

NASA Astrophysics Data System (ADS)

The International Symposium on Grids and Clouds 20111 was held at Academia Sinica in Taipei, Taiwan on 19th to 25th March 2011. A series of workshops and tutorials preceded the symposium. The aim of ISGC is to promote the use of grid and cloud computing in the Asia Pacific region. Over the 9 years that ISGC has been running, the programme has evolved to become more user community focused with subjects reaching out to a larger population. Research communities are making widespread use of distributed computing facilities. Linking together data centers, production grids, desktop systems or public clouds, many researchers are able to do more research and produce results more quickly. They could do much more if the computing infrastructures they use worked together more effectively. Changes in the way we approach distributed computing, and new services from commercial providers, mean that boundaries are starting to blur. This opens the way for hybrid solutions that make it easier for researchers to get their job done. Consequently the theme for ISGC2011 was the opportunities that better integrated computing infrastructures can bring, and the steps needed to achieve the vision of a seamless global research infrastructure. 2011 is a year of firsts for ISGC. First the title - while the acronym remains the same, its meaning has changed to reflect the evolution of computing: The International Symposium on Grids and Clouds. Secondly the programming - ISGC 2011 has always included topical workshops and tutorials. But 2011 is the first year that ISGC has been held in conjunction with the Open Grid Forum2 which held its 31st meeting with a series of working group sessions. The ISGC plenary session included keynote speakers from OGF that highlighted the relevance of standards for the research community. ISGC with its focus on applications and operational aspects complemented well with OGF's focus on standards development. ISGC brought to OGF real-life use cases and needs to be addressed while OGF exposed the state of current developments and issues to be resolved if commonalities are to be exploited. Another first is for the Proceedings for 2011, an open access online publishing scheme will ensure these Proceedings will appear more quickly and more people will have access to the results, providing a long-term online archive of the event. The symposium attracted more than 212 participants from 29 countries spanning Asia, Europe and the Americas. Coming so soon after the earthquake and tsunami in Japan, the participation of our Japanese colleagues was particularly appreciated. Keynotes by invited speakers highlighted the impact of distributed computing infrastructures in the social sciences and humanities, high energy physics, earth and life sciences. Plenary sessions entitled Grid Activities in Asia Pacific surveyed the state of grid deployment across 11 Asian countries. Through the parallel sessions, the impact of distributed computing infrastructures in a range of research disciplines was highlighted. Operational procedures, middleware and security aspects were addressed in a dedicated sessions. The symposium was covered online in real-time by the GridCast team from the GridTalk project. A running blog including summarises of specific sessions as well as video interviews with keynote speakers and personalities and photos. As with all regions of the world, grid and cloud computing has to be prove it is adding value to researchers if it is be accepted by them and demonstrate its impact on society as a while if it to be supported by national governments, funding agencies and the general public. ISGC has helped foster the emergence of a strong regional interest in the earth and life sciences, notably for natural disaster mitigation and bioinformatics studies. Prof. Simon C. Lin organised an intense social programme with a gastronomic tour of Taipei culminating with a banquet for all the symposium's participants at the hotel Palais de Chine. I would like to thank all the members of the programme committee, the participants and above all our hosts, Prof. Simon C. Lin and his excellent support team at Academia Sinica. Dr. Bob Jones Programme Chair 1 http://event.twgrid.org/isgc2011/ 2 http://www.gridforum.org/
Improving volcanic sulfur dioxide cloud dispersal forecasts by progressive assimilation of satellite observations

NASA Astrophysics Data System (ADS)

Boichu, Marie; Clarisse, Lieven; Khvorostyanov, Dmitry; Clerbaux, Cathy

2014-04-01

Forecasting the dispersal of volcanic clouds during an eruption is of primary importance, especially for ensuring aviation safety. As volcanic emissions are characterized by rapid variations of emission rate and height, the (generally) high level of uncertainty in the emission parameters represents a critical issue that limits the robustness of volcanic cloud dispersal forecasts. An inverse modeling scheme, combining satellite observations of the volcanic cloud with a regional chemistry-transport model, allows reconstructing this source term at high temporal resolution. We demonstrate here how a progressive assimilation of freshly acquired satellite observations, via such an inverse modeling procedure, allows for delivering robust sulfur dioxide (SO2) cloud dispersal forecasts during the eruption. This approach provides a computationally cheap estimate of the expected location and mass loading of volcanic clouds, including the identification of SO2-rich parts.
Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

PubMed Central

Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

2014-01-01

Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294
Genetic Constructor: An Online DNA Design Platform.

PubMed

Bates, Maxwell; Lachoff, Joe; Meech, Duncan; Zulkower, Valentin; Moisy, Anaïs; Luo, Yisha; Tekotte, Hille; Franziska Scheitz, Cornelia Johanna; Khilari, Rupal; Mazzoldi, Florencio; Chandran, Deepak; Groban, Eli

2017-12-15

Genetic Constructor is a cloud Computer Aided Design (CAD) application developed to support synthetic biologists from design intent through DNA fabrication and experiment iteration. The platform allows users to design, manage, and navigate complex DNA constructs and libraries, using a new visual language that focuses on functional parts abstracted from sequence. Features like combinatorial libraries and automated primer design allow the user to separate design from construction by focusing on functional intent, and design constraints aid iterative refinement of designs. A plugin architecture enables contributions from scientists and coders to leverage existing powerful software and connect to DNA foundries. The software is easily accessible and platform agnostic, free for academics, and available in an open-source community edition. Genetic Constructor seeks to democratize DNA design, manufacture, and access to tools and services from the synthetic biology community.
QMachine: commodity supercomputing in web browsers

PubMed Central

2014-01-01

Background Ongoing advancements in cloud computing provide novel opportunities in scientific computing, especially for distributed workflows. Modern web browsers can now be used as high-performance workstations for querying, processing, and visualizing genomics’ “Big Data” from sources like The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) without local software installation or configuration. The design of QMachine (QM) was driven by the opportunity to use this pervasive computing model in the context of the Web of Linked Data in Biomedicine. Results QM is an open-sourced, publicly available web service that acts as a messaging system for posting tasks and retrieving results over HTTP. The illustrative application described here distributes the analyses of 20 Streptococcus pneumoniae genomes for shared suffixes. Because all analytical and data retrieval tasks are executed by volunteer machines, few server resources are required. Any modern web browser can submit those tasks and/or volunteer to execute them without installing any extra plugins or programs. A client library provides high-level distribution templates including MapReduce. This stark departure from the current reliance on expensive server hardware running “download and install” software has already gathered substantial community interest, as QM received more than 2.2 million API calls from 87 countries in 12 months. Conclusions QM was found adequate to deliver the sort of scalable bioinformatics solutions that computation- and data-intensive workflows require. Paradoxically, the sandboxed execution of code by web browsers was also found to enable them, as compute nodes, to address critical privacy concerns that characterize biomedical environments. PMID:24913605
Open source system OpenVPN in a function of Virtual Private Network

NASA Astrophysics Data System (ADS)

Skendzic, A.; Kovacic, B.

2017-05-01

Using of Virtual Private Networks (VPN) can establish high security level in network communication. VPN technology enables high security networking using distributed or public network infrastructure. VPN uses different security and managing rules inside networks. It can be set up using different communication channels like Internet or separate ISP communication infrastructure. VPN private network makes security communication channel over public network between two endpoints (computers). OpenVPN is an open source software product under GNU General Public License (GPL) that can be used to establish VPN communication between two computers inside business local network over public communication infrastructure. It uses special security protocols and 256-bit Encryption and it is capable of traversing network address translators (NATs) and firewalls. It allows computers to authenticate each other using a pre-shared secret key, certificates or username and password. This work gives review of VPN technology with a special accent on OpenVPN. This paper will also give comparison and financial benefits of using open source VPN software in business environment.
Enabling a Scientific Cloud Marketplace: VGL (Invited)

NASA Astrophysics Data System (ADS)

Fraser, R.; Woodcock, R.; Wyborn, L. A.; Vote, J.; Rankine, T.; Cox, S. J.

2013-12-01

The Virtual Geophysics Laboratory (VGL) provides a flexible, web based environment where researchers can browse data and use a variety of scientific software packaged into tool kits that run in the Cloud. Both data and tool kits are published by multiple researchers and registered with the VGL infrastructure forming a data and application marketplace. The VGL provides the basic work flow of Discovery and Access to the disparate data sources and a Library for tool kits and scripting to drive the scientific codes. Computation is then performed on the Research or Commercial Clouds. Provenance information is collected throughout the work flow and can be published alongside the results allowing for experiment comparison and sharing with other researchers. VGL's "mix and match" approach to data, computational resources and scientific codes, enables a dynamic approach to scientific collaboration. VGL allows scientists to publish their specific contribution, be it data, code, compute or work flow, knowing the VGL framework will provide other components needed for a complete application. Other scientists can choose the pieces that suit them best to assemble an experiment. The coarse grain workflow of the VGL framework combined with the flexibility of the scripting library and computational toolkits allows for significant customisation and sharing amongst the community. The VGL utilises the cloud computational and storage resources from the Australian academic research cloud provided by the NeCTAR initiative and a large variety of data accessible from national and state agencies via the Spatial Information Services Stack (SISS - http://siss.auscope.org). VGL v1.2 screenshot - http://vgl.auscope.org
Open source bioimage informatics for cell biology

PubMed Central

Swedlow, Jason R.; Eliceiri, Kevin W.

2009-01-01

Significant technical advances in imaging, molecular biology and genomics have fueled a revolution in cell biology, in that the molecular and structural processes of the cell are now visualized and measured routinely. Driving much of this recent development has been the advent of computational tools for the acquisition, visualization, analysis and dissemination of these datasets. These tools collectively make up a new subfield of computational biology called bioimage informatics, which is facilitated by open source approaches. We discuss why open source tools for image informatics in cell biology are needed, some of the key general attributes of what make an open source imaging application successful, and point to opportunities for further operability that should greatly accelerate future cell biology discovery. PMID:19833518
SymPy: Symbolic computing in python

DOE PAGES

Meurer, Aaron; Smith, Christopher P.; Paprocki, Mateusz; ...

2017-01-02

Here, SymPy is a full featured computer algebra system (CAS) written in the Python programming language. It is open source, being licensed under the extremely permissive 3-clause BSD license. SymPy was started by Ondrej Certik in 2005, and it has since grown into a large open source project, with over 500 contributors.
Arkas: Rapid reproducible RNAseq analysis

PubMed Central

Colombo, Anthony R.; J. Triche Jr, Timothy; Ramsingh, Giridharan

2017-01-01

The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments. We offer cloud-scale RNAseq pipelines Arkas-Quantification, and Arkas-Analysis available within Illumina’s BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways . Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing. Arkas-Quantification deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace Sequence Read Archive (SRA) import/conversion application titled SRA Import. Arkas-Analysis annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The Arkas cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the SRA Import facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps. PMID:28868134
Cloud Computing for radiologists.

PubMed

Kharat, Amit T; Safvi, Amjad; Thind, Ss; Singh, Amarjit

2012-07-01

Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future.
Cloud Computing for radiologists

PubMed Central

Kharat, Amit T; Safvi, Amjad; Thind, SS; Singh, Amarjit

2012-01-01

Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future. PMID:23599560
Comparison of the different approaches to generate holograms from data acquired with a Kinect sensor

NASA Astrophysics Data System (ADS)

Kang, Ji-Hoon; Leportier, Thibault; Ju, Byeong-Kwon; Song, Jin Dong; Lee, Kwang-Hoon; Park, Min-Chul

2017-05-01

Data of real scenes acquired in real-time with a Kinect sensor can be processed with different approaches to generate a hologram. 3D models can be generated from a point cloud or a mesh representation. The advantage of the point cloud approach is that computation process is well established since it involves only diffraction and propagation of point sources between parallel planes. On the other hand, the mesh representation enables to reduce the number of elements necessary to represent the object. Then, even though the computation time for the contribution of a single element increases compared to a simple point, the total computation time can be reduced significantly. However, the algorithm is more complex since propagation of elemental polygons between non-parallel planes should be implemented. Finally, since a depth map of the scene is acquired at the same time than the intensity image, a depth layer approach can also be adopted. This technique is appropriate for a fast computation since propagation of an optical wavefront from one plane to another can be handled efficiently with the fast Fourier transform. Fast computation with depth layer approach is convenient for real time applications, but point cloud method is more appropriate when high resolution is needed. In this study, since Kinect can be used to obtain both point cloud and depth map, we examine the different approaches that can be adopted for hologram computation and compare their performance.
The Impact and Promise of Open-Source Computational Material for Physics Teaching

NASA Astrophysics Data System (ADS)

Christian, Wolfgang

2017-01-01

A computer-based modeling approach to teaching must be flexible because students and teachers have different skills and varying levels of preparation. Learning how to run the ``software du jour'' is not the objective for integrating computational physics material into the curriculum. Learning computational thinking, how to use computation and computer-based visualization to communicate ideas, how to design and build models, and how to use ready-to-run models to foster critical thinking is the objective. Our computational modeling approach to teaching is a research-proven pedagogy that predates computers. It attempts to enhance student achievement through the Modeling Cycle. This approach was pioneered by Robert Karplus and the SCIS Project in the 1960s and 70s and later extended by the Modeling Instruction Program led by Jane Jackson and David Hestenes at Arizona State University. This talk describes a no-cost open-source computational approach aligned with a Modeling Cycle pedagogy. Our tools, curricular material, and ready-to-run examples are freely available from the Open Source Physics Collection hosted on the AAPT-ComPADRE digital library. Examples will be presented.
Learner Ownership of Technology-Enhanced Learning

ERIC Educational Resources Information Center

Dommett, Eleanor J.

2018-01-01

Purpose: This paper aims to examine the different ways in which learners may have ownership over technology-enhanced learning by reflecting on technical, legal and psychological ownership. Design/methodology/approach: The paper uses a variety of examples of technology-enhanced learning ranging from open-source software to cloud storage to discuss…
Progress on the FabrIc for Frontier Experiments project at Fermilab

DOE PAGES

Box, Dennis; Boyd, Joseph; Dykstra, Dave; ...

2015-12-23

The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercialmore » cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. Hence, the progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide« less
Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

PubMed Central

Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.

2015-01-01

Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

Analytical optical scattering in clouds

NASA Technical Reports Server (NTRS)

Phanord, Dieudonne D.

1989-01-01

An analytical optical model for scattering of light due to lightning by clouds of different geometry is being developed. The self-consistent approach and the equivalent medium concept of Twersky was used to treat the case corresponding to outside illumination. Thus, the resulting multiple scattering problem is transformed with the knowledge of the bulk parameters, into scattering by a single obstacle in isolation. Based on the size parameter of a typical water droplet as compared to the incident wave length, the problem for the single scatterer equivalent to the distribution of cloud particles can be solved either by Mie or Rayleigh scattering theory. The super computing code of Wiscombe can be used immediately to produce results that can be compared to the Monte Carlo computer simulation for outside incidence. A fairly reasonable inverse approach using the solution of the outside illumination case was proposed to model analytically the situation for point sources located inside the thick optical cloud. Its mathematical details are still being investigated. When finished, it will provide scientists an enhanced capability to study more realistic clouds. For testing purposes, the direct approach to the inside illumination of clouds by lightning is under consideration. Presently, an analytical solution for the cubic cloud will soon be obtained. For cylindrical or spherical clouds, preliminary results are needed for scattering by bounded obstacles above or below a penetrable surface interface.
The ISB Cancer Genomics Cloud: A Flexible Cloud-Based Platform for Cancer Genomics Research.

PubMed

Reynolds, Sheila M; Miller, Michael; Lee, Phyliss; Leinonen, Kalle; Paquette, Suzanne M; Rodebaugh, Zack; Hahn, Abigail; Gibbs, David L; Slagel, Joseph; Longabaugh, William J; Dhankani, Varsha; Reyes, Madelyn; Pihl, Todd; Backus, Mark; Bookman, Matthew; Deflaux, Nicole; Bingham, Jonathan; Pot, David; Shmulevich, Ilya

2017-11-01

The ISB Cancer Genomics Cloud (ISB-CGC) is one of three pilot projects funded by the National Cancer Institute to explore new approaches to computing on large cancer datasets in a cloud environment. With a focus on Data as a Service, the ISB-CGC offers multiple avenues for accessing and analyzing The Cancer Genome Atlas, TARGET, and other important references such as GENCODE and COSMIC using the Google Cloud Platform. The open approach allows researchers to choose approaches best suited to the task at hand: from analyzing terabytes of data using complex workflows to developing new analysis methods in common languages such as Python, R, and SQL; to using an interactive web application to create synthetic patient cohorts and to explore the wealth of available genomic data. Links to resources and documentation can be found at www.isb-cgc.org Cancer Res; 77(21); e7-10. ©2017 AACR . ©2017 American Association for Cancer Research.
Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics

PubMed Central

Deutsch, Eric W.; Mendoza, Luis; Shteynberg, David; Slagel, Joseph; Sun, Zhi; Moritz, Robert L.

2015-01-01

Democratization of genomics technologies has enabled the rapid determination of genotypes. More recently the democratization of comprehensive proteomics technologies is enabling the determination of the cellular phenotype and the molecular events that define its dynamic state. Core proteomic technologies include mass spectrometry to define protein sequence, protein:protein interactions, and protein post-translational modifications. Key enabling technologies for proteomics are bioinformatic pipelines to identify, quantitate, and summarize these events. The Trans-Proteomics Pipeline (TPP) is a robust open-source standardized data processing pipeline for large-scale reproducible quantitative mass spectrometry proteomics. It supports all major operating systems and instrument vendors via open data formats. Here we provide a review of the overall proteomics workflow supported by the TPP, its major tools, and how it can be used in its various modes from desktop to cloud computing. We describe new features for the TPP, including data visualization functionality. We conclude by describing some common perils that affect the analysis of tandem mass spectrometry datasets, as well as some major upcoming features. PMID:25631240
Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics.

PubMed

Deutsch, Eric W; Mendoza, Luis; Shteynberg, David; Slagel, Joseph; Sun, Zhi; Moritz, Robert L

2015-08-01

Democratization of genomics technologies has enabled the rapid determination of genotypes. More recently the democratization of comprehensive proteomics technologies is enabling the determination of the cellular phenotype and the molecular events that define its dynamic state. Core proteomic technologies include MS to define protein sequence, protein:protein interactions, and protein PTMs. Key enabling technologies for proteomics are bioinformatic pipelines to identify, quantitate, and summarize these events. The Trans-Proteomics Pipeline (TPP) is a robust open-source standardized data processing pipeline for large-scale reproducible quantitative MS proteomics. It supports all major operating systems and instrument vendors via open data formats. Here, we provide a review of the overall proteomics workflow supported by the TPP, its major tools, and how it can be used in its various modes from desktop to cloud computing. We describe new features for the TPP, including data visualization functionality. We conclude by describing some common perils that affect the analysis of MS/MS datasets, as well as some major upcoming features. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Uncover the Cloud for Geospatial Sciences and Applications to Adopt Cloud Computing

NASA Astrophysics Data System (ADS)

Yang, C.; Huang, Q.; Xia, J.; Liu, K.; Li, J.; Xu, C.; Sun, M.; Bambacus, M.; Xu, Y.; Fay, D.

2012-12-01

Cloud computing is emerging as the future infrastructure for providing computing resources to support and enable scientific research, engineering development, and application construction, as well as work force education. On the other hand, there is a lot of doubt about the readiness of cloud computing to support a variety of scientific research, development and educations. This research is a project funded by NASA SMD to investigate through holistic studies how ready is the cloud computing to support geosciences. Four applications with different computing characteristics including data, computing, concurrent, and spatiotemporal intensities are taken to test the readiness of cloud computing to support geosciences. Three popular and representative cloud platforms including Amazon EC2, Microsoft Azure, and NASA Nebula as well as a traditional cluster are utilized in the study. Results illustrates that cloud is ready to some degree but more research needs to be done to fully implemented the cloud benefit as advertised by many vendors and defined by NIST. Specifically, 1) most cloud platform could help stand up new computing instances, a new computer, in a few minutes as envisioned, therefore, is ready to support most computing needs in an on demand fashion; 2) the load balance and elasticity, a defining characteristic, is ready in some cloud platforms, such as Amazon EC2, to support bigger jobs, e.g., needs response in minutes, while some are not ready to support the elasticity and load balance well. All cloud platform needs further research and development to support real time application at subminute level; 3) the user interface and functionality of cloud platforms vary a lot and some of them are very professional and well supported/documented, such as Amazon EC2, some of them needs significant improvement for the general public to adopt cloud computing without professional training or knowledge about computing infrastructure; 4) the security is a big concern in cloud computing platform, with the sharing spirit of cloud computing, it is very hard to ensure higher level security, except a private cloud is built for a specific organization without public access, public cloud platform does not support FISMA medium level yet and may never be able to support FISMA high level; 5) HPC jobs needs of cloud computing is not well supported and only Amazon EC2 supports this well. The research is being taken by NASA and other agencies to consider cloud computing adoption. We hope the publication of the research would also benefit the public to adopt cloud computing.
Virtual pools for interactive analysis and software development through an integrated Cloud environment

NASA Astrophysics Data System (ADS)

Grandi, C.; Italiano, A.; Salomoni, D.; Calabrese Melcarne, A. K.

2011-12-01

WNoDeS, an acronym for Worker Nodes on Demand Service, is software developed at CNAF-Tier1, the National Computing Centre of the Italian Institute for Nuclear Physics (INFN) located in Bologna. WNoDeS provides on demand, integrated access to both Grid and Cloud resources through virtualization technologies. Besides the traditional use of computing resources in batch mode, users need to have interactive and local access to a number of systems. WNoDeS can dynamically select these computers instantiating Virtual Machines, according to the requirements (computing, storage and network resources) of users through either the Open Cloud Computing Interface API, or through a web console. An interactive use is usually limited to activities in user space, i.e. where the machine configuration is not modified. In some other instances the activity concerns development and testing of services and thus implies the modification of the system configuration (and, therefore, root-access to the resource). The former use case is a simple extension of the WNoDeS approach, where the resource is provided in interactive mode. The latter implies saving the virtual image at the end of each user session so that it can be presented to the user at subsequent requests. This work describes how the LHC experiments at INFN-Bologna are testing and making use of these dynamically created ad-hoc machines via WNoDeS to support flexible, interactive analysis and software development at the INFN Tier-1 Computing Centre.
Evaluation of Arctic Clouds And Their Response to External Forcing in Climate Models

NASA Astrophysics Data System (ADS)

Wang, Y.; Jiang, J. H.; Ming, Y.; Su, H.; Yung, Y. L.

2017-12-01

A warming Arctic is undergoing significant environmental changes, mostly evidenced by the reduction in Arctic sea-ice extent (SIE). However, the role of Arctic clouds in determining the sea ice melting remains elusive, as different phases of clouds can induce either positive or negative radiative forcing in different seasons. The possible cloud feedbacks following the opened ocean surface are also debatable due to variations of polar boundary structure. Therefore, Arctic cloud simulation has long been considered as the largest source of uncertainty in the climate sensitivity assessment. Other local or remote atmospheric factors, such as poleward moisture and heat transport as well as atmospheric aerosols seeding liquid and ice clouds, further complicate our understanding of the Arctic cloud change. Our recent efforts focus on the post-CMIP5 and CMIP6 models, which improve atmospheric compositions, cloud macro- and microphysics, convection parameterizations, etc. In this study, we utilize long-term satellite measurements with high-resolution coverage and broad wavelength spectrum to evaluate the mean states and variations of mixed-phase clouds in the Arctic, along with the concurrent moisture and SIE measurements. The model sensitivity experiments to understand external perturbations on the atmosphere-cryosphere coupling in the Arctic will be presented.
Elastic extension of a local analysis facility on external clouds for the LHC experiments

NASA Astrophysics Data System (ADS)

Ciaschini, V.; Codispoti, G.; Rinaldi, L.; Aiftimiei, D. C.; Bonacorsi, D.; Calligola, P.; Dal Pra, S.; De Girolamo, D.; Di Maria, R.; Grandi, C.; Michelotto, D.; Panella, M.; Taneja, S.; Semeria, F.

2017-10-01

The computing infrastructures serving the LHC experiments have been designed to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, the LHC experiments are exploring the opportunity to access Cloud resources provided by external partners or commercial providers. In this work we present the proof of concept of the elastic extension of a local analysis facility, specifically the Bologna Tier-3 Grid site, for the LHC experiments hosted at the site, on an external OpenStack infrastructure. We focus on the Cloud Bursting of the Grid site using DynFarm, a newly designed tool that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on an OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage.
Biotic games and cloud experimentation as novel media for biophysics education

NASA Astrophysics Data System (ADS)

Riedel-Kruse, Ingmar; Blikstein, Paulo

2014-03-01

First-hand, open-ended experimentation is key for effective formal and informal biophysics education. We developed, tested and assessed multiple new platforms that enable students and children to directly interact with and learn about microscopic biophysical processes: (1) Biotic games that enable local and online play using galvano- and photo-tactic stimulation of micro-swimmers, illustrating concepts such as biased random walks, Low Reynolds number hydrodynamics, and Brownian motion; (2) an undergraduate course where students learn optics, electronics, micro-fluidics, real time image analysis, and instrument control by building biotic games; and (3) a graduate class on the biophysics of multi-cellular systems that contains a cloud experimentation lab enabling students to execute open-ended chemotaxis experiments on slimemolds online, analyze their data, and build biophysical models. Our work aims to generate the equivalent excitement and educational impact for biophysics as robotics and video games have had for mechatronics and computer science, respectively. We also discuss how scaled-up cloud experimentation systems can support MOOCs with true lab components and life-science research in general.
"helix Nebula - the Science Cloud", a European Science Driven Cross-Domain Initiative Implemented in via AN Active Ppp Set-Up

NASA Astrophysics Data System (ADS)

Lengert, W.; Mondon, E.; Bégin, M. E.; Ferrer, M.; Vallois, F.; DelaMar, J.

2015-12-01

Helix Nebula, a European science cross-domain initiative building on an active PPP, is aiming to implement the concept of an open science commons[1] while using a cloud hybrid model[2] as the proposed implementation solution. This approach allows leveraging and merging of complementary data intensive Earth Science disciplines (e.g. instrumentation[3] and modeling), without introducing significant changes in the contributors' operational set-up. Considering the seamless integration with life-science (e.g. EMBL), scientific exploitation of meteorological, climate, and Earth Observation data and models open an enormous potential for new big data science. The work of Helix Nebula has shown that is it feasible to interoperate publicly funded infrastructures, such as EGI [5] and GEANT [6], with commercial cloud services. Such hybrid systems are in the interest of the existing users of publicly funded infrastructures and funding agencies because they will provide "freedom and choice" over the type of computing resources to be consumed and the manner in which they can be obtained. But to offer such freedom and choice across a spectrum of suppliers, various issues such as intellectual property, legal responsibility, service quality agreements and related issues need to be addressed. Finding solutions to these issues is one of the goals of the Helix Nebula initiative. [1] http://www.egi.eu/news-and-media/publications/OpenScienceCommons_v3.pdf [2] http://www.helix-nebula.eu/events/towards-the-european-open-science-cloud [3] e.g. https://sentinel.esa.int/web/sentinel/sentinel-data-access [5] http://www.egi.eu/ [6] http://www.geant.net/
E-Learning 3.0 = E-Learning 2.0 + Web 3.0?

ERIC Educational Resources Information Center

Hussain, Fehmida

2012-01-01

Web 3.0, termed as the semantic web or the web of data is the transformed version of Web 2.0 with technologies and functionalities such as intelligent collaborative filtering, cloud computing, big data, linked data, openness, interoperability and smart mobility. If Web 2.0 is about social networking and mass collaboration between the creator and…
Open Courseware in Design and Planning Education and Utilization of Distance Education Opportunity: Anadolu University Experience

ERIC Educational Resources Information Center

Halac, Hicran Hanim; Cabuk, Alper

2013-01-01

Depending on the evolving technological possibilities, distance and online education applications have gradually gained more significance in the education system. Regarding the issues, such as advancements in the server services, disc capacity, cloud computing opportunities resulting from the increase in the number of the broadband internet users,…
Toward low-cloud-permitting cloud superparameterization with explicit boundary layer turbulence

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parishani, Hossein; Pritchard, Michael S.; Bretherton, Christopher S.

Systematic biases in the representation of boundary layer (BL) clouds are a leading source of uncertainty in climate projections. A variation on superparameterization (SP) called “ultraparameterization” (UP) is developed, in which the grid spacing of the cloud-resolving models (CRMs) is fine enough (250 × 20 m) to explicitly capture the BL turbulence, associated clouds, and entrainment in a global climate model capable of multiyear simulations. UP is implemented within the Community Atmosphere Model using 2° resolution (~14,000 embedded CRMs) with one-moment microphysics. By using a small domain and mean-state acceleration, UP is computationally feasible today and promising for exascale computers.more » Short-duration global UP hindcasts are compared with SP and satellite observations of top-of-atmosphere radiation and cloud vertical structure. The most encouraging improvement is a deeper BL and more realistic vertical structure of subtropical stratocumulus (Sc) clouds, due to stronger vertical eddy motions that promote entrainment. Results from 90 day integrations show climatological errors that are competitive with SP, with a significant improvement in the diurnal cycle of offshore Sc liquid water. Ongoing concerns with the current UP implementation include a dim bias for near-coastal Sc that also occurs less prominently in SP and a bright bias over tropical continental deep convection zones. Nevertheless, UP makes global eddy-permitting simulation a feasible and interesting alternative to conventionally parameterized GCMs or SP-GCMs with turbulence parameterizations for studying BL cloud-climate and cloud-aerosol feedback.« less
Toward low-cloud-permitting cloud superparameterization with explicit boundary layer turbulence

DOE PAGES

Parishani, Hossein; Pritchard, Michael S.; Bretherton, Christopher S.; ...

2017-06-19

Systematic biases in the representation of boundary layer (BL) clouds are a leading source of uncertainty in climate projections. A variation on superparameterization (SP) called “ultraparameterization” (UP) is developed, in which the grid spacing of the cloud-resolving models (CRMs) is fine enough (250 × 20 m) to explicitly capture the BL turbulence, associated clouds, and entrainment in a global climate model capable of multiyear simulations. UP is implemented within the Community Atmosphere Model using 2° resolution (~14,000 embedded CRMs) with one-moment microphysics. By using a small domain and mean-state acceleration, UP is computationally feasible today and promising for exascale computers.more » Short-duration global UP hindcasts are compared with SP and satellite observations of top-of-atmosphere radiation and cloud vertical structure. The most encouraging improvement is a deeper BL and more realistic vertical structure of subtropical stratocumulus (Sc) clouds, due to stronger vertical eddy motions that promote entrainment. Results from 90 day integrations show climatological errors that are competitive with SP, with a significant improvement in the diurnal cycle of offshore Sc liquid water. Ongoing concerns with the current UP implementation include a dim bias for near-coastal Sc that also occurs less prominently in SP and a bright bias over tropical continental deep convection zones. Nevertheless, UP makes global eddy-permitting simulation a feasible and interesting alternative to conventionally parameterized GCMs or SP-GCMs with turbulence parameterizations for studying BL cloud-climate and cloud-aerosol feedback.« less
Toward low-cloud-permitting cloud superparameterization with explicit boundary layer turbulence

NASA Astrophysics Data System (ADS)

Parishani, Hossein; Pritchard, Michael S.; Bretherton, Christopher S.; Wyant, Matthew C.; Khairoutdinov, Marat

2017-07-01

Systematic biases in the representation of boundary layer (BL) clouds are a leading source of uncertainty in climate projections. A variation on superparameterization (SP) called "ultraparameterization" (UP) is developed, in which the grid spacing of the cloud-resolving models (CRMs) is fine enough (250 × 20 m) to explicitly capture the BL turbulence, associated clouds, and entrainment in a global climate model capable of multiyear simulations. UP is implemented within the Community Atmosphere Model using 2° resolution (˜14,000 embedded CRMs) with one-moment microphysics. By using a small domain and mean-state acceleration, UP is computationally feasible today and promising for exascale computers. Short-duration global UP hindcasts are compared with SP and satellite observations of top-of-atmosphere radiation and cloud vertical structure. The most encouraging improvement is a deeper BL and more realistic vertical structure of subtropical stratocumulus (Sc) clouds, due to stronger vertical eddy motions that promote entrainment. Results from 90 day integrations show climatological errors that are competitive with SP, with a significant improvement in the diurnal cycle of offshore Sc liquid water. Ongoing concerns with the current UP implementation include a dim bias for near-coastal Sc that also occurs less prominently in SP and a bright bias over tropical continental deep convection zones. Nevertheless, UP makes global eddy-permitting simulation a feasible and interesting alternative to conventionally parameterized GCMs or SP-GCMs with turbulence parameterizations for studying BL cloud-climate and cloud-aerosol feedback.
Do Clouds Compute? A Framework for Estimating the Value of Cloud Computing

NASA Astrophysics Data System (ADS)

Klems, Markus; Nimis, Jens; Tai, Stefan

On-demand provisioning of scalable and reliable compute services, along with a cost model that charges consumers based on actual service usage, has been an objective in distributed computing research and industry for a while. Cloud Computing promises to deliver on this objective: consumers are able to rent infrastructure in the Cloud as needed, deploy applications and store data, and access them via Web protocols on a pay-per-use basis. The acceptance of Cloud Computing, however, depends on the ability for Cloud Computing providers and consumers to implement a model for business value co-creation. Therefore, a systematic approach to measure costs and benefits of Cloud Computing is needed. In this paper, we discuss the need for valuation of Cloud Computing, identify key components, and structure these components in a framework. The framework assists decision makers in estimating Cloud Computing costs and to compare these costs to conventional IT solutions. We demonstrate by means of representative use cases how our framework can be applied to real world scenarios.
Cloud Computing and Its Applications in GIS

NASA Astrophysics Data System (ADS)

Kang, Cao

2011-12-01

Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
Commissioning the CERN IT Agile Infrastructure with experiment workloads

NASA Astrophysics Data System (ADS)

Medrano Llamas, Ramón; Harald Barreiro Megino, Fernando; Kucharczyk, Katarzyna; Kamil Denis, Marek; Cinquilli, Mattia

2014-06-01

In order to ease the management of their infrastructure, most of the WLCG sites are adopting cloud based strategies. In the case of CERN, the Tier 0 of the WLCG, is completely restructuring the resource and configuration management of their computing center under the codename Agile Infrastructure. Its goal is to manage 15,000 Virtual Machines by means of an OpenStack middleware in order to unify all the resources in CERN's two datacenters: the one placed in Meyrin and the new on in Wigner, Hungary. During the commissioning of this infrastructure, CERN IT is offering an attractive amount of computing resources to the experiments (800 cores for ATLAS and CMS) through a private cloud interface. ATLAS and CMS have joined forces to exploit them by running stress tests and simulation workloads since November 2012. This work will describe the experience of the first deployments of the current experiment workloads on the CERN private cloud testbed. The paper is organized as follows: the first section will explain the integration of the experiment workload management systems (WMS) with the cloud resources. The second section will revisit the performance and stress testing performed with HammerCloud in order to evaluate and compare the suitability for the experiment workloads. The third section will go deeper into the dynamic provisioning techniques, such as the use of the cloud APIs directly by the WMS. The paper finishes with a review of the conclusions and the challenges ahead.
Human genome and open source: balancing ethics and business.

PubMed

Marturano, Antonio

2011-01-01

The Human Genome Project has been completed thanks to a massive use of computer techniques, as well as the adoption of the open-source business and research model by the scientists involved. This model won over the proprietary model and allowed a quick propagation and feedback of research results among peers. In this paper, the author will analyse some ethical and legal issues emerging by the use of such computer model in the Human Genome property rights. The author will argue that the Open Source is the best business model, as it is able to balance business and human rights perspectives.
IBM Cloud Computing Powering a Smarter Planet

NASA Astrophysics Data System (ADS)

Zhu, Jinzy; Fang, Xing; Guo, Zhe; Niu, Meng Hua; Cao, Fan; Yue, Shuang; Liu, Qin Yu

With increasing need for intelligent systems supporting the world's businesses, Cloud Computing has emerged as a dominant trend to provide a dynamic infrastructure to make such intelligence possible. The article introduced how to build a smarter planet with cloud computing technology. First, it introduced why we need cloud, and the evolution of cloud technology. Secondly, it analyzed the value of cloud computing and how to apply cloud technology. Finally, it predicted the future of cloud in the smarter planet.

Molgenis-impute: imputation pipeline in a box.

PubMed

Kanterakis, Alexandros; Deelen, Patrick; van Dijk, Freerk; Byelas, Heorhiy; Dijkstra, Martijn; Swertz, Morris A

2015-08-19

Genotype imputation is an important procedure in current genomic analysis such as genome-wide association studies, meta-analyses and fine mapping. Although high quality tools are available that perform the steps of this process, considerable effort and expertise is required to set up and run a best practice imputation pipeline, particularly for larger genotype datasets, where imputation has to scale out in parallel on computer clusters. Here we present MOLGENIS-impute, an 'imputation in a box' solution that seamlessly and transparently automates the set up and running of all the steps of the imputation process. These steps include genome build liftover (liftovering), genotype phasing with SHAPEIT2, quality control, sample and chromosomal chunking/merging, and imputation with IMPUTE2. MOLGENIS-impute builds on MOLGENIS-compute, a simple pipeline management platform for submission and monitoring of bioinformatics tasks in High Performance Computing (HPC) environments like local/cloud servers, clusters and grids. All the required tools, data and scripts are downloaded and installed in a single step. Researchers with diverse backgrounds and expertise have tested MOLGENIS-impute on different locations and imputed over 30,000 samples so far using the 1,000 Genomes Project and new Genome of the Netherlands data as the imputation reference. The tests have been performed on PBS/SGE clusters, cloud VMs and in a grid HPC environment. MOLGENIS-impute gives priority to the ease of setting up, configuring and running an imputation. It has minimal dependencies and wraps the pipeline in a simple command line interface, without sacrificing flexibility to adapt or limiting the options of underlying imputation tools. It does not require knowledge of a workflow system or programming, and is targeted at researchers who just want to apply best practices in imputation via simple commands. It is built on the MOLGENIS compute workflow framework to enable customization with additional computational steps or it can be included in other bioinformatics pipelines. It is available as open source from: https://github.com/molgenis/molgenis-imputation.
Cloud Computing Security Issue: Survey

NASA Astrophysics Data System (ADS)

Kamal, Shailza; Kaur, Rajpreet

2011-12-01

Cloud computing is the growing field in IT industry since 2007 proposed by IBM. Another company like Google, Amazon, and Microsoft provides further products to cloud computing. The cloud computing is the internet based computing that shared recourses, information on demand. It provides the services like SaaS, IaaS and PaaS. The services and recourses are shared by virtualization that run multiple operation applications on cloud computing. This discussion gives the survey on the challenges on security issues during cloud computing and describes some standards and protocols that presents how security can be managed.
T-Check in System-of-Systems Technologies: Cloud Computing

DTIC Science & Technology

2010-09-01

T-Check in System-of-Systems Technologies: Cloud Computing Harrison D. Strowd Grace A. Lewis September 2010 TECHNICAL NOTE CMU/SEI-2010... Cloud Computing 1 1.2 Types of Cloud Computing 2 1.3 Drivers and Barriers to Cloud Computing Adoption 5 2 Using the T-Check Method 7 2.1 T-Check...Hypothesis 3 25 3.4.2 Deployment View of the Solution for Testing Hypothesis 3 27 3.5 Selecting Cloud Computing Providers 30 3.6 Implementing the T-Check
Identity federation in OpenStack - an introduction to hybrid clouds

NASA Astrophysics Data System (ADS)

Denis, Marek; Castro Leon, Jose; Ormancey, Emmanuel; Tedesco, Paolo

2015-12-01

We are evaluating cloud identity federation available in the OpenStack ecosystem that allows for on premise bursting into remote clouds with use of local identities (i.e. domain accounts). Further enhancements to identity federation are a clear way to hybrid cloud architectures - virtualized infrastructures layered across independent private and public clouds.
Influence of Ice Cloud Microphysics on Imager-Based Estimates of Earth's Radiation Budget

NASA Astrophysics Data System (ADS)

Loeb, N. G.; Kato, S.; Minnis, P.; Yang, P.; Sun-Mack, S.; Rose, F. G.; Hong, G.; Ham, S. H.

2016-12-01

A central objective of the Clouds and the Earth's Radiant Energy System (CERES) is to produce a long-term global climate data record of Earth's radiation budget from the TOA down to the surface along with the associated atmospheric and surface properties that influence it. CERES relies on a number of data sources, including broadband radiometers measuring incoming and reflected solar radiation and OLR, high-resolution spectral imagers, meteorological, aerosol and ozone assimilation data, and snow/sea-ice maps based on microwave radiometer data. While the TOA radiation budget is largely determined directly from accurate broadband radiometer measurements, the surface radiation budget is derived indirectly through radiative transfer model calculations initialized using imager-based cloud and aerosol retrievals and meteorological assimilation data. Because ice cloud particles exhibit a wide range of shapes, sizes and habits that cannot be independently retrieved a priori from passive visible/infrared imager measurements, assumptions about the scattering properties of ice clouds are necessary in order to retrieve ice cloud optical properties (e.g., optical depth) from imager radiances and to compute broadband radiative fluxes. This presentation will examine how the choice of an ice cloud particle model impacts computed shortwave (SW) radiative fluxes at the top-of-atmosphere (TOA) and surface. The ice cloud particle models considered correspond to those from prior, current and future CERES data product versions. During the CERES Edition2 (and Edition3) processing, ice cloud particles were assumed to be smooth hexagonal columns. In the Edition4, roughened hexagonal columns are assumed. The CERES team is now working on implementing in a future version an ice cloud particle model comprised of a two-habit ice cloud model consisting of roughened hexagonal columns and aggregates of roughened columnar elements. In each case, we use the same ice particle model in both the imager-based cloud retrievals (inverse problem) and the computed radiative fluxes (forward calculation). In addition to comparing radiative fluxes using the different ice cloud particle models, we also compare instantaneous TOA flux calculations with those observed by the CERES instrument.
Information Security: Governmentwide Guidance Needed to Assist Agencies in Implementing Cloud Computing

DTIC Science & Technology

2010-07-01

Cloud computing , an emerging form of computing in which users have access to scalable, on-demand capabilities that are provided through Internet... cloud computing , (2) the information security implications of using cloud computing services in the Federal Government, and (3) federal guidance and...efforts to address information security when using cloud computing . The complete report is titled Information Security: Federal Guidance Needed to
GES DISC Data Recipes in Jupyter Notebooks

NASA Astrophysics Data System (ADS)

Li, A.; Banavige, B.; Garimella, K.; Rice, J.; Shen, S.; Liu, Z.

2017-12-01

The Earth Science Data and Information System (ESDIS) Project manages twelve Distributed Active Archive Centers (DAACs) which are geographically dispersed across the United States. The DAACs are responsible for ingesting, processing, archiving, and distributing Earth science data produced from various sources (satellites, aircraft, field measurements, etc.). In response to projections of an exponential increase in data production, there has been a recent effort to prototype various DAAC activities in the cloud computing environment. This, in turn, led to the creation of an initiative, called the Cloud Analysis Toolkit to Enable Earth Science (CATEES), to develop a Python software package in order to transition Earth science data processing to the cloud. This project, in particular, supports CATEES and has two primary goals. One, transition data recipes created by the Goddard Earth Science Data and Information Service Center (GES DISC) DAAC into an interactive and educational environment using Jupyter Notebooks. Two, acclimate Earth scientists to cloud computing. To accomplish these goals, we create Jupyter Notebooks to compartmentalize the different steps of data analysis and help users obtain and parse data from the command line. We also develop a Docker container, comprised of Jupyter Notebooks, Python library dependencies, and command line tools, and configure it into an easy to deploy package. The end result is an end-to-end product that simulates the use case of end users working in the cloud computing environment.
Risk in the Clouds?: Security Issues Facing Government Use of Cloud Computing

NASA Astrophysics Data System (ADS)

Wyld, David C.

Cloud computing is poised to become one of the most important and fundamental shifts in how computing is consumed and used. Forecasts show that government will play a lead role in adopting cloud computing - for data storage, applications, and processing power, as IT executives seek to maximize their returns on limited procurement budgets in these challenging economic times. After an overview of the cloud computing concept, this article explores the security issues facing public sector use of cloud computing and looks to the risk and benefits of shifting to cloud-based models. It concludes with an analysis of the challenges that lie ahead for government use of cloud resources.
A Review Study on Cloud Computing Issues

NASA Astrophysics Data System (ADS)

Kanaan Kadhim, Qusay; Yusof, Robiah; Sadeq Mahdi, Hamid; Al-shami, Sayed Samer Ali; Rahayu Selamat, Siti

2018-05-01

Cloud computing is the most promising current implementation of utility computing in the business world, because it provides some key features over classic utility computing, such as elasticity to allow clients dynamically scale-up and scale-down the resources in execution time. Nevertheless, cloud computing is still in its premature stage and experiences lack of standardization. The security issues are the main challenges to cloud computing adoption. Thus, critical industries such as government organizations (ministries) are reluctant to trust cloud computing due to the fear of losing their sensitive data, as it resides on the cloud with no knowledge of data location and lack of transparency of Cloud Service Providers (CSPs) mechanisms used to secure their data and applications which have created a barrier against adopting this agile computing paradigm. This study aims to review and classify the issues that surround the implementation of cloud computing which a hot area that needs to be addressed by future research.
NeuronDepot: keeping your colleagues in sync by combining modern cloud storage services, the local file system, and simple web applications

PubMed Central

Rautenberg, Philipp L.; Kumaraswamy, Ajayrama; Tejero-Cantero, Alvaro; Doblander, Christoph; Norouzian, Mohammad R.; Kai, Kazuki; Jacobsen, Hans-Arno; Ai, Hiroyuki; Wachtler, Thomas; Ikeno, Hidetoshi

2014-01-01

Neuroscience today deals with a “data deluge” derived from the availability of high-throughput sensors of brain structure and brain activity, and increased computational resources for detailed simulations with complex output. We report here (1) a novel approach to data sharing between collaborating scientists that brings together file system tools and cloud technologies, (2) a service implementing this approach, called NeuronDepot, and (3) an example application of the service to a complex use case in the neurosciences. The main drivers for our approach are to facilitate collaborations with a transparent, automated data flow that shields scientists from having to learn new tools or data structuring paradigms. Using NeuronDepot is simple: one-time data assignment from the originator and cloud based syncing—thus making experimental and modeling data available across the collaboration with minimum overhead. Since data sharing is cloud based, our approach opens up the possibility of using new software developments and hardware scalabitliy which are associated with elastic cloud computing. We provide an implementation that relies on existing synchronization services and is usable from all devices via a reactive web interface. We are motivating our solution by solving the practical problems of the GinJang project, a collaboration of three universities across eight time zones with a complex workflow encompassing data from electrophysiological recordings, imaging, morphological reconstructions, and simulations. PMID:24971059
NeuronDepot: keeping your colleagues in sync by combining modern cloud storage services, the local file system, and simple web applications.

PubMed

Rautenberg, Philipp L; Kumaraswamy, Ajayrama; Tejero-Cantero, Alvaro; Doblander, Christoph; Norouzian, Mohammad R; Kai, Kazuki; Jacobsen, Hans-Arno; Ai, Hiroyuki; Wachtler, Thomas; Ikeno, Hidetoshi

2014-01-01

Neuroscience today deals with a "data deluge" derived from the availability of high-throughput sensors of brain structure and brain activity, and increased computational resources for detailed simulations with complex output. We report here (1) a novel approach to data sharing between collaborating scientists that brings together file system tools and cloud technologies, (2) a service implementing this approach, called NeuronDepot, and (3) an example application of the service to a complex use case in the neurosciences. The main drivers for our approach are to facilitate collaborations with a transparent, automated data flow that shields scientists from having to learn new tools or data structuring paradigms. Using NeuronDepot is simple: one-time data assignment from the originator and cloud based syncing-thus making experimental and modeling data available across the collaboration with minimum overhead. Since data sharing is cloud based, our approach opens up the possibility of using new software developments and hardware scalabitliy which are associated with elastic cloud computing. We provide an implementation that relies on existing synchronization services and is usable from all devices via a reactive web interface. We are motivating our solution by solving the practical problems of the GinJang project, a collaboration of three universities across eight time zones with a complex workflow encompassing data from electrophysiological recordings, imaging, morphological reconstructions, and simulations.
Pteros: fast and easy to use open-source C++ library for molecular analysis.

PubMed

Yesylevskyy, Semen O

2012-07-15

An open-source Pteros library for molecular modeling and analysis of molecular dynamics trajectories for C++ programming language is introduced. Pteros provides a number of routine analysis operations ranging from reading and writing trajectory files and geometry transformations to structural alignment and computation of nonbonded interaction energies. The library features asynchronous trajectory reading and parallel execution of several analysis routines, which greatly simplifies development of computationally intensive trajectory analysis algorithms. Pteros programming interface is very simple and intuitive while the source code is well documented and easily extendible. Pteros is available for free under open-source Artistic License from http://sourceforge.net/projects/pteros/. Copyright © 2012 Wiley Periodicals, Inc.
GeoChronos: An On-line Collaborative Platform for Earth Observation Scientists

NASA Astrophysics Data System (ADS)

Gamon, J. A.; Kiddle, C.; Curry, R.; Markatchev, N.; Zonta-Pastorello, G., Jr.; Rivard, B.; Sanchez-Azofeifa, G. A.; Simmonds, R.; Tan, T.

2009-12-01

Recent advances in cyberinfrastructure are offering new solutions to the growing challenges of managing and sharing large data volumes. Web 2.0 and social networking technologies, provide the means for scientists to collaborate and share information more effectively. Cloud computing technologies can provide scientists with transparent and on-demand access to applications served over the Internet in a dynamic and scalable manner. Semantic Web technologies allow for data to be linked together in a manner understandable by machines, enabling greater automation. Combining all of these technologies together can enable the creation of very powerful platforms. GeoChronos (http://geochronos.org/), part of a CANARIE Network Enabled Platforms project, is an online collaborative platform that incorporates these technologies to enable members of the earth observation science community to share data and scientific applications and to collaborate more effectively. The GeoChronos portal is built on an open source social networking platform called Elgg. Elgg provides a full set of social networking functionalities similar to Facebook including blogs, tags, media/document sharing, wikis, friends/contacts, groups, discussions, message boards, calendars, status, activity feeds and more. An underlying cloud computing infrastructure enables scientists to access dynamically provisioned applications via the portal for visualizing and analyzing data. Users are able to access and run the applications from any computer that has a Web browser and Internet connectivity and do not need to manage and maintain the applications themselves. Semantic Web Technologies, such as the Resource Description Framework (RDF) are being employed for relating and linking together spectral, satellite, meteorological and other data. Social networking functionality plays an integral part in facilitating the sharing of data and applications. Examples of recent GeoChronos users during the early testing phase have included the IAI International Wireless Sensor Networking Summer School at the University of Alberta, and the IAI Tropi-Dry community. Current GeoChronos activities include the development of a web-based spectral library and related analytical and visualization tools, in collaboration with members of the SpecNet community. The GeoChronos portal will be open to all members of the earth observation science community when the project nears completion at the end of 2010.
A Cloud Microphysics Model for the Gas Giant Planets

NASA Astrophysics Data System (ADS)

Palotai, Csaba J.; Le Beau, Raymond P.; Shankar, Ramanakumar; Flom, Abigail; Lashley, Jacob; McCabe, Tyler

2016-10-01

Recent studies have significantly increased the quality and the number of observed meteorological features on the jovian planets, revealing banded cloud structures and discrete features. Our current understanding of the formation and decay of those clouds also defines the conceptual modes about the underlying atmospheric dynamics. The full interpretation of the new observational data set and the related theories requires modeling these features in a general circulation model (GCM). Here, we present details of our bulk cloud microphysics model that was designed to simulate clouds in the Explicit Planetary Hybrid-Isentropic Coordinate (EPIC) GCM for the jovian planets. The cloud module includes hydrological cycles for each condensable species that consist of interactive vapor, cloud and precipitation phases and it also accounts for latent heating and cooling throughout the transfer processes (Palotai and Dowling, 2008. Icarus, 194, 303-326). Previously, the self-organizing clouds in our simulations successfully reproduced the vertical and horizontal ammonia cloud structure in the vicinity of Jupiter's Great Red Spot and Oval BA (Palotai et al. 2014, Icarus, 232, 141-156). In our recent work, we extended this model to include water clouds on Jupiter and Saturn, ammonia clouds on Saturn, and methane clouds on Uranus and Neptune. Details of our cloud parameterization scheme, our initial results and their comparison with observations will be shown. The latest version of EPIC model is available as open source software from NASA's PDS Atmospheres Node.
Accurate mobile malware detection and classification in the cloud.

PubMed

Wang, Xiaolei; Yang, Yuexiang; Zeng, Yingzhi

2015-01-01

As the dominator of the Smartphone operating system market, consequently android has attracted the attention of s malware authors and researcher alike. The number of types of android malware is increasing rapidly regardless of the considerable number of proposed malware analysis systems. In this paper, by taking advantages of low false-positive rate of misuse detection and the ability of anomaly detection to detect zero-day malware, we propose a novel hybrid detection system based on a new open-source framework CuckooDroid, which enables the use of Cuckoo Sandbox's features to analyze Android malware through dynamic and static analysis. Our proposed system mainly consists of two parts: anomaly detection engine performing abnormal apps detection through dynamic analysis; signature detection engine performing known malware detection and classification with the combination of static and dynamic analysis. We evaluate our system using 5560 malware samples and 6000 benign samples. Experiments show that our anomaly detection engine with dynamic analysis is capable of detecting zero-day malware with a low false negative rate (1.16 %) and acceptable false positive rate (1.30 %); it is worth noting that our signature detection engine with hybrid analysis can accurately classify malware samples with an average positive rate 98.94 %. Considering the intensive computing resources required by the static and dynamic analysis, our proposed detection system should be deployed off-device, such as in the Cloud. The app store markets and the ordinary users can access our detection system for malware detection through cloud service.
78 FR 54453 - Notice of Public Meeting-Intersection of Cloud Computing and Mobility Forum and Workshop

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-04

...--Intersection of Cloud Computing and Mobility Forum and Workshop AGENCY: National Institute of Standards and.../intersection-of-cloud-and-mobility.cfm . SUPPLEMENTARY INFORMATION: NIST hosted six prior Cloud Computing Forum... interoperability, portability, and security, discuss the Federal Government's experience with cloud computing...
Embracing the Cloud: Six Ways to Look at the Shift to Cloud Computing

ERIC Educational Resources Information Center

Ullman, David F.; Haggerty, Blake

2010-01-01

Cloud computing is the latest paradigm shift for the delivery of IT services. Where previous paradigms (centralized, decentralized, distributed) were based on fairly straightforward approaches to technology and its management, cloud computing is radical in comparison. The literature on cloud computing, however, suffers from many divergent…
BigDataScript: a scripting language for data pipelines.

PubMed

Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu

2015-01-01

The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. © The Author 2014. Published by Oxford University Press.
The GridPP DIRAC project - DIRAC for non-LHC communities

NASA Astrophysics Data System (ADS)

Bauer, D.; Colling, D.; Currie, R.; Fayer, S.; Huffman, A.; Martyniak, J.; Rand, D.; Richards, A.

2015-12-01

The GridPP consortium in the UK is currently testing a multi-VO DIRAC service aimed at non-LHC VOs. These VOs (Virtual Organisations) are typically small and generally do not have a dedicated computing support post. The majority of these represent particle physics experiments (e.g. NA62 and COMET), although the scope of the DIRAC service is not limited to this field. A few VOs have designed bespoke tools around the EMI-WMS & LFC, while others have so far eschewed distributed resources as they perceive the overhead for accessing them to be too high. The aim of the GridPP DIRAC project is to provide an easily adaptable toolkit for such VOs in order to lower the threshold for access to distributed resources such as Grid and cloud computing. As well as hosting a centrally run DIRAC service, we will also publish our changes and additions to the upstream DIRAC codebase under an open-source license. We report on the current status of this project and show increasing adoption of DIRAC within the non-LHC communities.
BigDataScript: a scripting language for data pipelines

PubMed Central

Cingolani, Pablo; Sladek, Rob; Blanchette, Mathieu

2015-01-01

Motivation: The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. Results: We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. Availability and implementation: BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. Contact: pablo.e.cingolani@gmail.com PMID:25189778

Cloud computing basics for librarians.

PubMed

Hoy, Matthew B

2012-01-01

"Cloud computing" is the name for the recent trend of moving software and computing resources to an online, shared-service model. This article briefly defines cloud computing, discusses different models, explores the advantages and disadvantages, and describes some of the ways cloud computing can be used in libraries. Examples of cloud services are included at the end of the article. Copyright © Taylor & Francis Group, LLC
A Novel College Network Resource Management Method using Cloud Computing

NASA Astrophysics Data System (ADS)

Lin, Chen

At present information construction of college mainly has construction of college networks and management information system; there are many problems during the process of information. Cloud computing is development of distributed processing, parallel processing and grid computing, which make data stored on the cloud, make software and services placed in the cloud and build on top of various standards and protocols, you can get it through all kinds of equipments. This article introduces cloud computing and function of cloud computing, then analyzes the exiting problems of college network resource management, the cloud computing technology and methods are applied in the construction of college information sharing platform.
Storage quality-of-service in cloud-based scientific environments: a standardization approach

NASA Astrophysics Data System (ADS)

Millar, Paul; Fuhrmann, Patrick; Hardt, Marcus; Ertl, Benjamin; Brzezniak, Maciej

2017-10-01

When preparing the Data Management Plan for larger scientific endeavors, PIs have to balance between the most appropriate qualities of storage space along the line of the planned data life-cycle, its price and the available funding. Storage properties can be the media type, implicitly determining access latency and durability of stored data, the number and locality of replicas, as well as available access protocols or authentication mechanisms. Negotiations between the scientific community and the responsible infrastructures generally happen upfront, where the amount of storage space, media types, like: disk, tape and SSD and the foreseeable data life-cycles are negotiated. With the introduction of cloud management platforms, both in computing and storage, resources can be brokered to achieve the best price per unit of a given quality. However, in order to allow the platform orchestrator to programmatically negotiate the most appropriate resources, a standard vocabulary for different properties of resources and a commonly agreed protocol to communicate those, has to be available. In order to agree on a basic vocabulary for storage space properties, the storage infrastructure group in INDIGO-DataCloud together with INDIGO-associated and external scientific groups, created a working group under the umbrella of the Research Data Alliance (RDA). As communication protocol, to query and negotiate storage qualities, the Cloud Data Management Interface (CDMI) has been selected. Necessary extensions to CDMI are defined in regular meetings between INDIGO and the Storage Network Industry Association (SNIA). Furthermore, INDIGO is contributing to the SNIA CDMI reference implementation as the basis for interfacing the various storage systems in INDIGO to the agreed protocol and to provide an official Open-Source skeleton for systems not being maintained by INDIGO partners.
Open source acceleration of wave optics simulations on energy efficient high-performance computing platforms

NASA Astrophysics Data System (ADS)

Beck, Jeffrey; Bos, Jeremy P.

2017-05-01

We compare several modifications to the open-source wave optics package, WavePy, intended to improve execution time. Specifically, we compare the relative performance of the Intel MKL, a CPU based OpenCV distribution, and GPU-based version. Performance is compared between distributions both on the same compute platform and between a fully-featured computing workstation and the NVIDIA Jetson TX1 platform. Comparisons are drawn in terms of both execution time and power consumption. We have found that substituting the Fast Fourier Transform operation from OpenCV provides a marked improvement on all platforms. In addition, we show that embedded platforms offer some possibility for extensive improvement in terms of efficiency compared to a fully featured workstation.
Sea spray as a source of ice nucleating particles - results from the AIDA Ocean03 campaign

NASA Astrophysics Data System (ADS)

Salter, M. E.; Ickes, L.; Adams, M.; Bierbauer, S.; Bilde, M.; Christiansen, S.; Ekman, A.; Gorokhova, E.; Höhler, K.; Kiselev, A. A.; Leck, C.; Mohr, C.; Mohler, O.; Murray, B. J.; Porter, G.; Ullrich, R.; Wagner, R.

2017-12-01

Clouds and their radiative effects are one of the major influences on the radiative fluxes in the atmosphere, but at the same time they remain the largest uncertainty in climate models. This lack of understanding is especially pronounced in the high Arctic. Summertime clouds can persist over long periods in this region, which is difficult to replicate in models based on our current understanding. The clouds most often encountered in the summertime high Arctic consist of a mixture of ice crystals and super-cooled water droplets, so-called mixed-phase clouds. This cloud type is sensitive to the availability of aerosol particles, which can act as cloud condensation nuclei and ice nuclei. However, since the high Arctic is a pristine region, aerosol particles are not very abundant, and the hypothesis of open leads in the Arctic as a potentially important source of cloud and ice nucleating particles via bubble bursting has emerged. In this context, we have conducted a series of experiments at the AIDA chamber at KIT, designed to investigate the mechanisms linking marine biology, seawater chemistry and aerosol physics/potential cloud impacts. During this campaign, two marine diatom species (Melosira arctica and Skeletonema marinoi) as well as sea surface microlayer samples collected during several Arctic Ocean research cruises were investigated. To aerosolize the samples, a variety of methods were used including a sea spray simulation chamber to mimic the process of bubble-bursting. The ice nucleating efficiency (mixed-phase cloud regime) of the samples was determined either directly in the AIDA chamber during adiabatic expansions, or using the INKA continuous flow diffusion chamber, or a cold stage. Results from the campaign along with the potential implications are presented.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing.

PubMed

Cole, Brian S; Moore, Jason H

2018-03-01

Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world's largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.
Eleven quick tips for architecting biomedical informatics workflows with cloud computing

PubMed Central

Moore, Jason H.

2018-01-01

Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for architecting biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction. PMID:29596416
Evaluating cloud retrieval algorithms with the ARM BBHRP framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mlawer,E.; Dunn,M.; Mlawer, E.

2008-03-10

Climate and weather prediction models require accurate calculations of vertical profiles of radiative heating. Although heating rate calculations cannot be directly validated due to the lack of corresponding observations, surface and top-of-atmosphere measurements can indirectly establish the quality of computed heating rates through validation of the calculated irradiances at the atmospheric boundaries. The ARM Broadband Heating Rate Profile (BBHRP) project, a collaboration of all the working groups in the program, was designed with these heating rate validations as a key objective. Given the large dependence of radiative heating rates on cloud properties, a critical component of BBHRP radiative closure analysesmore » has been the evaluation of cloud microphysical retrieval algorithms. This evaluation is an important step in establishing the necessary confidence in the continuous profiles of computed radiative heating rates produced by BBHRP at the ARM Climate Research Facility (ACRF) sites that are needed for modeling studies. This poster details the continued effort to evaluate cloud property retrieval algorithms within the BBHRP framework, a key focus of the project this year. A requirement for the computation of accurate heating rate profiles is a robust cloud microphysical product that captures the occurrence, height, and phase of clouds above each ACRF site. Various approaches to retrieve the microphysical properties of liquid, ice, and mixed-phase clouds have been processed in BBHRP for the ACRF Southern Great Plains (SGP) and the North Slope of Alaska (NSA) sites. These retrieval methods span a range of assumptions concerning the parameterization of cloud location, particle density, size, shape, and involve different measurement sources. We will present the radiative closure results from several different retrieval approaches for the SGP site, including those from Microbase, the current 'reference' retrieval approach in BBHRP. At the NSA, mixed-phase clouds and cloud with a low optical depth are prevalent; the radiative closure studies using Microbase demonstrated significant residuals. As an alternative to Microbase at NSA, the Shupe-Turner cloud property retrieval algorithm, aimed at improving the partitioning of cloud phase and incorporating more constrained, conditional microphysics retrievals, also has been evaluated using the BBHRP data set.« less
A cloud-based data network approach for translational cancer research.

PubMed

Xing, Wei; Tsoumakos, Dimitrios; Ghanem, Moustafa

2015-01-01

We develop a new model and associated technology for constructing and managing self-organizing data to support translational cancer research studies. We employ a semantic content network approach to address the challenges of managing cancer research data. Such data is heterogeneous, large, decentralized, growing and continually being updated. Moreover, the data originates from different information sources that may be partially overlapping, creating redundancies as well as contradictions and inconsistencies. Building on the advantages of elasticity of cloud computing, we deploy the cancer data networks on top of the CELAR Cloud platform to enable more effective processing and analysis of Big cancer data.
Towards Large-area Field-scale Operational Evapotranspiration for Water Use Mapping

NASA Astrophysics Data System (ADS)

Senay, G. B.; Friedrichs, M.; Morton, C.; Huntington, J. L.; Verdin, J.

2017-12-01

Field-scale evapotranspiration (ET) estimates are needed for improving surface and groundwater use and water budget studies. Ideally, field-scale ET estimates would be at regional to national levels and cover long time periods. As a result of large data storage and computational requirements associated with processing field-scale satellite imagery such as Landsat, numerous challenges remain to develop operational ET estimates over large areas for detailed water use and availability studies. However, the combination of new science, data availability, and cloud computing technology is enabling unprecedented capabilities for ET mapping. To demonstrate this capability, we used Google's Earth Engine cloud computing platform to create nationwide annual ET estimates with 30-meter resolution Landsat ( 16,000 images) and gridded weather data using the Operational Simplified Surface Energy Balance (SSEBop) model in support of the National Water Census, a USGS research program designed to build decision support capacity for water management agencies and other natural resource managers. By leveraging Google's Earth Engine Application Programming Interface (API) and developing software in a collaborative, open-platform environment, we rapidly advance from research towards applications for large-area field-scale ET mapping. Cloud computing of the Landsat image archive combined with other satellite, climate, and weather data, is creating never imagined opportunities for assessing ET model behavior and uncertainty, and ultimately providing the ability for more robust operational monitoring and assessment of water use at field-scales.
Research on Quantum Authentication Methods for the Secure Access Control Among Three Elements of Cloud Computing

NASA Astrophysics Data System (ADS)

Dong, Yumin; Xiao, Shufen; Ma, Hongyang; Chen, Libo

2016-12-01

Cloud computing and big data have become the developing engine of current information technology (IT) as a result of the rapid development of IT. However, security protection has become increasingly important for cloud computing and big data, and has become a problem that must be solved to develop cloud computing. The theft of identity authentication information remains a serious threat to the security of cloud computing. In this process, attackers intrude into cloud computing services through identity authentication information, thereby threatening the security of data from multiple perspectives. Therefore, this study proposes a model for cloud computing protection and management based on quantum authentication, introduces the principle of quantum authentication, and deduces the quantum authentication process. In theory, quantum authentication technology can be applied in cloud computing for security protection. This technology cannot be cloned; thus, it is more secure and reliable than classical methods.
Insights into low-latitude cloud feedbacks from high-resolution models.

PubMed

Bretherton, Christopher S

2015-11-13

Cloud feedbacks are a leading source of uncertainty in the climate sensitivity simulated by global climate models (GCMs). Low-latitude boundary-layer and cumulus cloud regimes are particularly problematic, because they are sustained by tight interactions between clouds and unresolved turbulent circulations. Turbulence-resolving models better simulate such cloud regimes and support the GCM consensus that they contribute to positive global cloud feedbacks. Large-eddy simulations using sub-100 m grid spacings over small computational domains elucidate marine boundary-layer cloud response to greenhouse warming. Four observationally supported mechanisms contribute: 'thermodynamic' cloudiness reduction from warming of the atmosphere-ocean column, 'radiative' cloudiness reduction from CO2- and H2O-induced increase in atmospheric emissivity aloft, 'stability-induced' cloud increase from increased lower tropospheric stratification, and 'dynamical' cloudiness increase from reduced subsidence. The cloudiness reduction mechanisms typically dominate, giving positive shortwave cloud feedback. Cloud-resolving models with horizontal grid spacings of a few kilometres illuminate how cumulonimbus cloud systems affect climate feedbacks. Limited-area simulations and superparameterized GCMs show upward shift and slight reduction of cloud cover in a warmer climate, implying positive cloud feedbacks. A global cloud-resolving model suggests tropical cirrus increases in a warmer climate, producing positive longwave cloud feedback, but results are sensitive to subgrid turbulence and ice microphysics schemes. © 2015 The Author(s).
BioBlend: automating pipeline analyses within Galaxy and CloudMan.

PubMed

Sloggett, Clare; Goonasekera, Nuwan; Afgan, Enis

2013-07-01

We present BioBlend, a unified API in a high-level language (python) that wraps the functionality of Galaxy and CloudMan APIs. BioBlend makes it easy for bioinformaticians to automate end-to-end large data analysis, from scratch, in a way that is highly accessible to collaborators, by allowing them to both provide the required infrastructure and automate complex analyses over large datasets within the familiar Galaxy environment. http://bioblend.readthedocs.org/. Automated installation of BioBlend is available via PyPI (e.g. pip install bioblend). Alternatively, the source code is available from the GitHub repository (https://github.com/afgane/bioblend) under the MIT open source license. The library has been tested and is working on Linux, Macintosh and Windows-based systems.
Study on Huizhou architecture of point cloud registration based on optimized ICP algorithm

NASA Astrophysics Data System (ADS)

Zhang, Runmei; Wu, Yulu; Zhang, Guangbin; Zhou, Wei; Tao, Yuqian

2018-03-01

In view of the current point cloud registration software has high hardware requirements, heavy workload and moltiple interactive definition, the source of software with better processing effect is not open, a two--step registration method based on normal vector distribution feature and coarse feature based iterative closest point (ICP) algorithm is proposed in this paper. This method combines fast point feature histogram (FPFH) algorithm, define the adjacency region of point cloud and the calculation model of the distribution of normal vectors, setting up the local coordinate system for each key point, and obtaining the transformation matrix to finish rough registration, the rough registration results of two stations are accurately registered by using the ICP algorithm. Experimental results show that, compared with the traditional ICP algorithm, the method used in this paper has obvious time and precision advantages for large amount of point clouds.
Sodium D-line emission from Io - Comparison of observed and theoretical line profiles

NASA Technical Reports Server (NTRS)

Carlson, R. W.; Matson, D. L.; Johnson, T. V.; Bergstralh, J. T.

1978-01-01

High-resolution spectra of the D-line profiles have been obtained for Io's sodium emission cloud. These lines, which are produced through resonance scattering of sunlight, are broad and asymmetric and can be used to infer source and dynamical properties of the sodium cloud. In this paper we compare line profile data with theoretical line shapes computed for several assumed initial velocity distributions corresponding to various source mechanisms. We also examine the consequences of source distributions which are nonuniform over the surface of Io. It is found that the experimental data are compatible with escape of sodium atoms from the leading hemisphere of Io and with velocity distributions characteristic of sputtering processes. Thermal escape and simple models of plasma sweeping are found to be incompatible with the observations.
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation

NASA Technical Reports Server (NTRS)

Stocker, John C.; Golomb, Andrew M.

2011-01-01

Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
Outer satellite atmospheres: Their nature and planetary interactions

NASA Technical Reports Server (NTRS)

Smyth, W. H.; Combi, M. R.

1982-01-01

Significant progress is reported in early modeling analysis of observed sodium cloud images with our new model which includes the oscillating Io plasma torus ionization sink. Both the general w-D morphology of the region B cloud as well as the large spatial gradient seen between the region A and B clouds are found to be consistent with an isotropic flux of sodium atoms from Io. Model analysis of the spatially extended high velocity directional features provided substantial evidence for a magnetospheric wind driven gas escape mechanism from Io. In our efforts to define the source(s) of hydrogen atoms in the Saturn system, major steps were taken in order to understand the role of Titan. We have completed the comparison of the Voyager UVS data with previous Titan model results, as well as the update of the old model computer code to handle the spatially varying ionization sink for H atoms.
SmartVeh: Secure and Efficient Message Access Control and Authentication for Vehicular Cloud Computing.

PubMed

Huang, Qinlong; Yang, Yixian; Shi, Yuxiang

2018-02-24

With the growing number of vehicles and popularity of various services in vehicular cloud computing (VCC), message exchanging among vehicles under traffic conditions and in emergency situations is one of the most pressing demands, and has attracted significant attention. However, it is an important challenge to authenticate the legitimate sources of broadcast messages and achieve fine-grained message access control. In this work, we propose SmartVeh, a secure and efficient message access control and authentication scheme in VCC. A hierarchical, attribute-based encryption technique is utilized to achieve fine-grained and flexible message sharing, which ensures that vehicles whose persistent or dynamic attributes satisfy the access policies can access the broadcast message with equipped on-board units (OBUs). Message authentication is enforced by integrating an attribute-based signature, which achieves message authentication and maintains the anonymity of the vehicles. In order to reduce the computations of the OBUs in the vehicles, we outsource the heavy computations of encryption, decryption and signing to a cloud server and road-side units. The theoretical analysis and simulation results reveal that our secure and efficient scheme is suitable for VCC.
SmartVeh: Secure and Efficient Message Access Control and Authentication for Vehicular Cloud Computing

PubMed Central

Yang, Yixian; Shi, Yuxiang

2018-01-01

With the growing number of vehicles and popularity of various services in vehicular cloud computing (VCC), message exchanging among vehicles under traffic conditions and in emergency situations is one of the most pressing demands, and has attracted significant attention. However, it is an important challenge to authenticate the legitimate sources of broadcast messages and achieve fine-grained message access control. In this work, we propose SmartVeh, a secure and efficient message access control and authentication scheme in VCC. A hierarchical, attribute-based encryption technique is utilized to achieve fine-grained and flexible message sharing, which ensures that vehicles whose persistent or dynamic attributes satisfy the access policies can access the broadcast message with equipped on-board units (OBUs). Message authentication is enforced by integrating an attribute-based signature, which achieves message authentication and maintains the anonymity of the vehicles. In order to reduce the computations of the OBUs in the vehicles, we outsource the heavy computations of encryption, decryption and signing to a cloud server and road-side units. The theoretical analysis and simulation results reveal that our secure and efficient scheme is suitable for VCC. PMID:29495269
Establishing a Cloud Computing Success Model for Hospitals in Taiwan.

PubMed

Lian, Jiunn-Woei

2017-01-01

The purpose of this study is to understand the critical quality-related factors that affect cloud computing success of hospitals in Taiwan. In this study, private cloud computing is the major research target. The chief information officers participated in a questionnaire survey. The results indicate that the integration of trust into the information systems success model will have acceptable explanatory power to understand cloud computing success in the hospital. Moreover, information quality and system quality directly affect cloud computing satisfaction, whereas service quality indirectly affects the satisfaction through trust. In other words, trust serves as the mediator between service quality and satisfaction. This cloud computing success model will help hospitals evaluate or achieve success after adopting private cloud computing health care services.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.