experiment computing infrastructure: Topics by Science.gov

Sample records for experiment computing infrastructure

Progress in Machine Learning Studies for the CMS Computing Infrastructure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo

Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
Progress in Machine Learning Studies for the CMS Computing Infrastructure

DOE PAGES

Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo; ...

2017-12-06

Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
GSDC: A Unique Data Center in Korea for HEP research

NASA Astrophysics Data System (ADS)

Ahn, Sang-Un

2017-04-01

Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) is a unique data center in South Korea established for promoting the fundamental research fields by supporting them with the expertise on Information and Communication Technology (ICT) and the infrastructure for High Performance Computing (HPC), High Throughput Computing (HTC) and Networking. GSDC has supported various research fields in South Korea dealing with the large scale of data, e.g. RENO experiment for neutrino research, LIGO experiment for gravitational wave detection, Genome sequencing project for bio-medical, and HEP experiments such as CDF at FNAL, Belle at KEK, and STAR at BNL. In particular, GSDC has run a Tier-1 center for ALICE experiment using the LHC at CERN since 2013. In this talk, we present the overview on computing infrastructure that GSDC runs for the research fields and we discuss on the data center infrastructure management system deployed at GSDC.
Evolution of the Virtualized HPC Infrastructure of Novosibirsk Scientific Center

NASA Astrophysics Data System (ADS)

Adakin, A.; Anisenkov, A.; Belov, S.; Chubarov, D.; Kalyuzhny, V.; Kaplin, V.; Korol, A.; Kuchin, N.; Lomakin, S.; Nikultsev, V.; Skovpen, K.; Sukharev, A.; Zaytsev, A.

2012-12-01

Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies, and Institute of Computational Mathematics and Mathematical Geophysics (ICM&MG). Since each institute has specific requirements on the architecture of computing farms involved in its research field, currently we've got several computing facilities hosted by NSC institutes, each optimized for a particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM&MG), and a Grid Computing Facility of BINP. A dedicated optical network with the initial bandwidth of 10 Gb/s connecting these three facilities was built in order to make it possible to share the computing resources among the research communities, thus increasing the efficiency of operating the existing computing facilities and offering a common platform for building the computing infrastructure for future scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technology based on XEN and KVM platforms. This contribution gives a thorough review of the present status and future development prospects for the NSC virtualized computing infrastructure and the experience gained while using it for running production data analysis jobs related to HEP experiments being carried out at BINP, especially the KEDR detector experiment at the VEPP-4M electron-positron collider.
A Development of Lightweight Grid Interface

NASA Astrophysics Data System (ADS)

Iwai, G.; Kawai, Y.; Sasaki, T.; Watase, Y.

2011-12-01

In order to help a rapid development of Grid/Cloud aware applications, we have developed API to abstract the distributed computing infrastructures based on SAGA (A Simple API for Grid Applications). SAGA, which is standardized in the OGF (Open Grid Forum), defines API specifications to access distributed computing infrastructures, such as Grid, Cloud and local computing resources. The Universal Grid API (UGAPI), which is a set of command line interfaces (CLI) and APIs, aims to offer simpler API to combine several SAGA interfaces with richer functionalities. These CLIs of the UGAPI offer typical functionalities required by end users for job management and file access to the different distributed computing infrastructures as well as local computing resources. We have also built a web interface for the particle therapy simulation and demonstrated the large scale calculation using the different infrastructures at the same time. In this paper, we would like to present how the web interface based on UGAPI and SAGA achieve more efficient utilization of computing resources over the different infrastructures with technical details and practical experiences.
Cloud Infrastructure & Applications - CloudIA

NASA Astrophysics Data System (ADS)

Sulistio, Anthony; Reich, Christoph; Doelitzscher, Frank

The idea behind Cloud Computing is to deliver Infrastructure-as-a-Services and Software-as-a-Service over the Internet on an easy pay-per-use business model. To harness the potentials of Cloud Computing for e-Learning and research purposes, and to small- and medium-sized enterprises, the Hochschule Furtwangen University establishes a new project, called Cloud Infrastructure & Applications (CloudIA). The CloudIA project is a market-oriented cloud infrastructure that leverages different virtualization technologies, by supporting Service-Level Agreements for various service offerings. This paper describes the CloudIA project in details and mentions our early experiences in building a private cloud using an existing infrastructure.
Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

DOE PAGES

Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.; ...

2015-04-06

The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less
Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.

The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less
Infrastructures for Distributed Computing: the case of BESIII

NASA Astrophysics Data System (ADS)

Pellegrino, J.

2018-05-01

The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.
Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models.

PubMed

Rao, Nageswara S V; Poole, Stephen W; Ma, Chris Y T; He, Fei; Zhuang, Jun; Yau, David K Y

2016-04-01

The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities, expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical subinfrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein their components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures, are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. The analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures. © 2015 Society for Risk Analysis.
Sustaining a Community Computing Infrastructure for Online Teacher Professional Development: A Case Study of Designing Tapped In

NASA Astrophysics Data System (ADS)

Farooq, Umer; Schank, Patricia; Harris, Alexandra; Fusco, Judith; Schlager, Mark

Community computing has recently grown to become a major research area in human-computer interaction. One of the objectives of community computing is to support computer-supported cooperative work among distributed collaborators working toward shared professional goals in online communities of practice. A core issue in designing and developing community computing infrastructures — the underlying sociotechnical layer that supports communitarian activities — is sustainability. Many community computing initiatives fail because the underlying infrastructure does not meet end user requirements; the community is unable to maintain a critical mass of users consistently over time; it generates insufficient social capital to support significant contributions by members of the community; or, as typically happens with funded initiatives, financial and human capital resource become unavailable to further maintain the infrastructure. On the basis of more than 9 years of design experience with Tapped In-an online community of practice for education professionals — we present a case study that discusses four design interventions that have sustained the Tapped In infrastructure and its community to date. These interventions represent broader design strategies for developing online environments for professional communities of practice.
Managing a tier-2 computer centre with a private cloud infrastructure

NASA Astrophysics Data System (ADS)

Bagnasco, Stefano; Berzano, Dario; Brunetti, Riccardo; Lusso, Stefano; Vallero, Sara

2014-06-01

In a typical scientific computing centre, several applications coexist and share a single physical infrastructure. An underlying Private Cloud infrastructure eases the management and maintenance of such heterogeneous applications (such as multipurpose or application-specific batch farms, Grid sites, interactive data analysis facilities and others), allowing dynamic allocation resources to any application. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques. Such infrastructures are being deployed in some large centres (see e.g. the CERN Agile Infrastructure project), but with several open-source tools reaching maturity this is becoming viable also for smaller sites. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 centre, an Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The private cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem and the OpenWRT Linux distribution (used for network virtualization); a future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and OCCI.
Cloud access to interoperable IVOA-compliant VOSpace storage

NASA Astrophysics Data System (ADS)

Bertocco, S.; Dowler, P.; Gaudet, S.; Major, B.; Pasian, F.; Taffoni, G.

2018-07-01

Handling, processing and archiving the huge amount of data produced by the new generation of experiments and instruments in Astronomy and Astrophysics are among the more exciting challenges to address in designing the future data management infrastructures and computing services. We investigated the feasibility of a data management and computation infrastructure, available world-wide, with the aim of merging the FAIR data management provided by IVOA standards with the efficiency and reliability of a cloud approach. Our work involved the Canadian Advanced Network for Astronomy Research (CANFAR) infrastructure and the European EGI federated cloud (EFC). We designed and deployed a pilot data management and computation infrastructure that provides IVOA-compliant VOSpace storage resources and wide access to interoperable federated clouds. In this paper, we detail the main user requirements covered, the technical choices and the implemented solutions and we describe the resulting Hybrid cloud Worldwide infrastructure, its benefits and limitations.
Towards a Multi-Mission, Airborne Science Data System Environment

NASA Astrophysics Data System (ADS)

Crichton, D. J.; Hardman, S.; Law, E.; Freeborn, D.; Kay-Im, E.; Lau, G.; Oswald, J.

2011-12-01

NASA earth science instruments are increasingly relying on airborne missions. However, traditionally, there has been limited common infrastructure support available to principal investigators in the area of science data systems. As a result, each investigator has been required to develop their own computing infrastructures for the science data system. Typically there is little software reuse and many projects lack sufficient resources to provide a robust infrastructure to capture, process, distribute and archive the observations acquired from airborne flights. At NASA's Jet Propulsion Laboratory (JPL), we have been developing a multi-mission data system infrastructure for airborne instruments called the Airborne Cloud Computing Environment (ACCE). ACCE encompasses the end-to-end lifecycle covering planning, provisioning of data system capabilities, and support for scientific analysis in order to improve the quality, cost effectiveness, and capabilities to enable new scientific discovery and research in earth observation. This includes improving data system interoperability across each instrument. A principal characteristic is being able to provide an agile infrastructure that is architected to allow for a variety of configurations of the infrastructure from locally installed compute and storage services to provisioning those services via the "cloud" from cloud computer vendors such as Amazon.com. Investigators often have different needs that require a flexible configuration. The data system infrastructure is built on the Apache's Object Oriented Data Technology (OODT) suite of components which has been used for a number of spaceborne missions and provides a rich set of open source software components and services for constructing science processing and data management systems. In 2010, a partnership was formed between the ACCE team and the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to support the data processing and data management needs. A principal goal is to provide support for the Fourier Transform Spectrometer (FTS) instrument which will produce over 700,000 soundings over the life of their three-year mission. The cost to purchase and operate a cluster-based system in order to generate Level 2 Full Physics products from this data was prohibitive. Through an evaluation of cloud computing solutions, Amazon's Elastic Compute Cloud (EC2) was selected for the CARVE deployment. As the ACCE infrastructure is developed and extended to form an infrastructure for airborne missions, the experience of working with CARVE has provided a number of lessons learned and has proven to be important in reinforcing the unique aspects of airborne missions and the importance of the ACCE infrastructure in developing a cost effective, flexible multi-mission capability that leverages emerging capabilities in cloud computing, workflow management, and distributed computing.
First results from a combined analysis of CERN computing infrastructure metrics

NASA Astrophysics Data System (ADS)

Duellmann, Dirk; Nieke, Christian

2017-10-01

The IT Analysis Working Group (AWG) has been formed at CERN across individual computing units and the experiments to attempt a cross cutting analysis of computing infrastructure and application metrics. In this presentation we will describe the first results obtained using medium/long term data (1 months — 1 year) correlating box level metrics, job level metrics from LSF and HTCondor, IO metrics from the physics analysis disk pools (EOS) and networking and application level metrics from the experiment dashboards. We will cover in particular the measurement of hardware performance and prediction of job duration, the latency sensitivity of different job types and a search for bottlenecks with the production job mix in the current infrastructure. The presentation will conclude with the proposal of a small set of metrics to simplify drawing conclusions also in the more constrained environment of public cloud deployments.
Grids, virtualization, and clouds at Fermilab

DOE PAGES

Timm, S.; Chadwick, K.; Garzoglio, G.; ...

2014-06-11

Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture andmore » the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). Lastly, this work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.« less
Grids, virtualization, and clouds at Fermilab

NASA Astrophysics Data System (ADS)

Timm, S.; Chadwick, K.; Garzoglio, G.; Noh, S.

2014-06-01

Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture and the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). This work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.
Grid Computing at GSI for ALICE and FAIR - present and future

NASA Astrophysics Data System (ADS)

Schwarz, Kilian; Uhlig, Florian; Karabowicz, Radoslaw; Montiel-Gonzalez, Almudena; Zynovyev, Mykhaylo; Preuss, Carsten

2012-12-01

The future FAIR experiments CBM and PANDA have computing requirements that fall in a category that could currently not be satisfied by one single computing centre. One needs a larger, distributed computing infrastructure to cope with the amount of data to be simulated and analysed. Since 2002, GSI operates a tier2 center for ALICE@CERN. The central component of the GSI computing facility and hence the core of the ALICE tier2 centre is a LSF/SGE batch farm, currently split into three subclusters with a total of 15000 CPU cores shared by the participating experiments, and accessible both locally and soon also completely via Grid. In terms of data storage, a 5.5 PB Lustre file system, directly accessible from all worker nodes is maintained, as well as a 300 TB xrootd-based Grid storage element. Based on this existing expertise, and utilising ALICE's middleware ‘AliEn’, the Grid infrastructure for PANDA and CBM is being built. Besides a tier0 centre at GSI, the computing Grids of the two FAIR collaborations encompass now more than 17 sites in 11 countries and are constantly expanding. The operation of the distributed FAIR computing infrastructure benefits significantly from the experience gained with the ALICE tier2 centre. A close collaboration between ALICE Offline and FAIR provides mutual advantages. The employment of a common Grid middleware as well as compatible simulation and analysis software frameworks ensure significant synergy effects.
FermiGrid—experience and future plans

NASA Astrophysics Data System (ADS)

Chadwick, K.; Berman, E.; Canal, P.; Hesselroth, T.; Garzoglio, G.; Levshina, T.; Sergeev, V.; Sfiligoi, I.; Sharma, N.; Timm, S.; Yocum, D. R.

2008-07-01

Fermilab supports a scientific program that includes experiments and scientists located across the globe. In order to better serve this community, Fermilab has placed its production computer resources in a Campus Grid infrastructure called 'FermiGrid'. The FermiGrid infrastructure allows the large experiments at Fermilab to have priority access to their own resources, enables sharing of these resources in an opportunistic fashion, and movement of work (jobs, data) between the Campus Grid and National Grids such as Open Science Grid (OSG) and the Worldwide LHC Computing Grid Collaboration (WLCG). FermiGrid resources support multiple Virtual Organizations (VOs), including VOs from the OSG, EGEE, and the WLCG. Fermilab also makes leading contributions to the Open Science Grid in the areas of accounting, batch computing, grid security, job management, resource selection, site infrastructure, storage management, and VO services. Through the FermiGrid interfaces, authenticated and authorized VOs and individuals may access our core grid services, the 10,000+ Fermilab resident CPUs, near-petabyte (including CMS) online disk pools and the multi-petabyte Fermilab Mass Storage System. These core grid services include a site wide Globus gatekeeper, VO management services for several VOs, Fermilab site authorization services, grid user mapping services, as well as job accounting and monitoring, resource selection and data movement services. Access to these services is via standard and well-supported grid interfaces. We will report on the user experience of using the FermiGrid campus infrastructure interfaced to a national cyberinfrastructure - the successes and the problems.
FermiGrid - experience and future plans

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chadwick, K.; Berman, E.; Canal, P.

2007-09-01

Fermilab supports a scientific program that includes experiments and scientists located across the globe. In order to better serve this community, Fermilab has placed its production computer resources in a Campus Grid infrastructure called 'FermiGrid'. The FermiGrid infrastructure allows the large experiments at Fermilab to have priority access to their own resources, enables sharing of these resources in an opportunistic fashion, and movement of work (jobs, data) between the Campus Grid and National Grids such as Open Science Grid and the WLCG. FermiGrid resources support multiple Virtual Organizations (VOs), including VOs from the Open Science Grid (OSG), EGEE and themore » Worldwide LHC Computing Grid Collaboration (WLCG). Fermilab also makes leading contributions to the Open Science Grid in the areas of accounting, batch computing, grid security, job management, resource selection, site infrastructure, storage management, and VO services. Through the FermiGrid interfaces, authenticated and authorized VOs and individuals may access our core grid services, the 10,000+ Fermilab resident CPUs, near-petabyte (including CMS) online disk pools and the multi-petabyte Fermilab Mass Storage System. These core grid services include a site wide Globus gatekeeper, VO management services for several VOs, Fermilab site authorization services, grid user mapping services, as well as job accounting and monitoring, resource selection and data movement services. Access to these services is via standard and well-supported grid interfaces. We will report on the user experience of using the FermiGrid campus infrastructure interfaced to a national cyberinfrastructure--the successes and the problems.« less

Processing LHC data in the UK

PubMed Central

Colling, D.; Britton, D.; Gordon, J.; Lloyd, S.; Doyle, A.; Gronbech, P.; Coles, J.; Sansum, A.; Patrick, G.; Jones, R.; Middleton, R.; Kelsey, D.; Cass, A.; Geddes, N.; Clark, P.; Barnby, L.

2013-01-01

The Large Hadron Collider (LHC) is one of the greatest scientific endeavours to date. The construction of the collider itself and the experiments that collect data from it represent a huge investment, both financially and in terms of human effort, in our hope to understand the way the Universe works at a deeper level. Yet the volumes of data produced are so large that they cannot be analysed at any single computing centre. Instead, the experiments have all adopted distributed computing models based on the LHC Computing Grid. Without the correct functioning of this grid infrastructure the experiments would not be able to understand the data that they have collected. Within the UK, the Grid infrastructure needed by the experiments is provided by the GridPP project. We report on the operations, performance and contributions made to the experiments by the GridPP project during the years of 2010 and 2011—the first two significant years of the running of the LHC. PMID:23230163
Commissioning the CERN IT Agile Infrastructure with experiment workloads

NASA Astrophysics Data System (ADS)

Medrano Llamas, Ramón; Harald Barreiro Megino, Fernando; Kucharczyk, Katarzyna; Kamil Denis, Marek; Cinquilli, Mattia

2014-06-01

In order to ease the management of their infrastructure, most of the WLCG sites are adopting cloud based strategies. In the case of CERN, the Tier 0 of the WLCG, is completely restructuring the resource and configuration management of their computing center under the codename Agile Infrastructure. Its goal is to manage 15,000 Virtual Machines by means of an OpenStack middleware in order to unify all the resources in CERN's two datacenters: the one placed in Meyrin and the new on in Wigner, Hungary. During the commissioning of this infrastructure, CERN IT is offering an attractive amount of computing resources to the experiments (800 cores for ATLAS and CMS) through a private cloud interface. ATLAS and CMS have joined forces to exploit them by running stress tests and simulation workloads since November 2012. This work will describe the experience of the first deployments of the current experiment workloads on the CERN private cloud testbed. The paper is organized as follows: the first section will explain the integration of the experiment workload management systems (WMS) with the cloud resources. The second section will revisit the performance and stress testing performed with HammerCloud in order to evaluate and compare the suitability for the experiment workloads. The third section will go deeper into the dynamic provisioning techniques, such as the use of the cloud APIs directly by the WMS. The paper finishes with a review of the conclusions and the challenges ahead.
Climate simulations and services on HPC, Cloud and Grid infrastructures

NASA Astrophysics Data System (ADS)

Cofino, Antonio S.; Blanco, Carlos; Minondo Tshuma, Antonio

2017-04-01

Cloud, Grid and High Performance Computing have changed the accessibility and availability of computing resources for Earth Science research communities, specially for Climate community. These paradigms are modifying the way how climate applications are being executed. By using these technologies the number, variety and complexity of experiments and resources are increasing substantially. But, although computational capacity is increasing, traditional applications and tools used by the community are not good enough to manage this large volume and variety of experiments and computing resources. In this contribution, we evaluate the challenges to run climate simulations and services on Grid, Cloud and HPC infrestructures and how to tackle them. The Grid and Cloud infrastructures provided by EGI's VOs ( esr , earth.vo.ibergrid and fedcloud.egi.eu) will be evaluated, as well as HPC resources from PRACE infrastructure and institutional clusters. To solve those challenges, solutions using DRM4G framework will be shown. DRM4G provides a good framework to manage big volume and variety of computing resources for climate experiments. This work has been supported by the Spanish National R&D Plan under projects WRF4G (CGL2011-28864), INSIGNIA (CGL2016-79210-R) and MULTI-SDM (CGL2015-66583-R) ; the IS-ENES2 project from the 7FP of the European Commission (grant agreement no. 312979); the European Regional Development Fund—ERDF and the Programa de Personal Investigador en Formación Predoctoral from Universidad de Cantabria and Government of Cantabria.
Fermilab computing at the Intensity Frontier

DOE PAGES

Group, Craig; Fuess, S.; Gutsche, O.; ...

2015-12-23

The Intensity Frontier refers to a diverse set of particle physics experiments using high- intensity beams. In this paper I will focus the discussion on the computing requirements and solutions of a set of neutrino and muon experiments in progress or planned to take place at the Fermi National Accelerator Laboratory located near Chicago, Illinois. In addition, the experiments face unique challenges, but also have overlapping computational needs. In principle, by exploiting the commonality and utilizing centralized computing tools and resources, requirements can be satisfied efficiently and scientists of individual experiments can focus more on the science and less onmore » the development of tools and infrastructure.« less
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation

NASA Astrophysics Data System (ADS)

Anisenkov, A. V.

2018-03-01

In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
Belle II grid computing: An overview of the distributed data management system.

NASA Astrophysics Data System (ADS)

Bansal, Vikas; Schram, Malachi; Belle Collaboration, II

2017-01-01

The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start physics data taking in 2018 and will accumulate 50/ab of e +e- collision data, about 50 times larger than the data set of the Belle experiment. The computing requirements of Belle II are comparable to those of a Run I LHC experiment. Computing at this scale requires efficient use of the compute grids in North America, Asia and Europe and will take advantage of upgrades to the high-speed global network. We present the architecture of data flow and data handling as a part of the Belle II computing infrastructure.
Reconfiguring practice: the interdependence of experimental procedure and computing infrastructure in distributed earthquake engineering.

PubMed

De La Flor, Grace; Ojaghi, Mobin; Martínez, Ignacio Lamata; Jirotka, Marina; Williams, Martin S; Blakeborough, Anthony

2010-09-13

When transitioning local laboratory practices into distributed environments, the interdependent relationship between experimental procedure and the technologies used to execute experiments becomes highly visible and a focal point for system requirements. We present an analysis of ways in which this reciprocal relationship is reconfiguring laboratory practices in earthquake engineering as a new computing infrastructure is embedded within three laboratories in order to facilitate the execution of shared experiments across geographically distributed sites. The system has been developed as part of the UK Network for Earthquake Engineering Simulation e-Research project, which links together three earthquake engineering laboratories at the universities of Bristol, Cambridge and Oxford. We consider the ways in which researchers have successfully adapted their local laboratory practices through the modification of experimental procedure so that they may meet the challenges of coordinating distributed earthquake experiments.
A model to forecast data centre infrastructure costs.

NASA Astrophysics Data System (ADS)

Vernet, R.

2015-12-01

The computing needs in the HEP community are increasing steadily, but the current funding situation in many countries is tight. As a consequence experiments, data centres, and funding agencies have to rationalize resource usage and expenditures. CC-IN2P3 (Lyon, France) provides computing resources to many experiments including LHC, and is a major partner for astroparticle projects like LSST, CTA or Euclid. The financial cost to accommodate all these experiments is substantial and has to be planned well in advance for funding and strategic reasons. In that perspective, leveraging infrastructure expenses, electric power cost and hardware performance observed in our site over the last years, we have built a model that integrates these data and provides estimates of the investments that would be required to cater to the experiments for the mid-term future. We present how our model is built and the expenditure forecast it produces, taking into account the experiment roadmaps. We also examine the resource growth predicted by our model over the next years assuming a flat-budget scenario.
Evolution of a Materials Data Infrastructure

NASA Astrophysics Data System (ADS)

Warren, James A.; Ward, Charles H.

2018-06-01

The field of materials science and engineering is writing a new chapter in its evolution, one of digitally empowered materials discovery, development, and deployment. The 2008 Integrated Computational Materials Engineering (ICME) study report helped usher in this paradigm shift, making a compelling case and strong recommendations for an infrastructure supporting ICME that would enable access to precompetitive materials data for both scientific and engineering applications. With the launch of the Materials Genome Initiative in 2011, which drew substantial inspiration from the ICME study, digital data was highlighted as a core component of a Materials Innovation Infrastructure, along with experimental and computational tools. Over the past 10 years, our understanding of what it takes to provide accessible materials data has matured and rapid progress has been made in establishing a Materials Data Infrastructure (MDI). We are learning that the MDI is essential to eliminating the seams between experiment and computation by providing a means for them to connect effortlessly. Additionally, the MDI is becoming an enabler, allowing materials engineering to tie into a much broader model-based engineering enterprise for product design.
SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology.

PubMed

Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E; Troein, Carl; Millar, Andrew J; Goryanin, Igor; Gilmore, Stephen

2013-03-01

Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI's use of standard data formats. All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials.
!CHAOS: A cloud of controls

NASA Astrophysics Data System (ADS)

Angius, S.; Bisegni, C.; Ciuffetti, P.; Di Pirro, G.; Foggetta, L. G.; Galletti, F.; Gargana, R.; Gioscio, E.; Maselli, D.; Mazzitelli, G.; Michelotti, A.; Orrù, R.; Pistoni, M.; Spagnoli, F.; Spigone, D.; Stecchi, A.; Tonto, T.; Tota, M. A.; Catani, L.; Di Giulio, C.; Salina, G.; Buzzi, P.; Checcucci, B.; Lubrano, P.; Piccini, M.; Fattibene, E.; Michelotto, M.; Cavallaro, S. R.; Diana, B. F.; Enrico, F.; Pulvirenti, S.

2016-01-01

The paper is aimed to present the !CHAOS open source project aimed to develop a prototype of a national private Cloud Computing infrastructure, devoted to accelerator control systems and large experiments of High Energy Physics (HEP). The !CHAOS project has been financed by MIUR (Italian Ministry of Research and Education) and aims to develop a new concept of control system and data acquisition framework by providing, with a high level of aaabstraction, all the services needed for controlling and managing a large scientific, or non-scientific, infrastructure. A beta version of the !CHAOS infrastructure will be released at the end of December 2015 and will run on private Cloud infrastructures based on OpenStack.
1001 Ways to run AutoDock Vina for virtual screening

NASA Astrophysics Data System (ADS)

Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D.

2016-03-01

Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
1001 Ways to run AutoDock Vina for virtual screening.

PubMed

Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D

2016-03-01

Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
An Open Computing Infrastructure that Facilitates Integrated Product and Process Development from a Decision-Based Perspective

NASA Technical Reports Server (NTRS)

Hale, Mark A.

1996-01-01

Computer applications for design have evolved rapidly over the past several decades, and significant payoffs are being achieved by organizations through reductions in design cycle times. These applications are overwhelmed by the requirements imposed during complex, open engineering systems design. Organizations are faced with a number of different methodologies, numerous legacy disciplinary tools, and a very large amount of data. Yet they are also faced with few interdisciplinary tools for design collaboration or methods for achieving the revolutionary product designs required to maintain a competitive advantage in the future. These organizations are looking for a software infrastructure that integrates current corporate design practices with newer simulation and solution techniques. Such an infrastructure must be robust to changes in both corporate needs and enabling technologies. In addition, this infrastructure must be user-friendly, modular and scalable. This need is the motivation for the research described in this dissertation. The research is focused on the development of an open computing infrastructure that facilitates product and process design. In addition, this research explicitly deals with human interactions during design through a model that focuses on the role of a designer as that of decision-maker. The research perspective here is taken from that of design as a discipline with a focus on Decision-Based Design, Theory of Languages, Information Science, and Integration Technology. Given this background, a Model of IPPD is developed and implemented along the lines of a traditional experimental procedure: with the steps of establishing context, formalizing a theory, building an apparatus, conducting an experiment, reviewing results, and providing recommendations. Based on this Model, Design Processes and Specification can be explored in a structured and implementable architecture. An architecture for exploring design called DREAMS (Developing Robust Engineering Analysis Models and Specifications) has been developed which supports the activities of both meta-design and actual design execution. This is accomplished through a systematic process which is comprised of the stages of Formulation, Translation, and Evaluation. During this process, elements from a Design Specification are integrated into Design Processes. In addition, a software infrastructure was developed and is called IMAGE (Intelligent Multidisciplinary Aircraft Generation Environment). This represents a virtual apparatus in the Design Experiment conducted in this research. IMAGE is an innovative architecture because it explicitly supports design-related activities. This is accomplished through a GUI driven and Agent-based implementation of DREAMS. A HSCT design has been adopted from the Framework for Interdisciplinary Design Optimization (FIDO) and is implemented in IMAGE. This problem shows how Design Processes and Specification interact in a design system. In addition, the problem utilizes two different solution models concurrently: optimal and satisfying. The satisfying model allows for more design flexibility and allows a designer to maintain design freedom. As a result of following this experimental procedure, this infrastructure is an open system that it is robust to changes in both corporate needs and computer technologies. The development of this infrastructure leads to a number of significant intellectual contributions: 1) A new approach to implementing IPPD with the aid of a computer; 2) A formal Design Experiment; 3) A combined Process and Specification architecture that is language-based; 4) An infrastructure for exploring design; 5) An integration strategy for implementing computer resources; and 6) A seamless modeling language. The need for these contributions is emphasized by the demand by industry and government agencies for the development of these technologies.
Quake Final Video

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

Critical infrastructures of the world are at constant risks for earthquakes. Most of these critical structures are designed using archaic, seismic, simulation methods that were built from early digital computers from the 1970s. Idaho National Laboratory’s Seismic Research Group are working to modernize the simulation methods through computational research and large-scale laboratory experiments.
NGScloud: RNA-seq analysis of non-model species using cloud computing.

PubMed

Mora-Márquez, Fernando; Vázquez-Poletti, José Luis; López de Heredia, Unai

2018-05-03

RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon's hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis. NGScloud is freely available at https://github.com/GGFHF/NGScloud/. A manual detailing installation and how-to-use instructions is available with the distribution. unai.lopezdeheredia@upm.es.
Integration of Cloud resources in the LHCb Distributed Computing

NASA Astrophysics Data System (ADS)

Úbeda García, Mario; Méndez Muñoz, Víctor; Stagni, Federico; Cabarrou, Baptiste; Rauschmayr, Nathalie; Charpentier, Philippe; Closier, Joel

2014-06-01

This contribution describes how Cloud resources have been integrated in the LHCb Distributed Computing. LHCb is using its specific Dirac extension (LHCbDirac) as an interware for its Distributed Computing. So far, it was seamlessly integrating Grid resources and Computer clusters. The cloud extension of DIRAC (VMDIRAC) allows the integration of Cloud computing infrastructures. It is able to interact with multiple types of infrastructures in commercial and institutional clouds, supported by multiple interfaces (Amazon EC2, OpenNebula, OpenStack and CloudStack) - instantiates, monitors and manages Virtual Machines running on this aggregation of Cloud resources. Moreover, specifications for institutional Cloud resources proposed by Worldwide LHC Computing Grid (WLCG), mainly by the High Energy Physics Unix Information Exchange (HEPiX) group, have been taken into account. Several initiatives and computing resource providers in the eScience environment have already deployed IaaS in production during 2013. Keeping this on mind, pros and cons of a cloud based infrasctructure have been studied in contrast with the current setup. As a result, this work addresses four different use cases which represent a major improvement on several levels of our infrastructure. We describe the solution implemented by LHCb for the contextualisation of the VMs based on the idea of Cloud Site. We report on operational experience of using in production several institutional Cloud resources that are thus becoming integral part of the LHCb Distributed Computing resources. Furthermore, we describe as well the gradual migration of our Service Infrastructure towards a fully distributed architecture following the Service as a Service (SaaS) model.
Cybersecurity Workforce Development and the Protection of Critical Infrastructure

DTIC Science & Technology

2017-03-31

communicat ions products, and limited travel for site visits and conferencing. The CSCC contains a developed web-based coordination site, computer ...the CSCC. The Best Practices Ana~yst position maintains a lisr of best practices, computer related patches. and standard operating procedures (SOP...involved in conducting vulnerability assessments of computer networks. To adequately exercise and experiment with industry standard software, it was
The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications

NASA Technical Reports Server (NTRS)

Johnston, William E.

2002-01-01

With the advent of Grids - infrastructure for using and managing widely distributed computing and data resources in the science environment - there is now an opportunity to provide a standard, large-scale, computing, data, instrument, and collaboration environment for science that spans many different projects and provides the required infrastructure and services in a relatively uniform and supportable way. Grid technology has evolved over the past several years to provide the services and infrastructure needed for building 'virtual' systems and organizations. We argue that Grid technology provides an excellent basis for the creation of the integrated environments that can combine the resources needed to support the large- scale science projects located at multiple laboratories and universities. We present some science case studies that indicate that a paradigm shift in the process of science will come about as a result of Grids providing transparent and secure access to advanced and integrated information and technologies infrastructure: powerful computing systems, large-scale data archives, scientific instruments, and collaboration tools. These changes will be in the form of services that can be integrated with the user's work environment, and that enable uniform and highly capable access to these computers, data, and instruments, regardless of the location or exact nature of these resources. These services will integrate transient-use resources like computing systems, scientific instruments, and data caches (e.g., as they are needed to perform a simulation or analyze data from a single experiment); persistent-use resources. such as databases, data catalogues, and archives, and; collaborators, whose involvement will continue for the lifetime of a project or longer. While we largely address large-scale science in this paper, Grids, particularly when combined with Web Services, will address a broad spectrum of science scenarios. both large and small scale.
Using Multi-modal Sensing for Human Activity Modeling in the Real World

NASA Astrophysics Data System (ADS)

Harrison, Beverly L.; Consolvo, Sunny; Choudhury, Tanzeem

Traditionally smart environments have been understood to represent those (often physical) spaces where computation is embedded into the users' surrounding infrastructure, buildings, homes, and workplaces. Users of this "smartness" move in and out of these spaces. Ambient intelligence assumes that users are automatically and seamlessly provided with context-aware, adaptive information, applications and even sensing - though this remains a significant challenge even when limited to these specialized, instrumented locales. Since not all environments are "smart" the experience is not a pervasive one; rather, users move between these intelligent islands of computationally enhanced space while we still aspire to achieve a more ideal anytime, anywhere experience. Two key technological trends are helping to bridge the gap between these smart environments and make the associated experience more persistent and pervasive. Smaller and more computationally sophisticated mobile devices allow sensing, communication, and services to be more directly and continuously experienced by user. Improved infrastructure and the availability of uninterrupted data streams, for instance location-based data, enable new services and applications to persist across environments.

Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Progress on the Fabric for Frontier Experiments Project at Fermilab

NASA Astrophysics Data System (ADS)

Box, Dennis; Boyd, Joseph; Dykstra, Dave; Garzoglio, Gabriele; Herner, Kenneth; Kirby, Michael; Kreymer, Arthur; Levshina, Tanya; Mhashilkar, Parag; Sharma, Neha

2015-12-01

The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.
The diverse use of clouds by CMS

DOE PAGES

Andronis, Anastasios; Bauer, Daniela; Chaze, Olivier; ...

2015-12-23

The resources CMS is using are increasingly being offered as clouds. In Run 2 of the LHC the majority of CMS CERN resources, both in Meyrin and at the Wigner Computing Centre, will be presented as cloud resources on which CMS will have to build its own infrastructure. This infrastructure will need to run all of the CMS workflows including: Tier 0, production and user analysis. In addition, the CMS High Level Trigger will provide a compute resource comparable in scale to the total offered by the CMS Tier 1 sites, when it is not running as part of themore » trigger system. During these periods a cloud infrastructure will be overlaid on this resource, making it accessible for general CMS use. Finally, CMS is starting to utilise cloud resources being offered by individual institutes and is gaining experience to facilitate the use of opportunistically available cloud resources. Lastly, we present a snap shot of this infrastructure and its operation at the time of the CHEP2015 conference.« less
Managing competing elastic Grid and Cloud scientific computing applications using OpenNebula

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Lusso, S.; Masera, M.; Vallero, S.

2015-12-01

Elastic cloud computing applications, i.e. applications that automatically scale according to computing needs, work on the ideal assumption of infinite resources. While large public cloud infrastructures may be a reasonable approximation of this condition, scientific computing centres like WLCG Grid sites usually work in a saturated regime, in which applications compete for scarce resources through queues, priorities and scheduling policies, and keeping a fraction of the computing cores idle to allow for headroom is usually not an option. In our particular environment one of the applications (a WLCG Tier-2 Grid site) is much larger than all the others and cannot autoscale easily. Nevertheless, other smaller applications can benefit of automatic elasticity; the implementation of this property in our infrastructure, based on the OpenNebula cloud stack, will be described and the very first operational experiences with a small number of strategies for timely allocation and release of resources will be discussed.
Challenges and opportunities of cloud computing for atmospheric sciences

NASA Astrophysics Data System (ADS)

Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.

2016-04-01

Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.
Complete distributed computing environment for a HEP experiment: experience with ARC-connected infrastructure for ATLAS

NASA Astrophysics Data System (ADS)

Read, A.; Taga, A.; O-Saada, F.; Pajchel, K.; Samset, B. H.; Cameron, D.

2008-07-01

Computing and storage resources connected by the Nordugrid ARC middleware in the Nordic countries, Switzerland and Slovenia are a part of the ATLAS computing Grid. This infrastructure is being commissioned with the ongoing ATLAS Monte Carlo simulation production in preparation for the commencement of data taking in 2008. The unique non-intrusive architecture of ARC, its straightforward interplay with the ATLAS Production System via the Dulcinea executor, and its performance during the commissioning exercise is described. ARC support for flexible and powerful end-user analysis within the GANGA distributed analysis framework is also shown. Whereas the storage solution for this Grid was earlier based on a large, distributed collection of GridFTP-servers, the ATLAS computing design includes a structured SRM-based system with a limited number of storage endpoints. The characteristics, integration and performance of the old and new storage solutions are presented. Although the hardware resources in this Grid are quite modest, it has provided more than double the agreed contribution to the ATLAS production with an efficiency above 95% during long periods of stable operation.
Model development, testing and experimentation in a CyberWorkstation for Brain-Machine Interface research.

PubMed

Rattanatamrong, Prapaporn; Matsunaga, Andrea; Raiturkar, Pooja; Mesa, Diego; Zhao, Ming; Mahmoudi, Babak; Digiovanna, Jack; Principe, Jose; Figueiredo, Renato; Sanchez, Justin; Fortes, Jose

2010-01-01

The CyberWorkstation (CW) is an advanced cyber-infrastructure for Brain-Machine Interface (BMI) research. It allows the development, configuration and execution of BMI computational models using high-performance computing resources. The CW's concept is implemented using a software structure in which an "experiment engine" is used to coordinate all software modules needed to capture, communicate and process brain signals and motor-control commands. A generic BMI-model template, which specifies a common interface to the CW's experiment engine, and a common communication protocol enable easy addition, removal or replacement of models without disrupting system operation. This paper reviews the essential components of the CW and shows how templates can facilitate the processes of BMI model development, testing and incorporation into the CW. It also discusses the ongoing work towards making this process infrastructure independent.
Software Infrastructure for Computer-aided Drug Discovery and Development, a Practical Example with Guidelines.

PubMed

Moretti, Loris; Sartori, Luca

2016-09-01

In the field of Computer-Aided Drug Discovery and Development (CADDD) the proper software infrastructure is essential for everyday investigations. The creation of such an environment should be carefully planned and implemented with certain features in order to be productive and efficient. Here we describe a solution to integrate standard computational services into a functional unit that empowers modelling applications for drug discovery. This system allows users with various level of expertise to run in silico experiments automatically and without the burden of file formatting for different software, managing the actual computation, keeping track of the activities and graphical rendering of the structural outcomes. To showcase the potential of this approach, performances of five different docking programs on an Hiv-1 protease test set are presented. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Using OSG Computing Resources with (iLC)Dirac

NASA Astrophysics Data System (ADS)

Sailer, A.; Petric, M.; CLICdp Collaboration

2017-10-01

CPU cycles for small experiments and projects can be scarce, thus making use of all available resources, whether dedicated or opportunistic, is mandatory. While enabling uniform access to the LCG computing elements (ARC, CREAM), the DIRAC grid interware was not able to use OSG computing elements (GlobusCE, HTCondor-CE) without dedicated support at the grid site through so called ‘SiteDirectors’, which directly submit to the local batch system. This in turn requires additional dedicated effort for small experiments on the grid site. Adding interfaces to the OSG CEs through the respective grid middleware is therefore allowing accessing them within the DIRAC software without additional site-specific infrastructure. This enables greater use of opportunistic resources for experiments and projects without dedicated clusters or an established computing infrastructure with the DIRAC software. To allow sending jobs to HTCondor-CE and legacy Globus computing elements inside DIRAC the required wrapper classes were developed. Not only is the usage of these types of computing elements now completely transparent for all DIRAC instances, which makes DIRAC a flexible solution for OSG based virtual organisations, but it also allows LCG Grid Sites to move to the HTCondor-CE software, without shutting DIRAC based VOs out of their site. In these proceedings we detail how we interfaced the DIRAC system to the HTCondor-CE and Globus computing elements and explain the encountered obstacles and solutions developed, and how the linear collider community uses resources in the OSG.
A Framework for Debugging Geoscience Projects in a High Performance Computing Environment

NASA Astrophysics Data System (ADS)

Baxter, C.; Matott, L.

2012-12-01

High performance computing (HPC) infrastructure has become ubiquitous in today's world with the emergence of commercial cloud computing and academic supercomputing centers. Teams of geoscientists, hydrologists and engineers can take advantage of this infrastructure to undertake large research projects - for example, linking one or more site-specific environmental models with soft computing algorithms, such as heuristic global search procedures, to perform parameter estimation and predictive uncertainty analysis, and/or design least-cost remediation systems. However, the size, complexity and distributed nature of these projects can make identifying failures in the associated numerical experiments using conventional ad-hoc approaches both time- consuming and ineffective. To address these problems a multi-tiered debugging framework has been developed. The framework allows for quickly isolating and remedying a number of potential experimental failures, including: failures in the HPC scheduler; bugs in the soft computing code; bugs in the modeling code; and permissions and access control errors. The utility of the framework is demonstrated via application to a series of over 200,000 numerical experiments involving a suite of 5 heuristic global search algorithms and 15 mathematical test functions serving as cheap analogues for the simulation-based optimization of pump-and-treat subsurface remediation systems.
ATLAS computing on Swiss Cloud SWITCHengines

NASA Astrophysics Data System (ADS)

Haug, S.; Sciacca, F. G.; ATLAS Collaboration

2017-10-01

Consolidation towards more computing at flat budgets beyond what pure chip technology can offer, is a requirement for the full scientific exploitation of the future data from the Large Hadron Collider at CERN in Geneva. One consolidation measure is to exploit cloud infrastructures whenever they are financially competitive. We report on the technical solutions and the performances used and achieved running simulation tasks for the ATLAS experiment on SWITCHengines. SWITCHengines is a new infrastructure as a service offered to Swiss academia by the National Research and Education Network SWITCH. While solutions and performances are general, financial considerations and policies, on which we also report, are country specific.
Separating Added Value from Hype: Some Experiences and Prognostications

NASA Astrophysics Data System (ADS)

Reed, Dan

2004-03-01

These are exciting times for the interplay of science and computing technology. As new data archives, instruments and computing facilities are connected nationally and internationally, a new model of distributed scientific collaboration is emerging. However, any new technology brings both opportunities and challenges -- Grids are no exception. In this talk, we will discuss some of the experiences deploying Grid software in production environments, illustrated with experiences from the NSF PACI Alliance, the NSF Extensible Terascale Facility (ETF) and other Grid projects. From these experiences, we derive some guidelines for deployment and some suggestions for community engagement, software development and infrastructure
Integrating multiple scientific computing needs via a Private Cloud infrastructure

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Brunetti, R.; Lusso, S.; Vallero, S.

2014-06-01

In a typical scientific computing centre, diverse applications coexist and share a single physical infrastructure. An underlying Private Cloud facility eases the management and maintenance of heterogeneous use cases such as multipurpose or application-specific batch farms, Grid sites catering to different communities, parallel interactive data analysis facilities and others. It allows to dynamically and efficiently allocate resources to any application and to tailor the virtual machines according to the applications' requirements. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques; for example, rolling updates can be performed easily and minimizing the downtime. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 site and a dynamically expandable PROOF-based Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The Private Cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem (used in two different configurations for worker- and service-class hypervisors) and the OpenWRT Linux distribution (used for network virtualization). A future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and by using mainstream contextualization tools like CloudInit.
Integration of a neuroimaging processing pipeline into a pan-canadian computing grid

NASA Astrophysics Data System (ADS)

Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.

2012-02-01

The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.
The INDIGO-Datacloud Authentication and Authorization Infrastructure

NASA Astrophysics Data System (ADS)

Ceccanti, A.; Hardt, M.; Wegh, B.; Millar, AP; Caberletti, M.; Vianello, E.; Licehammer, S.

2017-10-01

Contemporary distributed computing infrastructures (DCIs) are not easily and securely accessible by scientists. These computing environments are typically hard to integrate due to interoperability problems resulting from the use of different authentication mechanisms, identity negotiation protocols and access control policies. Such limitations have a big impact on the user experience making it hard for user communities to port and run their scientific applications on resources aggregated from multiple providers. The INDIGO-DataCloud project wants to provide the services and tools needed to enable a secure composition of resources from multiple providers in support of scientific applications. In order to do so, a common AAI architecture has to be defined that supports multiple authentication mechanisms, support delegated authorization across services and can be easily integrated in off-the-shelf software. In this contribution we introduce the INDIGO Authentication and Authorization Infrastructure, describing its main components and their status and how authentication, delegation and authorization flows are implemented across services.
SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology

PubMed Central

Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E.; Troein, Carl; Millar, Andrew J.; Goryanin, Igor; Gilmore, Stephen

2013-01-01

Summary: Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI’s use of standard data formats. Availability and implementation: All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials. Contact: stg@inf.ed.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23329415
Progress on the FabrIc for Frontier Experiments project at Fermilab

DOE PAGES

Box, Dennis; Boyd, Joseph; Dykstra, Dave; ...

2015-12-23

The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercialmore » cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. Hence, the progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide« less
Integrating Puppet and Gitolite to provide a novel solution for scalable system management at the MPPMU Tier2 centre

NASA Astrophysics Data System (ADS)

Delle Fratte, C.; Kennedy, J. A.; Kluth, S.; Mazzaferro, L.

2015-12-01

In a grid computing infrastructure tasks such as continuous upgrades, services installations and software deployments are part of an admins daily work. In such an environment tools to help with the management, provisioning and monitoring of the deployed systems and services have become crucial. As experiments such as the LHC increase in scale, the computing infrastructure also becomes larger and more complex. Moreover, today's admins increasingly work within teams that share responsibilities and tasks. Such a scaled up situation requires tools that not only simplify the workload on administrators but also enable them to work seamlessly in teams. In this paper will be presented our experience from managing the Max Planck Institute Tier2 using Puppet and Gitolite in a cooperative way to help the system administrator in their daily work. In addition to describing the Puppet-Gitolite system, best practices and customizations will also be shown.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yilk, Todd

The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.

Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

DOE PAGES

Yilk, Todd

2018-02-17

The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Application of large-scale computing infrastructure for diverse environmental research applications using GC3Pie

NASA Astrophysics Data System (ADS)

Maffioletti, Sergio; Dawes, Nicholas; Bavay, Mathias; Sarni, Sofiane; Lehning, Michael

2013-04-01

The Swiss Experiment platform (SwissEx: http://www.swiss-experiment.ch) provides a distributed storage and processing infrastructure for environmental research experiments. The aim of the second phase project (the Open Support Platform for Environmental Research, OSPER, 2012-2015) is to develop the existing infrastructure to provide scientists with an improved workflow. This improved workflow will include pre-defined, documented and connected processing routines. A large-scale computing and data facility is required to provide reliable and scalable access to data for analysis, and it is desirable that such an infrastructure should be free of traditional data handling methods. Such an infrastructure has been developed using the cloud-based part of the Swiss national infrastructure SMSCG (http://www.smscg.ch) and Academic Cloud. The infrastructure under construction supports two main usage models: 1) Ad-hoc data analysis scripts: These scripts are simple processing scripts, written by the environmental researchers themselves, which can be applied to large data sets via the high power infrastructure. Examples of this type of script are spatial statistical analysis scripts (R-based scripts), mostly computed on raw meteorological and/or soil moisture data. These provide processed output in the form of a grid, a plot, or a kml. 2) Complex models: A more intense data analysis pipeline centered (initially) around the physical process model, Alpine3D, and the MeteoIO plugin; depending on the data set, this may require a tightly coupled infrastructure. SMSCG already supports Alpine3D executions as both regular grid jobs and as virtual software appliances. A dedicated appliance with the Alpine3D specific libraries has been created and made available through the SMSCG infrastructure. The analysis pipelines are activated and supervised by simple control scripts that, depending on the data fetched from the meteorological stations, launch new instances of the Alpine3D appliance, execute location-based subroutines at each grid point and store the results back into the central repository for post-processing. An optional extension of this infrastructure will be to provide a 'ring buffer'-type database infrastructure, such that model results (e.g. test runs made to check parameter dependency or for development) can be visualised and downloaded after completion without submitting them to a permanent storage infrastructure. Data organization Data collected from sensors are archived and classified in distributed sites connected with an open-source software middleware, GSN. Publicly available data are available through common web services and via a cloud storage server (based on Swift). Collocation of the data and processing in the cloud would eventually eliminate data transfer requirements. Execution control logic Execution of the data analysis pipelines (for both the R-based analysis and the Alpine3D simulations) has been implemented using the GC3Pie framework developed by UZH. (https://code.google.com/p/gc3pie/). This allows large-scale, fault-tolerant execution of the pipelines to be described in terms of software appliances. GC3Pie also allows supervision of the execution of large campaigns of appliances as a single simulation. This poster will present the fundamental architectural components of the data analysis pipelines together with initial experimental results.
The FIFE Project at Fermilab

DOE Office of Scientific and Technical Information (OSTI.GOV)

Box, D.; Boyd, J.; Di Benedetto, V.

2016-01-01

The FabrIc for Frontier Experiments (FIFE) project is an initiative within the Fermilab Scientific Computing Division designed to steer the computing model for non-LHC Fermilab experiments across multiple physics areas. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying size, needs, and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of solutions for high throughput computing, data management, database access and collaboration management within an experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid compute sites alongmore » with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including a common job submission service, software and reference data distribution through CVMFS repositories, flexible and robust data transfer clients, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken the leading role in defining the computing model for Fermilab experiments, aided in the design of experiments beyond those hosted at Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.« less
Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

PubMed Central

2013-01-01

Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made. PMID:23800020
Elastic extension of a local analysis facility on external clouds for the LHC experiments

NASA Astrophysics Data System (ADS)

Ciaschini, V.; Codispoti, G.; Rinaldi, L.; Aiftimiei, D. C.; Bonacorsi, D.; Calligola, P.; Dal Pra, S.; De Girolamo, D.; Di Maria, R.; Grandi, C.; Michelotto, D.; Panella, M.; Taneja, S.; Semeria, F.

2017-10-01

The computing infrastructures serving the LHC experiments have been designed to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, the LHC experiments are exploring the opportunity to access Cloud resources provided by external partners or commercial providers. In this work we present the proof of concept of the elastic extension of a local analysis facility, specifically the Bologna Tier-3 Grid site, for the LHC experiments hosted at the site, on an external OpenStack infrastructure. We focus on the Cloud Bursting of the Grid site using DynFarm, a newly designed tool that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on an OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage.
The International Symposium on Grids and Clouds

NASA Astrophysics Data System (ADS)

The International Symposium on Grids and Clouds (ISGC) 2012 will be held at Academia Sinica in Taipei from 26 February to 2 March 2012, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). 2012 is the decennium anniversary of the ISGC which over the last decade has tracked the convergence, collaboration and innovation of individual researchers across the Asia Pacific region to a coherent community. With the continuous support and dedication from the delegates, ISGC has provided the primary international distributed computing platform where distinguished researchers and collaboration partners from around the world share their knowledge and experiences. The last decade has seen the wide-scale emergence of e-Infrastructure as a critical asset for the modern e-Scientist. The emergence of large-scale research infrastructures and instruments that has produced a torrent of electronic data is forcing a generational change in the scientific process and the mechanisms used to analyse the resulting data deluge. No longer can the processing of these vast amounts of data and production of relevant scientific results be undertaken by a single scientist. Virtual Research Communities that span organisations around the world, through an integrated digital infrastructure that connects the trust and administrative domains of multiple resource providers, have become critical in supporting these analyses. Topics covered in ISGC 2012 include: High Energy Physics, Biomedicine & Life Sciences, Earth Science, Environmental Changes and Natural Disaster Mitigation, Humanities & Social Sciences, Operations & Management, Middleware & Interoperability, Security and Networking, Infrastructure Clouds & Virtualisation, Business Models & Sustainability, Data Management, Distributed Volunteer & Desktop Grid Computing, High Throughput Computing, and High Performance, Manycore & GPU Computing.
Research infrastructure support to address ecosystem dynamics

NASA Astrophysics Data System (ADS)

Los, Wouter

2014-05-01

Predicting the evolution of ecosystems to climate change or human pressures is a challenge. Even understanding past or current processes is complicated as a result of the many interactions and feedbacks that occur within and between components of the system. This talk will present an example of current research on changes in landscape evolution, hydrology, soil biogeochemical processes, zoological food webs, and plant community succession, and how these affect feedbacks to components of the systems, including the climate system. Multiple observations, experiments, and simulations provide a wealth of data, but not necessarily understanding. Model development on the coupled processes on different spatial and temporal scales is sensitive for variations in data and of parameter change. Fast high performance computing may help to visualize the effect of these changes and the potential stability (and reliability) of the models. This may than allow for iteration between data production and models towards stable models reducing uncertainty and improving the prediction of change. The role of research infrastructures becomes crucial is overcoming barriers for such research. Environmental infrastructures are covering physical site facilities, dedicated instrumentation and e-infrastructure. The LifeWatch infrastructure for biodiversity and ecosystem research will provide services for data integration, analysis and modeling. But it has to cooperate intensively with the other kinds of infrastructures in order to support the iteration between data production and model computation. The cooperation in the ENVRI project (Common operations of environmental research infrastructures) is one of the initiatives to foster such multidisciplinary research.
Advances in Grid Computing for the FabrIc for Frontier Experiments Project at Fermialb

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herner, K.; Alba Hernandex, A. F.; Bhat, S.

The FabrIc for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientic Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of diering size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certicate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have signicantly matured, and present an increasinglymore » complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the eorts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production work ows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular work ows, and support troubleshooting and triage in case of problems. Recently a new certicate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specic third-party Certicate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.« less
Advances in Grid Computing for the Fabric for Frontier Experiments Project at Fermilab

NASA Astrophysics Data System (ADS)

Herner, K.; Alba Hernandez, A. F.; Bhat, S.; Box, D.; Boyd, J.; Di Benedetto, V.; Ding, P.; Dykstra, D.; Fattoruso, M.; Garzoglio, G.; Kirby, M.; Kreymer, A.; Levshina, T.; Mazzacane, A.; Mengel, M.; Mhashilkar, P.; Podstavkov, V.; Retzke, K.; Sharma, N.; Teheran, J.

2017-10-01

The Fabric for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientific Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of differing size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certificate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have significantly matured, and present an increasingly complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the efforts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production workflows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular workflows, and support troubleshooting and triage in case of problems. Recently a new certificate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specific third-party Certificate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy

PubMed Central

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-01-01

Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.

PubMed

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-06-01

Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
Mock Data Challenge for the MPD/NICA Experiment on the HybriLIT Cluster

NASA Astrophysics Data System (ADS)

Gertsenberger, Konstantin; Rogachevsky, Oleg

2018-02-01

Simulation of data processing before receiving first experimental data is an important issue in high-energy physics experiments. This article presents the current Event Data Model and the Mock Data Challenge for the MPD experiment at the NICA accelerator complex which uses ongoing simulation studies to exercise in a stress-testing the distributed computing infrastructure and experiment software in the full production environment from simulated data through the physical analysis.
Current Grid operation and future role of the Grid

NASA Astrophysics Data System (ADS)

Smirnova, O.

2012-12-01

Grid-like technologies and approaches became an integral part of HEP experiments. Some other scientific communities also use similar technologies for data-intensive computations. The distinct feature of Grid computing is the ability to federate heterogeneous resources of different ownership into a seamless infrastructure, accessible via a single log-on. Like other infrastructures of similar nature, Grid functioning requires not only technologically sound basis, but also reliable operation procedures, monitoring and accounting. The two aspects, technological and operational, are closely related: weaker is the technology, more burden is on operations, and other way around. As of today, Grid technologies are still evolving: at CERN alone, every LHC experiment uses an own Grid-like system. This inevitably creates a heavy load on operations. Infrastructure maintenance, monitoring and incident response are done on several levels, from local system administrators to large international organisations, involving massive human effort worldwide. The necessity to commit substantial resources is one of the obstacles faced by smaller research communities when moving computing to the Grid. Moreover, most current Grid solutions were developed under significant influence of HEP use cases, and thus need additional effort to adapt them to other applications. Reluctance of many non-HEP researchers to use Grid negatively affects the outlook for national Grid organisations, which strive to provide multi-science services. We started from the situation where Grid organisations were fused with HEP laboratories and national HEP research programmes; we hope to move towards the world where Grid will ultimately reach the status of generic public computing and storage service provider and permanent national and international Grid infrastructures will be established. How far will we be able to advance along this path, depends on us. If no standardisation and convergence efforts will take place, Grid will become limited to HEP; if however the current multitude of Grid-like systems will converge to a generic, modular and extensible solution, Grid will become true to its name.
Controlling Infrastructure Costs: Right-Sizing the Mission Control Facility

NASA Technical Reports Server (NTRS)

Martin, Keith; Sen-Roy, Michael; Heiman, Jennifer

2009-01-01

Johnson Space Center's Mission Control Center is a space vehicle, space program agnostic facility. The current operational design is essentially identical to the original facility architecture that was developed and deployed in the mid-90's. In an effort to streamline the support costs of the mission critical facility, the Mission Operations Division (MOD) of Johnson Space Center (JSC) has sponsored an exploratory project to evaluate and inject current state-of-the-practice Information Technology (IT) tools, processes and technology into legacy operations. The general push in the IT industry has been trending towards a data-centric computer infrastructure for the past several years. Organizations facing challenges with facility operations costs are turning to creative solutions combining hardware consolidation, virtualization and remote access to meet and exceed performance, security, and availability requirements. The Operations Technology Facility (OTF) organization at the Johnson Space Center has been chartered to build and evaluate a parallel Mission Control infrastructure, replacing the existing, thick-client distributed computing model and network architecture with a data center model utilizing virtualization to provide the MCC Infrastructure as a Service. The OTF will design a replacement architecture for the Mission Control Facility, leveraging hardware consolidation through the use of blade servers, increasing utilization rates for compute platforms through virtualization while expanding connectivity options through the deployment of secure remote access. The architecture demonstrates the maturity of the technologies generally available in industry today and the ability to successfully abstract the tightly coupled relationship between thick-client software and legacy hardware into a hardware agnostic "Infrastructure as a Service" capability that can scale to meet future requirements of new space programs and spacecraft. This paper discusses the benefits and difficulties that a migration to cloud-based computing philosophies has uncovered when compared to the legacy Mission Control Center architecture. The team consists of system and software engineers with extensive experience with the MCC infrastructure and software currently used to support the International Space Station (ISS) and Space Shuttle program (SSP).
Federated data storage and management infrastructure

NASA Astrophysics Data System (ADS)

Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.

2016-10-01

The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.
Processing of the WLCG monitoring data using NoSQL

NASA Astrophysics Data System (ADS)

Andreeva, J.; Beche, A.; Belov, S.; Dzhunov, I.; Kadochnikov, I.; Karavakis, E.; Saiz, P.; Schovancova, J.; Tuckett, D.

2014-06-01

The Worldwide LHC Computing Grid (WLCG) today includes more than 150 computing centres where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. Monitoring the computing activities of the LHC experiments, over such a huge heterogeneous infrastructure, is extremely demanding in terms of computation, performance and reliability. Furthermore, the generated monitoring flow is constantly increasing, which represents another challenge for the monitoring systems. While existing solutions are traditionally based on Oracle for data storage and processing, recent developments evaluate NoSQL for processing large-scale monitoring datasets. NoSQL databases are getting increasingly popular for processing datasets at the terabyte and petabyte scale using commodity hardware. In this contribution, the integration of NoSQL data processing in the Experiment Dashboard framework is described along with first experiences of using this technology for monitoring the LHC computing activities.
The ATLAS Simulation Infrastructure

DOE PAGES

Aad, G.; Abbott, B.; Abdallah, J.; ...

2010-09-25

The simulation software for the ATLAS Experiment at the Large Hadron Collider is being used for large-scale production of events on the LHC Computing Grid. This simulation requires many components, from the generators that simulate particle collisions, through packages simulating the response of the various detectors and triggers. All of these components come together under the ATLAS simulation infrastructure. In this paper, that infrastructure is discussed, including that supporting the detector description, interfacing the event generation, and combining the GEANT4 simulation of the response of the individual detectors. Also described are the tools allowing the software validation, performance testing, andmore » the validation of the simulated output against known physics processes.« less
AGIS: Integration of new technologies used in ATLAS Distributed Computing

NASA Astrophysics Data System (ADS)

Anisenkov, Alexey; Di Girolamo, Alessandro; Alandes Pradillo, Maria

2017-10-01

The variety of the ATLAS Distributed Computing infrastructure requires a central information system to define the topology of computing resources and to store different parameters and configuration data which are needed by various ATLAS software components. The ATLAS Grid Information System (AGIS) is the system designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by ATLAS Distributed Computing applications and services. Being an intermediate middleware system between clients and external information sources (like central BDII, GOCDB, MyOSG), AGIS defines the relations between experiment specific used resources and physical distributed computing capabilities. Being in production during LHC Runl AGIS became the central information system for Distributed Computing in ATLAS and it is continuously evolving to fulfil new user requests, enable enhanced operations and follow the extension of the ATLAS Computing model. The ATLAS Computing model and data structures used by Distributed Computing applications and services are continuously evolving and trend to fit newer requirements from ADC community. In this note, we describe the evolution and the recent developments of AGIS functionalities, related to integration of new technologies recently become widely used in ATLAS Computing, like flexible computing utilization of opportunistic Cloud and HPC resources, ObjectStore services integration for Distributed Data Management (Rucio) and ATLAS workload management (PanDA) systems, unified storage protocols declaration required for PandDA Pilot site movers and others. The improvements of information model and general updates are also shown, in particular we explain how other collaborations outside ATLAS could benefit the system as a computing resources information catalogue. AGIS is evolving towards a common information system, not coupled to a specific experiment.
Elastic Extension of a CMS Computing Centre Resources on External Clouds

NASA Astrophysics Data System (ADS)

Codispoti, G.; Di Maria, R.; Aiftimiei, C.; Bonacorsi, D.; Calligola, P.; Ciaschini, V.; Costantini, A.; Dal Pra, S.; DeGirolamo, D.; Grandi, C.; Michelotto, D.; Panella, M.; Peco, G.; Sapunenko, V.; Sgaravatto, M.; Taneja, S.; Zizzi, G.

2016-10-01

After the successful LHC data taking in Run-I and in view of the future runs, the LHC experiments are facing new challenges in the design and operation of the computing facilities. The computing infrastructure for Run-II is dimensioned to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, CMS - along the lines followed by other LHC experiments - is exploring the opportunity to access Cloud resources provided by external partners or commercial providers. Specific use cases have already been explored and successfully exploited during Long Shutdown 1 (LS1) and the first part of Run 2. In this work we present the proof of concept of the elastic extension of a CMS site, specifically the Bologna Tier-3, on an external OpenStack infrastructure. We focus on the “Cloud Bursting” of a CMS Grid site using a newly designed LSF configuration that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on the OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage. The amount of resources allocated thus can be elastically modeled to cope up with the needs of CMS experiment and local users. Moreover, a direct access/integration of OpenStack resources to the CMS workload management system is explored. In this paper we present this approach, we report on the performances of the on-demand allocated resources, and we discuss the lessons learned and the next steps.
AGIS: The ATLAS Grid Information System

NASA Astrophysics Data System (ADS)

Anisenkov, A.; Di Girolamo, A.; Klimentov, A.; Oleynik, D.; Petrosyan, A.; Atlas Collaboration

2014-06-01

ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produced petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization and computing resources able to meet ATLAS requirements of petabytes scale data operations. In this paper we describe the ATLAS Grid Information System (AGIS), designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.

Using high-performance networks to enable computational aerosciences applications

NASA Technical Reports Server (NTRS)

Johnson, Marjory J.

1992-01-01

One component of the U.S. Federal High Performance Computing and Communications Program (HPCCP) is the establishment of a gigabit network to provide a communications infrastructure for researchers across the nation. This gigabit network will provide new services and capabilities, in addition to increased bandwidth, to enable future applications. An understanding of these applications is necessary to guide the development of the gigabit network and other high-performance networks of the future. In this paper we focus on computational aerosciences applications run remotely using the Numerical Aerodynamic Simulation (NAS) facility located at NASA Ames Research Center. We characterize these applications in terms of network-related parameters and relate user experiences that reveal limitations imposed by the current wide-area networking infrastructure. Then we investigate how the development of a nationwide gigabit network would enable users of the NAS facility to work in new, more productive ways.
Mitigating Communication Delays in Remotely Connected Hardware-in-the-loop Experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cale, James; Johnson, Brian; Dall'Anese, Emiliano

Here, this paper introduces a potential approach for mitigating the effects of communication delays between multiple, closed-loop hardware-in-the-loop experiments which are virtually connected, yet physically separated. The method consists of an analytical method for the compensation of communication delays, along with the supporting computational and communication infrastructure. The control design leverages tools for the design of observers for the compensation of measurement errors in systems with time-varying delays. The proposed methodology is validated through computer simulation and hardware experimentation connecting hardware-in-the-loop experiments conducted between laboratories separated by a distance of over 100 km.
Mitigating Communication Delays in Remotely Connected Hardware-in-the-loop Experiments

DOE PAGES

Cale, James; Johnson, Brian; Dall'Anese, Emiliano; ...

2018-03-30

Here, this paper introduces a potential approach for mitigating the effects of communication delays between multiple, closed-loop hardware-in-the-loop experiments which are virtually connected, yet physically separated. The method consists of an analytical method for the compensation of communication delays, along with the supporting computational and communication infrastructure. The control design leverages tools for the design of observers for the compensation of measurement errors in systems with time-varying delays. The proposed methodology is validated through computer simulation and hardware experimentation connecting hardware-in-the-loop experiments conducted between laboratories separated by a distance of over 100 km.
Dynamic Collaboration Infrastructure for Hydrologic Science

NASA Astrophysics Data System (ADS)

Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.

2016-12-01

Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.
Recent evolution of the offline computing model of the NOvA experiment

DOE PAGES

Habig, Alec; Norman, A.; Group, Craig

2015-12-23

The NOvA experiment at Fermilab is a long-baseline neutrino experiment designed to study ν e appearance in a ν μ beam. Over the last few years there has been intense work to streamline the computing infrastructure in preparation for data, which started to flow in from the far detector in Fall 2013. Major accomplishments for this effort include migration to the use of off-site resources through the use of the Open Science Grid and upgrading the file-handling framework from simple disk storage to a tiered system using a comprehensive data management and delivery system to find and access files onmore » either disk or tape storage. NOvA has already produced more than 6.5 million files and more than 1 PB of raw data and Monte Carlo simulation files which are managed under this model. In addition, the current system has demonstrated sustained rates of up to 1 TB/hour of file transfer by the data handling system. NOvA pioneered the use of new tools and this paved the way for their use by other Intensity Frontier experiments at Fermilab. Most importantly, the new framework places the experiment's infrastructure on a firm foundation, and is ready to produce the files needed for first physics.« less
Recent Evolution of the Offline Computing Model of the NOvA Experiment

NASA Astrophysics Data System (ADS)

Habig, Alec; Norman, A.

2015-12-01

The NOvA experiment at Fermilab is a long-baseline neutrino experiment designed to study νe appearance in a νμ beam. Over the last few years there has been intense work to streamline the computing infrastructure in preparation for data, which started to flow in from the far detector in Fall 2013. Major accomplishments for this effort include migration to the use of off-site resources through the use of the Open Science Grid and upgrading the file-handling framework from simple disk storage to a tiered system using a comprehensive data management and delivery system to find and access files on either disk or tape storage. NOvA has already produced more than 6.5 million files and more than 1 PB of raw data and Monte Carlo simulation files which are managed under this model. The current system has demonstrated sustained rates of up to 1 TB/hour of file transfer by the data handling system. NOvA pioneered the use of new tools and this paved the way for their use by other Intensity Frontier experiments at Fermilab. Most importantly, the new framework places the experiment's infrastructure on a firm foundation, and is ready to produce the files needed for first physics.
Characterizing Crowd Participation and Productivity of Foldit Through Web Scraping

DTIC Science & Technology

2016-03-01

Berkeley Open Infrastructure for Network Computing CDF Cumulative Distribution Function CPU Central Processing Unit CSSG Crowdsourced Serious Game...computers at once can create a similar capacity. According to Anderson [6], principal investigator for the Berkeley Open Infrastructure for Network...extraterrestrial life. From this project, a software-based distributed computing platform called the Berkeley Open Infrastructure for Network Computing
A simple grid implementation with Berkeley Open Infrastructure for Network Computing using BLAST as a model

PubMed Central

Pinthong, Watthanai; Muangruen, Panya

2016-01-01

Development of high-throughput technologies, such as Next-generation sequencing, allows thousands of experiments to be performed simultaneously while reducing resource requirement. Consequently, a massive amount of experiment data is now rapidly generated. Nevertheless, the data are not readily usable or meaningful until they are further analysed and interpreted. Due to the size of the data, a high performance computer (HPC) is required for the analysis and interpretation. However, the HPC is expensive and difficult to access. Other means were developed to allow researchers to acquire the power of HPC without a need to purchase and maintain one such as cloud computing services and grid computing system. In this study, we implemented grid computing in a computer training center environment using Berkeley Open Infrastructure for Network Computing (BOINC) as a job distributor and data manager combining all desktop computers to virtualize the HPC. Fifty desktop computers were used for setting up a grid system during the off-hours. In order to test the performance of the grid system, we adapted the Basic Local Alignment Search Tools (BLAST) to the BOINC system. Sequencing results from Illumina platform were aligned to the human genome database by BLAST on the grid system. The result and processing time were compared to those from a single desktop computer and HPC. The estimated durations of BLAST analysis for 4 million sequence reads on a desktop PC, HPC and the grid system were 568, 24 and 5 days, respectively. Thus, the grid implementation of BLAST by BOINC is an efficient alternative to the HPC for sequence alignment. The grid implementation by BOINC also helped tap unused computing resources during the off-hours and could be easily modified for other available bioinformatics software. PMID:27547555
The Next Generation of Lab and Classroom Computing - The Silver Lining

DTIC Science & Technology

2016-12-01

desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The research... infrastructure , VDI, hardware cost, software cost, manpower, availability, cloud computing, private cloud, bring your own device, BYOD, thin client...virtual desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The
Chaos Breeds Life: Finding Opportunities for Library Advancement during a Period of Collection Schizophrenia.

ERIC Educational Resources Information Center

Neal, James G.

1999-01-01

Examines the changes that are affecting academic library collection development. Highlights include computer technology; digital information; networking; virtual reality; hypertext; fair use and copyrights; technological infrastructure; digital libraries; information policy; academic and scholarly publishing; and experiences at the Johns Hopkins…
Life science research and drug discovery at the turn of the 21st century: the experience of SwissBioGrid.

PubMed

den Besten, Matthijs; Thomas, Arthur J; Schroeder, Ralph

2009-04-22

It is often said that the life sciences are transforming into an information science. As laboratory experiments are starting to yield ever increasing amounts of data and the capacity to deal with those data is catching up, an increasing share of scientific activity is seen to be taking place outside the laboratories, sifting through the data and modelling "in silico" the processes observed "in vitro." The transformation of the life sciences and similar developments in other disciplines have inspired a variety of initiatives around the world to create technical infrastructure to support the new scientific practices that are emerging. The e-Science programme in the United Kingdom and the NSF Office for Cyberinfrastructure are examples of these. In Switzerland there have been no such national initiatives. Yet, this has not prevented scientists from exploring the development of similar types of computing infrastructures. In 2004, a group of researchers in Switzerland established a project, SwissBioGrid, to explore whether Grid computing technologies could be successfully deployed within the life sciences. This paper presents their experiences as a case study of how the life sciences are currently operating as an information science and presents the lessons learned about how existing institutional and technical arrangements facilitate or impede this operation. SwissBioGrid gave rise to two pilot projects: one for proteomics data analysis and the other for high-throughput molecular docking ("virtual screening") to find new drugs for neglected diseases (specifically, for dengue fever). The proteomics project was an example of a data management problem, applying many different analysis algorithms to Terabyte-sized datasets from mass spectrometry, involving comparisons with many different reference databases; the virtual screening project was more a purely computational problem, modelling the interactions of millions of small molecules with a limited number of protein targets on the coat of the dengue virus. Both present interesting lessons about how scientific practices are changing when they tackle the problems of large-scale data analysis and data management by means of creating a novel technical infrastructure. In the experience of SwissBioGrid, data intensive discovery has a lot to gain from close collaboration with industry and harnessing distributed computing power. Yet the diversity in life science research implies only a limited role for generic infrastructure; and the transience of support means that researchers need to integrate their efforts with others if they want to sustain the benefits of their success, which are otherwise lost.
Quantifying the Digital Divide: A Scientific Overview of Network Connectivity and Grid Infrastructure in South Asian Countries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khan, Shahryar Muhammad; /SLAC /NUST, Rawalpindi; Cottrell, R.Les

2007-10-30

The future of Computing in High Energy Physics (HEP) applications depends on both the Network and Grid infrastructure. South Asian countries such as India and Pakistan are making significant progress by building clusters as well as improving their network infrastructure However to facilitate the use of these resources, they need to manage the issues of network connectivity to be among the leading participants in Computing for HEP experiments. In this paper we classify the connectivity for academic and research institutions of South Asia. The quantitative measurements are carried out using the PingER methodology; an approach that induces minimal ICMP trafficmore » to gather active end-to-end network statistics. The PingER project has been measuring the Internet performance for the last decade. Currently the measurement infrastructure comprises of over 700 hosts in more than 130 countries which collectively represents approximately 99% of the world's Internet-connected population. Thus, we are well positioned to characterize the world's connectivity. Here we present the current state of the National Research and Educational Networks (NRENs) and Grid Infrastructure in the South Asian countries and identify the areas of concern. We also present comparisons between South Asia and other developing as well as developed regions. We show that there is a strong correlation between the Network performance and several Human Development indices.« less
Analysis of CERN computing infrastructure and monitoring data

NASA Astrophysics Data System (ADS)

Nieke, C.; Lassnig, M.; Menichetti, L.; Motesnitsalis, E.; Duellmann, D.

2015-12-01

Optimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group. The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools. To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface. Using this infrastructure we were able to quantitatively analyze the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput. In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.
Supporting research sites in resource-limited settings: Challenges in implementing IT infrastructure

PubMed Central

Whalen, Christopher; Donnell, Deborah; Tartakovsky, Michael

2014-01-01

As Information and Communication Technology infrastructure becomes more reliable, new methods of Electronic Data Capture (EDC), datamarts/Data warehouses, and mobile computing provide platforms for rapid coordination of international research projects and multisite studies. However, despite the increasing availability of internet connectivity and communication systems in remote regions of the world, there are still significant obstacles. Sites with poor infrastructure face serious challenges participating in modern clinical and basic research, particularly that relying on EDC and internet communication technologies. This report discusses our experiences in supporting research in resource-limited settings (RLS). We describe examples of the practical and ethical/regulatory challenges raised by use of these newer technologies for data collection in multisite clinical studies. PMID:24321986
Supporting research sites in resource-limited settings: challenges in implementing information technology infrastructure.

PubMed

Whalen, Christopher J; Donnell, Deborah; Tartakovsky, Michael

2014-01-01

As information and communication technology infrastructure becomes more reliable, new methods of electronic data capture, data marts/data warehouses, and mobile computing provide platforms for rapid coordination of international research projects and multisite studies. However, despite the increasing availability of Internet connectivity and communication systems in remote regions of the world, there are still significant obstacles. Sites with poor infrastructure face serious challenges participating in modern clinical and basic research, particularly that relying on electronic data capture and Internet communication technologies. This report discusses our experiences in supporting research in resource-limited settings. We describe examples of the practical and ethical/regulatory challenges raised by the use of these newer technologies for data collection in multisite clinical studies.
The dependence of educational infrastructure on clinical infrastructure.

PubMed Central

Cimino, C.

1998-01-01

The Albert Einstein College of Medicine needed to assess the growth of its infrastructure for educational computing as a first step to determining if student needs were being met. Included in computing infrastructure are space, equipment, software, and computing services. The infrastructure was assessed by reviewing purchasing and support logs for a six year period from 1992 to 1998. This included equipment, software, and e-mail accounts provided to students and to faculty for educational purposes. Student space has grown at a constant rate (averaging 14% increase each year respectively). Student equipment on campus has grown by a constant amount each year (average 8.3 computers each year). Student infrastructure off campus and educational support of faculty has not kept pace. It has either declined or remained level over the six year period. The availability of electronic mail clearly demonstrates this with accounts being used by 99% of students, 78% of Basic Science Course Leaders, 38% of Clerkship Directors, 18% of Clerkship Site Directors, and 8% of Clinical Elective Directors. The collection of the initial descriptive infrastructure data has revealed problems that may generalize to other medical schools. The discrepancy between infrastructure available to students and faculty on campus and students and faculty off campus creates a setting where students perceive a paradoxical declining support for computer use as they progress through medical school. While clinical infrastructure may be growing, it is at the expense of educational infrastructure at affiliate hospitals. PMID:9929262
Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure

DTIC Science & Technology

2017-01-05

AFRL-AFOSR-JP-TR-2017-0002 Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure Manabu...Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure 5a. CONTRACT NUMBER 5b. GRANT NUMBER FA2386...UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This report for the project titled ’Advanced Computational Methods for Optimization of
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

NASA Astrophysics Data System (ADS)

Klimentov, A.; Buncic, P.; De, K.; Jha, S.; Maeno, T.; Mount, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Porter, R. J.; Read, K. F.; Vaniachine, A.; Wells, J. C.; Wenaus, T.

2015-05-01

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled ‘Next Generation Workload Management and Analysis System for Big Data’ (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.
Collaborative Multi-Scale 3d City and Infrastructure Modeling and Simulation

NASA Astrophysics Data System (ADS)

Breunig, M.; Borrmann, A.; Rank, E.; Hinz, S.; Kolbe, T.; Schilcher, M.; Mundani, R.-P.; Jubierre, J. R.; Flurl, M.; Thomsen, A.; Donaubauer, A.; Ji, Y.; Urban, S.; Laun, S.; Vilgertshofer, S.; Willenborg, B.; Menninghaus, M.; Steuer, H.; Wursthorn, S.; Leitloff, J.; Al-Doori, M.; Mazroobsemnani, N.

2017-09-01

Computer-aided collaborative and multi-scale 3D planning are challenges for complex railway and subway track infrastructure projects in the built environment. Many legal, economic, environmental, and structural requirements have to be taken into account. The stringent use of 3D models in the different phases of the planning process facilitates communication and collaboration between the stake holders such as civil engineers, geological engineers, and decision makers. This paper presents concepts, developments, and experiences gained by an interdisciplinary research group coming from civil engineering informatics and geo-informatics banding together skills of both, the Building Information Modeling and the 3D GIS world. New approaches including the development of a collaborative platform and 3D multi-scale modelling are proposed for collaborative planning and simulation to improve the digital 3D planning of subway tracks and other infrastructures. Experiences during this research and lessons learned are presented as well as an outlook on future research focusing on Building Information Modeling and 3D GIS applications for cities of the future.
Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

De, K; Jha, S; Klimentov, A

2016-01-01

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), MIRA supercomputer at Argonne Leadership Computing Facilities (ALCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava and others). Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full production for the ATLAS experiment since September 2015. We will present our current accomplishments with running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.« less

The Cloud Area Padovana: from pilot to production

NASA Astrophysics Data System (ADS)

Andreetto, P.; Costa, F.; Crescente, A.; Dorigo, A.; Fantinel, S.; Fanzago, F.; Sgaravatto, M.; Traldi, S.; Verlato, M.; Zangrando, L.

2017-10-01

The Cloud Area Padovana has been running for almost two years. This is an OpenStack-based scientific cloud, spread across two different sites: the INFN Padova Unit and the INFN Legnaro National Labs. The hardware resources have been scaled horizontally and vertically, by upgrading some hypervisors and by adding new ones: currently it provides about 1100 cores. Some in-house developments were also integrated in the OpenStack dashboard, such as a tool for user and project registrations with direct support for the INFN-AAI Identity Provider as a new option for the user authentication. In collaboration with the EU-funded Indigo DataCloud project, the integration with Docker-based containers has been experimented with and will be available in production soon. This computing facility now satisfies the computational and storage demands of more than 70 users affiliated with about 20 research projects. We present here the architecture of this Cloud infrastructure, the tools and procedures used to operate it. We also focus on the lessons learnt in these two years, describing the problems that were found and the corrective actions that had to be applied. We also discuss about the chosen strategy for upgrades, which combines the need to promptly integrate the OpenStack new developments, the demand to reduce the downtimes of the infrastructure, and the need to limit the effort requested for such updates. We also discuss how this Cloud infrastructure is being used. In particular we focus on two big physics experiments which are intensively exploiting this computing facility: CMS and SPES. CMS deployed on the cloud a complex computational infrastructure, composed of several user interfaces for job submission in the Grid environment/local batch queues or for interactive processes; this is fully integrated with the local Tier-2 facility. To avoid a static allocation of the resources, an elastic cluster, based on cernVM, has been configured: it allows to automatically create and delete virtual machines according to the user needs. SPES, using a client-server system called TraceWin, exploits INFN’s virtual resources performing a very large number of simulations on about a thousand nodes elastically managed.
The framework for simulation of bioinspired security mechanisms against network infrastructure attacks.

PubMed

Shorov, Andrey; Kotenko, Igor

2014-01-01

The paper outlines a bioinspired approach named "network nervous system" and methods of simulation of infrastructure attacks and protection mechanisms based on this approach. The protection mechanisms based on this approach consist of distributed procedures of information collection and processing, which coordinate the activities of the main devices of a computer network, identify attacks, and determine necessary countermeasures. Attacks and protection mechanisms are specified as structural models using a set-theoretic approach. An environment for simulation of protection mechanisms based on the biological metaphor is considered; the experiments demonstrating the effectiveness of the protection mechanisms are described.
Cloud Computing in Support of Applied Learning: A Baseline Study of Infrastructure Design at Southern Polytechnic State University

ERIC Educational Resources Information Center

Conn, Samuel S.; Reichgelt, Han

2013-01-01

Cloud computing represents an architecture and paradigm of computing designed to deliver infrastructure, platforms, and software as constructible computing resources on demand to networked users. As campuses are challenged to better accommodate academic needs for applications and computing environments, cloud computing can provide an accommodating…
Advanced Optical Burst Switched Network Concepts

NASA Astrophysics Data System (ADS)

Nejabati, Reza; Aracil, Javier; Castoldi, Piero; de Leenheer, Marc; Simeonidou, Dimitra; Valcarenghi, Luca; Zervas, Georgios; Wu, Jian

In recent years, as the bandwidth and the speed of networks have increased significantly, a new generation of network-based applications using the concept of distributed computing and collaborative services is emerging (e.g., Grid computing applications). The use of the available fiber and DWDM infrastructure for these applications is a logical choice offering huge amounts of cheap bandwidth and ensuring global reach of computing resources [230]. Currently, there is a great deal of interest in deploying optical circuit (wavelength) switched network infrastructure for distributed computing applications that require long-lived wavelength paths and address the specific needs of a small number of well-known users. Typical users are particle physicists who, due to their international collaborations and experiments, generate enormous amounts of data (Petabytes per year). These users require a network infrastructures that can support processing and analysis of large datasets through globally distributed computing resources [230]. However, providing wavelength granularity bandwidth services is not an efficient and scalable solution for applications and services that address a wider base of user communities with different traffic profiles and connectivity requirements. Examples of such applications may be: scientific collaboration in smaller scale (e.g., bioinformatics, environmental research), distributed virtual laboratories (e.g., remote instrumentation), e-health, national security and defense, personalized learning environments and digital libraries, evolving broadband user services (i.e., high resolution home video editing, real-time rendering, high definition interactive TV). As a specific example, in e-health services and in particular mammography applications due to the size and quantity of images produced by remote mammography, stringent network requirements are necessary. Initial calculations have shown that for 100 patients to be screened remotely, the network would have to securely transport 1.2 GB of data every 30 s [230]. According to the above explanation it is clear that these types of applications need a new network infrastructure and transport technology that makes large amounts of bandwidth at subwavelength granularity, storage, computation, and visualization resources potentially available to a wide user base for specified time durations. As these types of collaborative and network-based applications evolve addressing a wide range and large number of users, it is infeasible to build dedicated networks for each application type or category. Consequently, there should be an adaptive network infrastructure able to support all application types, each with their own access, network, and resource usage patterns. This infrastructure should offer flexible and intelligent network elements and control mechanism able to deploy new applications quickly and efficiently.
Design and Evaluation of a Virtual Environment Infrastructure to Support Experiments in Social Behavior

ERIC Educational Resources Information Center

Hmeljak, Dimitrij

2010-01-01

Virtual worlds provide useful platforms for social behavioral research, but impose stringent limitations on the rules of engagement, responsiveness, and data collection, along with other resource restrictions. The major challenge from a computer science standpoint in developing group behavior applications for such environments is accommodating the…
Grids and clouds in the Czech NGI

NASA Astrophysics Data System (ADS)

Kundrát, Jan; Adam, Martin; Adamová, Dagmar; Chudoba, Jiří; Kouba, Tomáš; Lokajíček, Miloš; Mikula, Alexandr; Říkal, Václav; Švec, Jan; Vohnout, Rudolf

2016-09-01

There are several infrastructure operators within the Czech Republic NGI (National Grid Initiative) which provide users with access to high-performance computing facilities over a grid and cloud interface. This article focuses on those where the primary author has personal first-hand experience. We cover some operational issues as well as the history of these facilities.
Swiss Experiment: Design, implemention and use of a cross-disciplinary infrastructure for data intensive science

NASA Astrophysics Data System (ADS)

Dawes, N.; Salehi, A.; Clifton, A.; Bavay, M.; Aberer, K.; Parlange, M. B.; Lehning, M.

2010-12-01

It has long been known that environmental processes are cross-disciplinary, but data has continued to be acquired and held for a single purpose. Swiss Experiment is a rapidly evolving cross-disciplinary, distributed sensor data infrastructure, where tools for the environmental science community stem directly from computer science research. The platform uses the bleeding edge of computer science to acquire, store and distribute data and metadata from all environmental science disciplines at a variety of temporal and spatial resolutions. SwissEx is simultaneously developing new technologies to allow low cost, high spatial and temporal resolution measurements such that small areas can be intensely monitored. This data is then combined with existing widespread, low density measurements in the cross-disciplinary platform to provide well documented datasets, which are of use to multiple research disciplines. We present a flexible, generic infrastructure at an advanced stage of development. The infrastructure makes the most of Web 2.0 technologies for a collaborative working environment and as a user interface for a metadata database. This environment is already closely integrated with GSN, an open-source database middleware developed under Swiss Experiment for acquisition and storage of generic time-series data (2D and 3D). GSN can be queried directly by common data processing packages and makes data available in real-time to models and 3rd party software interfaces via its web service interface. It also provides real-time push or pull data exchange between instances, a user management system which leaves data owners in charge of their data, advanced real-time processing and much more. The SwissEx interface is increasingly gaining users and supporting environmental science in Switzerland. It is also an integral part of environmental education projects ClimAtscope and O3E, where the technologies can provide rapid feedback of results for children of all ages and where the data from their own stations can be compared to national data networks.
Integration of Panda Workload Management System with supercomputers

NASA Astrophysics Data System (ADS)

De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Novikov, A.; Oleynik, D.; Panitkin, S.; Poyda, A.; Read, K. F.; Ryabinkin, E.; Teslyuk, A.; Velikhov, V.; Wells, J. C.; Wenaus, T.

2016-09-01

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), Supercomputer at the National Research Center "Kurchatov Institute", IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run singlethreaded workloads in parallel on Titan's multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms. We will present our current accomplishments in running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facility's infrastructure for High Energy and Nuclear Physics, as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.
Lowering the barriers to computational modeling of Earth's surface: coupling Jupyter Notebooks with Landlab, HydroShare, and CyberGIS for research and education.

NASA Astrophysics Data System (ADS)

Bandaragoda, C.; Castronova, A. M.; Phuong, J.; Istanbulluoglu, E.; Strauch, R. L.; Nudurupati, S. S.; Tarboton, D. G.; Wang, S. W.; Yin, D.; Barnhart, K. R.; Tucker, G. E.; Hutton, E.; Hobley, D. E. J.; Gasparini, N. M.; Adams, J. M.

2017-12-01

The ability to test hypotheses about hydrology, geomorphology and atmospheric processes is invaluable to research in the era of big data. Although community resources are available, there remain significant educational, logistical and time investment barriers to their use. Knowledge infrastructure is an emerging intellectual framework to understand how people are creating, sharing and distributing knowledge - which has been dramatically transformed by Internet technologies. In addition to the technical and social components in a cyberinfrastructure system, knowledge infrastructure considers educational, institutional, and open source governance components required to advance knowledge. We are designing an infrastructure environment that lowers common barriers to reproducing modeling experiments for earth surface investigation. Landlab is an open-source modeling toolkit for building, coupling, and exploring two-dimensional numerical models. HydroShare is an online collaborative environment for sharing hydrologic data and models. CyberGIS-Jupyter is an innovative cyberGIS framework for achieving data-intensive, reproducible, and scalable geospatial analytics using the Jupyter Notebook based on ROGER - the first cyberGIS supercomputer, so that models that can be elastically reproduced through cloud computing approaches. Our team of geomorphologists, hydrologists, and computer geoscientists has created a new infrastructure environment that combines these three pieces of software to enable knowledge discovery. Through this novel integration, any user can interactively execute and explore their shared data and model resources. Landlab on HydroShare with CyberGIS-Jupyter supports the modeling continuum from fully developed modelling applications, prototyping new science tools, hands on research demonstrations for training workshops, and classroom applications. Computational geospatial models based on big data and high performance computing can now be more efficiently developed, improved, scaled, and seamlessly reproduced among multidisciplinary users, thereby expanding the active learning curriculum and research opportunities for students in earth surface modeling and informatics.
Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

NASA Astrophysics Data System (ADS)

Klimentov, A.; De, K.; Jha, S.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Wells, J.; Wenaus, T.

2016-10-01

The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than grid can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility. Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full pro duction for the ATLAS since September 2015. We will present our current accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.
A Cloud-based Infrastructure and Architecture for Environmental System Research

NASA Astrophysics Data System (ADS)

Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.

2016-12-01

The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.
Design and implementation of a reliable and cost-effective cloud computing infrastructure: the INFN Napoli experience

NASA Astrophysics Data System (ADS)

Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.

2012-12-01

Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
SAME4HPC: A Promising Approach in Building a Scalable and Mobile Environment for High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karthik, Rajasekar

2014-01-01

In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
The Framework for Simulation of Bioinspired Security Mechanisms against Network Infrastructure Attacks

PubMed Central

Kotenko, Igor

2014-01-01

The paper outlines a bioinspired approach named “network nervous system" and methods of simulation of infrastructure attacks and protection mechanisms based on this approach. The protection mechanisms based on this approach consist of distributed prosedures of information collection and processing, which coordinate the activities of the main devices of a computer network, identify attacks, and determine nessesary countermeasures. Attacks and protection mechanisms are specified as structural models using a set-theoretic approach. An environment for simulation of protection mechanisms based on the biological metaphor is considered; the experiments demonstrating the effectiveness of the protection mechanisms are described. PMID:25254229
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

DOE PAGES

Klimentov, A.; Buncic, P.; De, K.; ...

2015-05-22

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klimentov, A.; Buncic, P.; De, K.

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less
Using Amazon's Elastic Compute Cloud to dynamically scale CMS computational resources

NASA Astrophysics Data System (ADS)

Evans, D.; Fisk, I.; Holzman, B.; Melo, A.; Metson, S.; Pordes, R.; Sheldon, P.; Tiradani, A.

2011-12-01

Large international scientific collaborations such as the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider have traditionally addressed their data reduction and analysis needs by building and maintaining dedicated computational infrastructure. Emerging cloud computing services such as Amazon's Elastic Compute Cloud (EC2) offer short-term CPU and storage resources with costs based on usage. These services allow experiments to purchase computing resources as needed, without significant prior planning and without long term investments in facilities and their management. We have demonstrated that services such as EC2 can successfully be integrated into the production-computing model of CMS, and find that they work very well as worker nodes. The cost-structure and transient nature of EC2 services makes them inappropriate for some CMS production services and functions. We also found that the resources are not truely "on-demand" as limits and caps on usage are imposed. Our trial workflows allow us to make a cost comparison between EC2 resources and dedicated CMS resources at a University, and conclude that it is most cost effective to purchase dedicated resources for the "base-line" needs of experiments such as CMS. However, if the ability to use cloud computing resources is built into an experiment's software framework before demand requires their use, cloud computing resources make sense for bursting during times when spikes in usage are required.
Flexible services for the support of research.

PubMed

Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John

2013-01-28

Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
Building a Prototype of LHC Analysis Oriented Computing Centers

NASA Astrophysics Data System (ADS)

Bagliesi, G.; Boccali, T.; Della Ricca, G.; Donvito, G.; Paganoni, M.

2012-12-01

A Consortium between four LHC Computing Centers (Bari, Milano, Pisa and Trieste) has been formed in 2010 to prototype Analysis-oriented facilities for CMS data analysis, profiting from a grant from the Italian Ministry of Research. The Consortium aims to realize an ad-hoc infrastructure to ease the analysis activities on the huge data set collected at the LHC Collider. While “Tier2” Computing Centres, specialized in organized processing tasks like Monte Carlo simulation, are nowadays a well established concept, with years of running experience, site specialized towards end user chaotic analysis activities do not yet have a defacto standard implementation. In our effort, we focus on all the aspects that can make the analysis tasks easier for a physics user not expert in computing. On the storage side, we are experimenting on storage techniques allowing for remote data access and on storage optimization on the typical analysis access patterns. On the networking side, we are studying the differences between flat and tiered LAN architecture, also using virtual partitioning of the same physical networking for the different use patterns. Finally, on the user side, we are developing tools and instruments to allow for an exhaustive monitoring of their processes at the site, and for an efficient support system in case of problems. We will report about the results of the test executed on different subsystem and give a description of the layout of the infrastructure in place at the site participating to the consortium.
Implementing Computer-Aided Instruction in Distance Education: An Infrastructure. RR/89-06.

ERIC Educational Resources Information Center

Kotze, Paula

The infrastructure required for the implementation of computer aided instruction is described with particular reference to the distance education environment at the University of South Africa. A review of the state of the art of online distance education in the United States and Europe is followed by an outline of the proposed infrastructure for…

Enabling Grid Computing resources within the KM3NeT computing model

NASA Astrophysics Data System (ADS)

Filippidis, Christos

2016-04-01

KM3NeT is a future European deep-sea research infrastructure hosting a new generation neutrino detectors that - located at the bottom of the Mediterranean Sea - will open a new window on the universe and answer fundamental questions both in particle physics and astrophysics. International collaborative scientific experiments, like KM3NeT, are generating datasets which are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. These experiments, in their majority, adopt computing models consisting of different Tiers with several computing centres and providing a specific set of services for the different steps of data processing such as detector calibration, simulation and data filtering, reconstruction and analysis. The computing requirements are extremely demanding and, usually, span from serial to multi-parallel or GPU-optimized jobs. The collaborative nature of these experiments demands very frequent WAN data transfers and data sharing among individuals and groups. In order to support the aforementioned demanding computing requirements we enabled Grid Computing resources, operated by EGI, within the KM3NeT computing model. In this study we describe our first advances in this field and the method for the KM3NeT users to utilize the EGI computing resources in a simulation-driven use-case.
Data Center Consolidation: A Step towards Infrastructure Clouds

NASA Astrophysics Data System (ADS)

Winter, Markus

Application service providers face enormous challenges and rising costs in managing and operating a growing number of heterogeneous system and computing landscapes. Limitations of traditional computing environments force IT decision-makers to reorganize computing resources within the data center, as continuous growth leads to an inefficient utilization of the underlying hardware infrastructure. This paper discusses a way for infrastructure providers to improve data center operations based on the findings of a case study on resource utilization of very large business applications and presents an outlook beyond server consolidation endeavors, transforming corporate data centers into compute clouds.
Elastic Cloud Computing Infrastructures in the Open Cirrus Testbed Implemented via Eucalyptus

NASA Astrophysics Data System (ADS)

Baun, Christian; Kunze, Marcel

Cloud computing realizes the advantages and overcomes some restrictionsof the grid computing paradigm. Elastic infrastructures can easily be createdand managed by cloud users. In order to accelerate the research ondata center management and cloud services the OpenCirrusTM researchtestbed has been started by HP, Intel and Yahoo!. Although commercialcloud offerings are proprietary, Open Source solutions exist in the field ofIaaS with Eucalyptus, PaaS with AppScale and at the applications layerwith Hadoop MapReduce. This paper examines the I/O performance ofcloud computing infrastructures implemented with Eucalyptus in contrastto Amazon S3.
Network and computing infrastructure for scientific applications in Georgia

NASA Astrophysics Data System (ADS)

Kvatadze, R.; Modebadze, Z.

2016-09-01

Status of network and computing infrastructure and available services for research and education community of Georgia are presented. Research and Educational Networking Association - GRENA provides the following network services: Internet connectivity, network services, cyber security, technical support, etc. Computing resources used by the research teams are located at GRENA and at major state universities. GE-01-GRENA site is included in European Grid infrastructure. Paper also contains information about programs of Learning Center and research and development projects in which GRENA is participating.
Implementation of Grid Tier 2 and Tier 3 facilities on a Distributed OpenStack Cloud

NASA Astrophysics Data System (ADS)

Limosani, Antonio; Boland, Lucien; Coddington, Paul; Crosby, Sean; Huang, Joanna; Sevior, Martin; Wilson, Ross; Zhang, Shunde

2014-06-01

The Australian Government is making a AUD 100 million investment in Compute and Storage for the academic community. The Compute facilities are provided in the form of 30,000 CPU cores located at 8 nodes around Australia in a distributed virtualized Infrastructure as a Service facility based on OpenStack. The storage will eventually consist of over 100 petabytes located at 6 nodes. All will be linked via a 100 Gb/s network. This proceeding describes the development of a fully connected WLCG Tier-2 grid site as well as a general purpose Tier-3 computing cluster based on this architecture. The facility employs an extension to Torque to enable dynamic allocations of virtual machine instances. A base Scientific Linux virtual machine (VM) image is deployed in the OpenStack cloud and automatically configured as required using Puppet. Custom scripts are used to launch multiple VMs, integrate them into the dynamic Torque cluster and to mount remote file systems. We report on our experience in developing this nation-wide ATLAS and Belle II Tier 2 and Tier 3 computing infrastructure using the national Research Cloud and storage facilities.
Methodologies and systems for heterogeneous concurrent computing

NASA Technical Reports Server (NTRS)

Sunderam, V. S.

1994-01-01

Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.
Integration of Russian Tier-1 Grid Center with High Performance Computers at NRC-KI for LHC experiments and beyond HENP

NASA Astrophysics Data System (ADS)

Belyaev, A.; Berezhnaya, A.; Betev, L.; Buncic, P.; De, K.; Drizhuk, D.; Klimentov, A.; Lazin, Y.; Lyalin, I.; Mashinistov, R.; Novikov, A.; Oleynik, D.; Polyakov, A.; Poyda, A.; Ryabinkin, E.; Teslyuk, A.; Tkachenko, I.; Yasnopolskiy, L.

2015-12-01

The LHC experiments are preparing for the precision measurements and further discoveries that will be made possible by higher LHC energies from April 2015 (LHC Run2). The need for simulation, data processing and analysis would overwhelm the expected capacity of grid infrastructure computing facilities deployed by the Worldwide LHC Computing Grid (WLCG). To meet this challenge the integration of the opportunistic resources into LHC computing model is highly important. The Tier-1 facility at Kurchatov Institute (NRC-KI) in Moscow is a part of WLCG and it will process, simulate and store up to 10% of total data obtained from ALICE, ATLAS and LHCb experiments. In addition Kurchatov Institute has supercomputers with peak performance 0.12 PFLOPS. The delegation of even a fraction of supercomputing resources to the LHC Computing will notably increase total capacity. In 2014 the development a portal combining a Tier-1 and a supercomputer in Kurchatov Institute was started to provide common interfaces and storage. The portal will be used not only for HENP experiments, but also by other data- and compute-intensive sciences like biology with genome sequencing analysis; astrophysics with cosmic rays analysis, antimatter and dark matter search, etc.
Research Institute for Technical Careers

NASA Technical Reports Server (NTRS)

Glenn, Ronald L.

1996-01-01

The NASA research grant to Wilberforce University enabled us to establish the Research Institute for Technical Careers (RITC) in order to improve the teaching of science and engineering at Wilberforce. The major components of the research grant are infrastructure development, establishment of the Wilberforce Intensive Summer Experience (WISE), and Joint Research Collaborations with NASA Scientists. (A) Infrastructure Development. The NASA grant has enabled us to improve the standard of our chemistry laboratory and establish the electronics, design, and robotics laboratories. These laboratories have significantly improved the level of instruction at Wilberforce University. (B) Wilberforce Intensive Summer Experience (WISE). The WISE program is a science and engineering bridge program for prefreshman students. It is an intensive academic experience designed to strengthen students' knowledge in mathematics, science, engineering, computing skills, and writing. (C) Joint Collaboration. Another feature of the grant is research collaborations between NASA Scientists and Wilberforce University Scientists. These collaborations have enabled our faculty and students to conduct research at NASA Lewis during the summer and publish research findings in various journals and scientific proceedings.
Consolidating WLCG topology and configuration in the Computing Resource Information Catalogue

DOE PAGES

Alandes, Maria; Andreeva, Julia; Anisenkov, Alexey; ...

2017-10-01

Here, the Worldwide LHC Computing Grid infrastructure links about 200 participating computing centres affiliated with several partner projects. It is built by integrating heterogeneous computer and storage resources in diverse data centres all over the world and provides CPU and storage capacity to the LHC experiments to perform data processing and physics analysis. In order to be used by the experiments, these distributed resources should be well described, which implies easy service discovery and detailed description of service configuration. Currently this information is scattered over multiple generic information sources like GOCDB, OIM, BDII and experiment-specific information systems. Such a modelmore » does not allow to validate topology and configuration information easily. Moreover, information in various sources is not always consistent. Finally, the evolution of computing technologies introduces new challenges. Experiments are more and more relying on opportunistic resources, which by their nature are more dynamic and should also be well described in the WLCG information system. This contribution describes the new WLCG configuration service CRIC (Computing Resource Information Catalogue) which collects information from various information providers, performs validation and provides a consistent set of UIs and APIs to the LHC VOs for service discovery and usage configuration. The main requirements for CRIC are simplicity, agility and robustness. CRIC should be able to be quickly adapted to new types of computing resources, new information sources, and allow for new data structures to be implemented easily following the evolution of the computing models and operations of the experiments.« less
Consolidating WLCG topology and configuration in the Computing Resource Information Catalogue

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alandes, Maria; Andreeva, Julia; Anisenkov, Alexey

Here, the Worldwide LHC Computing Grid infrastructure links about 200 participating computing centres affiliated with several partner projects. It is built by integrating heterogeneous computer and storage resources in diverse data centres all over the world and provides CPU and storage capacity to the LHC experiments to perform data processing and physics analysis. In order to be used by the experiments, these distributed resources should be well described, which implies easy service discovery and detailed description of service configuration. Currently this information is scattered over multiple generic information sources like GOCDB, OIM, BDII and experiment-specific information systems. Such a modelmore » does not allow to validate topology and configuration information easily. Moreover, information in various sources is not always consistent. Finally, the evolution of computing technologies introduces new challenges. Experiments are more and more relying on opportunistic resources, which by their nature are more dynamic and should also be well described in the WLCG information system. This contribution describes the new WLCG configuration service CRIC (Computing Resource Information Catalogue) which collects information from various information providers, performs validation and provides a consistent set of UIs and APIs to the LHC VOs for service discovery and usage configuration. The main requirements for CRIC are simplicity, agility and robustness. CRIC should be able to be quickly adapted to new types of computing resources, new information sources, and allow for new data structures to be implemented easily following the evolution of the computing models and operations of the experiments.« less
Consolidating WLCG topology and configuration in the Computing Resource Information Catalogue

NASA Astrophysics Data System (ADS)

Alandes, Maria; Andreeva, Julia; Anisenkov, Alexey; Bagliesi, Giuseppe; Belforte, Stephano; Campana, Simone; Dimou, Maria; Flix, Jose; Forti, Alessandra; di Girolamo, A.; Karavakis, Edward; Lammel, Stephan; Litmaath, Maarten; Sciaba, Andrea; Valassi, Andrea

2017-10-01

The Worldwide LHC Computing Grid infrastructure links about 200 participating computing centres affiliated with several partner projects. It is built by integrating heterogeneous computer and storage resources in diverse data centres all over the world and provides CPU and storage capacity to the LHC experiments to perform data processing and physics analysis. In order to be used by the experiments, these distributed resources should be well described, which implies easy service discovery and detailed description of service configuration. Currently this information is scattered over multiple generic information sources like GOCDB, OIM, BDII and experiment-specific information systems. Such a model does not allow to validate topology and configuration information easily. Moreover, information in various sources is not always consistent. Finally, the evolution of computing technologies introduces new challenges. Experiments are more and more relying on opportunistic resources, which by their nature are more dynamic and should also be well described in the WLCG information system. This contribution describes the new WLCG configuration service CRIC (Computing Resource Information Catalogue) which collects information from various information providers, performs validation and provides a consistent set of UIs and APIs to the LHC VOs for service discovery and usage configuration. The main requirements for CRIC are simplicity, agility and robustness. CRIC should be able to be quickly adapted to new types of computing resources, new information sources, and allow for new data structures to be implemented easily following the evolution of the computing models and operations of the experiments.
Bringing the CMS distributed computing system into scalable operations

NASA Astrophysics Data System (ADS)

Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.

2010-04-01

Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing

PubMed Central

Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

2018-01-01

The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination. PMID:29565313
Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing.

PubMed

Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

2018-03-22

The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination.
Centre for Research Infrastructure of Polish GNSS Data - response and possible contribution to EPOS

NASA Astrophysics Data System (ADS)

Araszkiewicz, Andrzej; Rohm, Witold; Bosy, Jaroslaw; Szolucha, Marcin; Kaplon, Jan; Kroszczynski, Krzysztof

2017-04-01

In the frame of the first call under Action 4.2: Development of modern research infrastructure of the science sector in the Smart Growth Operational Programme 2014-2020 in the late of 2016 the "EPOS-PL" project has launched. Following institutes are responsible for the implementation of this project: Institute of Geophysics, Polish Academy of Sciences - Project Leader, Academic Computer Centre Cyfronet AGH University of Science and Technology, Central Mining Institute, the Institute of Geodesy and Cartography, Wrocław University of Environmental and Life Sciences, Military University of Technology. In addition, resources constituting entrepreneur's own contribution will come from the Polish Mining Group. Research Infrastructure EPOS-PL will integrate both existing and newly built National Research Infrastructures (Theme Centre for Research Infrastructures), which, under the premise of the program EPOS, are financed exclusively by the national founds. In addition, the e-science platform will be developed. The Centre for Research Infrastructure of GNSS Data (CIBDG - Task 5) will be built based on the experience and facilities of two institutions: Military University of Technology and Wrocław University of Environmental and Life Sciences. The project includes the construction of the National GNNS Repository with data QC procedures and adaptation of two Regional GNNS Analysis Centres for rapid and long-term geodynamical monitoring.
Federation in genomics pipelines: techniques and challenges.

PubMed

Chaterji, Somali; Koo, Jinkyu; Li, Ninghui; Meyer, Folker; Grama, Ananth; Bagchi, Saurabh

2017-08-29

Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Utilization of Cloud Computing in Education and Research to the Attainment of Millennium Development Goals and Vision 2030 in Kenya

ERIC Educational Resources Information Center

Waga, Duncan; Makori, Esther; Rabah, Kefa

2014-01-01

Kenya Educational and Research fraternity has highly qualified human resource capacity with globally gained experiences. However each research entity works in disparity due to the absence of a common digital platform while educational units don't even have the basic infrastructure. For sustainability of Education and research progression,…
Heterogeneous concurrent computing with exportable services

NASA Technical Reports Server (NTRS)

Sunderam, Vaidy

1995-01-01

Heterogeneous concurrent computing, based on the traditional process-oriented model, is approaching its functionality and performance limits. An alternative paradigm, based on the concept of services, supporting data driven computation, and built on a lightweight process infrastructure, is proposed to enhance the functional capabilities and the operational efficiency of heterogeneous network-based concurrent computing. TPVM is an experimental prototype system supporting exportable services, thread-based computation, and remote memory operations that is built as an extension of and an enhancement to the PVM concurrent computing system. TPVM offers a significantly different computing paradigm for network-based computing, while maintaining a close resemblance to the conventional PVM model in the interest of compatibility and ease of transition Preliminary experiences have demonstrated that the TPVM framework presents a natural yet powerful concurrent programming interface, while being capable of delivering performance improvements of upto thirty percent.
KODAMA and VPC based Framework for Ubiquitous Systems and its Experiment

NASA Astrophysics Data System (ADS)

Takahashi, Kenichi; Amamiya, Satoshi; Iwao, Tadashige; Zhong, Guoqiang; Kainuma, Tatsuya; Amamiya, Makoto

Recently, agent technologies have attracted a lot of interest as an emerging programming paradigm. With such agent technologies, services are provided through collaboration among agents. At the same time, the spread of mobile technologies and communication infrastructures has made it possible to access the network anytime and from anywhere. Using agents and mobile technologies to realize ubiquitous computing systems, we propose a new framework based on KODAMA and VPC. KODAMA provides distributed management mechanisms by using the concept of community and communication infrastructure to deliver messages among agents without agents being aware of the physical network. VPC provides a method of defining peer-to-peer services based on agent communication with policy packages. By merging the characteristics of both KODAMA and VPC functions, we propose a new framework for ubiquitous computing environments. It provides distributed management functions according to the concept of agent communities, agent communications which are abstracted from the physical environment, and agent collaboration with policy packages. Using our new framework, we conducted a large-scale experiment in shopping malls in Nagoya, which sent advertisement e-mails to users' cellular phones according to user location and attributes. The empirical results showed that our new framework worked effectively for sales in shopping malls.
Integration of end-user Cloud storage for CMS analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Riahi, Hassen; Aimar, Alberto; Ayllon, Alejandro Alvarez

End-user Cloud storage is increasing rapidly in popularity in research communities thanks to the collaboration capabilities it offers, namely synchronisation and sharing. CERN IT has implemented a model of such storage named, CERNBox, integrated with the CERN AuthN and AuthZ services. To exploit the use of the end-user Cloud storage for the distributed data analysis activity, the CMS experiment has started the integration of CERNBox as a Grid resource. This will allow CMS users to make use of their own storage in the Cloud for their analysis activities as well as to benefit from synchronisation and sharing capabilities to achievemore » results faster and more effectively. It will provide an integration model of Cloud storages in the Grid, which is implemented and commissioned over the world’s largest computing Grid infrastructure, Worldwide LHC Computing Grid (WLCG). In this paper, we present the integration strategy and infrastructure changes needed in order to transparently integrate end-user Cloud storage with the CMS distributed computing model. We describe the new challenges faced in data management between Grid and Cloud and how they were addressed, along with details of the support for Cloud storage recently introduced into the WLCG data movement middleware, FTS3. Finally, the commissioning experience of CERNBox for the distributed data analysis activity is also presented.« less

Integration of end-user Cloud storage for CMS analysis

DOE PAGES

Riahi, Hassen; Aimar, Alberto; Ayllon, Alejandro Alvarez; ...

2017-05-19

End-user Cloud storage is increasing rapidly in popularity in research communities thanks to the collaboration capabilities it offers, namely synchronisation and sharing. CERN IT has implemented a model of such storage named, CERNBox, integrated with the CERN AuthN and AuthZ services. To exploit the use of the end-user Cloud storage for the distributed data analysis activity, the CMS experiment has started the integration of CERNBox as a Grid resource. This will allow CMS users to make use of their own storage in the Cloud for their analysis activities as well as to benefit from synchronisation and sharing capabilities to achievemore » results faster and more effectively. It will provide an integration model of Cloud storages in the Grid, which is implemented and commissioned over the world’s largest computing Grid infrastructure, Worldwide LHC Computing Grid (WLCG). In this paper, we present the integration strategy and infrastructure changes needed in order to transparently integrate end-user Cloud storage with the CMS distributed computing model. We describe the new challenges faced in data management between Grid and Cloud and how they were addressed, along with details of the support for Cloud storage recently introduced into the WLCG data movement middleware, FTS3. Finally, the commissioning experience of CERNBox for the distributed data analysis activity is also presented.« less
Optimisation of the usage of LHC and local computing resources in a multidisciplinary physics department hosting a WLCG Tier-2 centre

NASA Astrophysics Data System (ADS)

Barberis, Stefano; Carminati, Leonardo; Leveraro, Franco; Mazza, Simone Michele; Perini, Laura; Perlz, Francesco; Rebatto, David; Tura, Ruggero; Vaccarossa, Luca; Villaplana, Miguel

2015-12-01

We present the approach of the University of Milan Physics Department and the local unit of INFN to allow and encourage the sharing among different research areas of computing, storage and networking resources (the largest ones being those composing the Milan WLCG Tier-2 centre and tailored to the needs of the ATLAS experiment). Computing resources are organised as independent HTCondor pools, with a global master in charge of monitoring them and optimising their usage. The configuration has to provide satisfactory throughput for both serial and parallel (multicore, MPI) jobs. A combination of local, remote and cloud storage options are available. The experience of users from different research areas operating on this shared infrastructure is discussed. The promising direction of improving scientific computing throughput by federating access to distributed computing and storage also seems to fit very well with the objectives listed in the European Horizon 2020 framework for research and development.
The CMS Tier0 goes cloud and grid for LHC Run 2

DOE PAGES

Hufnagel, Dirk

2015-12-23

In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threadedmore » framework to deal with the increased event complexity and to ensure efficient use of the resources. Furthermore, this contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.« less
The CMS TierO goes Cloud and Grid for LHC Run 2

NASA Astrophysics Data System (ADS)

Hufnagel, Dirk

2015-12-01

In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threaded framework to deal with the increased event complexity and to ensure efficient use of the resources. This contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.

PubMed

Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina

2013-01-28

Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
Development of stable Grid service at the next generation system of KEKCC

NASA Astrophysics Data System (ADS)

Nakamura, T.; Iwai, G.; Matsunaga, H.; Murakami, K.; Sasaki, T.; Suzuki, S.; Takase, W.

2017-10-01

A lot of experiments in the field of accelerator based science are actively running at High Energy Accelerator Research Organization (KEK) by using SuperKEKB and J-PARC accelerator in Japan. In these days at KEK, the computing demand from the various experiments for the data processing, analysis, and MC simulation is monotonically increasing. It is not only for the case with high-energy experiments, the computing requirement from the hadron and neutrino experiments and some projects of astro-particle physics is also rapidly increasing due to the very high precision measurement. Under this situation, several projects, Belle II, T2K, ILC and KAGRA experiments supported by KEK are going to utilize Grid computing infrastructure as the main computing resource. The Grid system and services in KEK, which is already in production, are upgraded for the further stable operation at the same time of whole scale hardware replacement of KEK Central Computer System (KEKCC). The next generation system of KEKCC starts the operation from the beginning of September 2016. The basic Grid services e.g. BDII, VOMS, LFC, CREAM computing element and StoRM storage element are made by the more robust hardware configuration. Since the raw data transfer is one of the most important tasks for the KEKCC, two redundant GridFTP servers are adapted to the StoRM service instances with 40 Gbps network bandwidth on the LHCONE routing. These are dedicated to the Belle II raw data transfer to the other sites apart from the servers for the data transfer usage of the other VOs. Additionally, we prepare the redundant configuration for the database oriented services like LFC and AMGA by using LifeKeeper. The LFC servers are made by two read/write servers and two read-only servers for the Belle II experiment, and all of them have an individual database for the purpose of load balancing. The FTS3 service is newly deployed as a service for the Belle II data distribution. The service of CVMFS stratum-0 is started for the Belle II software repository, and stratum-1 service is prepared for the other VOs. In this way, there are a lot of upgrade for the real production service of Grid infrastructure at KEK Computing Research Center. In this paper, we would like to introduce the detailed configuration of the hardware for Grid instance, and several mechanisms to construct the robust Grid system in the next generation system of KEKCC.
The computing and data infrastructure to interconnect EEE stations

NASA Astrophysics Data System (ADS)

Noferini, F.; EEE Collaboration

2016-07-01

The Extreme Energy Event (EEE) experiment is devoted to the search of high energy cosmic rays through a network of telescopes installed in about 50 high schools distributed throughout the Italian territory. This project requires a peculiar data management infrastructure to collect data registered in stations very far from each other and to allow a coordinated analysis. Such an infrastructure is realized at INFN-CNAF, which operates a Cloud facility based on the OpenStack opensource Cloud framework and provides Infrastructure as a Service (IaaS) for its users. In 2014 EEE started to use it for collecting, monitoring and reconstructing the data acquired in all the EEE stations. For the synchronization between the stations and the INFN-CNAF infrastructure we used BitTorrent Sync, a free peer-to-peer software designed to optimize data syncronization between distributed nodes. All data folders are syncronized with the central repository in real time to allow an immediate reconstruction of the data and their publication in a monitoring webpage. We present the architecture and the functionalities of this data management system that provides a flexible environment for the specific needs of the EEE project.
Modeling the Cloud to Enhance Capabilities for Crises and Catastrophe Management

DTIC Science & Technology

2016-11-16

order for cloud computing infrastructures to be successfully deployed in real world scenarios as tools for crisis and catastrophe management, where...Statement of the Problem Studied As cloud computing becomes the dominant computational infrastructure[1] and cloud technologies make a transition to hosting...1. Formulate rigorous mathematical models representing technological capabilities and resources in cloud computing for performance modeling and
Infrastructure Systems for Advanced Computing in E-science applications

NASA Astrophysics Data System (ADS)

Terzo, Olivier

2013-04-01

In the e-science field are growing needs for having computing infrastructure more dynamic and customizable with a model of use "on demand" that follow the exact request in term of resources and storage capacities. The integration of grid and cloud infrastructure solutions allows us to offer services that can adapt the availability in terms of up scaling and downscaling resources. The main challenges for e-sciences domains will on implement infrastructure solutions for scientific computing that allow to adapt dynamically the demands of computing resources with a strong emphasis on optimizing the use of computing resources for reducing costs of investments. Instrumentation, data volumes, algorithms, analysis contribute to increase the complexity for applications who require high processing power and storage for a limited time and often exceeds the computational resources that equip the majority of laboratories, research Unit in an organization. Very often it is necessary to adapt or even tweak rethink tools, algorithms, and consolidate existing applications through a phase of reverse engineering in order to adapt them to a deployment on Cloud infrastructure. For example, in areas such as rainfall monitoring, meteorological analysis, Hydrometeorology, Climatology Bioinformatics Next Generation Sequencing, Computational Electromagnetic, Radio occultation, the complexity of the analysis raises several issues such as the processing time, the scheduling of tasks of processing, storage of results, a multi users environment. For these reasons, it is necessary to rethink the writing model of E-Science applications in order to be already adapted to exploit the potentiality of cloud computing services through the uses of IaaS, PaaS and SaaS layer. An other important focus is on create/use hybrid infrastructure typically a federation between Private and public cloud, in fact in this way when all resources owned by the organization are all used it will be easy with a federate cloud infrastructure to add some additional resources form the Public cloud for following the needs in term of computational and storage resources and release them where process are finished. Following the hybrid model, the scheduling approach is important for managing both cloud models. Thanks to this model infrastructure every time resources are available for additional request in term of IT capacities that can used "on demand" for a limited time without having to proceed to purchase additional servers.
Monitoring techniques and alarm procedures for CMS services and sites in WLCG

DOE Office of Scientific and Technical Information (OSTI.GOV)

Molina-Perez, J.; Bonacorsi, D.; Gutsche, O.

2012-01-01

The CMS offline computing system is composed of roughly 80 sites (including most experienced T3s) and a number of central services to distribute, process and analyze data worldwide. A high level of stability and reliability is required from the underlying infrastructure and services, partially covered by local or automated monitoring and alarming systems such as Lemon and SLS, the former collects metrics from sensors installed on computing nodes and triggers alarms when values are out of range, the latter measures the quality of service and warns managers when service is affected. CMS has established computing shift procedures with personnel operatingmore » worldwide from remote Computing Centers, under the supervision of the Computing Run Coordinator at CERN. This dedicated 24/7 computing shift personnel is contributing to detect and react timely on any unexpected error and hence ensure that CMS workflows are carried out efficiently and in a sustained manner. Synergy among all the involved actors is exploited to ensure the 24/7 monitoring, alarming and troubleshooting of the CMS computing sites and services. We review the deployment of the monitoring and alarming procedures, and report on the experience gained throughout the first two years of LHC operation. We describe the efficiency of the communication tools employed, the coherent monitoring framework, the proactive alarming systems and the proficient troubleshooting procedures that helped the CMS Computing facilities and infrastructure to operate at high reliability levels.« less
A cyber infrastructure for the SKA Telescope Manager

NASA Astrophysics Data System (ADS)

Barbosa, Domingos; Barraca, João. P.; Carvalho, Bruno; Maia, Dalmiro; Gupta, Yashwant; Natarajan, Swaminathan; Le Roux, Gerhard; Swart, Paul

2016-07-01

The Square Kilometre Array Telescope Manager (SKA TM) will be responsible for assisting the SKA Operations and Observation Management, carrying out System diagnosis and collecting Monitoring and Control data from the SKA subsystems and components. To provide adequate compute resources, scalability, operation continuity and high availability, as well as strict Quality of Service, the TM cyber-infrastructure (embodied in the Local Infrastructure - LINFRA) consists of COTS hardware and infrastructural software (for example: server monitoring software, host operating system, virtualization software, device firmware), providing a specially tailored Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solution. The TM infrastructure provides services in the form of computational power, software defined networking, power, storage abstractions, and high level, state of the art IaaS and PaaS management interfaces. This cyber platform will be tailored to each of the two SKA Phase 1 telescopes (SKA_MID in South Africa and SKA_LOW in Australia) instances, each presenting different computational and storage infrastructures and conditioned by location. This cyber platform will provide a compute model enabling TM to manage the deployment and execution of its multiple components (observation scheduler, proposal submission tools, MandC components, Forensic tools and several Databases, etc). In this sense, the TM LINFRA is primarily focused towards the provision of isolated instances, mostly resorting to virtualization technologies, while defaulting to bare hardware if specifically required due to performance, security, availability, or other requirement.
The High-Performance Computing and Communications program, the national information infrastructure and health care.

PubMed Central

Lindberg, D A; Humphreys, B L

1995-01-01

The High-Performance Computing and Communications (HPCC) program is a multiagency federal effort to advance the state of computing and communications and to provide the technologic platform on which the National Information Infrastructure (NII) can be built. The HPCC program supports the development of high-speed computers, high-speed telecommunications, related software and algorithms, education and training, and information infrastructure technology and applications. The vision of the NII is to extend access to high-performance computing and communications to virtually every U.S. citizen so that the technology can be used to improve the civil infrastructure, lifelong learning, energy management, health care, etc. Development of the NII will require resolution of complex economic and social issues, including information privacy. Health-related applications supported under the HPCC program and NII initiatives include connection of health care institutions to the Internet; enhanced access to gene sequence data; the "Visible Human" Project; and test-bed projects in telemedicine, electronic patient records, shared informatics tool development, and image systems. PMID:7614116
Future Naval Use of COTS Networking Infrastructure

DTIC Science & Technology

2009-07-01

user to benefit from Google’s vast databases and computational resources. Obviously, the ability to harness the full power of the Cloud could be... Computing Impact Findings Action Items Take-Aways Appendices: Pages 54-68 A. Terms of Reference Document B. Sample Definitions of Cloud ...and definition of Cloud Computing . While Cloud Computing is developing in many variations – including Infrastructure as a Service (IaaS), Platform as
DOE Office of Scientific and Technical Information (OSTI.GOV)

Amerio, S.; Behari, S.; Boyd, J.

The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have approximately 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 and beyond. To achieve this goal, we have implemented a system that utilizes virtualization, automated validation, and migration to new standards inmore » both software and data storage technology and leverages resources available from currently-running experiments at Fermilab. Lastly, these efforts have also provided useful lessons in ensuring long-term data access for numerous experiments, and enable high-quality scientific output for years to come.« less
Data preservation at the Fermilab Tevatron

NASA Astrophysics Data System (ADS)

Amerio, S.; Behari, S.; Boyd, J.; Brochmann, M.; Culbertson, R.; Diesburg, M.; Freeman, J.; Garren, L.; Greenlee, H.; Herner, K.; Illingworth, R.; Jayatilaka, B.; Jonckheere, A.; Li, Q.; Naymola, S.; Oleynik, G.; Sakumoto, W.; Varnes, E.; Vellidis, C.; Watts, G.; White, S.

2017-04-01

The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have approximately 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 and beyond. To achieve this goal, we have implemented a system that utilizes virtualization, automated validation, and migration to new standards in both software and data storage technology and leverages resources available from currently-running experiments at Fermilab. These efforts have also provided useful lessons in ensuring long-term data access for numerous experiments, and enable high-quality scientific output for years to come.
From the CMS Computing Experience in the WLCG STEP'09 Challenge to the First Data Taking of the LHC Era

NASA Astrophysics Data System (ADS)

Bonacorsi, D.; Gutsche, O.

The Worldwide LHC Computing Grid (WLCG) project decided in March 2009 to perform scale tests of parts of its overall Grid infrastructure before the start of the LHC data taking. The "Scale Test for the Experiment Program" (STEP'09) was performed mainly in June 2009 -with more selected tests in September- October 2009 -and emphasized the simultaneous test of the computing systems of all 4 LHC experiments. CMS tested its Tier-0 tape writing and processing capabilities. The Tier-1 tape systems were stress tested using the complete range of Tier-1 work-flows: transfer from Tier-0 and custody of data on tape, processing and subsequent archival, redistribution of datasets amongst all Tier-1 sites as well as burst transfers of datasets to Tier-2 sites. The Tier-2 analysis capacity was tested using bulk analysis job submissions to backfill normal user activity. In this talk, we will report on the different performed tests and present their post-mortem analysis.
Scientific Services on the Cloud

NASA Astrophysics Data System (ADS)

Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong

Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
AGIS: Evolution of Distributed Computing information system for ATLAS

NASA Astrophysics Data System (ADS)

Anisenkov, A.; Di Girolamo, A.; Alandes, M.; Karavakis, E.

2015-12-01

ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produces petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization of computing resources in order to meet the ATLAS requirements of petabytes scale data operations. It has been evolved after the first period of LHC data taking (Run-1) in order to cope with new challenges of the upcoming Run- 2. In this paper we describe the evolution and recent developments of the ATLAS Grid Information System (AGIS), developed in order to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.
Human Pacman: A Mobile Augmented Reality Entertainment System Based on Physical, Social, and Ubiquitous Computing

NASA Astrophysics Data System (ADS)

Cheok, Adrian David

This chapter details the Human Pacman system to illuminate entertainment computing which ventures to embed the natural physical world seamlessly with a fantasy virtual playground by capitalizing on infrastructure provided by mobile computing, wireless LAN, and ubiquitous computing. With Human Pacman, we have a physical role-playing computer fantasy together with real human-social and mobile-gaming that emphasizes on collaboration and competition between players in a wide outdoor physical area that allows natural wide-area human-physical movements. Pacmen and Ghosts are now real human players in the real world experiencing mixed computer graphics fantasy-reality provided by using the wearable computers on them. Virtual cookies and actual tangible physical objects are incorporated into the game play to provide novel experiences of seamless transitions between the real and virtual worlds. This is an example of a new form of gaming that anchors on physicality, mobility, social interaction, and ubiquitous computing.
The VISPA internet platform for outreach, education and scientific research in various experiments

NASA Astrophysics Data System (ADS)

van Asseldonk, D.; Erdmann, M.; Fischer, B.; Fischer, R.; Glaser, C.; Heidemann, F.; Müller, G.; Quast, T.; Rieger, M.; Urban, M.; Welling, C.

2015-12-01

VISPA provides a graphical front-end to computing infrastructures giving its users all functionality needed for working conditions comparable to a personal computer. It is a framework that can be extended with custom applications to support individual needs, e.g. graphical interfaces for experiment-specific software. By design, VISPA serves as a multipurpose platform for many disciplines and experiments as demonstrated in the following different use-cases. A GUI to the analysis framework OFFLINE of the Pierre Auger collaboration, submission and monitoring of computing jobs, university teaching of hundreds of students, and outreach activity, especially in CERN's open data initiative. Serving heterogeneous user groups and applications gave us lots of experience. This helps us in maturing the system, i.e. improving the robustness and responsiveness, and the interplay of the components. Among the lessons learned are the choice of a file system, the implementation of websockets, efficient load balancing, and the fine-tuning of existing technologies like the RPC over SSH. We present in detail the improved server setup and report on the performance, the user acceptance and the realized applications of the system.

Consolidation and development roadmap of the EMI middleware

NASA Astrophysics Data System (ADS)

Kónya, B.; Aiftimiei, C.; Cecchi, M.; Field, L.; Fuhrmann, P.; Nilsen, J. K.; White, J.

2012-12-01

Scientific research communities have benefited recently from the increasing availability of computing and data infrastructures with unprecedented capabilities for large scale distributed initiatives. These infrastructures are largely defined and enabled by the middleware they deploy. One of the major issues in the current usage of research infrastructures is the need to use similar but often incompatible middleware solutions. The European Middleware Initiative (EMI) is a collaboration of the major European middleware providers ARC, dCache, gLite and UNICORE. EMI aims to: deliver a consolidated set of middleware components for deployment in EGI, PRACE and other Distributed Computing Infrastructures; extend the interoperability between grids and other computing infrastructures; strengthen the reliability of the services; establish a sustainable model to maintain and evolve the middleware; fulfil the requirements of the user communities. This paper presents the consolidation and development objectives of the EMI software stack covering the last two years. The EMI development roadmap is introduced along the four technical areas of compute, data, security and infrastructure. The compute area plan focuses on consolidation of standards and agreements through a unified interface for job submission and management, a common format for accounting, the wide adoption of GLUE schema version 2.0 and the provision of a common framework for the execution of parallel jobs. The security area is working towards a unified security model and lowering the barriers to Grid usage by allowing users to gain access with their own credentials. The data area is focusing on implementing standards to ensure interoperability with other grids and industry components and to reuse already existing clients in operating systems and open source distributions. One of the highlights of the infrastructure area is the consolidation of the information system services via the creation of a common information backbone.
CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

PubMed Central

Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

2016-01-01

The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418
CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

PubMed

Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

2016-09-01

The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.
From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

PubMed

Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

2012-01-01

Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
Minimizing Overhead for Secure Computation and Fully Homomorphic Encryption: Overhead

DTIC Science & Technology

2015-11-01

many inputs. We also improved our compiler infrastructure to handle very large circuits in a more scalable way. In Jan’13, we employed the AESNI and...Amazon’s elastic compute infrastructure , and is running under a Xen hypervisor. Since we do not have direct access to the bare metal, we cannot...creating novel opportunities for compressing au- thentication overhead. It is especially compelling that existing public key infrastructures can be used
MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB

PubMed Central

Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A.

2013-01-01

Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED are maintained at http://vislab.github.com/MobbedMatlab/ PMID:24124417
MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

PubMed

Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

2013-01-01

Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED are maintained at http://vislab.github.com/MobbedMatlab/
Downscaling seasonal to centennial simulations on distributed computing infrastructures using WRF model. The WRF4G project

NASA Astrophysics Data System (ADS)

Cofino, A. S.; Fernández Quiruelas, V.; Blanco Real, J. C.; García Díez, M.; Fernández, J.

2013-12-01

Nowadays Grid Computing is powerful computational tool which is ready to be used for scientific community in different areas (such as biomedicine, astrophysics, climate, etc.). However, the use of this distributed computing infrastructures (DCI) is not yet common practice in climate research, and only a few teams and applications in this area take advantage of this infrastructure. Thus, the WRF4G project objective is to popularize the use of this technology in the atmospheric sciences area. In order to achieve this objective, one of the most used applications has been taken (WRF; a limited- area model, successor of the MM5 model), that has a user community formed by more than 8000 researchers worldwide. This community develop its research activity on different areas and could benefit from the advantages of Grid resources (case study simulations, regional hind-cast/forecast, sensitivity studies, etc.). The WRF model is used by many groups, in the climate research community, to carry on downscaling simulations. Therefore this community will also benefit. However, Grid infrastructures have some drawbacks for the execution of applications that make an intensive use of CPU and memory for a long period of time. This makes necessary to develop a specific framework (middleware). This middleware encapsulates the application and provides appropriate services for the monitoring and management of the simulations and the data. Thus,another objective of theWRF4G project consists on the development of a generic adaptation of WRF to DCIs. It should simplify the access to the DCIs for the researchers, and also to free them from the technical and computational aspects of the use of theses DCI. Finally, in order to demonstrate the ability of WRF4G solving actual scientific challenges with interest and relevance on the climate science (implying a high computational cost) we will shown results from different kind of downscaling experiments, like ERA-Interim re-analysis, CMIP5 models, or seasonal. WRF4G is been used to run WRF simulations which are contributing to the CORDEX initiative and others projects like SPECS and EUPORIAS. This work is been partially funded by the European Regional Development Fund (ERDF) and the Spanish National R&D Plan 2008-2011 (CGL2011-28864)
[Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure].

PubMed

Yokohama, Noriya

2013-07-01

This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
75 FR 70899 - Submission for OMB Review; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2010-11-19

... submit to the Office of Management and Budget (OMB) for clearance the following proposal for collection... Annual Burden Hours: 2,952. Public Computer Center Reports (Quarterly and Annually) Number of Respondents... specific to Infrastructure and Comprehensive Community Infrastructure, Public Computer Center, and...
Dashboard Task Monitor for Managing ATLAS User Analysis on the Grid

NASA Astrophysics Data System (ADS)

Sargsyan, L.; Andreeva, J.; Jha, M.; Karavakis, E.; Kokoszkiewicz, L.; Saiz, P.; Schovancova, J.; Tuckett, D.; Atlas Collaboration

2014-06-01

The organization of the distributed user analysis on the Worldwide LHC Computing Grid (WLCG) infrastructure is one of the most challenging tasks among the computing activities at the Large Hadron Collider. The Experiment Dashboard offers a solution that not only monitors but also manages (kill, resubmit) user tasks and jobs via a web interface. The ATLAS Dashboard Task Monitor provides analysis users with a tool that is independent of the operating system and Grid environment. This contribution describes the functionality of the application and its implementation details, in particular authentication, authorization and audit of the management operations.
INTEGRATION OF PANDA WORKLOAD MANAGEMENT SYSTEM WITH SUPERCOMPUTERS

DOE Office of Scientific and Technical Information (OSTI.GOV)

De, K; Jha, S; Maeno, T

Abstract The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the funda- mental nature of matter and the basic forces that shape our universe, and were recently credited for the dis- covery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Datamore » Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data cen- ters are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Com- puting Facility (OLCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single- threaded workloads in parallel on Titan s multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms. We will present our current accom- plishments in running PanDA WMS at supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facility s infrastructure for High Energy and Nuclear Physics, as well as other data-intensive science applications, such as bioinformatics and astro-particle physics.« less
Optimisation of Critical Infrastructure Protection: The SiVe Project on Airport Security

NASA Astrophysics Data System (ADS)

Breiing, Marcus; Cole, Mara; D'Avanzo, John; Geiger, Gebhard; Goldner, Sascha; Kuhlmann, Andreas; Lorenz, Claudia; Papproth, Alf; Petzel, Erhard; Schwetje, Oliver

This paper outlines the scientific goals, ongoing work and first results of the SiVe research project on critical infrastructure security. The methodology is generic while pilot studies are chosen from airport security. The outline proceeds in three major steps, (1) building a threat scenario, (2) development of simulation models as scenario refinements, and (3) assessment of alternatives. Advanced techniques of systems analysis and simulation are employed to model relevant airport structures and processes as well as offences. Computer experiments are carried out to compare and optimise alternative solutions. The optimality analyses draw on approaches to quantitative risk assessment recently developed in the operational sciences. To exploit the advantages of the various techniques, an integrated simulation workbench is build up in the project.
New security infrastructure model for distributed computing systems

NASA Astrophysics Data System (ADS)

Dubenskaya, J.; Kryukov, A.; Demichev, A.; Prikhodko, N.

2016-02-01

At the paper we propose a new approach to setting up a user-friendly and yet secure authentication and authorization procedure in a distributed computing system. The security concept of the most heterogeneous distributed computing systems is based on the public key infrastructure along with proxy certificates which are used for rights delegation. In practice a contradiction between the limited lifetime of the proxy certificates and the unpredictable time of the request processing is a big issue for the end users of the system. We propose to use unlimited in time hashes which are individual for each request instead of proxy certificate. Our approach allows to avoid using of the proxy certificates. Thus the security infrastructure of distributed computing system becomes easier for development, support and use.
Critical infrastructure protection : significant challenges in developing national capabilities

DOT National Transportation Integrated Search

2001-04-01

To address the concerns about protecting the nation's critical computer-dependent infrastructure, this General Accounting Office (GOA) report describes the progress of the National Infrastructure Protection Center (NIPC) in (1) developing national ca...
Distributed Accounting on the Grid

NASA Technical Reports Server (NTRS)

Thigpen, William; Hacker, Thomas J.; McGinnis, Laura F.; Athey, Brian D.

2001-01-01

By the late 1990s, the Internet was adequately equipped to move vast amounts of data between HPC (High Performance Computing) systems, and efforts were initiated to link together the national infrastructure of high performance computational and data storage resources together into a general computational utility 'grid', analogous to the national electrical power grid infrastructure. The purpose of the Computational grid is to provide dependable, consistent, pervasive, and inexpensive access to computational resources for the computing community in the form of a computing utility. This paper presents a fully distributed view of Grid usage accounting and a methodology for allocating Grid computational resources for use on a Grid computing system.
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

PubMed Central

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

2015-01-01

Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

PubMed

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

2015-05-01

The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.
Computation Directorate Annual Report 2003

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crawford, D L; McGraw, J R; Ashby, S F

Big computers are icons: symbols of the culture, and of the larger computing infrastructure that exists at Lawrence Livermore. Through the collective effort of Laboratory personnel, they enable scientific discovery and engineering development on an unprecedented scale. For more than three decades, the Computation Directorate has supplied the big computers that enable the science necessary for Laboratory missions and programs. Livermore supercomputing is uniquely mission driven. The high-fidelity weapon simulation capabilities essential to the Stockpile Stewardship Program compel major advances in weapons codes and science, compute power, and computational infrastructure. Computation's activities align with this vital mission of the Departmentmore » of Energy. Increasingly, non-weapons Laboratory programs also rely on computer simulation. World-class achievements have been accomplished by LLNL specialists working in multi-disciplinary research and development teams. In these teams, Computation personnel employ a wide array of skills, from desktop support expertise, to complex applications development, to advanced research. Computation's skilled professionals make the Directorate the success that it has become. These individuals know the importance of the work they do and the many ways it contributes to Laboratory missions. They make appropriate and timely decisions that move the entire organization forward. They make Computation a leader in helping LLNL achieve its programmatic milestones. I dedicate this inaugural Annual Report to the people of Computation in recognition of their continuing contributions. I am proud that we perform our work securely and safely. Despite increased cyber attacks on our computing infrastructure from the Internet, advanced cyber security practices ensure that our computing environment remains secure. Through Integrated Safety Management (ISM) and diligent oversight, we address safety issues promptly and aggressively. The safety of our employees, whether at work or at home, is a paramount concern. Even as the Directorate meets today's supercomputing requirements, we are preparing for the future. We are investigating open-source cluster technology, the basis of our highly successful Mulitprogrammatic Capability Resource (MCR). Several breakthrough discoveries have resulted from MCR calculations coupled with theory and experiment, prompting Laboratory scientists to demand ever-greater capacity and capability. This demand is being met by a new 23-TF system, Thunder, with architecture modeled on MCR. In preparation for the ''after-next'' computer, we are researching technology even farther out on the horizon--cell-based computers. Assuming that the funding and the technology hold, we will acquire the cell-based machine BlueGene/L within the next 12 months.« less
Fostering incidental experiences of nature through green infrastructure planning.

PubMed

Beery, Thomas H; Raymond, Christopher M; Kyttä, Marketta; Olafsson, Anton Stahl; Plieninger, Tobias; Sandberg, Mattias; Stenseke, Marie; Tengö, Maria; Jönsson, K Ingemar

2017-11-01

Concern for a diminished human experience of nature and subsequent decreased human well-being is addressed via a consideration of green infrastructure's potential to facilitate unplanned or incidental nature experience. Incidental nature experience is conceptualized and illustrated in order to consider this seldom addressed aspect of human interaction with nature in green infrastructure planning. Special attention has been paid to the ability of incidental nature experience to redirect attention from a primary activity toward an unplanned focus (in this case, nature phenomena). The value of such experience for human well-being is considered. The role of green infrastructure to provide the opportunity for incidental nature experience may serve as a nudge or guide toward meaningful interaction. These ideas are explored using examples of green infrastructure design in two Nordic municipalities: Kristianstad, Sweden, and Copenhagen, Denmark. The outcome of the case study analysis coupled with the review of literature is a set of sample recommendations for how green infrastructure can be designed to support a range of incidental nature experiences with the potential to support human well-being.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boyd, J.; Herner, K.; Jayatilaka, B.

The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and DO experiments each have nearly 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 or beyond. To achieve this, we are implementing a system that utilizes virtualization, automated validation, and migration to new standards in bothmore » software and data storage technology as well as leveraging resources available from currently-running experiments at Fermilab. Furthermore, these efforts will provide useful lessons in ensuring long-term data access for numerous experiments throughout high-energy physics, and provide a roadmap for high-quality scientific output for years to come.« less
CMS Centres Worldwide - a New Collaborative Infrastructure

NASA Astrophysics Data System (ADS)

Taylor, Lucas

2011-12-01

The CMS Experiment at the LHC has established a network of more than fifty inter-connected "CMS Centres" at CERN and in institutes in the Americas, Asia, Australasia, and Europe. These facilities are used by people doing CMS detector and computing grid operations, remote shifts, data quality monitoring and analysis, as well as education and outreach. We present the computing, software, and collaborative tools and videoconferencing systems. These include permanently running "telepresence" video links (hardware-based H.323, EVO and Vidyo), Webcasts, and generic Web tools such as CMS-TV for broadcasting live monitoring and outreach information. Being Web-based and experiment-independent, these systems could easily be extended to other organizations. We describe the experiences of using CMS Centres Worldwide in the CMS data-taking operations as well as for major media events with several hundred TV channels, radio stations, and many more press journalists simultaneously around the world.
Data preservation at the Fermilab Tevatron

DOE PAGES

Amerio, S.; Behari, S.; Boyd, J.; ...

2017-01-22

The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have approximately 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 and beyond. To achieve this goal, we have implemented a system that utilizes virtualization, automated validation, and migration to new standards inmore » both software and data storage technology and leverages resources available from currently-running experiments at Fermilab. Lastly, these efforts have also provided useful lessons in ensuring long-term data access for numerous experiments, and enable high-quality scientific output for years to come.« less
Data preservation at the Fermilab Tevatron

DOE PAGES

Boyd, J.; Herner, K.; Jayatilaka, B.; ...

2015-12-23

The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and DO experiments each have nearly 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 or beyond. To achieve this, we are implementing a system that utilizes virtualization, automated validation, and migration to new standards in bothmore » software and data storage technology as well as leveraging resources available from currently-running experiments at Fermilab. Furthermore, these efforts will provide useful lessons in ensuring long-term data access for numerous experiments throughout high-energy physics, and provide a roadmap for high-quality scientific output for years to come.« less
Data preservation at the Fermilab Tevatron

NASA Astrophysics Data System (ADS)

Boyd, J.; Herner, K.; Jayatilaka, B.; Roser, R.; Sakumoto, W.

2015-12-01

The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and DO experiments each have nearly 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 or beyond. To achieve this, we are implementing a system that utilizes virtualization, automated validation, and migration to new standards in both software and data storage technology as well as leveraging resources available from currently-running experiments at Fermilab. These efforts will provide useful lessons in ensuring long-term data access for numerous experiments throughout high-energy physics, and provide a roadmap for high-quality scientific output for years to come.
Passepartout Sherpa - A low-cost, reusable transportation system into the stratosphere for small experiments

NASA Astrophysics Data System (ADS)

Taraba, M.; Fauland, H.; Turetschek, T.; Stumptner, W.; Kudielka, V.; Scheer, D.; Sattler, B.; Fritz, A.; Stingl, B.; Fuchs, H.; Gubo, B.; Hettrich, S.; Hirtl, A.; Unger, E.; Soucek, A.; Frischauf, N.; Grömer, G.

2014-12-01

The Passepartout sounding balloon transportation system for low-mass (< 1200 g) experiments or hardware for validation to an altitude of 35 km is described. We present the general flight configuration, set-up of the flight control system, environmental and position sensors, power system, buoyancy considerations as well as the ground control infrastructure including recovery operations. In the telemetry and command module the integrated airborne computer is able to control the experiment, transmit telemetry and environmental data and allows for a duplex communication to a control centre for tele-commanding. The experiment module is mounted below the telemetry and command module and can either work as a standalone system or be controlled by the airborne computer. This spacing between experiment- and control unit allows for a high flexibility in the experiment design. After a parachute landing, the on-board satellite based recovery subsystems allow for a rapid tracking and recovery of the telemetry and command module and the experiment. We discuss flight data and lessons learned from two representative flights with research payloads.
Waggle: A Framework for Intelligent Attentive Sensing and Actuation

NASA Astrophysics Data System (ADS)

Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.

2014-12-01

Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.
Pilots 2.0: DIRAC pilots for all the skies

NASA Astrophysics Data System (ADS)

Stagni, F.; Tsaregorodtsev, A.; McNab, A.; Luzzi, C.

2015-12-01

In the last few years, new types of computing infrastructures, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are opportunistic. Most of these new infrastructures are based on virtualization techniques. Meanwhile, some concepts, such as distributed queues, lost appeal, while still supporting a vast amount of resources. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to hide the diversity of underlying resources has become essential. The DIRAC WMS is based on the concept of pilot jobs that was introduced back in 2004. A pilot is what creates the possibility to run jobs on a worker node. Within DIRAC, we developed a new generation of pilot jobs, that we dubbed Pilots 2.0. Pilots 2.0 are not tied to a specific infrastructure; rather they are generic, fully configurable and extendible pilots. A Pilot 2.0 can be sent, as a script to be run, or it can be fetched from a remote location. A pilot 2.0 can run on every computing resource, e.g.: on CREAM Computing elements, on DIRAC Computing elements, on Virtual Machines as part of the contextualization script, or IAAC resources, provided that these machines are properly configured, hiding all the details of the Worker Nodes (WNs) infrastructure. Pilots 2.0 can be generated server and client side. Pilots 2.0 are the “pilots to fly in all the skies”, aiming at easy use of computing power, in whatever form it is presented. Another aim is the unification and simplification of the monitoring infrastructure for all kinds of computing resources, by using pilots as a network of distributed sensors coordinated by a central resource monitoring system. Pilots 2.0 have been developed using the command pattern. VOs using DIRAC can tune pilots 2.0 as they need, and extend or replace each and every pilot command in an easy way. In this paper we describe how Pilots 2.0 work with distributed and heterogeneous resources providing the necessary abstraction to deal with different kind of computing resources.
Networking Technologies Enable Advances in Earth Science

NASA Technical Reports Server (NTRS)

Johnson, Marjory; Freeman, Kenneth; Gilstrap, Raymond; Beck, Richard

2004-01-01

This paper describes an experiment to prototype a new way of conducting science by applying networking and distributed computing technologies to an Earth Science application. A combination of satellite, wireless, and terrestrial networking provided geologists at a remote field site with interactive access to supercomputer facilities at two NASA centers, thus enabling them to validate and calibrate remotely sensed geological data in near-real time. This represents a fundamental shift in the way that Earth scientists analyze remotely sensed data. In this paper we describe the experiment and the network infrastructure that enabled it, analyze the data flow during the experiment, and discuss the scientific impact of the results.
GreenView and GreenLand Applications Development on SEE-GRID Infrastructure

NASA Astrophysics Data System (ADS)

Mihon, Danut; Bacu, Victor; Gorgan, Dorian; Mészáros, Róbert; Gelybó, Györgyi; Stefanut, Teodor

2010-05-01

The GreenView and GreenLand applications [1] have been developed through the SEE-GRID-SCI (SEE-GRID eInfrastructure for regional eScience) FP7 project co-funded by the European Commission [2]. The development of environment applications is a challenge for Grid technologies and software development methodologies. This presentation exemplifies the development of the GreenView and GreenLand applications over the SEE-GRID infrastructure by the Grid Application Development Methodology [3]. Today's environmental applications are used in vary domains of Earth Science such as meteorology, ground and atmospheric pollution, ground metal detection or weather prediction. These applications run on satellite images (e.g. Landsat, MERIS, MODIS, etc.) and the accuracy of output results depends mostly of the quality of these images. The main drawback of such environmental applications regards the need of computation power and storage power (some images are almost 1GB in size), in order to process such a large data volume. Actually, almost applications requiring high computation resources have approached the migration onto the Grid infrastructure. This infrastructure offers the computing power by running the atomic application components on different Grid nodes in sequential or parallel mode. The middleware used between the Grid infrastructure and client applications is ESIP (Environment Oriented Satellite Image Processing Platform), which is based on gProcess platform [4]. In its current format, gProcess is used for launching new processes on the Grid nodes, but also for monitoring the execution status of these processes. This presentation highlights two case studies of Grid based environmental applications, GreenView and GreenLand [5]. GreenView is used in correlation with MODIS (Moderate Resolution Imaging Spectroradiometer) satellite images and meteorological datasets, in order to produce pseudo colored temperature and vegetation maps for different geographical CEE (Central Eastern Europe) regions. On the other hand, GreenLand is used for generating maps for different vegetation indexes (e.g. NDVI, EVI, SAVI, GEMI) based on Landsat satellite images. Both applications are using interpolation and random value generation algorithms, but also specific formulas for computing vegetation index values. The GreenView and GreenLand applications have been experimented over the SEE-GRID infrastructure and the performance evaluation is reported in [6]. The improvement of the execution time (obtained through a better parallelization of jobs), the extension of geographical areas to other parts of the Earth, and new user interaction techniques on spatial data and large set of satellite images are the goals of the future work. References [1] GreenView application on Wiki, http://wiki.egee-see.org/index.php/GreenView [2] SEE-GRID-SCI Project, http://www.see-grid-sci.eu/ [3] Gorgan D., Stefanut T., Bâcu V., Mihon D., Grid based Environment Application Development Methodology, SCICOM, 7th International Conference on "Large-Scale Scientific Computations", 4-8 June, 2009, Sozopol, Bulgaria, (To be published by Springer), (2009). [4] Gorgan D., Bacu V., Stefanut T., Rodila D., Mihon D., Grid based Satellite Image Processing Platform for Earth Observation Applications Development. IDAACS'2009 - IEEE Fifth International Workshop on "Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications", 21-23 September, Cosenza, Italy, IEEE Published in Computer Press, 247-252 (2009). [5] Mihon D., Bacu V., Stefanut T., Gorgan D., "Grid Based Environment Application Development - GreenView Application". ICCP2009 - IEEE 5th International Conference on Intelligent Computer Communication and Processing, 27 Aug, 2009 Cluj-Napoca. Published by IEEE Computer Press, pp. 275-282 (2009). [6] Danut Mihon, Victor Bacu, Dorian Gorgan, Róbert Mészáros, Györgyi Gelybó, Teodor Stefanut, Practical Considerations on the GreenView Application Development and Execution over SEE-GRID. SEE-GRID-SCI User Forum, 9-10 Dec 2009, Bogazici University, Istanbul, Turkey, ISBN: 978-975-403-510-0, pp. 167-175 (2009).
Editorial [Special issue on software defined networks and infrastructures, network function virtualisation, autonomous systems and network management

DOE PAGES

Biswas, Amitava; Liu, Chen; Monga, Inder; ...

2016-01-01

For last few years, there has been a tremendous growth in data traffic due to high adoption rate of mobile devices and cloud computing. Internet of things (IoT) will stimulate even further growth. This is increasing scale and complexity of telecom/internet service provider (SP) and enterprise data centre (DC) compute and network infrastructures. As a result, managing these large network-compute converged infrastructures is becoming complex and cumbersome. To cope up, network and DC operators are trying to automate network and system operations, administrations and management (OAM) functions. OAM includes all non-functional mechanisms which keep the network running.
Cloud computing can simplify HIT infrastructure management.

PubMed

Glaser, John

2011-08-01

Software as a Service (SaaS), built on cloud computing technology, is emerging as the forerunner in IT infrastructure because it helps healthcare providers reduce capital investments. Cloud computing leads to predictable, monthly, fixed operating expenses for hospital IT staff. Outsourced cloud computing facilities are state-of-the-art data centers boasting some of the most sophisticated networking equipment on the market. The SaaS model helps hospitals safeguard against technology obsolescence, minimizes maintenance requirements, and simplifies management.
The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure

NASA Astrophysics Data System (ADS)

Kazarov, A.; Lehmann Miotto, G.; Magnoni, L.

2012-06-01

The Trigger and Data Acquisition (TDAQ) system of the ATLAS experiment at CERN is the infrastructure responsible for collecting and transferring ATLAS experimental data from detectors to the mass storage system. It relies on a large, distributed computing environment, including thousands of computing nodes with thousands of application running concurrently. In such a complex environment, information analysis is fundamental for controlling applications behavior, error reporting and operational monitoring. During data taking runs, streams of messages sent by applications via the message reporting system together with data published from applications via information services are the main sources of knowledge about correctness of running operations. The flow of data produced (with an average rate of O(1-10KHz)) is constantly monitored by experts to detect problem or misbehavior. This requires strong competence and experience in understanding and discovering problems and root causes, and often the meaningful information is not in the single message or update, but in the aggregated behavior in a certain time-line. The AAL project is meant at reducing the man power needs and at assuring a constant high quality of problem detection by automating most of the monitoring tasks and providing real-time correlation of data-taking and system metrics. This project combines technologies coming from different disciplines, in particular it leverages on an Event Driven Architecture to unify the flow of data from the ATLAS infrastructure, on a Complex Event Processing (CEP) engine for correlation of events and on a message oriented architecture for components integration. The project is composed of 2 main components: a core processing engine, responsible for correlation of events through expert-defined queries and a web based front-end to present real-time information and interact with the system. All components works in a loose-coupled event based architecture, with a message broker to centralize all communication between modules. The result is an intelligent system able to extract and compute relevant information from the flow of operational data to provide real-time feedback to human experts who can promptly react when needed. The paper presents the design and implementation of the AAL project, together with the results of its usage as automated monitoring assistant for the ATLAS data taking infrastructure.
Towards usable and interdisciplinary e-infrastructure (Invited)

NASA Astrophysics Data System (ADS)

de Roure, D.

2010-12-01

e-Science and cyberinfrastucture at their outset tended to focus on ‘big science’ and cross-organisational infrastructures, demonstrating complex engineering with the promise of high returns. It soon became evident that the key to researchers harnessing new technology for everyday use is a user-centric approach which empowers the user - both from a developer and an end user viewpoint. For example, this philosophy is demonstrated in workflow systems for systematic data processing and in the Web 2.0 approach as exemplified by the myExperiment social web site for sharing workflows, methods and ‘research objects’. Hence the most disruptive aspect of Cloud and virtualisation is perhaps that they make new computational resources and applications usable, creating a flourishing ecosystem for routine processing and innovation alike - and in this we must consider software sustainability. This talk will discuss the changing nature of e-Science digital ecosystem, focus on the e-infrastructure for cross-disciplinary work, and highlight issues in sustainable software development in this context.
SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments.

PubMed

Burdick, David B; Cavnor, Chris C; Handcock, Jeremy; Killcoyne, Sarah; Lin, Jake; Marzolf, Bruz; Ramsey, Stephen A; Rovira, Hector; Bressler, Ryan; Shmulevich, Ilya; Boyle, John

2010-07-14

High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments

PubMed Central

2010-01-01

Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services. PMID:20630057
Collaborative Working Architecture for IoT-Based Applications.

PubMed

Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus

2018-05-23

The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.
e-Infrastructures for e-Sciences 2013 A CHAIN-REDS Workshop organised under the aegis of the European Commission

NASA Astrophysics Data System (ADS)

The CHAIN-REDS Project is organising a workshop on "e-Infrastructures for e-Sciences" focusing on Cloud Computing and Data Repositories under the aegis of the European Commission and in co-location with the International Conference on e-Science 2013 (IEEE2013) that will be held in Beijing, P.R. of China on October 17-22, 2013. The core objective of the CHAIN-REDS project is to promote, coordinate and support the effort of a critical mass of non-European e-Infrastructures for Research and Education to collaborate with Europe addressing interoperability and interoperation of Grids and other Distributed Computing Infrastructures (DCI). From this perspective, CHAIN-REDS will optimise the interoperation of European infrastructures with those present in 6 other regions of the world, both from a development and use point of view, and catering to different communities. Overall, CHAIN-REDS will provide input for future strategies and decision-making regarding collaboration with other regions on e-Infrastructure deployment and availability of related data; it will raise the visibility of e-Infrastructures towards intercontinental audiences, covering most of the world and will provide support to establish globally connected and interoperable infrastructures, in particular between the EU and the developing regions. Organised by IHEP, INFN and Sigma Orionis with the support of all project partners, this workshop will aim at: - Presenting the state of the art of Cloud computing in Europe and in China and discussing the opportunities offered by having interoperable and federated e-Infrastructures; - Exploring the existing initiatives of Data Infrastructures in Europe and China, and highlighting the Data Repositories of interest for the Virtual Research Communities in several domains such as Health, Agriculture, Climate, etc.
Critical Infrastructure Protection II, The International Federation for Information Processing, Volume 290.

NASA Astrophysics Data System (ADS)

Papa, Mauricio; Shenoi, Sujeet

The information infrastructure -- comprising computers, embedded devices, networks and software systems -- is vital to day-to-day operations in every sector: information and telecommunications, banking and finance, energy, chemicals and hazardous materials, agriculture, food, water, public health, emergency services, transportation, postal and shipping, government and defense. Global business and industry, governments, indeed society itself, cannot function effectively if major components of the critical information infrastructure are degraded, disabled or destroyed. Critical Infrastructure Protection II describes original research results and innovative applications in the interdisciplinary field of critical infrastructure protection. Also, it highlights the importance of weaving science, technology and policy in crafting sophisticated, yet practical, solutions that will help secure information, computer and network assets in the various critical infrastructure sectors. Areas of coverage include: - Themes and Issues - Infrastructure Security - Control Systems Security - Security Strategies - Infrastructure Interdependencies - Infrastructure Modeling and Simulation This book is the second volume in the annual series produced by the International Federation for Information Processing (IFIP) Working Group 11.10 on Critical Infrastructure Protection, an international community of scientists, engineers, practitioners and policy makers dedicated to advancing research, development and implementation efforts focused on infrastructure protection. The book contains a selection of twenty edited papers from the Second Annual IFIP WG 11.10 International Conference on Critical Infrastructure Protection held at George Mason University, Arlington, Virginia, USA in the spring of 2008.
DOE Office of Scientific and Technical Information (OSTI.GOV)

McCaskey, Alexander J.

Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.

Galaxy CloudMan: delivering cloud compute clusters.

PubMed

Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James

2010-12-21

Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
Cyber-workstation for computational neuroscience.

PubMed

Digiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C; Fortes, Jose; Sanchez, Justin C

2010-01-01

A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface.
Cyber-Workstation for Computational Neuroscience

PubMed Central

DiGiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C.; Fortes, Jose; Sanchez, Justin C.

2009-01-01

A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface. PMID:20126436
Centralized Monitoring of the Microsoft Windows-based computers of the LHC Experiment Control Systems

NASA Astrophysics Data System (ADS)

Varela Rodriguez, F.

2011-12-01

The control system of each of the four major Experiments at the CERN Large Hadron Collider (LHC) is distributed over up to 160 computers running either Linux or Microsoft Windows. A quick response to abnormal situations of the computer infrastructure is crucial to maximize the physics usage. For this reason, a tool was developed to supervise, identify errors and troubleshoot such a large system. Although the monitoring of the performance of the Linux computers and their processes was available since the first versions of the tool, it is only recently that the software package has been extended to provide similar functionality for the nodes running Microsoft Windows as this platform is the most commonly used in the LHC detector control systems. In this paper, the architecture and the functionality of the Windows Management Instrumentation (WMI) client developed to provide centralized monitoring of the nodes running different flavour of the Microsoft platform, as well as the interface to the SCADA software of the control systems are presented. The tool is currently being commissioned by the Experiments and it has already proven to be very efficient optimize the running systems and to detect misbehaving processes or nodes.
Software Reuse Methods to Improve Technological Infrastructure for e-Science

NASA Technical Reports Server (NTRS)

Marshall, James J.; Downs, Robert R.; Mattmann, Chris A.

2011-01-01

Social computing has the potential to contribute to scientific research. Ongoing developments in information and communications technology improve capabilities for enabling scientific research, including research fostered by social computing capabilities. The recent emergence of e-Science practices has demonstrated the benefits from improvements in the technological infrastructure, or cyber-infrastructure, that has been developed to support science. Cloud computing is one example of this e-Science trend. Our own work in the area of software reuse offers methods that can be used to improve new technological development, including cloud computing capabilities, to support scientific research practices. In this paper, we focus on software reuse and its potential to contribute to the development and evaluation of information systems and related services designed to support new capabilities for conducting scientific research.
Making the most of cloud storage - a toolkit for exploitation by WLCG experiments

NASA Astrophysics Data System (ADS)

Alvarez Ayllon, Alejandro; Arsuaga Rios, Maria; Bitzes, Georgios; Furano, Fabrizio; Keeble, Oliver; Manzi, Andrea

2017-10-01

Understanding how cloud storage can be effectively used, either standalone or in support of its associated compute, is now an important consideration for WLCG. We report on a suite of extensions to familiar tools targeted at enabling the integration of cloud object stores into traditional grid infrastructures and workflows. Notable updates include support for a number of object store flavours in FTS3, Davix and gfal2, including mitigations for lack of vector reads; the extension of Dynafed to operate as a bridge between grid and cloud domains; protocol translation in FTS3; the implementation of extensions to DPM (also implemented by the dCache project) to allow 3rd party transfers over HTTP. The result is a toolkit which facilitates data movement and access between grid and cloud infrastructures, broadening the range of workflows suitable for cloud. We report on deployment scenarios and prototype experience, explaining how, for example, an Amazon S3 or Azure allocation can be exploited by grid workflows.
RAPPORT: running scientific high-performance computing applications on the cloud.

PubMed

Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt

2013-01-28

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.
CERN data services for LHC computing

NASA Astrophysics Data System (ADS)

Espinal, X.; Bocchi, E.; Chan, B.; Fiorot, A.; Iven, J.; Lo Presti, G.; Lopez, J.; Gonzalez, H.; Lamanna, M.; Mascetti, L.; Moscicki, J.; Pace, A.; Peters, A.; Ponce, S.; Rousseau, H.; van der Ster, D.

2017-10-01

Dependability, resilience, adaptability and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the broad number of experiments at CERN need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the success of HEP experiments. Nowadays we operate at high incoming throughput (14GB/s during 2015 LHC Pb-Pb run and 11PB in July 2016) and with concurrent complex production work-loads. In parallel our systems provide the platform for the continuous user and experiment driven work-loads for large-scale data analysis, including end-user access and sharing. The storage services at CERN cover the needs of our community: EOS and CASTOR as a large-scale storage; CERNBox for end-user access and sharing; Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy distributed-file-system services. In this paper we will summarise the experience in supporting LHC experiments and the transition of our infrastructure from static monolithic systems to flexible components providing a more coherent environment with pluggable protocols, tuneable QoS, sharing capabilities and fine grained ACLs management while continuing to guarantee dependable and robust services.
International Symposium on Grids and Clouds (ISGC) 2016

NASA Astrophysics Data System (ADS)

The International Symposium on Grids and Clouds (ISGC) 2016 will be held at Academia Sinica in Taipei, Taiwan from 13-18 March 2016, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). The theme of ISGC 2016 focuses on“Ubiquitous e-infrastructures and Applications”. Contemporary research is impossible without a strong IT component - researchers rely on the existence of stable and widely available e-infrastructures and their higher level functions and properties. As a result of these expectations, e-Infrastructures are becoming ubiquitous, providing an environment that supports large scale collaborations that deal with global challenges as well as smaller and temporal research communities focusing on particular scientific problems. To support those diversified communities and their needs, the e-Infrastructures themselves are becoming more layered and multifaceted, supporting larger groups of applications. Following the call for the last year conference, ISGC 2016 continues its aim to bring together users and application developers with those responsible for the development and operation of multi-purpose ubiquitous e-Infrastructures. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities, Arts, and Social Sciences (HASS) Applications, Virtual Research Environment (including Middleware, tools, services, workflow, etc.), Data Management, Big Data, Networking & Security, Infrastructure & Operations, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC), etc.
PanDA: Exascale Federation of Resources for the ATLAS Experiment at the LHC

NASA Astrophysics Data System (ADS)

Barreiro Megino, Fernando; Caballero Bejar, Jose; De, Kaushik; Hover, John; Klimentov, Alexei; Maeno, Tadashi; Nilsson, Paul; Oleynik, Danila; Padolski, Siarhei; Panitkin, Sergey; Petrosyan, Artem; Wenaus, Torre

2016-02-01

After a scheduled maintenance and upgrade period, the world's largest and most powerful machine - the Large Hadron Collider(LHC) - is about to enter its second run at unprecedented energies. In order to exploit the scientific potential of the machine, the experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousand of physics users and compared to simulated data. Given diverse funding constraints, the computational resources for the LHC have been deployed in a worldwide mesh of data centres, connected to each other through Grid technologies. The PanDA (Production and Distributed Analysis) system was developed in 2005 for the ATLAS experiment on top of this heterogeneous infrastructure to seamlessly integrate the computational resources and give the users the feeling of a unique system. Since its origins, PanDA has evolved together with upcoming computing paradigms in and outside HEP, such as changes in the networking model, Cloud Computing and HPC. It is currently running steadily up to 200 thousand simultaneous cores (limited by the available resources for ATLAS), up to two million aggregated jobs per day and processes over an exabyte of data per year. The success of PanDA in ATLAS is triggering the widespread adoption and testing by other experiments. In this contribution we will give an overview of the PanDA components and focus on the new features and upcoming challenges that are relevant to the next decade of distributed computing workload management using PanDA.
S3DB core: a framework for RDF generation and management in bioinformatics infrastructures

PubMed Central

2010-01-01

Background Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine. Results A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations. Conclusions The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology. PMID:20646315
The Czech National Grid Infrastructure

NASA Astrophysics Data System (ADS)

Chudoba, J.; Křenková, I.; Mulač, M.; Ruda, M.; Sitera, J.

2017-10-01

The Czech National Grid Infrastructure is operated by MetaCentrum, a CESNET department responsible for coordinating and managing activities related to distributed computing. CESNET as the Czech National Research and Education Network (NREN) provides many e-infrastructure services, which are used by 94% of the scientific and research community in the Czech Republic. Computing and storage resources owned by different organizations are connected by fast enough network to provide transparent access to all resources. We describe in more detail the computing infrastructure, which is based on several different technologies and covers grid, cloud and map-reduce environment. While the largest part of CPUs is still accessible via distributed torque servers, providing environment for long batch jobs, part of infrastructure is available via standard EGI tools in EGI, subset of NGI resources is provided into EGI FedCloud environment with cloud interface and there is also Hadoop cluster provided by the same e-infrastructure.A broad spectrum of computing servers is offered; users can choose from standard 2 CPU servers to large SMP machines with up to 6 TB of RAM or servers with GPU cards. Different groups have different priorities on various resources, resource owners can even have an exclusive access. The software is distributed via AFS. Storage servers offering up to tens of terabytes of disk space to individual users are connected via NFS4 on top of GPFS and access to long term HSM storage with peta-byte capacity is also provided. Overview of available resources and recent statistics of usage will be given.
Web-Based Integrated Research Environment for Aerodynamic Analyses and Design

NASA Astrophysics Data System (ADS)

Ahn, Jae Wan; Kim, Jin-Ho; Kim, Chongam; Cho, Jung-Hyun; Hur, Cinyoung; Kim, Yoonhee; Kang, Sang-Hyun; Kim, Byungsoo; Moon, Jong Bae; Cho, Kum Won

e-AIRS[1,2], an abbreviation of ‘e-Science Aerospace Integrated Research System,' is a virtual organization designed to support aerodynamic flow analyses in aerospace engineering using the e-Science environment. As the first step toward a virtual aerospace engineering organization, e-AIRS intends to give a full support of aerodynamic research process. Currently, e-AIRS can handle both the computational and experimental aerodynamic research on the e-Science infrastructure. In detail, users can conduct a full CFD (Computational Fluid Dynamics) research process, request wind tunnel experiment, perform comparative analysis between computational prediction and experimental measurement, and finally, collaborate with other researchers using the web portal. The present paper describes those services and the internal architecture of the e-AIRS system.
An infrastructure with a unified control plane to integrate IP into optical metro networks to provide flexible and intelligent bandwidth on demand for cloud computing

NASA Astrophysics Data System (ADS)

Yang, Wei; Hall, Trevor

2012-12-01

The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users and the nature of the Internet traffic will undertake a fundamental transformation. Consequently, the current Internet will no longer suffice for serving cloud traffic in metro areas. This work proposes an infrastructure with a unified control plane that integrates simple packet aggregation technology with optical express through the interoperation between IP routers and electrical traffic controllers in optical metro networks. The proposed infrastructure provides flexible, intelligent, and eco-friendly bandwidth on demand for cloud computing in metro areas.
The EPOS ICT Architecture

NASA Astrophysics Data System (ADS)

Jeffery, Keith; Harrison, Matt; Bailo, Daniele

2016-04-01

The EPOS-PP Project 2010-2014 proposed an architecture and demonstrated feasibility with a prototype. Requirements based on use cases were collected and an inventory of assets (e.g. datasets, software, users, computing resources, equipment/detectors, laboratory services) (RIDE) was developed. The architecture evolved through three stages of refinement with much consultation both with the EPOS community representing EPOS users and participants in geoscience and with the overall ICT community especially those working on research such as the RDA (Research Data Alliance) community. The architecture consists of a central ICS (Integrated Core Services) consisting of a portal and catalog, the latter providing to end-users a 'map' of all EPOS resources (datasets, software, users, computing, equipment/detectors etc.). ICS is extended to ICS-d (distributed ICS) for certain services (such as visualisation software services or Cloud computing resources) and CES (Computational Earth Science) for specific simulation or analytical processing. ICS also communicates with TCS (Thematic Core Services) which represent European-wide portals to national and local assets, resources and services in the various specific domains (e.g. seismology, volcanology, geodesy) of EPOS. The EPOS-IP project 2015-2019 started October 2015. Two work-packages cover the ICT aspects; WP6 involves interaction with the TCS while WP7 concentrates on ICS including interoperation with ICS-d and CES offerings: in short the ICT architecture. Based on the experience and results of EPOS-PP the ICT team held a pre-meeting in July 2015 and set out a project plan. The first major activity involved requirements (re-)collection with use cases and also updating the inventory of assets held by the various TCS in EPOS. The RIDE database of assets is currently being converted to CERIF (Common European Research Information Format - an EU Recommendation to Member States) to provide the basis for the EPOS-IP ICS Catalog. In parallel the ICT team is tracking developments in ICT for relevance to EPOS-IP. In particular, the potential utilisation of e-Is (e-Infrastructures) such as GEANT(network), AARC (security), EGI (GRID computing), EUDAT (data curation), PRACE (High Performance Computing), HELIX-Nebula / Open Science Cloud (Cloud computing) are being assessed. Similarly relationships to other e-RIs (e-Research Infrastructures) such as ENVRI+, EXCELERATE and other ESFRI (European Strategic Forum for Research Infrastructures) projects are developed to share experience and technology and to promote interoperability. EPOS ICT team members are also involved in VRE4EIC, a project developing a reference architecture and component software services for a Virtual Research Environment to be superimposed on EPOS-ICS. The challenge which is being tackled now is therefore to keep consistency and interoperability among the different modules, initiatives and actors which participate to the process of running the EPOS platform. It implies both a continuous update about IT aspects of mentioned initiatives and a refinement of the e-architecture designed so far. One major aspect of EPOS-IP is the ICT support for legalistic, financial and governance aspects of the EPOS ERIC to be initiated during EPOS-IP. This implies a sophisticated AAAI (Authentication, authorization, accounting infrastructure) with consistency throughout the software, communications and data stack.
Towards Portable Large-Scale Image Processing with High-Performance Computing.

PubMed

Huo, Yuankai; Blaber, Justin; Damon, Stephen M; Boyd, Brian D; Bao, Shunxing; Parvathaneni, Prasanna; Noguera, Camilo Bermudez; Chaganti, Shikha; Nath, Vishwesh; Greer, Jasmine M; Lyu, Ilwoo; French, William R; Newton, Allen T; Rogers, Baxter P; Landman, Bennett A

2018-05-03

High-throughput, large-scale medical image computing demands tight integration of high-performance computing (HPC) infrastructure for data storage, job distribution, and image processing. The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has constructed a large-scale image storage and processing infrastructure that is composed of (1) a large-scale image database using the eXtensible Neuroimaging Archive Toolkit (XNAT), (2) a content-aware job scheduling platform using the Distributed Automation for XNAT pipeline automation tool (DAX), and (3) a wide variety of encapsulated image processing pipelines called "spiders." The VUIIS CCI medical image data storage and processing infrastructure have housed and processed nearly half-million medical image volumes with Vanderbilt Advanced Computing Center for Research and Education (ACCRE), which is the HPC facility at the Vanderbilt University. The initial deployment was natively deployed (i.e., direct installations on a bare-metal server) within the ACCRE hardware and software environments, which lead to issues of portability and sustainability. First, it could be laborious to deploy the entire VUIIS CCI medical image data storage and processing infrastructure to another HPC center with varying hardware infrastructure, library availability, and software permission policies. Second, the spiders were not developed in an isolated manner, which has led to software dependency issues during system upgrades or remote software installation. To address such issues, herein, we describe recent innovations using containerization techniques with XNAT/DAX which are used to isolate the VUIIS CCI medical image data storage and processing infrastructure from the underlying hardware and software environments. The newly presented XNAT/DAX solution has the following new features: (1) multi-level portability from system level to the application level, (2) flexible and dynamic software development and expansion, and (3) scalable spider deployment compatible with HPC clusters and local workstations.
The JASMIN Cloud: specialised and hybrid to meet the needs of the Environmental Sciences Community

NASA Astrophysics Data System (ADS)

Kershaw, Philip; Lawrence, Bryan; Churchill, Jonathan; Pritchard, Matt

2014-05-01

Cloud computing provides enormous opportunities for the research community. The large public cloud providers provide near-limitless scaling capability. However, adapting Cloud to scientific workloads is not without its problems. The commodity nature of the public cloud infrastructure can be at odds with the specialist requirements of the research community. Issues such as trust, ownership of data, WAN bandwidth and costing models make additional barriers to more widespread adoption. Alongside the application of public cloud for scientific applications, a number of private cloud initiatives are underway in the research community of which the JASMIN Cloud is one example. Here, cloud service models are being effectively super-imposed over more established services such as data centres, compute cluster facilities and Grids. These have the potential to deliver the specialist infrastructure needed for the science community coupled with the benefits of a Cloud service model. The JASMIN facility based at the Rutherford Appleton Laboratory was established in 2012 to support the data analysis requirements of the climate and Earth Observation community. In its first year of operation, the 5PB of available storage capacity was filled and the hosted compute capability used extensively. JASMIN has modelled the concept of a centralised large-volume data analysis facility. Key characteristics have enabled success: peta-scale fast disk connected via low latency networks to compute resources and the use of virtualisation for effective management of the resources for a range of users. A second phase is now underway funded through NERC's (Natural Environment Research Council) Big Data initiative. This will see significant expansion to the resources available with a doubling of disk-based storage to 12PB and an increase of compute capacity by a factor of ten to over 3000 processing cores. This expansion is accompanied by a broadening in the scope for JASMIN, as a service available to the entire UK environmental science community. Experience with the first phase demonstrated the range of user needs. A trade-off is needed between access privileges to resources, flexibility of use and security. This has influenced the form and types of service under development for the new phase. JASMIN will deploy a specialised private cloud organised into "Managed" and "Unmanaged" components. In the Managed Cloud, users have direct access to the storage and compute resources for optimal performance but for reasons of security, via a more restrictive PaaS (Platform-as-a-Service) interface. The Unmanaged Cloud is deployed in an isolated part of the network but co-located with the rest of the infrastructure. This enables greater liberty to tenants - full IaaS (Infrastructure-as-a-Service) capability to provision customised infrastructure - whilst at the same time protecting more sensitive parts of the system from direct access using these elevated privileges. The private cloud will be augmented with cloud-bursting capability so that it can exploit the resources available from public clouds, making it effectively a hybrid solution. A single interface will overlay the functionality of both the private cloud and external interfaces to public cloud providers giving users the flexibility to migrate resources between infrastructures as requirements dictate.
Galaxy CloudMan: delivering cloud compute clusters

PubMed Central

2010-01-01

Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983
Conditions for Ubiquitous Computing: What Can Be Learned from a Longitudinal Study

ERIC Educational Resources Information Center

Lei, Jing

2010-01-01

Based on survey data and interview data collected over four academic years, this longitudinal study examined how a ubiquitous computing project evolved along with the changes in teachers, students, the human infrastructure, and technology infrastructure in the school. This study also investigated what conditions were necessary for successful…
Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem

NASA Astrophysics Data System (ADS)

Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.

2015-12-01

Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.

Increasing the impact of medical image computing using community-based open-access hackathons: The NA-MIC and 3D Slicer experience.

PubMed

Kapur, Tina; Pieper, Steve; Fedorov, Andriy; Fillion-Robin, J-C; Halle, Michael; O'Donnell, Lauren; Lasso, Andras; Ungi, Tamas; Pinter, Csaba; Finet, Julien; Pujol, Sonia; Jagadeesan, Jayender; Tokuda, Junichi; Norton, Isaiah; Estepar, Raul San Jose; Gering, David; Aerts, Hugo J W L; Jakab, Marianna; Hata, Nobuhiko; Ibanez, Luiz; Blezek, Daniel; Miller, Jim; Aylward, Stephen; Grimson, W Eric L; Fichtinger, Gabor; Wells, William M; Lorensen, William E; Schroeder, Will; Kikinis, Ron

2016-10-01

The National Alliance for Medical Image Computing (NA-MIC) was launched in 2004 with the goal of investigating and developing an open source software infrastructure for the extraction of information and knowledge from medical images using computational methods. Several leading research and engineering groups participated in this effort that was funded by the US National Institutes of Health through a variety of infrastructure grants. This effort transformed 3D Slicer from an internal, Boston-based, academic research software application into a professionally maintained, robust, open source platform with an international leadership and developer and user communities. Critical improvements to the widely used underlying open source libraries and tools-VTK, ITK, CMake, CDash, DCMTK-were an additional consequence of this effort. This project has contributed to close to a thousand peer-reviewed publications and a growing portfolio of US and international funded efforts expanding the use of these tools in new medical computing applications every year. In this editorial, we discuss what we believe are gaps in the way medical image computing is pursued today; how a well-executed research platform can enable discovery, innovation and reproducible science ("Open Science"); and how our quest to build such a software platform has evolved into a productive and rewarding social engineering exercise in building an open-access community with a shared vision. Copyright © 2016 Elsevier B.V. All rights reserved.
Early experiences in developing and managing the neuroscience gateway.

PubMed

Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T

2015-02-01

The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.
Early experiences in developing and managing the neuroscience gateway

PubMed Central

Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.

2015-01-01

SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124
Snore related signals processing in a private cloud computing system.

PubMed

Qian, Kun; Guo, Jian; Xu, Huijie; Zhu, Zhaomeng; Zhang, Gongxuan

2014-09-01

Snore related signals (SRS) have been demonstrated to carry important information about the obstruction site and degree in the upper airway of Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) patients in recent years. To make this acoustic signal analysis method more accurate and robust, big SRS data processing is inevitable. As an emerging concept and technology, cloud computing has motivated numerous researchers and engineers to exploit applications both in academic and industry field, which could have an ability to implement a huge blue print in biomedical engineering. Considering the security and transferring requirement of biomedical data, we designed a system based on private cloud computing to process SRS. Then we set the comparable experiments of processing a 5-hour audio recording of an OSAHS patient by a personal computer, a server and a private cloud computing system to demonstrate the efficiency of the infrastructure we proposed.
LHC@Home: a BOINC-based volunteer computing infrastructure for physics studies at CERN

NASA Astrophysics Data System (ADS)

Barranco, Javier; Cai, Yunhai; Cameron, David; Crouch, Matthew; Maria, Riccardo De; Field, Laurence; Giovannozzi, Massimo; Hermes, Pascal; Høimyr, Nils; Kaltchev, Dobrin; Karastathis, Nikos; Luzzi, Cinzia; Maclean, Ewen; McIntosh, Eric; Mereghetti, Alessio; Molson, James; Nosochkov, Yuri; Pieloni, Tatiana; Reid, Ivan D.; Rivkin, Lenny; Segal, Ben; Sjobak, Kyrre; Skands, Peter; Tambasco, Claudia; Veken, Frederik Van der; Zacharov, Igor

2017-12-01

The LHC@Home BOINC project has provided computing capacity for numerical simulations to researchers at CERN since 2004, and has since 2011 been expanded with a wider range of applications. The traditional CERN accelerator physics simulation code SixTrack enjoys continuing volunteers support, and thanks to virtualisation a number of applications from the LHC experiment collaborations and particle theory groups have joined the consolidated LHC@Home BOINC project. This paper addresses the challenges related to traditional and virtualized applications in the BOINC environment, and how volunteer computing has been integrated into the overall computing strategy of the laboratory through the consolidated LHC@Home service. Thanks to the computing power provided by volunteers joining LHC@Home, numerous accelerator beam physics studies have been carried out, yielding an improved understanding of charged particle dynamics in the CERN Large Hadron Collider (LHC) and its future upgrades. The main results are highlighted in this paper.
Dynfarm: A Dynamic Site Extension

NASA Astrophysics Data System (ADS)

Ciaschini, V.; De Girolamo, D.

2017-10-01

Requests for computing resources from LHC experiments are constantly mounting, and so are their peak usage. Since dimensioning a site to handle the peak usage times is impractical due to constraints on resources that many publicly-owned computing centres have, opportunistic usage of resources from external, even commercial, cloud providers is becoming more and more interesting, and is even the subject of upcoming initiative from the EU commission, named HelixNebula. While extra resources are always a good thing, to fully take advantage of them they must be integrated in the site’s own infrastructure and made available to users as if they were local resources. At the CNAF INFN Tier-1 we have developed a framework, called dynfarm, capable of taking external resources and, placing minimal and easily satisfied requirements upon them, fully integrate them into a pre-existing infrastructure and treat them as if they were local, fully-owned resources. In this article we for the first time will a give a full, complete description of the framework’s architecture along with all of its capabilities, to describe exactly what is possible with it and what are its requirements.
Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems

NASA Astrophysics Data System (ADS)

Shyu, Mei-Ling; Huang, Zifang; Luo, Hongli

In recent years, pervasive computing infrastructures have greatly improved the interaction between human and system. As we put more reliance on these computing infrastructures, we also face threats of network intrusion and/or any new forms of undesirable IT-based activities. Hence, network security has become an extremely important issue, which is closely connected with homeland security, business transactions, and people's daily life. Accurate and efficient intrusion detection technologies are required to safeguard the network systems and the critical information transmitted in the network systems. In this chapter, a novel network intrusion detection framework for mining and detecting sequential intrusion patterns is proposed. The proposed framework consists of a Collateral Representative Subspace Projection Modeling (C-RSPM) component for supervised classification, and an inter-transactional association rule mining method based on Layer Divided Modeling (LDM) for temporal pattern analysis. Experiments on the KDD99 data set and the traffic data set generated by a private LAN testbed show promising results with high detection rates, low processing time, and low false alarm rates in mining and detecting sequential intrusion detections.
Detecting Abnormal Machine Characteristics in Cloud Infrastructures

NASA Technical Reports Server (NTRS)

Bhaduri, Kanishka; Das, Kamalika; Matthews, Bryan L.

2011-01-01

In the cloud computing environment resources are accessed as services rather than as a product. Monitoring this system for performance is crucial because of typical pay-peruse packages bought by the users for their jobs. With the huge number of machines currently in the cloud system, it is often extremely difficult for system administrators to keep track of all machines using distributed monitoring programs such as Ganglia1 which lacks system health assessment and summarization capabilities. To overcome this problem, we propose a technique for automated anomaly detection using machine performance data in the cloud. Our algorithm is entirely distributed and runs locally on each computing machine on the cloud in order to rank the machines in order of their anomalous behavior for given jobs. There is no need to centralize any of the performance data for the analysis and at the end of the analysis, our algorithm generates error reports, thereby allowing the system administrators to take corrective actions. Experiments performed on real data sets collected for different jobs validate the fact that our algorithm has a low overhead for tracking anomalous machines in a cloud infrastructure.
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

DOE PAGES

Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

2018-03-19

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less
UNH Data Cooperative: A Cyber Infrastructure for Earth System Studies

NASA Astrophysics Data System (ADS)

Braswell, B. H.; Fekete, B. M.; Prusevich, A.; Gliden, S.; Magill, A.; Vorosmarty, C. J.

2007-12-01

Earth system scientists and managers have a continuously growing demand for a wide array of earth observations derived from various data sources including (a) modern satellite retrievals, (b) "in-situ" records, (c) various simulation outputs, and (d) assimilated data products combining model results with observational records. The sheer quantity of data, and formatting inconsistencies make it difficult for users to take full advantage of this important information resource. Thus the system could benefit from a thorough retooling of our current data processing procedures and infrastructure. Emerging technologies, like OPeNDAP and OGC map services, open standard data formats (NetCDF, HDF) data cataloging systems (NASA-Echo, Global Change Master Directory, etc.) are providing the basis for a new approach in data management and processing, where web- services are increasingly designed to serve computer-to-computer communications without human interactions and complex analysis can be carried out over distributed computer resources interconnected via cyber infrastructure. The UNH Earth System Data Collaborative is designed to utilize the aforementioned emerging web technologies to offer new means of access to earth system data. While the UNH Data Collaborative serves a wide array of data ranging from weather station data (Climate Portal) to ocean buoy records and ship tracks (Portsmouth Harbor Initiative) to land cover characteristics, etc. the underlaying data architecture shares common components for data mining and data dissemination via web-services. Perhaps the most unique element of the UNH Data Cooperative's IT infrastructure is its prototype modeling environment for regional ecosystem surveillance over the Northeast corridor, which allows the integration of complex earth system model components with the Cooperative's data services. While the complexity of the IT infrastructure to perform complex computations is continuously increasing, scientists are often forced to spend considerable amount of time to solve basic data management and preprocessing tasks and deal with low level computational design problems like parallelization of model codes. Our modeling infrastructure is designed to take care the bulk of the common tasks found in complex earth system models like I/O handling, computational domain and time management, parallel execution of the modeling tasks, etc. The modeling infrastructure allows scientists to focus on the numerical implementation of the physical processes on a single computational objects(typically grid cells) while the framework takes care of the preprocessing of input data, establishing of the data exchange between computation objects and the execution of the science code. In our presentation, we will discuss the key concepts of our modeling infrastructure. We will demonstrate integration of our modeling framework with data services offered by the UNH Earth System Data Collaborative via web interfaces. We will layout the road map to turn our prototype modeling environment into a truly community framework for wide range of earth system scientists and environmental managers.
Computational analysis of water entry of a circular section at constant velocity based on Reynold's averaged Navier-Stokes method

NASA Astrophysics Data System (ADS)

Uddin, M. Maruf; Fuad, Muzaddid-E.-Zaman; Rahaman, Md. Mashiur; Islam, M. Rabiul

2017-12-01

With the rapid decrease in the cost of computational infrastructure with more efficient algorithm for solving non-linear problems, Reynold's averaged Navier-Stokes (RaNS) based Computational Fluid Dynamics (CFD) has been used widely now-a-days. As a preliminary evaluation tool, CFD is used to calculate the hydrodynamic loads on offshore installations, ships, and other structures in the ocean at initial design stages. Traditionally, wedges have been studied more than circular cylinders because cylinder section has zero deadrise angle at the instant of water impact, which increases with increase of submergence. In Present study, RaNS based commercial code ANSYS Fluent is used to simulate the water entry of a circular section at constant velocity. It is seen that present computational results were compared with experiment and other numerical method.
Enabling opportunistic resources for CMS Computing Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hufnagel, Dirk

With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize opportunistic resources resources not owned by, or a priori configured for CMS to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are used to enablemore » access and bring the CMS environment into these non CMS resources. Finally, we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Enabling opportunistic resources for CMS Computing Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hufnagel, Dick

With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize “opportunistic” resources — resources not owned by, or a priori configured for CMS — to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are usedmore » to enable access and bring the CMS environment into these non CMS resources. Here we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Enabling opportunistic resources for CMS Computing Operations

DOE PAGES

Hufnagel, Dirk

2015-12-23

With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize opportunistic resources resources not owned by, or a priori configured for CMS to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are used to enablemore » access and bring the CMS environment into these non CMS resources. Finally, we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Dynamic VM Provisioning for TORQUE in a Cloud Environment

NASA Astrophysics Data System (ADS)

Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.

2014-06-01

Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.
VMEbus based computer and real-time UNIX as infrastructure of DAQ

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yasu, Y.; Fujii, H.; Nomachi, M.

1994-12-31

This paper describes what the authors have constructed as the infrastructure of data acquisition system (DAQ). The paper reports recent developments concerned with HP VME board computer with LynxOS (HP742rt/HP-RT) and Alpha/OSF1 with VMEbus adapter. The paper also reports current status of developing a Benchmark Suite for Data Acquisition (DAQBENCH) for measuring not only the performance of VME/CAMAC access but also that of the context switching, the inter-process communications and so on, for various computers including Workstation-based systems and VME board computers.
Remotely Measuring Trash Fluxes in the Flood Canals of Megacities with Time Lapse Cameras and Computer Vision Algorithms - a Case Study from Jakarta, Indonesia.

NASA Astrophysics Data System (ADS)

Sedlar, F.; Turpin, E.; Kerkez, B.

2014-12-01

As megacities around the world continue to develop at breakneck speeds, future development, investment, and social wellbeing are threatened by a number of environmental and social factors. Chief among these is frequent, persistent, and unpredictable urban flooding. Jakarta, Indonesia with a population of 28 million, is a prime example of a city plagued by such flooding. Yet although Jakarta has ample hydraulic infrastructure already in place with more being constructed, the increasingly severity of the flooding it experiences is not from a lack of hydraulic infrastructure but rather a failure of existing infrastructure. As was demonstrated during the most recent floods in Jakarta, the infrastructure failure is often the result of excessive amounts of trash in the flood canals. This trash clogs pumps and reduces the overall system capacity. Despite this critical weakness of flood control in Jakarta, no data exists on the overall amount of trash in the flood canals, much less on how it varies temporally and spatially. The recent availability of low cost photography provides a means to obtain such data. Time lapse photography postprocessed with computer vision algorithms yields a low cost, remote, and automatic solution to measuring the trash fluxes. When combined with the measurement of key hydrological parameters, a thorough understanding of the relationship between trash fluxes and the hydrology of massive urban areas becomes possible. This work examines algorithm development, quantifying trash parameters, and hydrological measurements followed by data assimilation into existing hydraulic and hydrological models of Jakarta. The insights afforded from such an approach allows for more efficient operating of hydraulic infrastructure, knowledge of when and where critical levels of trash originate from, and the opportunity for community outreach - which is ultimately needed to reduce the trash in the flood canals of Jakarta and megacities around the world.
Technography and Design-Actuality Gap-Analysis of Internet Computer Technologies-Assisted Education: Western Expectations and Global Education

ERIC Educational Resources Information Center

Greenhalgh-Spencer, Heather; Jerbi, Moja

2017-01-01

In this paper, we provide a design-actuality gap-analysis of the internet infrastructure that exists in developing nations and nations in the global South with the deployed internet computer technologies (ICT)-assisted programs that are designed to use internet infrastructure to provide educational opportunities. Programs that specifically…
Closing the Gap: Cybersecurity for U.S. Forces and Commands

DTIC Science & Technology

2017-03-30

Dickson, Ph.D. Professor of Military Studies , JAWS Thesis Advisor Kevin Therrien, Col, USAF Committee Member Stephen Rogers, Colonel, USA Director...infrastructures, and includes the Internet, telecommunications networks, computer systems, and embedded processors and controllers in critical industries.”5...of information technology infrastructures, including the Internet, telecommunications networks, computer systems, and embedded processors and

Tuning In: Challenging Design for Communities through a Field Study of Radio Amateurs

NASA Astrophysics Data System (ADS)

Bogdan, Cristian; Bowers, John

As illustrated by the emerging field of Communities and Technologies, the topic of community, whether further qualified by ‘virtual' (Rheingold 1993), ‘on line' or ‘networked' (Schuler 1996), has become a major focus for field study, design, technical infrastructural provision, as well as social, psychological and economic theorising. Let us review some early examples of this ‘turn to community'. (1999) discuss the ‘network communities of SeniorNet', an organisation that supports people over the age of 50 in the use of computer networking technologies. The SeniorNet study highlights the complex ‘collage' of participation and interaction styles that community members sustain, many of which go beyond conventional understandings of older people, their practices and relations to technology. While the members of SeniorNet are geographically dispersed, (1996) describe the ‘Blacksburg Electronic Village', a local community computing initiative centred around Blacksburg, Virginia, USA. As long ago as 1994, (1994) claimed the existence of over 100 such projects in the US with very diverse aims and experiences but all concerned to be responsive to a community's needs while exploiting the Internet and the technical developments it has made possible. For their part, (2001) offer some generic infrastructural tools for community computing, including support for ‘identity management'.
The National Information Infrastructure: Agenda for Action.

ERIC Educational Resources Information Center

Department of Commerce, Washington, DC. Information Infrastructure Task Force.

The National Information Infrastructure (NII) is planned as a web of communications networks, computers, databases, and consumer electronics that will put vast amounts of information at the users' fingertips. Private sector firms are beginning to develop this infrastructure, but essential roles remain for the Federal Government. The National…
The virtual machine (VM) scaler: an infrastructure manager supporting environmental modeling on IaaS clouds

USDA-ARS?s Scientific Manuscript database

Infrastructure-as-a-service (IaaS) clouds provide a new medium for deployment of environmental modeling applications. Harnessing advancements in virtualization, IaaS clouds can provide dynamic scalable infrastructure to better support scientific modeling computational demands. Providing scientific m...
Helix Nebula: Enabling federation of existing data infrastructures and data services to an overarching cross-domain e-infrastructure

NASA Astrophysics Data System (ADS)

Lengert, Wolfgang; Farres, Jordi; Lanari, Riccardo; Casu, Francesco; Manunta, Michele; Lassalle-Balier, Gerard

2014-05-01

Helix Nebula has established a growing public private partnership of more than 30 commercial cloud providers, SMEs, and publicly funded research organisations and e-infrastructures. The Helix Nebula strategy is to establish a federated cloud service across Europe. Three high-profile flagships, sponsored by CERN (high energy physics), EMBL (life sciences) and ESA/DLR/CNES/CNR (earth science), have been deployed and extensively tested within this federated environment. The commitments behind these initial flagships have created a critical mass that attracts suppliers and users to the initiative, to work together towards an "Information as a Service" market place. Significant progress in implementing the following 4 programmatic goals (as outlined in the strategic Plan Ref.1) has been achieved: × Goal #1 Establish a Cloud Computing Infrastructure for the European Research Area (ERA) serving as a platform for innovation and evolution of the overall infrastructure. × Goal #2 Identify and adopt suitable policies for trust, security and privacy on a European-level can be provided by the European Cloud Computing framework and infrastructure. × Goal #3 Create a light-weight governance structure for the future European Cloud Computing Infrastructure that involves all the stakeholders and can evolve over time as the infrastructure, services and user-base grows. × Goal #4 Define a funding scheme involving the three stake-holder groups (service suppliers, users, EC and national funding agencies) into a Public-Private-Partnership model to implement a Cloud Computing Infrastructure that delivers a sustainable business environment adhering to European level policies. Now in 2014 a first version of this generic cross-domain e-infrastructure is ready to go into operations building on federation of European industry and contributors (data, tools, knowledge, ...). This presentation describes how Helix Nebula is being used in the domain of earth science focusing on geohazards. The so called "Supersite Exploitation Platform" (SSEP) provides scientists an overarching federated e-infrastructure with a very fast access to (i) large volume of data (EO/non-space data), (ii) computing resources (e.g. hybrid cloud/grid), (iii) processing software (e.g. toolboxes, RTMs, retrieval baselines, visualization routines), and (iv) general platform capabilities (e.g. user management and access control, accounting, information portal, collaborative tools, social networks etc.). In this federation each data provider remains in full control of the implementation of its data policy. This presentation outlines the Architecture (technical and services) supporting very heterogeneous science domains as well as the procedures for new-comers to join the Helix Nebula Market Place. Ref.1 http://cds.cern.ch/record/1374172/files/CERN-OPEN-2011-036.pdf
Approach to sustainable e-Infrastructures - The case of the Latin American Grid

NASA Astrophysics Data System (ADS)

Barbera, Roberto; Diacovo, Ramon; Brasileiro, Francisco; Carvalho, Diego; Dutra, Inês; Faerman, Marcio; Gavillet, Philippe; Hoeger, Herbert; Lopez Pourailly, Maria Jose; Marechal, Bernard; Garcia, Rafael Mayo; Neumann Ciuffo, Leandro; Ramos Pollan, Paul; Scardaci, Diego; Stanton, Michael

2010-05-01

The EELA (E-Infrastructure shared between Europe and Latin America) and EELA-2 (E-science grid facility for Europe and Latin America) projects, co-funded by the European Commission under FP6 and FP7, respectively, have been successful in building a high capacity, production-quality, scalable Grid Facility for a wide spectrum of applications (e.g. Earth & Life Sciences, High energy physics, etc.) from several European and Latin American User Communities. This paper presents the 4-year experience of EELA and EELA-2 in: • Providing each Member Institution the unique opportunity to benefit of a huge distributed computing platform for its research activities, in particular through initiatives such as OurGrid which proposes a so-called Opportunistic Grid Computing well adapted to small and medium Research Laboratories such as most of those of Latin America and Africa; • Developing a realistic strategy to ensure the long-term continuity of the e-Infrastructure in the Latin American continent, beyond the term of the EELA-2 project, in association with CLARA and collaborating with EGI. Previous interactions between EELA and African Grid members at events such as the IST Africa'07, 08 and 09, the International Conference on Open Access'08 and EuroAfriCa-ICT'08, to which EELA and EELA-2 contributed, have shown that the e-Infrastructure situation in Africa compares well with the Latin American one. This means that African Grids are likely to face the same problems that EELA and EELA-2 experienced, especially in getting the necessary User and Decision Makers support to create NGIs and, later, a possible continent-wide African Grid Initiative (AGI). The hope is that the EELA-2 endeavour towards sustainability as described in this presentation could help the progress of African Grids.
An EMSO data case study within the INDIGO-DC project

NASA Astrophysics Data System (ADS)

Monna, Stephen; Marcucci, Nicola M.; Marinaro, Giuditta; Fiore, Sandro; D'Anca, Alessandro; Antonacci, Marica; Beranzoli, Laura; Favali, Paolo

2017-04-01

We present our experience based on a case study within the INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation) project (www.indigo-datacloud.eu). The aim of INDIGO-DC is to develop a data and computing platform targeting scientific communities. Our case study is an example of activities performed by INGV using data from seafloor observatories that are nodes of the infrastructure EMSO (European Multidisciplinary Seafloor and water column Observatory)-ERIC (www.emso-eu.org). EMSO is composed of several deep-seafloor and water column observatories, deployed at key sites in the European waters, thus forming a widely distributed pan-European infrastructure. In our case study we consider data collected by the NEMO-SN1 observatory, one of the EMSO nodes used for geohazard monitoring, located in the Western Ionian Sea in proximity of Etna volcano. Starting from the case study, through an agile approach, we defined some requirements for INDIGO developers, and tested some of the proposed INDIGO solutions that are of interest for our research community. Given that EMSO is a distributed infrastructure, we are interested in INDIGO solutions that allow access to distributed data storage. Access should be both user-oriented and machine-oriented, and with the use of a common identity and access system. For this purpose, we have been testing: - ONEDATA (https://onedata.org), as global data management system. - INDIGO-IAM as Identity and Access Management system. Another aspect we are interested in is the efficient data processing, and we have focused on two types of INDIGO products: - Ophidia (http://ophidia.cmcc.it), a big data analytics framework for eScience for the analysis of multidimensional data. - A collection of INDIGO Services to run processes for scientific computing through the INDIGO Orchestrator.
Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

DOE PAGES

Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; ...

2011-01-01

In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less
A practical approach to virtualization in HEP

NASA Astrophysics Data System (ADS)

Buncic, P.; Aguado Sánchez, C.; Blomer, J.; Harutyunyan, A.; Mudrinic, M.

2011-01-01

In the attempt to solve the problem of processing data coming from LHC experiments at CERN at a rate of 15PB per year, for almost a decade the High Enery Physics (HEP) community has focused its efforts on the development of the Worldwide LHC Computing Grid. This generated large interest and expectations promising to revolutionize computing. Meanwhile, having initially taken part in the Grid standardization process, industry has moved in a different direction and started promoting the Cloud Computing paradigm which aims to solve problems on a similar scale and in equally seamless way as it was expected in the idealized Grid approach. A key enabling technology behind Cloud computing is server virtualization. In early 2008, an R&D project was established in the PH-SFT group at CERN to investigate how virtualization technology could be used to improve and simplify the daily interaction of physicists with experiment software frameworks and the Grid infrastructure. In this article we shall first briefly compare Grid and Cloud computing paradigms and then summarize the results of the R&D activity pointing out where and how virtualization technology could be effectively used in our field in order to maximize practical benefits whilst avoiding potential pitfalls.
A Grid Infrastructure for Supporting Space-based Science Operations

NASA Technical Reports Server (NTRS)

Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)

2002-01-01

Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.
The CMS High Level Trigger System: Experience and Future Development

NASA Astrophysics Data System (ADS)

Bauer, G.; Behrens, U.; Bowen, M.; Branson, J.; Bukowiec, S.; Cittolin, S.; Coarasa, J. A.; Deldicque, C.; Dobson, M.; Dupont, A.; Erhan, S.; Flossdorf, A.; Gigi, D.; Glege, F.; Gomez-Reino, R.; Hartl, C.; Hegeman, J.; Holzner, A.; Hwong, Y. L.; Masetti, L.; Meijers, F.; Meschi, E.; Mommsen, R. K.; O'Dell, V.; Orsini, L.; Paus, C.; Petrucci, A.; Pieri, M.; Polese, G.; Racz, A.; Raginel, O.; Sakulin, H.; Sani, M.; Schwick, C.; Shpakov, D.; Simon, S.; Spataru, A. C.; Sumorok, K.

2012-12-01

The CMS experiment at the LHC features a two-level trigger system. Events accepted by the first level trigger, at a maximum rate of 100 kHz, are read out by the Data Acquisition system (DAQ), and subsequently assembled in memory in a farm of computers running a software high-level trigger (HLT), which selects interesting events for offline storage and analysis at a rate of order few hundred Hz. The HLT algorithms consist of sequences of offline-style reconstruction and filtering modules, executed on a farm of 0(10000) CPU cores built from commodity hardware. Experience from the operation of the HLT system in the collider run 2010/2011 is reported. The current architecture of the CMS HLT, its integration with the CMS reconstruction framework and the CMS DAQ, are discussed in the light of future development. The possible short- and medium-term evolution of the HLT software infrastructure to support extensions of the HLT computing power, and to address remaining performance and maintenance issues, are discussed.
BelleII@home: Integrate volunteer computing resources into DIRAC in a secure way

NASA Astrophysics Data System (ADS)

Wu, Wenjing; Hara, Takanori; Miyake, Hideki; Ueda, Ikuo; Kan, Wenxiao; Urquijo, Phillip

2017-10-01

The exploitation of volunteer computing resources has become a popular practice in the HEP computing community as the huge amount of potential computing power it provides. In the recent HEP experiments, the grid middleware has been used to organize the services and the resources, however it relies heavily on the X.509 authentication, which is contradictory to the untrusted feature of volunteer computing resources, therefore one big challenge to utilize the volunteer computing resources is how to integrate them into the grid middleware in a secure way. The DIRAC interware which is commonly used as the major component of the grid computing infrastructure for several HEP experiments proposes an even bigger challenge to this paradox as its pilot is more closely coupled with operations requiring the X.509 authentication compared to the implementations of pilot in its peer grid interware. The Belle II experiment is a B-factory experiment at KEK, and it uses DIRAC for its distributed computing. In the project of BelleII@home, in order to integrate the volunteer computing resources into the Belle II distributed computing platform in a secure way, we adopted a new approach which detaches the payload running from the Belle II DIRAC pilot which is a customized pilot pulling and processing jobs from the Belle II distributed computing platform, so that the payload can run on volunteer computers without requiring any X.509 authentication. In this approach we developed a gateway service running on a trusted server which handles all the operations requiring the X.509 authentication. So far, we have developed and deployed the prototype of BelleII@home, and tested its full workflow which proves the feasibility of this approach. This approach can also be applied on HPC systems whose work nodes do not have outbound connectivity to interact with the DIRAC system in general.
Network Computing Infrastructure to Share Tools and Data in Global Nuclear Energy Partnership

NASA Astrophysics Data System (ADS)

Kim, Guehee; Suzuki, Yoshio; Teshima, Naoya

CCSE/JAEA (Center for Computational Science and e-Systems/Japan Atomic Energy Agency) integrated a prototype system of a network computing infrastructure for sharing tools and data to support the U.S. and Japan collaboration in GNEP (Global Nuclear Energy Partnership). We focused on three technical issues to apply our information process infrastructure, which are accessibility, security, and usability. In designing the prototype system, we integrated and improved both network and Web technologies. For the accessibility issue, we adopted SSL-VPN (Security Socket Layer-Virtual Private Network) technology for the access beyond firewalls. For the security issue, we developed an authentication gateway based on the PKI (Public Key Infrastructure) authentication mechanism to strengthen the security. Also, we set fine access control policy to shared tools and data and used shared key based encryption method to protect tools and data against leakage to third parties. For the usability issue, we chose Web browsers as user interface and developed Web application to provide functions to support sharing tools and data. By using WebDAV (Web-based Distributed Authoring and Versioning) function, users can manipulate shared tools and data through the Windows-like folder environment. We implemented the prototype system in Grid infrastructure for atomic energy research: AEGIS (Atomic Energy Grid Infrastructure) developed by CCSE/JAEA. The prototype system was applied for the trial use in the first period of GNEP.
Geospatial-enabled Data Exploration and Computation through Data Infrastructure Building Blocks

NASA Astrophysics Data System (ADS)

Song, C. X.; Biehl, L. L.; Merwade, V.; Villoria, N.

2015-12-01

Geospatial data are present everywhere today with the proliferation of location-aware computing devices and sensors. This is especially true in the scientific community where large amounts of data are driving research and education activities in many domains. Collaboration over geospatial data, for example, in modeling, data analysis and visualization, must still overcome the barriers of specialized software and expertise among other challenges. The GABBs project aims at enabling broader access to geospatial data exploration and computation by developing spatial data infrastructure building blocks that leverage capabilities of end-to-end application service and virtualized computing framework in HUBzero. Funded by NSF Data Infrastructure Building Blocks (DIBBS) initiative, GABBs provides a geospatial data architecture that integrates spatial data management, mapping and visualization and will make it available as open source. The outcome of the project will enable users to rapidly create tools and share geospatial data and tools on the web for interactive exploration of data without requiring significant software development skills, GIS expertise or IT administrative privileges. This presentation will describe the development of geospatial data infrastructure building blocks and the scientific use cases that help drive the software development, as well as seek feedback from the user communities.
Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

NASA Astrophysics Data System (ADS)

Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

2014-06-01

Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.
Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt

Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared overmore » the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.« less
Building analytical platform with Big Data solutions for log files of PanDA infrastructure

NASA Astrophysics Data System (ADS)

Alekseev, A. A.; Barreiro Megino, F. G.; Klimentov, A. A.; Korchuganova, T. A.; Maendo, T.; Padolski, S. V.

2018-05-01

The paper describes the implementation of a high-performance system for the processing and analysis of log files for the PanDA infrastructure of the ATLAS experiment at the Large Hadron Collider (LHC), responsible for the workload management of order of 2M daily jobs across the Worldwide LHC Computing Grid. The solution is based on the ELK technology stack, which includes several components: Filebeat, Logstash, ElasticSearch (ES), and Kibana. Filebeat is used to collect data from logs. Logstash processes data and export to Elasticsearch. ES are responsible for centralized data storage. Accumulated data in ES can be viewed using a special software Kibana. These components were integrated with the PanDA infrastructure and replaced previous log processing systems for increased scalability and usability. The authors will describe all the components and their configuration tuning for the current tasks, the scale of the actual system and give several real-life examples of how this centralized log processing and storage service is used to showcase the advantages for daily operations.
Extending the farm on external sites: the INFN Tier-1 experience

NASA Astrophysics Data System (ADS)

Boccali, T.; Cavalli, A.; Chiarelli, L.; Chierici, A.; Cesini, D.; Ciaschini, V.; Dal Pra, S.; dell'Agnello, L.; De Girolamo, D.; Falabella, A.; Fattibene, E.; Maron, G.; Prosperini, A.; Sapunenko, V.; Virgilio, S.; Zani, S.

2017-10-01

The Tier-1 at CNAF is the main INFN computing facility offering computing and storage resources to more than 30 different scientific collaborations including the 4 experiments at the LHC. It is also foreseen a huge increase in computing needs in the following years mainly driven by the experiments at the LHC (especially starting with the run 3 from 2021) but also by other upcoming experiments such as CTA[1] While we are considering the upgrade of the infrastructure of our data center, we are also evaluating the possibility of using CPU resources available in other data centres or even leased from commercial cloud providers. Hence, at INFN Tier-1, besides participating to the EU project HNSciCloud, we have also pledged a small amount of computing resources (˜ 2000 cores) located at the Bari ReCaS[2] for the WLCG experiments for 2016 and we are testing the use of resources provided by a commercial cloud provider. While the Bari ReCaS data center is directly connected to the GARR network[3] with the obvious advantage of a low latency and high bandwidth connection, in the case of the commercial provider we rely only on the General Purpose Network. In this paper we describe the set-up phase and the first results of these installations started in the last quarter of 2015, focusing on the issues that we have had to cope with and discussing the measured results in terms of efficiency.
On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

NASA Astrophysics Data System (ADS)

Erli, G.; Fischer, F.; Fleig, G.; Giffels, M.; Hauth, T.; Quast, G.; Schnepf, M.; Heese, J.; Leppert, K.; Arnaez de Pedro, J.; Sträter, R.

2017-10-01

This contribution reports on solutions, experiences and recent developments with the dynamic, on-demand provisioning of remote computing resources for analysis and simulation workflows. Local resources of a physics institute are extended by private and commercial cloud sites, ranging from the inclusion of desktop clusters over institute clusters to HPC centers. Rather than relying on dedicated HEP computing centers, it is nowadays more reasonable and flexible to utilize remote computing capacity via virtualization techniques or container concepts. We report on recent experience from incorporating a remote HPC center (NEMO Cluster, Freiburg University) and resources dynamically requested from the commercial provider 1&1 Internet SE into our intitute’s computing infrastructure. The Freiburg HPC resources are requested via the standard batch system, allowing HPC and HEP applications to be executed simultaneously, such that regular batch jobs run side by side to virtual machines managed via OpenStack [1]. For the inclusion of the 1&1 commercial resources, a Python API and SDK as well as the possibility to upload images were available. Large scale tests prove the capability to serve the scientific use case in the European 1&1 datacenters. The described environment at the Institute of Experimental Nuclear Physics (IEKP) at KIT serves the needs of researchers participating in the CMS and Belle II experiments. In total, resources exceeding half a million CPU hours have been provided by remote sites.
Computational simulation of concurrent engineering for aerospace propulsion systems

NASA Technical Reports Server (NTRS)

Chamis, C. C.; Singhal, S. N.

1992-01-01

Results are summarized of an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulations methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties - fundamental in developing such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering for propulsion systems and systems in general. Benefits and facets needing early attention in the development are outlined.
Computational simulation for concurrent engineering of aerospace propulsion systems

NASA Technical Reports Server (NTRS)

Chamis, C. C.; Singhal, S. N.

1993-01-01

Results are summarized for an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulation methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties--fundamental to develop such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering of propulsion systems and systems in general. Benefits and issues needing early attention in the development are outlined.

Computational simulation for concurrent engineering of aerospace propulsion systems

NASA Astrophysics Data System (ADS)

Chamis, C. C.; Singhal, S. N.

1993-02-01

Results are summarized for an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulation methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties--fundamental to develop such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering of propulsion systems and systems in general. Benefits and issues needing early attention in the development are outlined.
The future of PanDA in ATLAS distributed computing

NASA Astrophysics Data System (ADS)

De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.

2015-12-01

Experiments at the Large Hadron Collider (LHC) face unprecedented computing challenges. Heterogeneous resources are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, while data processing requires more than a few billion hours of computing usage per year. The PanDA (Production and Distributed Analysis) system was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. In the process, the old batch job paradigm of locally managed computing in HEP was discarded in favour of a far more automated, flexible and scalable model. The success of PanDA in ATLAS is leading to widespread adoption and testing by other experiments. PanDA is the first exascale workload management system in HEP, already operating at more than a million computing jobs per day, and processing over an exabyte of data in 2013. There are many new challenges that PanDA will face in the near future, in addition to new challenges of scale, heterogeneity and increasing user base. PanDA will need to handle rapidly changing computing infrastructure, will require factorization of code for easier deployment, will need to incorporate additional information sources including network metrics in decision making, be able to control network circuits, handle dynamically sized workload processing, provide improved visualization, and face many other challenges. In this talk we will focus on the new features, planned or recently implemented, that are relevant to the next decade of distributed computing workload management using PanDA.
A service-based BLAST command tool supported by cloud infrastructures.

PubMed

Carrión, Abel; Blanquer, Ignacio; Hernández, Vicente

2012-01-01

Notwithstanding the benefits of distributed-computing infrastructures for empowering bioinformatics analysis tools with the needed computing and storage capability, the actual use of these infrastructures is still low. Learning curves and deployment difficulties have reduced the impact on the wide research community. This article presents a porting strategy of BLAST based on a multiplatform client and a service that provides the same interface as sequential BLAST, thus reducing learning curve and with minimal impact on their integration on existing workflows. The porting has been done using the execution and data access components from the EC project Venus-C and the Windows Azure infrastructure provided in this project. The results obtained demonstrate a low overhead on the global execution framework and reasonable speed-up and cost-efficiency with respect to a sequential version.
The GÉANT network: addressing current and future needs of the HEP community

NASA Astrophysics Data System (ADS)

Capone, Vincenzo; Usman, Mian

2015-12-01

The GÉANT infrastructure is the backbone that serves the scientific communities in Europe for their data movement needs and their access to international research and education networks. Using the extensive fibre footprint and infrastructure in Europe the GÉANT network delivers a portfolio of services aimed to best fit the specific needs of the users, including Authentication and Authorization Infrastructure, end-to-end performance monitoring, advanced network services (dynamic circuits, L2-L3VPN, MD-VPN). This talk will outline the factors that help the GÉANT network to respond to the needs of the High Energy Physics community, both in Europe and worldwide. The Pan-European network provides the connectivity between 40 European national research and education networks. In addition, GÉANT also connects the European NRENs to the R&E networks in other world region and has reach to over 110 NREN worldwide, making GÉANT the best connected Research and Education network, with its multiple intercontinental links to different continents e.g. North and South America, Africa and Asia-Pacific. The High Energy Physics computational needs have always had (and will keep having) a leading role among the scientific user groups of the GÉANT network: the LHCONE overlay network has been built, in collaboration with the other big world REN, specifically to address the peculiar needs of the LHC data movement. Recently, as a result of a series of coordinated efforts, the LHCONE network has been expanded to the Asia-Pacific area, and is going to include some of the main regional R&E network in the area. The LHC community is not the only one that is actively using a distributed computing model (hence the need for a high-performance network); new communities are arising, as BELLE II. GÉANT is deeply involved also with the BELLE II Experiment, to provide full support to their distributed computing model, along with a perfSONAR-based network monitoring system. GÉANT has also coordinated the setup of the network infrastructure to perform the BELLE II Trans-Atlantic Data Challenge, and has been active on helping the BELLE II community to sort out their end-to-end performance issues. In this talk we will provide information about the current GÉANT network architecture and of the international connectivity, along with the upcoming upgrades and the planned and foreseeable improvements. We will also describe the implementation of the solutions provided to support the LHC and BELLE II experiments.
iTools: a framework for classification, categorization and integration of computational biology resources.

PubMed

Dinov, Ivo D; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H V; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D Stott; Toga, Arthur W

2008-05-28

The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.
iTools: A Framework for Classification, Categorization and Integration of Computational Biology Resources

PubMed Central

Dinov, Ivo D.; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H. V.; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D. Stott; Toga, Arthur W.

2008-01-01

The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu. PMID:18509477
Coalition Agents Experiment: Multi-Agent Co-operation in an International Coalition Setting

DTIC Science & Technology

2002-04-01

middleware layer based on Java / Jini technology that provides the computing infrastructure to integrate heterogeneous agent communities and systems...Bowl of Africa". E GAO Elb i-on1-1 I ~ ~ L11g A1adz^ , fo foq r cs 9 6 •F ..................................... Figure 1. Map of Binni showing...firestorm deception. Misinformation from Gao is intended to displace the firestorm to the west, allowing Gao and Agadez forces to clash in the region of the
An economic and financial exploratory

NASA Astrophysics Data System (ADS)

Cincotti, S.; Sornette, D.; Treleaven, P.; Battiston, S.; Caldarelli, G.; Hommes, C.; Kirman, A.

2012-11-01

This paper describes the vision of a European Exploratory for economics and finance using an interdisciplinary consortium of economists, natural scientists, computer scientists and engineers, who will combine their expertise to address the enormous challenges of the 21st century. This Academic Public facility is intended for economic modelling, investigating all aspects of risk and stability, improving financial technology, and evaluating proposed regulatory and taxation changes. The European Exploratory for economics and finance will be constituted as a network of infrastructure, observatories, data repositories, services and facilities and will foster the creation of a new cross-disciplinary research community of social scientists, complexity scientists and computing (ICT) scientists to collaborate in investigating major issues in economics and finance. It is also considered a cradle for training and collaboration with the private sector to spur spin-offs and job creations in Europe in the finance and economic sectors. The Exploratory will allow Social Scientists and Regulators as well as Policy Makers and the private sector to conduct realistic investigations with real economic, financial and social data. The Exploratory will (i) continuously monitor and evaluate the status of the economies of countries in their various components, (ii) use, extend and develop a large variety of methods including data mining, process mining, computational and artificial intelligence and every other computer and complex science techniques coupled with economic theory and econometric, and (iii) provide the framework and infrastructure to perform what-if analysis, scenario evaluations and computational, laboratory, field and web experiments to inform decision makers and help develop innovative policy, market and regulation designs.
Enabling Data Intensive Science through Service Oriented Science: Virtual Laboratories and Science Gateways

NASA Astrophysics Data System (ADS)

Lescinsky, D. T.; Wyborn, L. A.; Evans, B. J. K.; Allen, C.; Fraser, R.; Rankine, T.

2014-12-01

We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia's National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI's 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud. The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes. We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.
Cloud computing applications for biomedical science: A perspective.

PubMed

Navale, Vivek; Bourne, Philip E

2018-06-01

Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Cloud computing applications for biomedical science: A perspective

PubMed Central

2018-01-01

Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
National Computational Infrastructure for Lattice Gauge Theory SciDAC-2 Closeout Report Indiana University Component

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gottlieb, Steven Arthur; DeTar, Carleton; Tousaint, Doug

This is the closeout report for the Indiana University portion of the National Computational Infrastructure for Lattice Gauge Theory project supported by the United States Department of Energy under the SciDAC program. It includes information about activities at Indian University, the University of Arizona, and the University of Utah, as those three universities coordinated their activities.
Deploying Crowd-Sourced Formal Verification Systems in a DoD Network

DTIC Science & Technology

2013-09-01

INTENTIONALLY LEFT BLANK 1 I. INTRODUCTION A. INTRODUCTION In 2014 cyber attacks on critical infrastructure are expected to increase...CSFV systems on the Internet‒‒possibly using cloud infrastructure (Dean, 2013). By using Amazon Compute Cloud (EC2) systems, DARPA will use ordinary...through standard access methods. Those clients could be mobile phones, laptops, netbooks, tablet computers or personal digital assistants (PDAs) (Smoot
Estuarine wetland evolution including sea-level rise and infrastructure effects.

NASA Astrophysics Data System (ADS)

Rodriguez, Jose Fernando; Trivisonno, Franco; Rojas, Steven Sandi; Riccardi, Gerardo; Stenta, Hernan; Saco, Patricia Mabel

2015-04-01

Estuarine wetlands are an extremely valuable resource in terms of biotic diversity, flood attenuation, storm surge protection, groundwater recharge, filtering of surface flows and carbon sequestration. On a large scale the survival of these systems depends on the slope of the land and a balance between the rates of accretion and sea-level rise, but local man-made flow disturbances can have comparable effects. Climate change predictions for most of Australia include an accelerated sea level rise, which may challenge the survival of estuarine wetlands. Furthermore, coastal infrastructure poses an additional constraint on the adaptive capacity of these ecosystems. Numerical models are increasingly being used to assess wetland dynamics and to help manage some of these situations. We present results of a wetland evolution model that is based on computed values of hydroperiod and tidal range that drive vegetation preference. Our first application simulates the long term evolution of an Australian wetland heavily constricted by infrastructure that is undergoing the effects of predicted accelerated sea level rise. The wetland presents a vegetation zonation sequence mudflats - mangrove - saltmarsh from the seaward margin and up the topographic gradient but is also affected by compartmentalization due to internal road embankments and culverts that effectively attenuates tidal input to the upstream compartments. For this reason, the evolution model includes a 2D hydrodynamic module which is able to handle man-made flow controls and spatially varying roughness. It continually simulates tidal inputs into the wetland and computes annual values of hydroperiod and tidal range to update vegetation distribution based on preference to hydrodynamic conditions of the different vegetation types. It also computes soil accretion rates and updates roughness coefficient values according to evolving vegetation types. In order to explore in more detail the magnitude of flow attenuation due to roughness and its effects on the computation of tidal range and hydroperiod, we performed numerical experiments simulating floodplain flow on the side of a tidal creek using different roughness values. Even though the values of roughness that produce appreciable changes in hydroperiod and tidal range are relatively high, they are within the range expected for some of the wetland vegetation. Both applications of the model show that flow attenuation can play a major role in wetland hydrodynamics and that its effects must be considered when predicting wetland evolution under climate change scenarios, particularly in situations where existing infrastructure affects the flow.
Doing Your Science While You're in Orbit

NASA Astrophysics Data System (ADS)

Green, Mark L.; Miller, Stephen D.; Vazhkudai, Sudharshan S.; Trater, James R.

2010-11-01

Large-scale neutron facilities such as the Spallation Neutron Source (SNS) located at Oak Ridge National Laboratory need easy-to-use access to Department of Energy Leadership Computing Facilities and experiment repository data. The Orbiter thick- and thin-client and its supporting Service Oriented Architecture (SOA) based services (available at https://orbiter.sns.gov) consist of standards-based components that are reusable and extensible for accessing high performance computing, data and computational grid infrastructure, and cluster-based resources easily from a user configurable interface. The primary Orbiter system goals consist of (1) developing infrastructure for the creation and automation of virtual instrumentation experiment optimization, (2) developing user interfaces for thin- and thick-client access, (3) provide a prototype incorporating major instrument simulation packages, and (4) facilitate neutron science community access and collaboration. The secure Orbiter SOA authentication and authorization is achieved through the developed Virtual File System (VFS) services, which use Role-Based Access Control (RBAC) for data repository file access, thin-and thick-client functionality and application access, and computational job workflow management. The VFS Relational Database Management System (RDMS) consists of approximately 45 database tables describing 498 user accounts with 495 groups over 432,000 directories with 904,077 repository files. Over 59 million NeXus file metadata records are associated to the 12,800 unique NeXus file field/class names generated from the 52,824 repository NeXus files. Services that enable (a) summary dashboards of data repository status with Quality of Service (QoS) metrics, (b) data repository NeXus file field/class name full text search capabilities within a Google like interface, (c) fully functional RBAC browser for the read-only data repository and shared areas, (d) user/group defined and shared metadata for data repository files, (e) user, group, repository, and web 2.0 based global positioning with additional service capabilities are currently available. The SNS based Orbiter SOA integration progress with the Distributed Data Analysis for Neutron Scattering Experiments (DANSE) software development project is summarized with an emphasis on DANSE Central Services and the Virtual Neutron Facility (VNF). Additionally, the DANSE utilization of the Orbiter SOA authentication, authorization, and data transfer services best practice implementations are presented.
OSiRIS: a distributed Ceph deployment using software defined networking for multi-institutional research

NASA Astrophysics Data System (ADS)

McKee, Shawn; Kissel, Ezra; Meekhof, Benjeman; Swany, Martin; Miller, Charles; Gregorowicz, Michael

2017-10-01

We report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The projects goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to read, write, manage and share data directly from their own computing locations. The NSF CC*DNI DIBBS program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges. While OSiRIS will eventually be serving a broad range of science domains, its first adopter will be the LHC ATLAS detector project via the ATLAS Great Lakes Tier-2 (AGLT2) jointly located at the University of Michigan and Michigan State University. Part of our presentation will cover how ATLAS is using the OSiRIS infrastructure and our experiences integrating our first user community. The presentation will also review the motivations for and goals of the project, the technical details of the OSiRIS infrastructure, the challenges in providing such an infrastructure, and the technical choices made to address those challenges. We will conclude with our plans for the remaining 4 years of the project and our vision for what we hope to deliver by the projects end.
Data Acquisition and Mass Storage

NASA Astrophysics Data System (ADS)

Vande Vyvre, P.

2004-08-01

The experiments performed at supercolliders will constitute a new challenge in several disciplines of High Energy Physics and Information Technology. This will definitely be the case for data acquisition and mass storage. The microelectronics, communication, and computing industries are maintaining an exponential increase of the performance of their products. The market of commodity products remains the largest and the most competitive market of technology products. This constitutes a strong incentive to use these commodity products extensively as components to build the data acquisition and computing infrastructures of the future generation of experiments. The present generation of experiments in Europe and in the US already constitutes an important step in this direction. The experience acquired in the design and the construction of the present experiments has to be complemented by a large R&D effort executed with good awareness of industry developments. The future experiments will also be expected to follow major trends of our present world: deliver physics results faster and become more and more visible and accessible. The present evolution of the technologies and the burgeoning of GRID projects indicate that these trends will be made possible. This paper includes a brief overview of the technologies currently used for the different tasks of the experimental data chain: data acquisition, selection, storage, processing, and analysis. The major trends of the computing and networking technologies are then indicated with particular attention paid to their influence on the future experiments. Finally, the vision of future data acquisition and processing systems and their promise for future supercolliders is presented.
Scaling the CERN OpenStack cloud

NASA Astrophysics Data System (ADS)

Bell, T.; Bompastor, B.; Bukowiec, S.; Castro Leon, J.; Denis, M. K.; van Eldik, J.; Fermin Lobo, M.; Fernandez Alvarez, L.; Fernandez Rodriguez, D.; Marino, A.; Moreira, B.; Noel, B.; Oulevey, T.; Takase, W.; Wiebalck, A.; Zilli, S.

2015-12-01

CERN has been running a production OpenStack cloud since July 2013 to support physics computing and infrastructure services for the site. In the past year, CERN Cloud Infrastructure has seen a constant increase in nodes, virtual machines, users and projects. This paper will present what has been done in order to make the CERN cloud infrastructure scale out.
Science of Security Lablet - Scalability and Usability

DTIC Science & Technology

2014-12-16

mobile computing [19]. However, the high-level infrastructure design and our own implementation (both described throughout this paper) can easily...critical and infrastructural systems demands high levels of sophistication in the technical aspects of cybersecurity, software and hardware design...Forget, S. Komanduri, Alessandro Acquisti, Nicolas Christin, Lorrie Cranor, Rahul Telang. "Security Behavior Observatory: Infrastructure for Long-term
Performance measurement and modeling of component applications in a high performance computing environment : a case study.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Armstrong, Robert C.; Ray, Jaideep; Malony, A.

2003-11-01

We present a case study of performance measurement and modeling of a CCA (Common Component Architecture) component-based application in a high performance computing environment. We explore issues peculiar to component-based HPC applications and propose a performance measurement infrastructure for HPC based loosely on recent work done for Grid environments. A prototypical implementation of the infrastructure is used to collect data for a three components in a scientific application and construct performance models for two of them. Both computational and message-passing performance are addressed.

Role of Computational Fluid Dynamics and Wind Tunnels in Aeronautics R and D

NASA Technical Reports Server (NTRS)

Malik, Murjeeb R.; Bushnell, Dennis M.

2012-01-01

The purpose of this report is to investigate the status and future projections for the question of supplantation of wind tunnels by computation in design and to intuit the potential impact of computation approaches on wind-tunnel utilization all with an eye toward reducing the infrastructure cost at aeronautics R&D centers. Wind tunnels have been closing for myriad reasons, and such closings have reduced infrastructure costs. Further cost reductions are desired, and the work herein attempts to project which wind-tunnel capabilities can be replaced in the future and, if possible, the timing of such. If the possibility exists to project when a facility could be closed, then maintenance and other associated costs could be rescheduled accordingly (i.e., before the fact) to obtain an even greater infrastructure cost reduction.
A Computational framework for telemedicine.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.

1998-07-01

Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less
Information technology developments within the national biological information infrastructure

USGS Publications Warehouse

Cotter, G.; Frame, M.T.

2000-01-01

Looking out an office window or exploring a community park, one can easily see the tremendous challenges that biological information presents the computer science community. Biological information varies in format and content depending whether or not it is information pertaining to a particular species (i.e. Brown Tree Snake), or a specific ecosystem, which often includes multiple species, land use characteristics, and geospatially referenced information. The complexity and uniqueness of each individual species or ecosystem do not easily lend themselves to today's computer science tools and applications. To address the challenges that the biological enterprise presents the National Biological Information Infrastructure (NBII) (http://www.nbii.gov) was established in 1993. The NBII is designed to address these issues on a National scale within the United States, and through international partnerships abroad. This paper discusses current computer science efforts within the National Biological Information Infrastructure Program and future computer science research endeavors that are needed to address the ever-growing issues related to our Nation's biological concerns.
Use of agents to implement an integrated computing environment

NASA Technical Reports Server (NTRS)

Hale, Mark A.; Craig, James I.

1995-01-01

Integrated Product and Process Development (IPPD) embodies the simultaneous application to both system and quality engineering methods throughout an iterative design process. The use of IPPD results in the time-conscious, cost-saving development of engineering systems. To implement IPPD, a Decision-Based Design perspective is encapsulated in an approach that focuses on the role of the human designer in product development. The approach has two parts and is outlined in this paper. First, an architecture, called DREAMS, is being developed that facilitates design from a decision-based perspective. Second, a supporting computing infrastructure, called IMAGE, is being designed. Agents are used to implement the overall infrastructure on the computer. Successful agent utilization requires that they be made of three components: the resource, the model, and the wrap. Current work is focused on the development of generalized agent schemes and associated demonstration projects. When in place, the technology independent computing infrastructure will aid the designer in systematically generating knowledge used to facilitate decision-making.
Searching for Physics Beyond the Standard Model: Strongly-Coupled Field Theories at the Intensity and Energy Frontiers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brower, Richard C.

This proposal is to develop the software and algorithmic infrastructure needed for the numerical study of quantum chromodynamics (QCD), and of theories that have been proposed to describe physics beyond the Standard Model (BSM) of high energy physics, on current and future computers. This infrastructure will enable users (1) to improve the accuracy of QCD calculations to the point where they no longer limit what can be learned from high-precision experiments that seek to test the Standard Model, and (2) to determine the predictions of BSM theories in order to understand which of them are consistent with the data thatmore » will soon be available from the LHC. Work will include the extension and optimizations of community codes for the next generation of leadership class computers, the IBM Blue Gene/Q and the Cray XE/XK, and for the dedicated hardware funded for our field by the Department of Energy. Members of our collaboration at Brookhaven National Laboratory and Columbia University worked on the design of the Blue Gene/Q, and have begun to develop software for it. Under this grant we will build upon their experience to produce high-efficiency production codes for this machine. Cray XE/XK computers with many thousands of GPU accelerators will soon be available, and the dedicated commodity clusters we obtain with DOE funding include growing numbers of GPUs. We will work with our partners in NVIDIA's Emerging Technology group to scale our existing software to thousands of GPUs, and to produce highly efficient production codes for these machines. Work under this grant will also include the development of new algorithms for the effective use of heterogeneous computers, and their integration into our codes. It will include improvements of Krylov solvers and the development of new multigrid methods in collaboration with members of the FASTMath SciDAC Institute, using their HYPRE framework, as well as work on improved symplectic integrators.« less
Trans-Pacific Astronomy Experiment Project Status

NASA Technical Reports Server (NTRS)

Hsu, Eddie

2000-01-01

The Trans-Pacific Astronomy Experiment is Phase 2 of the Trans-Pacific High Data Rate Satcom Experiments following the Trans-Pacific High Definition Video Experiment. It is a part of the Global Information Infrastructure-Global Interoperability for Broadband Networks Project (GII-GIBN). Provides global information infrastructure involving broadband satellites and terrestrial networks and access to information by anyone, anywhere, at any time. Collaboration of government, industry, and academic organizations demonstrate the use of broadband satellite links in a global information infrastructure with emphasis on astronomical observations, collaborative discussions and distance learning.
EGI-EUDAT integration activity - Pair data and high-throughput computing resources together

NASA Astrophysics Data System (ADS)

Scardaci, Diego; Viljoen, Matthew; Vitlacil, Dejan; Fiameni, Giuseppe; Chen, Yin; sipos, Gergely; Ferrari, Tiziana

2016-04-01

EGI (www.egi.eu) is a publicly funded e-infrastructure put together to give scientists access to more than 530,000 logical CPUs, 200 PB of disk capacity and 300 PB of tape storage to drive research and innovation in Europe. The infrastructure provides both high throughput computing and cloud compute/storage capabilities. Resources are provided by about 350 resource centres which are distributed across 56 countries in Europe, the Asia-Pacific region, Canada and Latin America. EUDAT (www.eudat.eu) is a collaborative Pan-European infrastructure providing research data services, training and consultancy for researchers, research communities, research infrastructures and data centres. EUDAT's vision is to enable European researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure (CDI) conceived as a network of collaborating, cooperating centres, combining the richness of numerous community-specific data repositories with the permanence and persistence of some of Europe's largest scientific data centres. EGI and EUDAT, in the context of their flagship projects, EGI-Engage and EUDAT2020, started in March 2015 a collaboration to harmonise the two infrastructures, including technical interoperability, authentication, authorisation and identity management, policy and operations. The main objective of this work is to provide end-users with a seamless access to an integrated infrastructure offering both EGI and EUDAT services and, then, pairing data and high-throughput computing resources together. To define the roadmap of this collaboration, EGI and EUDAT selected a set of relevant user communities, already collaborating with both infrastructures, which could bring requirements and help to assign the right priorities to each of them. In this way, from the beginning, this activity has been really driven by the end users. The identified user communities are relevant European Research infrastructure in the field of Earth Science (EPOS and ICOS), Bioinformatics (BBMRI and ELIXIR) and Space Physics (EISCAT-3D). The first outcome of this activity has been the definition of a generic use case that captures the typical user scenario with respect the integrated use of the EGI and EUDAT infrastructures. This generic use case allows a user to instantiate a set of Virtual Machine images on the EGI Federated Cloud to perform computational jobs that analyse data previously stored on EUDAT long-term storage systems. The results of such analysis can be staged back to EUDAT storages, and if needed, allocated with Permanent identifyers (PIDs) for future use. The implementation of this generic use case requires the following integration activities between EGI and EUDAT: (1) harmonisation of the user authentication and authorisation models, (2) implementing interface connectors between the relevant EGI and EUDAT services, particularly EGI Cloud compute facilities and EUDAT long-term storage and PID systems. In the presentation, the collected user requirements and the implementation status of the universal use case will be showed. Furthermore, how the universal use case is currently applied to satisfy EPOS and ICOS needs will be described.
Experience of public procurement of Open Compute servers

NASA Astrophysics Data System (ADS)

Bärring, Olof; Guerri, Marco; Bonfillou, Eric; Valsan, Liviu; Grigore, Alexandru; Dore, Vincent; Gentit, Alain; Clement, Benoît; Grossir, Anthony

2015-12-01

The Open Compute Project. OCP (http://www.opencompute.org/). was launched by Facebook in 2011 with the objective of building efficient computing infrastructures at the lowest possible cost. The technologies are released as open hardware. with the goal to develop servers and data centres following the model traditionally associated with open source software projects. In 2013 CERN acquired a few OCP servers in order to compare performance and power consumption with standard hardware. The conclusions were that there are sufficient savings to motivate an attempt to procure a large scale installation. One objective is to evaluate if the OCP market is sufficiently mature and broad enough to meet the constraints of a public procurement. This paper summarizes this procurement. which started in September 2014 and involved the Request for information (RFI) to qualify bidders and Request for Tender (RFT).
A Big Data Platform for Storing, Accessing, Mining and Learning Geospatial Data

NASA Astrophysics Data System (ADS)

Yang, C. P.; Bambacus, M.; Duffy, D.; Little, M. M.

2017-12-01

Big Data is becoming a norm in geoscience domains. A platform that is capable to effiently manage, access, analyze, mine, and learn the big data for new information and knowledge is desired. This paper introduces our latest effort on developing such a platform based on our past years' experiences on cloud and high performance computing, analyzing big data, comparing big data containers, and mining big geospatial data for new information. The platform includes four layers: a) the bottom layer includes a computing infrastructure with proper network, computer, and storage systems; b) the 2nd layer is a cloud computing layer based on virtualization to provide on demand computing services for upper layers; c) the 3rd layer is big data containers that are customized for dealing with different types of data and functionalities; d) the 4th layer is a big data presentation layer that supports the effient management, access, analyses, mining and learning of big geospatial data.
Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures.

PubMed

Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B

2013-01-01

A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
A Security Monitoring Framework For Virtualization Based HEP Infrastructures

NASA Astrophysics Data System (ADS)

Gomez Ramirez, A.; Martinez Pedreira, M.; Grigoras, C.; Betev, L.; Lara, C.; Kebschull, U.; ALICE Collaboration

2017-10-01

High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware samples. This malware set was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.
Integration of XRootD into the cloud infrastructure for ALICE data analysis

NASA Astrophysics Data System (ADS)

Kompaniets, Mikhail; Shadura, Oksana; Svirin, Pavlo; Yurchenko, Volodymyr; Zarochentsev, Andrey

2015-12-01

Cloud technologies allow easy load balancing between different tasks and projects. From the viewpoint of the data analysis in the ALICE experiment, cloud allows to deploy software using Cern Virtual Machine (CernVM) and CernVM File System (CVMFS), to run different (including outdated) versions of software for long term data preservation and to dynamically allocate resources for different computing activities, e.g. grid site, ALICE Analysis Facility (AAF) and possible usage for local projects or other LHC experiments. We present a cloud solution for Tier-3 sites based on OpenStack and Ceph distributed storage with an integrated XRootD based storage element (SE). One of the key features of the solution is based on idea that Ceph has been used as a backend for Cinder Block Storage service for OpenStack, and in the same time as a storage backend for XRootD, with redundancy and availability of data preserved by Ceph settings. For faster and easier OpenStack deployment was applied the Packstack solution, which is based on the Puppet configuration management system. Ceph installation and configuration operations are structured and converted to Puppet manifests describing node configurations and integrated into Packstack. This solution can be easily deployed, maintained and used even in small groups with limited computing resources and small organizations, which usually have lack of IT support. The proposed infrastructure has been tested on two different clouds (SPbSU & BITP) and integrates successfully with the ALICE data analysis model.
Operational flood control of a low-lying delta system using large time step Model Predictive Control

NASA Astrophysics Data System (ADS)

Tian, Xin; van Overloop, Peter-Jules; Negenborn, Rudy R.; van de Giesen, Nick

2015-01-01

The safety of low-lying deltas is threatened not only by riverine flooding but by storm-induced coastal flooding as well. For the purpose of flood control, these deltas are mostly protected in a man-made environment, where dikes, dams and other adjustable infrastructures, such as gates, barriers and pumps are widely constructed. Instead of always reinforcing and heightening these structures, it is worth considering making the most of the existing infrastructure to reduce the damage and manage the delta in an operational and overall way. In this study, an advanced real-time control approach, Model Predictive Control, is proposed to operate these structures in the Dutch delta system (the Rhine-Meuse delta). The application covers non-linearity in the dynamic behavior of the water system and the structures. To deal with the non-linearity, a linearization scheme is applied which directly uses the gate height instead of the structure flow as the control variable. Given the fact that MPC needs to compute control actions in real-time, we address issues regarding computational time. A new large time step scheme is proposed in order to save computation time, in which different control variables can have different control time steps. Simulation experiments demonstrate that Model Predictive Control with the large time step setting is able to control a delta system better and much more efficiently than the conventional operational schemes.
High-throughput neuroimaging-genetics computational infrastructure

PubMed Central

Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Hobel, Sam; Vespa, Paul; Woo Moon, Seok; Van Horn, John D.; Franco, Joseph; Toga, Arthur W.

2014-01-01

Many contemporary neuroscientific investigations face significant challenges in terms of data management, computational processing, data mining, and results interpretation. These four pillars define the core infrastructure necessary to plan, organize, orchestrate, validate, and disseminate novel scientific methods, computational resources, and translational healthcare findings. Data management includes protocols for data acquisition, archival, query, transfer, retrieval, and aggregation. Computational processing involves the necessary software, hardware, and networking infrastructure required to handle large amounts of heterogeneous neuroimaging, genetics, clinical, and phenotypic data and meta-data. Data mining refers to the process of automatically extracting data features, characteristics and associations, which are not readily visible by human exploration of the raw dataset. Result interpretation includes scientific visualization, community validation of findings and reproducible findings. In this manuscript we describe the novel high-throughput neuroimaging-genetics computational infrastructure available at the Institute for Neuroimaging and Informatics (INI) and the Laboratory of Neuro Imaging (LONI) at University of Southern California (USC). INI and LONI include ultra-high-field and standard-field MRI brain scanners along with an imaging-genetics database for storing the complete provenance of the raw and derived data and meta-data. In addition, the institute provides a large number of software tools for image and shape analysis, mathematical modeling, genomic sequence processing, and scientific visualization. A unique feature of this architecture is the Pipeline environment, which integrates the data management, processing, transfer, and visualization. Through its client-server architecture, the Pipeline environment provides a graphical user interface for designing, executing, monitoring validating, and disseminating of complex protocols that utilize diverse suites of software tools and web-services. These pipeline workflows are represented as portable XML objects which transfer the execution instructions and user specifications from the client user machine to remote pipeline servers for distributed computing. Using Alzheimer's and Parkinson's data, we provide several examples of translational applications using this infrastructure1. PMID:24795619
Networking for large-scale science: infrastructure, provisioning, transport and application mapping

NASA Astrophysics Data System (ADS)

Rao, Nageswara S.; Carter, Steven M.; Wu, Qishi; Wing, William R.; Zhu, Mengxia; Mezzacappa, Anthony; Veeraraghavan, Malathi; Blondin, John M.

2005-01-01

Large-scale science computations and experiments require unprecedented network capabilities in the form of large bandwidth and dynamically stable connections to support data transfers, interactive visualizations, and monitoring and steering operations. A number of component technologies dealing with the infrastructure, provisioning, transport and application mappings must be developed and/or optimized to achieve these capabilities. We present a brief account of the following technologies that contribute toward achieving these network capabilities: (a) DOE UltraScienceNet and NSF CHEETAH network testbeds that provide on-demand and scheduled dedicated network connections; (b) experimental results on transport protocols that achieve close to 100% utilization on dedicated 1Gbps wide-area channels; (c) a scheme for optimally mapping a visualization pipeline onto a network to minimize the end-to-end delays; and (d) interconnect configuration and protocols that provides multiple Gbps flows from Cray X1 to external hosts.
XRootD popularity on hadoop clusters

NASA Astrophysics Data System (ADS)

Meoni, Marco; Boccali, Tommaso; Magini, Nicolò; Menichetti, Luca; Giordano, Domenico; CMS Collaboration

2017-10-01

Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.
Introduction of Cooperative Vehicle-to-Infrastructure Systems to Improve Speed Harmonization

DOT National Transportation Integrated Search

2016-03-01

This project executed a preliminary experiment of vehicle-to-infrastructure (V2I)-based speed harmonization in which speed guidance was communicated directly to vehicles. This experiment involved a set of micro-simulation experiments and a limited nu...
32 CFR 240.4 - Policy.

Code of Federal Regulations, 2012 CFR

2012-07-01

... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...
32 CFR 240.4 - Policy.

Code of Federal Regulations, 2013 CFR

2013-07-01

... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...
32 CFR 240.4 - Policy.

Code of Federal Regulations, 2014 CFR

2014-07-01

... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...

Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions.

PubMed

Dinov, Ivo D; Siegrist, Kyle; Pearl, Dennis K; Kalinin, Alexandr; Christou, Nicolas

2016-06-01

Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome , which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols.
Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions

PubMed Central

Dinov, Ivo D.; Siegrist, Kyle; Pearl, Dennis K.; Kalinin, Alexandr; Christou, Nicolas

2015-01-01

Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome, which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols. PMID:27158191
The Liquid Argon Software Toolkit (LArSoft): Goals, Status and Plan

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pordes, Rush; Snider, Erica

LArSoft is a toolkit that provides a software infrastructure and algorithms for the simulation, reconstruction and analysis of events in Liquid Argon Time Projection Chambers (LArTPCs). It is used by the ArgoNeuT, LArIAT, MicroBooNE, DUNE (including 35ton prototype and ProtoDUNE) and SBND experiments. The LArSoft collaboration provides an environment for the development, use, and sharing of code across experiments. The ultimate goal is to develop fully automatic processes for reconstruction and analysis of LArTPC events. The toolkit is based on the art framework and has a well-defined architecture to interface to other packages, including to GEANT4 and GENIE simulation softwaremore » and the Pandora software development kit for pattern recognition. It is designed to facilitate and support the evolution of algorithms including their transition to new computing platforms. The development of the toolkit is driven by the scientific stakeholders involved. The core infrastructure includes standard definitions of types and constants, means to input experiment geometries as well as meta and event- data in several formats, and relevant general utilities. Examples of algorithms experiments have contributed to date are: photon-propagation; particle identification; hit finding, track finding and fitting; electromagnetic shower identification and reconstruction. We report on the status of the toolkit and plans for future work.« less
INFN-Pisa scientific computation environment (GRID, HPC and Interactive Analysis)

NASA Astrophysics Data System (ADS)

Arezzini, S.; Carboni, A.; Caruso, G.; Ciampa, A.; Coscetti, S.; Mazzoni, E.; Piras, S.

2014-06-01

The INFN-Pisa Tier2 infrastructure is described, optimized not only for GRID CPU and Storage access, but also for a more interactive use of the resources in order to provide good solutions for the final data analysis step. The Data Center, equipped with about 6700 production cores, permits the use of modern analysis techniques realized via advanced statistical tools (like RooFit and RooStat) implemented in multicore systems. In particular a POSIX file storage access integrated with standard SRM access is provided. Therefore the unified storage infrastructure is described, based on GPFS and Xrootd, used both for SRM data repository and interactive POSIX access. Such a common infrastructure allows a transparent access to the Tier2 data to the users for their interactive analysis. The organization of a specialized many cores CPU facility devoted to interactive analysis is also described along with the login mechanism integrated with the INFN-AAI (National INFN Infrastructure) to extend the site access and use to a geographical distributed community. Such infrastructure is used also for a national computing facility in use to the INFN theoretical community, it enables a synergic use of computing and storage resources. Our Center initially developed for the HEP community is now growing and includes also HPC resources fully integrated. In recent years has been installed and managed a cluster facility (1000 cores, parallel use via InfiniBand connection) and we are now updating this facility that will provide resources for all the intermediate level HPC computing needs of the INFN theoretical national community.
Advanced Artificial Science. The development of an artificial science and engineering research infrastructure to facilitate innovative computational modeling, analysis, and application to interdisciplinary areas of scientific investigation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saffer, Shelley

2014-12-01

This is a final report of the DOE award DE-SC0001132, Advanced Artificial Science. The development of an artificial science and engineering research infrastructure to facilitate innovative computational modeling, analysis, and application to interdisciplinary areas of scientific investigation. This document describes the achievements of the goals, and resulting research made possible by this award.
Ubiquitous Green Computing Techniques for High Demand Applications in Smart Environments

PubMed Central

Zapater, Marina; Sanchez, Cesar; Ayala, Jose L.; Moya, Jose M.; Risco-Martín, José L.

2012-01-01

Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time. PMID:23112621
Ubiquitous green computing techniques for high demand applications in Smart environments.

PubMed

Zapater, Marina; Sanchez, Cesar; Ayala, Jose L; Moya, Jose M; Risco-Martín, José L

2012-01-01

Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.
NCI's High Performance Computing (HPC) and High Performance Data (HPD) Computing Platform for Environmental and Earth System Data Science

NASA Astrophysics Data System (ADS)

Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley

2015-04-01

The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially increasing data volumes at NCI. Traditional HPC and data environments are still made available in a way that flexibly provides the tools, services and supporting software systems on these new petascale infrastructures. But to enable the research to take place at this scale, the data, metadata and software now need to evolve together - creating a new integrated high performance infrastructure. The new infrastructure at NCI currently supports a catalogue of integrated, reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. One of the challenges for NCI has been to support existing techniques and methods, while carefully preparing the underlying infrastructure for the transition needed for the next class of Data-intensive Science. In doing so, a flexible range of techniques and software can be made available for application across the corpus of data collections available, and to provide a new infrastructure for future interdisciplinary research.
NAS infrastructure management system build 1.5 computer-human interface

DOT National Transportation Integrated Search

2001-01-01

Human factors engineers from the National Airspace System (NAS) Human Factors Branch (ACT-530) of the Federal Aviation Administration William J. Hughes Technical Center conducted an evaluation of the NAS Infrastructure Management System (NIMS) Build ...
Infrastructure sensing.

PubMed

Soga, Kenichi; Schooling, Jennifer

2016-08-06

Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors.
Infrastructure sensing

PubMed Central

Soga, Kenichi; Schooling, Jennifer

2016-01-01

Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors. PMID:27499845
Tempest: GPU-CPU computing for high-throughput database spectral matching.

PubMed

Milloy, Jeffrey A; Faherty, Brendan K; Gerber, Scott A

2012-07-06

Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.
Grid computing technology for hydrological applications

NASA Astrophysics Data System (ADS)

Lecca, G.; Petitdidier, M.; Hluchy, L.; Ivanovic, M.; Kussul, N.; Ray, N.; Thieron, V.

2011-06-01

SummaryAdvances in e-Infrastructure promise to revolutionize sensing systems and the way in which data are collected and assimilated, and complex water systems are simulated and visualized. According to the EU Infrastructure 2010 work-programme, data and compute infrastructures and their underlying technologies, either oriented to tackle scientific challenges or complex problem solving in engineering, are expected to converge together into the so-called knowledge infrastructures, leading to a more effective research, education and innovation in the next decade and beyond. Grid technology is recognized as a fundamental component of e-Infrastructures. Nevertheless, this emerging paradigm highlights several topics, including data management, algorithm optimization, security, performance (speed, throughput, bandwidth, etc.), and scientific cooperation and collaboration issues that require further examination to fully exploit it and to better inform future research policies. The paper illustrates the results of six different surface and subsurface hydrology applications that have been deployed on the Grid. All the applications aim to answer to strong requirements from the Civil Society at large, relatively to natural and anthropogenic risks. Grid technology has been successfully tested to improve flood prediction, groundwater resources management and Black Sea hydrological survey, by providing large computing resources. It is also shown that Grid technology facilitates e-cooperation among partners by means of services for authentication and authorization, seamless access to distributed data sources, data protection and access right, and standardization.
Transportation Infrastructure Design and Construction \\0x16 Virtual Training Tools

DOT National Transportation Integrated Search

2003-09-01

This project will develop 3D interactive computer-training environments for a major element of transportation infrastructure : hot mix asphalt paving. These tools will include elements of hot mix design (including laboratory equipment) and constructi...
Putting the Information Infrastructure to Work. Report of the Information Infrastructure Task Force Committee on Applications and Technology. NIST Special Publication 857.

ERIC Educational Resources Information Center

National Inst. of Standards and Technology, Gaithersburg, MD.

An interconnection of computer networks, telecommunications services, and applications, the National Information Infrastructure (NII) can open up new vistas and profoundly change much of American life. This report explores some of the opportunities and obstacles to the use of the NII by people and organizations. The goal is to express how…
Integrating Network Management for Cloud Computing Services

DTIC Science & Technology

2015-06-01

abstraction and system design. In this dissertation, we make three major contributions. We rst propose to consolidate the tra c and infrastructure management...abstraction and system design. In this dissertation, we make three major contributions. We first propose to consolidate the traffic and infrastructure ...1.3.1 Safe Datacenter Traffic/ Infrastructure Management . . . . . . 9 1.3.2 End-host/Network Cooperative Traffic Management . . . . . . 10 1.3.3 Direct
Automating ATLAS Computing Operations using the Site Status Board

NASA Astrophysics Data System (ADS)

J, Andreeva; Iglesias C, Borrego; S, Campana; Girolamo A, Di; I, Dzhunov; Curull X, Espinal; S, Gayazov; E, Magradze; M, Nowotka M.; L, Rinaldi; P, Saiz; J, Schovancova; A, Stewart G.; M, Wright

2012-12-01

The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in the SSB. It will demonstrate the positive impact of the use of the SSB on the overall performance of ATLAS computing activities and will overview future plans.
A service brokering and recommendation mechanism for better selecting cloud services.

PubMed

Gui, Zhipeng; Yang, Chaowei; Xia, Jizhe; Huang, Qunying; Liu, Kai; Li, Zhenlong; Yu, Manzhu; Sun, Min; Zhou, Nanyin; Jin, Baoxuan

2014-01-01

Cloud computing is becoming the new generation computing infrastructure, and many cloud vendors provide different types of cloud services. How to choose the best cloud services for specific applications is very challenging. Addressing this challenge requires balancing multiple factors, such as business demands, technologies, policies and preferences in addition to the computing requirements. This paper recommends a mechanism for selecting the best public cloud service at the levels of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). A systematic framework and associated workflow include cloud service filtration, solution generation, evaluation, and selection of public cloud services. Specifically, we propose the following: a hierarchical information model for integrating heterogeneous cloud information from different providers and a corresponding cloud information collecting mechanism; a cloud service classification model for categorizing and filtering cloud services and an application requirement schema for providing rules for creating application-specific configuration solutions; and a preference-aware solution evaluation mode for evaluating and recommending solutions according to the preferences of application providers. To test the proposed framework and methodologies, a cloud service advisory tool prototype was developed after which relevant experiments were conducted. The results show that the proposed system collects/updates/records the cloud information from multiple mainstream public cloud services in real-time, generates feasible cloud configuration solutions according to user specifications and acceptable cost predication, assesses solutions from multiple aspects (e.g., computing capability, potential cost and Service Level Agreement, SLA) and offers rational recommendations based on user preferences and practical cloud provisioning; and visually presents and compares solutions through an interactive web Graphical User Interface (GUI).
Rapid prototyping of soil moisture estimates using the NASA Land Information System

NASA Astrophysics Data System (ADS)

Anantharaj, V.; Mostovoy, G.; Li, B.; Peters-Lidard, C.; Houser, P.; Moorhead, R.; Kumar, S.

2007-12-01

The Land Information System (LIS), developed at the NASA Goddard Space Flight Center, is a functional Land Data Assimilation System (LDAS) that incorporates a suite of land models in an interoperable computational framework. LIS has been integrated into a computational Rapid Prototyping Capabilities (RPC) infrastructure. LIS consists of a core, a number of community land models, data servers, and visualization systems - integrated in a high-performance computing environment. The land surface models (LSM) in LIS incorporate surface and atmospheric parameters of temperature, snow/water, vegetation, albedo, soil conditions, topography, and radiation. Many of these parameters are available from in-situ observations, numerical model analysis, and from NASA, NOAA, and other remote sensing satellite platforms at various spatial and temporal resolutions. The computational resources, available to LIS via the RPC infrastructure, support e- Science experiments involving the global modeling of land-atmosphere studies at 1km spatial resolutions as well as regional studies at finer resolutions. The Noah Land Surface Model, available with-in the LIS is being used to rapidly prototype soil moisture estimates in order to evaluate the viability of other science applications for decision making purposes. For example, LIS has been used to further extend the utility of the USDA Soil Climate Analysis Network of in-situ soil moisture observations. In addition, LIS also supports data assimilation capabilities that are used to assimilate remotely sensed soil moisture retrievals from the AMSR-E instrument onboard the Aqua satellite. The rapid prototyping of soil moisture estimates using LIS and their applications will be illustrated during the presentation.
Progress In Developing An In-Pile Acoustically Telemetered Sensor Infrastructure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, James A.; Garrett, Steven L.; Heibel, Michael D.

2016-09-01

A salient grand challenge for a number of Department of Energy programs such as Fuels Cycle Research and Development ( includes Accident Tolerant Fuel research and the Transient Reactor Test Facility Restart experiments), Light Water Sustainability, and Advanced Reactor Technologies is to enhance our fundamental understanding of fuel and materials behavior under irradiation. Robust and accurate in-pile measurements will be instrumental to develop and validate a computationally predictive multi-scale understanding of nuclear fuel and materials. This sensing technology will enable the linking of fundamental micro-structural evolution mechanisms to the macroscopic degradation of fuels and materials. The in situ sensors andmore » measurement systems will monitor local environmental parameters as well as characterize microstructure evolution during irradiation. One of the major road blocks in developing practical robust, and cost effective in-pile sensor systems, are instrument leads. If a wireless telemetry infrastructure can be developed for in-pile use, in-core measurements would become more attractive and effective. Thus to be successful in accomplishing effective in-pile sensing and microstructure characterization an interdisciplinary measurement infrastructure needs to be developed in parallel with key sensing technology. For the discussion in this research, infrastructure is defined as systems, technology, techniques, and algorithms that may be necessary in the delivery of beneficial and robust data from in-pile devices. The architecture of a system’s infrastructure determines how well it operates and how flexible it is to meet future requirements. The limiting path for the effective deployment of the salient sensing technology will not be the sensors themselves but the infrastructure that is necessary to communicate data from in-pile to the outside world in a non-intrusive and reliable manner. This article gives a high level overview of a promising telemetry infrastructure based on acoustic wireless transmission of data that is being developed and tested by the INL, Penn State and Westinghouse.« less

ENES the European Network for Earth System modelling and its infrastructure projects IS-ENES

NASA Astrophysics Data System (ADS)

Guglielmo, Francesca; Joussaume, Sylvie; Parinet, Marie

2016-04-01

The scientific community working on climate modelling is organized within the European Network for Earth System modelling (ENES). In the past decade, several European university departments, research centres, meteorological services, computer centres, and industrial partners engaged in the creation of ENES with the purpose of working together and cooperating towards the further development of the network, by signing a Memorandum of Understanding. As of 2015, the consortium counts 47 partners. The climate modelling community, and thus ENES, faces challenges which are both science-driven, i.e. analysing of the full complexity of the Earth System to improve our understanding and prediction of climate changes, and have multi-faceted societal implications, as a better representation of climate change on regional scales leads to improved understanding and prediction of impacts and to the development and provision of climate services. ENES, promoting and endorsing projects and initiatives, helps in developing and evaluating of state-of-the-art climate and Earth system models, facilitates model inter-comparison studies, encourages exchanges of software and model results, and fosters the use of high performance computing facilities dedicated to high-resolution multi-model experiments. ENES brings together public and private partners, integrates countries underrepresented in climate modelling studies, and reaches out to different user communities, thus enhancing European expertise and competitiveness. In this need of sophisticated models, world-class, high-performance computers, and state-of-the-art software solutions to make efficient use of models, data and hardware, a key role is played by the constitution and maintenance of a solid infrastructure, developing and providing services to the different user communities. ENES has investigated the infrastructural needs and has received funding from the EU FP7 program for the IS-ENES (InfraStructure for ENES) phase I and II projects. We present here the case study of an existing network of institutions brought together toward common goals by a non-binding agreement, ENES, and of its two IS-ENES projects. These latter will be discussed in their double role as a means to provide and/or maintain the actual infrastructure (hardware, software, skilled human resources, services) to achieve ENES scientific goals -fulfilling the aims set in a strategy document-, but also to inform and provide to the network a structured way of working and of interacting with the extended community. The genesis and evolution of the network and the interaction network/projects will also be analysed in terms of long-term sustainability.
Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis

PubMed Central

Duarte, Afonso M. S.; Psomopoulos, Fotis E.; Blanchet, Christophe; Bonvin, Alexandre M. J. J.; Corpas, Manuel; Franc, Alain; Jimenez, Rafael C.; de Lucas, Jesus M.; Nyrönen, Tommi; Sipos, Gergely; Suhr, Stephanie B.

2015-01-01

With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community. PMID:26157454
Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis.

PubMed

Duarte, Afonso M S; Psomopoulos, Fotis E; Blanchet, Christophe; Bonvin, Alexandre M J J; Corpas, Manuel; Franc, Alain; Jimenez, Rafael C; de Lucas, Jesus M; Nyrönen, Tommi; Sipos, Gergely; Suhr, Stephanie B

2015-01-01

With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community.
GLIDE: a grid-based light-weight infrastructure for data-intensive environments

NASA Technical Reports Server (NTRS)

Mattmann, Chris A.; Malek, Sam; Beckman, Nels; Mikic-Rakic, Marija; Medvidovic, Nenad; Chrichton, Daniel J.

2005-01-01

The promise of the grid is that it will enable public access and sharing of immense amounts of computational and data resources among dynamic coalitions of individuals and institutions. However, the current grid solutions make several limiting assumptions that curtail their widespread adoption. To address these limitations, we present GLIDE, a prototype light-weight, data-intensive middleware infrastructure that enables access to the robust data and computational power of the grid on DREAM platforms.
Spin-Off Successes of SETI Research at Berkeley

NASA Astrophysics Data System (ADS)

Douglas, K. A.; Anderson, D. P.; Bankay, R.; Chen, H.; Cobb, J.; Korpela, E. J.; Lebofsky, M.; Parsons, A.; von Korff, J.; Werthimer, D.

2009-12-01

Our group contributes to the Search for Extra-Terrestrial Intelligence (SETI) by developing and using world-class signal processing computers to analyze data collected on the Arecibo telescope. Although no patterned signal of extra-terrestrial origin has yet been detected, and the immediate prospects for making such a detection are highly uncertain, the SETI@home project has nonetheless proven the value of pursuing such research through its impact on the fields of distributed computing, real-time signal processing, and radio astronomy. The SETI@home project has spun off the Center for Astronomy Signal Processing and Electronics Research (CASPER) and the Berkeley Open Infrastructure for Networked Computing (BOINC), both of which are responsible for catalyzing a smorgasbord of new research in scientific disciplines in countries around the world. Futhermore, the data collected and archived for the SETI@home project is proving valuable in data-mining experiments for mapping neutral galatic hydrogen and for detecting black-hole evaporation.
Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles.

PubMed

Lampa, Samuel; Alvarsson, Jonathan; Spjuth, Ola

2016-01-01

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.Graphical abstract.
PRACE - The European HPC Infrastructure

NASA Astrophysics Data System (ADS)

Stadelmeyer, Peter

2014-05-01

The mission of PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to realize this mission by offering world class computing and data management resources and services through a peer review process. This talk gives a general overview about PRACE and the PRACE research infrastructure (RI). PRACE is established as an international not-for-profit association and the PRACE RI is a pan-European supercomputing infrastructure which offers access to computing and data management resources at partner sites distributed throughout Europe. Besides a short summary about the organization, history, and activities of PRACE, it is explained how scientists and researchers from academia and industry from around the world can access PRACE systems and which education and training activities are offered by PRACE. The overview also contains a selection of PRACE contributions to societal challenges and ongoing activities. Examples of the latter are beside others petascaling, application benchmark suite, best practice guides for efficient use of key architectures, application enabling / scaling, new programming models, and industrial applications. The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels. The PRACE Research Infrastructure provides a persistent world-class high performance computing service for scientists and researchers from academia and industry in Europe. The computer systems and their operations accessible through PRACE are provided by 4 PRACE members (BSC representing Spain, CINECA representing Italy, GCS representing Germany and GENCI representing France). The Implementation Phase of PRACE receives funding from the EU's Seventh Framework Programme (FP7/2007-2013) under grant agreements RI-261557, RI-283493 and RI-312763. For more information, see www.prace-ri.eu
High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis

PubMed Central

Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

2016-01-01

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153
High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

PubMed

Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

2016-01-01

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.
Cooperative high-performance storage in the accelerated strategic computing initiative

NASA Technical Reports Server (NTRS)

Gary, Mark; Howard, Barry; Louis, Steve; Minuzzo, Kim; Seager, Mark

1996-01-01

The use and acceptance of new high-performance, parallel computing platforms will be impeded by the absence of an infrastructure capable of supporting orders-of-magnitude improvement in hierarchical storage and high-speed I/O (Input/Output). The distribution of these high-performance platforms and supporting infrastructures across a wide-area network further compounds this problem. We describe an architectural design and phased implementation plan for a distributed, Cooperative Storage Environment (CSE) to achieve the necessary performance, user transparency, site autonomy, communication, and security features needed to support the Accelerated Strategic Computing Initiative (ASCI). ASCI is a Department of Energy (DOE) program attempting to apply terascale platforms and Problem-Solving Environments (PSEs) toward real-world computational modeling and simulation problems. The ASCI mission must be carried out through a unified, multilaboratory effort, and will require highly secure, efficient access to vast amounts of data. The CSE provides a logically simple, geographically distributed, storage infrastructure of semi-autonomous cooperating sites to meet the strategic ASCI PSE goal of highperformance data storage and access at the user desktop.
Privacy and the National Information Infrastructure.

ERIC Educational Resources Information Center

Rotenberg, Marc

1994-01-01

Explains the work of Computer Professionals for Social Responsibility regarding privacy issues in the use of electronic networks; recommends principles that should be adopted for a National Information Infrastructure privacy code; discusses the need for public education; and suggests pertinent legislative proposals. (LRW)
Effecting IT infrastructure culture change: management by processes and metrics

NASA Technical Reports Server (NTRS)

Miller, R. L.

2001-01-01

This talk describes the processes and metrics used by Jet Propulsion Laboratory to bring about the required IT infrastructure culture change to update and certify, as Y2K compliant, thousands of computers and millions of lines of code.
IEEE TRANSACTIONS ON CYBERNETICS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Craig R. RIeger; David H. Scheidt; William D. Smart

2014-11-01

MODERN societies depend on complex and critical infrastructures for energy, transportation, sustenance, medical care, emergency response, communications security. As computers, automation, and information technology (IT) have advanced, these technologies have been exploited to enhance the efficiency of operating the processes that make up these infrastructures
A modular (almost) automatic set-up for elastic multi-tenants cloud (micro)infrastructures

NASA Astrophysics Data System (ADS)

Amoroso, A.; Astorino, F.; Bagnasco, S.; Balashov, N. A.; Bianchi, F.; Destefanis, M.; Lusso, S.; Maggiora, M.; Pellegrino, J.; Yan, L.; Yan, T.; Zhang, X.; Zhao, X.

2017-10-01

An auto-installing tool on an usb drive can allow for a quick and easy automatic deployment of OpenNebula-based cloud infrastructures remotely managed by a central VMDIRAC instance. A single team, in the main site of an HEP Collaboration or elsewhere, can manage and run a relatively large network of federated (micro-)cloud infrastructures, making an highly dynamic and elastic use of computing resources. Exploiting such an approach can lead to modular systems of cloud-bursting infrastructures addressing complex real-life scenarios.
The OSG open facility: A sharing ecosystem

DOE PAGES

Jayatilaka, B.; Levshina, T.; Rynge, M.; ...

2015-12-23

The Open Science Grid (OSG) ties together individual experiments’ computing power, connecting their resources to create a large, robust computing grid, this computing infrastructure started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero. In the years since, the OSG has broadened its focus to also address the needs of other US researchers and increased delivery of Distributed High Through-put Computing (DHTC) to users from a wide variety of disciplines via the OSG Open Facility. Presently, the Open Facility delivers about 100 million computing wall hours per year to researchers whomore » are not already associated with the owners of the computing sites, this is primarily accomplished by harvesting and organizing the temporarily unused capacity (i.e. opportunistic cycles) from the sites in the OSG. Using these methods, OSG resource providers and scientists share computing hours with researchers in many other fields to enable their science, striving to make sure that these computing power used with maximal efficiency. Furthermore, we believe that expanded access to DHTC is an essential tool for scientific innovation and work continues in expanding this service.« less
Streaming support for data intensive cloud-based sequence analysis.

PubMed

Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

2013-01-01

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
In-Use and Emerging Disruptive Technology Trends

DTIC Science & Technology

2015-03-31

blog/establishing-zero-trust- infrastructure / (accessed No- vember 7, 2014) Mobile Thin Client End Points In the early days of computing, the...companies are using their network infrastructure to break into the mobile broadband market. For example, Ca- blevision recently began providing a Wi-Fi...smartphones and mobile devic- es will be used within the Pentagon. A building-wide cellular infrastructure is not the an- swer to retrieving and sending
Services and the National Information Infrastructure. Report of the Information Infrastructure Task Force Committee on Applications and Technology, Technology Policy Working Group. Draft for Public Comment.

ERIC Educational Resources Information Center

Office of Science and Technology Policy, Washington, DC.

In this report, the National Information Infrastructure (NII) services issue is addressed, and activities to advance the development of NII services are recommended. The NII is envisioned to grow into a seamless web of communications networks, computers, databases, and consumer electronics that will put vast amounts of information at users'…
Dynamic interaction between vehicles and infrastructure experiment (DIVINE) : policy implications

DOT National Transportation Integrated Search

1998-01-01

The Dynamic Interaction between Vehicles and Infrastructure Experiment (DIVINE) Project provided scientific evidence of the dynamic effects of heavy vehicles and their suspension systems on pavements and bridges. These conclusions are detailed in the...
Dynamic interaction between vehicles and infrastructure experiment (DIVINE) : technical report

DOT National Transportation Integrated Search

1998-10-27

The Dynamic Interaction between Vehicles and Infrastructure Experiment (DIVINE) Project provides scientific evidence of the dynamic effects of heavy vehicles and their suspension systems on pavements and bridges in support of transport policy decisio...

Shrink-film microfluidic education modules: Complete devices within minutes.

PubMed

Nguyen, Diep; McLane, Jolie; Lew, Valerie; Pegan, Jonathan; Khine, Michelle

2011-06-01

As advances in microfluidics continue to make contributions to diagnostics and life sciences, broader awareness of this expanding field becomes necessary. By leveraging low-cost microfabrication techniques that require no capital equipment or infrastructure, simple, accessible, and effective educational modules can be made available for a broad range of educational needs from middle school demonstrations to college laboratory classes. These modules demonstrate key microfluidic concepts such as diffusion and separation as well as "laboratory on-chip" applications including chemical reactions and biological assays. These modules are intended to provide an interdisciplinary hands-on experience, including chip design, fabrication of functional devices, and experiments at the microscale. Consequently, students will be able to conceptualize physics at small scales, gain experience in computer-aided design and microfabrication, and perform experiments-all in the context of addressing real-world challenges by making their own lab-on-chip devices.
Testing of SWMM Model’s LID Modules

EPA Science Inventory

EPA’s Storm Water Management Model (SWMM) is a computational code heavily relied upon by industry for the simulation of wastewater and stormwater infrastructure performance to . design and build multi-billion-dollar, multi-decade infrastructure upgrades. Since the 1970’s, EPA a...
Extensible Infrastructure for Browsing and Searching Abstracted Spacecraft Data

NASA Technical Reports Server (NTRS)

Wallick, Michael N.; Crockett, Thomas M.; Joswig, Joseph C.; Torres, Recaredo J.; Norris, Jeffrey S.; Fox, Jason M.; Powell, Mark W.; Mittman, David S.; Abramyan, Lucy; Shams, Khawaja S.;

2009-01-01

A computer program has been developed to provide a common interface for all space mission data, and allows different types of data to be displayed in the same context. This software provides an infrastructure for representing any type of mission data.

The Path to Convergence: Design, Coordination and Social Issues in the Implementation of a Middleware Data Broker.

NASA Astrophysics Data System (ADS)

Slota, S.; Khalsa, S. J. S.

2015-12-01

Infrastructures are the result of systems, networks, and inter-networks that accrete, overlay and segment one another over time. As a result, working infrastructures represent a broad heterogeneity of elements - data types, computational resources, material substrates (computing hardware, physical infrastructure, labs, physical information resources, etc.) as well as organizational and social functions which result in divergent outputs and goals. Cyber infrastructure's engineering often defaults to a separation of the social from the technical that results in the engineering succeeding in limited ways, or the exposure of unanticipated points of failure within the system. Studying the development of middleware intended to mediate interactions among systems within an earth systems science infrastructure exposes organizational, technical and standards-focused negotiations endemic to a fundamental trait of infrastructure: its characteristic invisibility in use. Intended to perform a core function within the EarthCube cyberinfrastructure, the development, governance and maintenance of an automated brokering system is a microcosm of large-scale infrastructural efforts. Points of potential system failure, regardless of the extent to which they are more social or more technical in nature, can be considered in terms of the reverse salient: a point of social and material configuration that momentarily lags behind the progress of an emerging or maturing infrastructure. The implementation of the BCube data broker has exposed reverse salients in regards to the overall EarthCube infrastructure (and the role of middleware brokering) in the form of organizational factors such as infrastructural alignment, maintenance and resilience; differing and incompatible practices of data discovery and evaluation among users and stakeholders; and a preponderance of local variations in the implementation of standards and authentication in data access. These issues are characterized by their role in increasing tension or friction among components that are on the path to convergence and may help to predict otherwise-occluded endogenous points of failure or non-adoption in the infrastructure.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy

PubMed Central

Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-01-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined. PMID:19564686
Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

PubMed

Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-07-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.
Large scale and cloud-based multi-model analytics experiments on climate change data in the Earth System Grid Federation

NASA Astrophysics Data System (ADS)

Fiore, Sandro; Płóciennik, Marcin; Doutriaux, Charles; Blanquer, Ignacio; Barbera, Roberto; Donvito, Giacinto; Williams, Dean N.; Anantharaj, Valentine; Salomoni, Davide D.; Aloisio, Giovanni

2017-04-01

In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated, such as the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). A case study on climate models intercomparison data analysis addressing several classes of multi-model experiments is being implemented in the context of the EU H2020 INDIGO-DataCloud project. Such experiments require the availability of large amount of data (multi-terabyte order) related to the output of several climate models simulations as well as the exploitation of scientific data management tools for large-scale data analytics. More specifically, the talk discusses in detail a use case on precipitation trend analysis in terms of requirements, architectural design solution, and infrastructural implementation. The experiment has been tested and validated on CMIP5 datasets, in the context of a large scale distributed testbed across EU and US involving three ESGF sites (LLNL, ORNL, and CMCC) and one central orchestrator site (PSNC). The general "environment" of the case study relates to: (i) multi-model data analysis inter-comparison challenges; (ii) addressed on CMIP5 data; and (iii) which are made available through the IS-ENES/ESGF infrastructure. The added value of the solution proposed in the INDIGO-DataCloud project are summarized in the following: (i) it implements a different paradigm (from client- to server-side); (ii) it intrinsically reduces data movement; (iii) it makes lightweight the end-user setup; (iv) it fosters re-usability (of data, final/intermediate products, workflows, sessions, etc.) since everything is managed on the server-side; (v) it complements, extends and interoperates with the ESGF stack; (vi) it provides a "tool" for scientists to run multi-model experiments, and finally; and (vii) it can drastically reduce the time-to-solution for these experiments from weeks to hours. At the time the contribution is being written, the proposed testbed represents the first concrete implementation of a distributed multi-model experiment in the ESGF/CMIP context joining server-side and parallel processing, end-to-end workflow management and cloud computing. As opposed to the current scenario based on search & discovery, data download, and client-based data analysis, the INDIGO-DataCloud architectural solution described in this contribution addresses the scientific computing & analytics requirements by providing a paradigm shift based on server-side and high performance big data frameworks jointly with two-level workflow management systems realized at the PaaS level via a cloud infrastructure.
A Computing Environment to Support Repeatable Scientific Big Data Experimentation of World-Wide Scientific Literature

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schlicher, Bob G; Kulesz, James J; Abercrombie, Robert K

A principal tenant of the scientific method is that experiments must be repeatable and relies on ceteris paribus (i.e., all other things being equal). As a scientific community, involved in data sciences, we must investigate ways to establish an environment where experiments can be repeated. We can no longer allude to where the data comes from, we must add rigor to the data collection and management process from which our analysis is conducted. This paper describes a computing environment to support repeatable scientific big data experimentation of world-wide scientific literature, and recommends a system that is housed at the Oakmore » Ridge National Laboratory in order to provide value to investigators from government agencies, academic institutions, and industry entities. The described computing environment also adheres to the recently instituted digital data management plan mandated by multiple US government agencies, which involves all stages of the digital data life cycle including capture, analysis, sharing, and preservation. It particularly focuses on the sharing and preservation of digital research data. The details of this computing environment are explained within the context of cloud services by the three layer classification of Software as a Service , Platform as a Service , and Infrastructure as a Service .« less
Cloud Computing and Its Applications in GIS

NASA Astrophysics Data System (ADS)

Kang, Cao

2011-12-01

Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
Analysis of Pervasive Mobile Ad Hoc Routing Protocols

NASA Astrophysics Data System (ADS)

Qadri, Nadia N.; Liotta, Antonio

Mobile ad hoc networks (MANETs) are a fundamental element of pervasive networks and therefore, of pervasive systems that truly support pervasive computing, where user can communicate anywhere, anytime and on-the-fly. In fact, future advances in pervasive computing rely on advancements in mobile communication, which includes both infrastructure-based wireless networks and non-infrastructure-based MANETs. MANETs introduce a new communication paradigm, which does not require a fixed infrastructure - they rely on wireless terminals for routing and transport services. Due to highly dynamic topology, absence of established infrastructure for centralized administration, bandwidth constrained wireless links, and limited resources in MANETs, it is challenging to design an efficient and reliable routing protocol. This chapter reviews the key studies carried out so far on the performance of mobile ad hoc routing protocols. We discuss performance issues and metrics required for the evaluation of ad hoc routing protocols. This leads to a survey of existing work, which captures the performance of ad hoc routing algorithms and their behaviour from different perspectives and highlights avenues for future research.
Open source system OpenVPN in a function of Virtual Private Network

NASA Astrophysics Data System (ADS)

Skendzic, A.; Kovacic, B.

2017-05-01

Using of Virtual Private Networks (VPN) can establish high security level in network communication. VPN technology enables high security networking using distributed or public network infrastructure. VPN uses different security and managing rules inside networks. It can be set up using different communication channels like Internet or separate ISP communication infrastructure. VPN private network makes security communication channel over public network between two endpoints (computers). OpenVPN is an open source software product under GNU General Public License (GPL) that can be used to establish VPN communication between two computers inside business local network over public communication infrastructure. It uses special security protocols and 256-bit Encryption and it is capable of traversing network address translators (NATs) and firewalls. It allows computers to authenticate each other using a pre-shared secret key, certificates or username and password. This work gives review of VPN technology with a special accent on OpenVPN. This paper will also give comparison and financial benefits of using open source VPN software in business environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hale, M.A.; Craig, J.I.

Integrated Product and Process Development (IPPD) embodies the simultaneous application to both system and quality engineering methods throughout an iterative design process. The use of IPPD results in the time-conscious, cost-saving development of engineering systems. To implement IPPD, a Decision-Based Design perspective is encapsulated in an approach that focuses on the role of the human designer in product development. The approach has two parts and is outlined in this paper. First, an architecture, called DREAMS, is being developed that facilitates design from a decision-based perspective. Second, a supporting computing infrastructure, called IMAGE, is being designed. Agents are used to implementmore » the overall infrastructure on the computer. Successful agent utilization requires that they be made of three components: the resource, the model, and the wrap. Current work is focused on the development of generalized agent schemes and associated demonstration projects. When in place, the technology independent computing infrastructure will aid the designer in systematically generating knowledge used to facilitate decision-making.« less
Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

NASA Astrophysics Data System (ADS)

Li, Rui; Chen, Lei; Li, Wen-Syan

Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.
Assessing the uptake of persistent identifiers by research infrastructure users

PubMed Central

Maull, Keith E.

2017-01-01

Significant progress has been made in the past few years in the development of recommendations, policies, and procedures for creating and promoting citations to data sets, software, and other research infrastructures like computing facilities. Open questions remain, however, about the extent to which referencing practices of authors of scholarly publications are changing in ways desired by these initiatives. This paper uses four focused case studies to evaluate whether research infrastructures are being increasingly identified and referenced in the research literature via persistent citable identifiers. The findings of the case studies show that references to such resources are increasing, but that the patterns of these increases are variable. In addition, the study suggests that citation practices for data sets may change more slowly than citation practices for software and research facilities, due to the inertia of existing practices for referencing the use of data. Similarly, existing practices for acknowledging computing support may slow the adoption of formal citations for computing resources. PMID:28394907
Big-BOE: Fusing Spanish Official Gazette with Big Data Technology.

PubMed

Basanta-Val, Pablo; Sánchez-Fernández, Luis

2018-06-01

The proliferation of new data sources, stemmed from the adoption of open-data schemes, in combination with an increasing computing capacity causes the inception of new type of analytics that process Internet of things with low-cost engines to speed up data processing using parallel computing. In this context, the article presents an initiative, called BIG-Boletín Oficial del Estado (BOE), designed to process the Spanish official government gazette (BOE) with state-of-the-art processing engines, to reduce computation time and to offer additional speed up for big data analysts. The goal of including a big data infrastructure is to be able to process different BOE documents in parallel with specific analytics, to search for several issues in different documents. The application infrastructure processing engine is described from an architectural perspective and from performance, showing evidence on how this type of infrastructure improves the performance of different types of simple analytics as several machines cooperate.
NASA's Participation in the National Computational Grid

NASA Technical Reports Server (NTRS)

Feiereisen, William J.; Zornetzer, Steve F. (Technical Monitor)

1998-01-01

Over the last several years it has become evident that the character of NASA's supercomputing needs has changed. One of the major missions of the agency is to support the design and manufacture of aero- and space-vehicles with technologies that will significantly reduce their cost. It is becoming clear that improvements in the process of aerospace design and manufacturing will require a high performance information infrastructure that allows geographically dispersed teams to draw upon resources that are broader than traditional supercomputing. A computational grid draws together our information resources into one system. We can foresee the time when a Grid will allow engineers and scientists to use the tools of supercomputers, databases and on line experimental devices in a virtual environment to collaborate with distant colleagues. The concept of a computational grid has been spoken of for many years, but several events in recent times are conspiring to allow us to actually build one. In late 1997 the National Science Foundation initiated the Partnerships for Advanced Computational Infrastructure (PACI) which is built around the idea of distributed high performance computing. The Alliance lead, by the National Computational Science Alliance (NCSA), and the National Partnership for Advanced Computational Infrastructure (NPACI), lead by the San Diego Supercomputing Center, have been instrumental in drawing together the "Grid Community" to identify the technology bottlenecks and propose a research agenda to address them. During the same period NASA has begun to reformulate parts of two major high performance computing research programs to concentrate on distributed high performance computing and has banded together with the PACI centers to address the research agenda in common.
Analysis Report for Exascale Storage Requirements for Scientific Data.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruwart, Thomas M.

Over the next 10 years, the Department of Energy will be transitioning from Petascale to Exascale Computing resulting in data storage, networking, and infrastructure requirements to increase by three orders of magnitude. The technologies and best practices used today are the result of a relatively slow evolution of ancestral technologies developed in the 1950s and 1960s. These include magnetic tape, magnetic disk, networking, databases, file systems, and operating systems. These technologies will continue to evolve over the next 10 to 15 years on a reasonably predictable path. Experience with the challenges involved in transitioning these fundamental technologies from Terascale tomore » Petascale computing systems has raised questions about how these will scale another 3 or 4 orders of magnitude to meet the requirements imposed by Exascale computing systems. This report is focused on the most concerning scaling issues with data storage systems as they relate to High Performance Computing- and presents options for a path forward. Given the ability to store exponentially increasing amounts of data, far more advanced concepts and use of metadata will be critical to managing data in Exascale computing systems.« less
Computation of Asteroid Proper Elements on the Grid

NASA Astrophysics Data System (ADS)

Novakovic, B.; Balaz, A.; Knezevic, Z.; Potocnik, M.

2009-12-01

A procedure of gridification of the computation of asteroid proper orbital elements is described. The need to speed up the time consuming computations and make them more efficient is justified by the large increase of observational data expected from the next generation all sky surveys. We give the basic notion of proper elements and of the contemporary theories and methods used to compute them for different populations of objects. Proper elements for nearly 70,000 asteroids are derived since the beginning of use of the Grid infrastructure for the purpose. The average time for the catalogs update is significantly shortened with respect to the time needed with stand-alone workstations. We also present basics of the Grid computing, the concepts of Grid middleware and its Workload management system. The practical steps we undertook to efficiently gridify our application are described in full detail. We present the results of a comprehensive testing of the performance of different Grid sites, and offer some practical conclusions based on the benchmark results and on our experience. Finally, we propose some possibilities for the future work.
The StratusLab cloud distribution: Use-cases and support for scientific applications

NASA Astrophysics Data System (ADS)

Floros, E.

2012-04-01

The StratusLab project is integrating an open cloud software distribution that enables organizations to setup and provide their own private or public IaaS (Infrastructure as a Service) computing clouds. StratusLab distribution capitalizes on popular infrastructure virtualization solutions like KVM, the OpenNebula virtual machine manager, Claudia service manager and SlipStream deployment platform, which are further enhanced and expanded with additional components developed within the project. The StratusLab distribution covers the core aspects of a cloud IaaS architecture, namely Computing (life-cycle management of virtual machines), Storage, Appliance management and Networking. The resulting software stack provides a packaged turn-key solution for deploying cloud computing services. The cloud computing infrastructures deployed using StratusLab can support a wide range of scientific and business use cases. Grid computing has been the primary use case pursued by the project and for this reason the initial priority has been the support for the deployment and operation of fully virtualized production-level grid sites; a goal that has already been achieved by operating such a site as part of EGI's (European Grid Initiative) pan-european grid infrastructure. In this area the project is currently working to provide non-trivial capabilities like elastic and autonomic management of grid site resources. Although grid computing has been the motivating paradigm, StratusLab's cloud distribution can support a wider range of use cases. Towards this direction, we have developed and currently provide support for setting up general purpose computing solutions like Hadoop, MPI and Torque clusters. For what concerns scientific applications the project is collaborating closely with the Bioinformatics community in order to prepare VM appliances and deploy optimized services for bioinformatics applications. In a similar manner additional scientific disciplines like Earth Science can take advantage of StratusLab cloud solutions. Interested users are welcomed to join StratusLab's user community by getting access to the reference cloud services deployed by the project and offered to the public.
Infrastructure for Training and Partnershipes: California Water and Coastal Ocean Resources

NASA Technical Reports Server (NTRS)

Siegel, David A.; Dozier, Jeffrey; Gautier, Catherine; Davis, Frank; Dickey, Tommy; Dunne, Thomas; Frew, James; Keller, Arturo; MacIntyre, Sally; Melack, John

2000-01-01

The purpose of this project was to advance the existing ICESS/Bren School computing infrastructure to allow scientists, students, and research trainees the opportunity to interact with environmental data and simulations in near-real time. Improvements made with the funding from this project have helped to strengthen the research efforts within both units, fostered graduate research training, and helped fortify partnerships with government and industry. With this funding, we were able to expand our computational environment in which computer resources, software, and data sets are shared by ICESS/Bren School faculty researchers in all areas of Earth system science. All of the graduate and undergraduate students associated with the Donald Bren School of Environmental Science and Management and the Institute for Computational Earth System Science have benefited from the infrastructure upgrades accomplished by this project. Additionally, the upgrades fostered a significant number of research projects (attached is a list of the projects that benefited from the upgrades). As originally proposed, funding for this project provided the following infrastructure upgrades: 1) a modem file management system capable of interoperating UNIX and NT file systems that can scale to 6.7 TB, 2) a Qualstar 40-slot tape library with two AIT tape drives and Legato Networker backup/archive software, 3) previously unavailable import/export capability for data sets on Zip, Jaz, DAT, 8mm, CD, and DLT media in addition to a 622Mb/s Internet 2 connection, 4) network switches capable of 100 Mbps to 128 desktop workstations, 5) Portable Batch System (PBS) computational task scheduler, and vi) two Compaq/Digital Alpha XP1000 compute servers each with 1.5 GB of RAM along with an SGI Origin 2000 (purchased partially using funds from this project along with funding from various other sources) to be used for very large computations, as required for simulation of mesoscale meteorology or climate.

Physicists Get INSPIREd: INSPIRE Project and Grid Applications

NASA Astrophysics Data System (ADS)

Klem, Jukka; Iwaszkiewicz, Jan

2011-12-01

INSPIRE is the new high-energy physics scientific information system developed by CERN, DESY, Fermilab and SLAC. INSPIRE combines the curated and trusted contents of SPIRES database with Invenio digital library technology. INSPIRE contains the entire HEP literature with about one million records and in addition to becoming the reference HEP scientific information platform, it aims to provide new kinds of data mining services and metrics to assess the impact of articles and authors. Grid and cloud computing provide new opportunities to offer better services in areas that require large CPU and storage resources including document Optical Character Recognition (OCR) processing, full-text indexing of articles and improved metrics. D4Science-II is a European project that develops and operates an e-Infrastructure supporting Virtual Research Environments (VREs). It develops an enabling technology (gCube) which implements a mechanism for facilitating the interoperation of its e-Infrastructure with other autonomously running data e-Infrastructures. As a result, this creates the core of an e-Infrastructure ecosystem. INSPIRE is one of the e-Infrastructures participating in D4Science-II project. In the context of the D4Science-II project, the INSPIRE e-Infrastructure makes available some of its resources and services to other members of the resulting ecosystem. Moreover, it benefits from the ecosystem via a dedicated Virtual Organization giving access to an array of resources ranging from computing and storage resources of grid infrastructures to data and services.
SEED: A Suite of Instructional Laboratories for Computer Security Education

ERIC Educational Resources Information Center

Du, Wenliang; Wang, Ronghua

2008-01-01

The security and assurance of our computing infrastructure has become a national priority. To address this priority, higher education has gradually incorporated the principles of computer and information security into the mainstream undergraduate and graduate computer science curricula. To achieve effective education, learning security principles…
Storing and using health data in a virtual private cloud.

PubMed

Regola, Nathan; Chawla, Nitesh V

2013-03-13

Electronic health records are being adopted at a rapid rate due to increased funding from the US federal government. Health data provide the opportunity to identify possible improvements in health care delivery by applying data mining and statistical methods to the data and will also enable a wide variety of new applications that will be meaningful to patients and medical professionals. Researchers are often granted access to health care data to assist in the data mining process, but HIPAA regulations mandate comprehensive safeguards to protect the data. Often universities (and presumably other research organizations) have an enterprise information technology infrastructure and a research infrastructure. Unfortunately, both of these infrastructures are generally not appropriate for sensitive research data such as HIPAA, as they require special accommodations on the part of the enterprise information technology (or increased security on the part of the research computing environment). Cloud computing, which is a concept that allows organizations to build complex infrastructures on leased resources, is rapidly evolving to the point that it is possible to build sophisticated network architectures with advanced security capabilities. We present a prototype infrastructure in Amazon's Virtual Private Cloud to allow researchers and practitioners to utilize the data in a HIPAA-compliant environment.
Monitoring performance of a highly distributed and complex computing infrastructure in LHCb

NASA Astrophysics Data System (ADS)

Mathe, Z.; Haen, C.; Stagni, F.

2017-10-01

In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.
Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Duque, Earl P.N.; Whitlock, Brad J.

High performance computers have for many years been on a trajectory that gives them extraordinary compute power with the addition of more and more compute cores. At the same time, other system parameters such as the amount of memory per core and bandwidth to storage have remained constant or have barely increased. This creates an imbalance in the computer, giving it the ability to compute a lot of data that it cannot reasonably save out due to time and storage constraints. While technologies have been invented to mitigate this problem (burst buffers, etc.), software has been adapting to employ inmore » situ libraries which perform data analysis and visualization on simulation data while it is still resident in memory. This avoids the need to ever have to pay the costs of writing many terabytes of data files. Instead, in situ enables the creation of more concentrated data products such as statistics, plots, and data extracts, which are all far smaller than the full-sized volume data. With the increasing popularity of in situ, multiple in situ infrastructures have been created, each with its own mechanism for integrating with a simulation. To make it easier to instrument a simulation with multiple in situ infrastructures and include custom analysis algorithms, this project created the SENSEI framework.« less
Machine learning patterns for neuroimaging-genetic studies in the cloud.

PubMed

Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand

2014-01-01

Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.
Applied Operations Research: Augmented Reality in an Industrial Environment

NASA Technical Reports Server (NTRS)

Cole, Stuart K.

2015-01-01

Augmented reality is the application of computer generated data or graphics onto a real world view. Its use provides the operator additional information or a heightened situational awareness. While advancements have been made in automation and diagnostics of high value critical equipment to improve readiness, reliability and maintenance, the need for assisting and support to Operations and Maintenance staff persists. AR can improve the human machine interface where computer capabilities maximize the human experience and analysis capabilities. NASA operates multiple facilities with complex ground based HVCE in support of national aerodynamics and space exploration, and the need exists to improve operational support and close a gap related to capability sustainment where key and experienced staff consistently rotate work assignments and reach their expiration of term of service. The initiation of an AR capability to augment and improve human abilities and training experience in the industrial environment requires planning and establishment of a goal and objectives for the systems and specific applications. This paper explored use of AR in support of Operation staff in real time operation of HVCE and its maintenance. The results identified include identification of specific goal and objectives, challenges related to availability and computer system infrastructure.
Advancing Cyberinfrastructure to support high resolution water resources modeling

NASA Astrophysics Data System (ADS)

Tarboton, D. G.; Ogden, F. L.; Jones, N.; Horsburgh, J. S.

2012-12-01

Addressing the problem of how the availability and quality of water resources at large scales are sensitive to climate variability, watershed alterations and management activities requires computational resources that combine data from multiple sources and support integrated modeling. Related cyberinfrastructure challenges include: 1) how can we best structure data and computer models to address this scientific problem through the use of high-performance and data-intensive computing, and 2) how can we do this in a way that discipline scientists without extensive computational and algorithmic knowledge and experience can take advantage of advances in cyberinfrastructure? This presentation will describe a new system called CI-WATER that is being developed to address these challenges and advance high resolution water resources modeling in the Western U.S. We are building on existing tools that enable collaboration to develop model and data interfaces that link integrated system models running within an HPC environment to multiple data sources. Our goal is to enhance the use of computational simulation and data-intensive modeling to better understand water resources. Addressing water resource problems in the Western U.S. requires simulation of natural and engineered systems, as well as representation of legal (water rights) and institutional constraints alongside the representation of physical processes. We are establishing data services to represent the engineered infrastructure and legal and institutional systems in a way that they can be used with high resolution multi-physics watershed modeling at high spatial resolution. These services will enable incorporation of location-specific information on water management infrastructure and systems into the assessment of regional water availability in the face of growing demands, uncertain future meteorological forcings, and existing prior-appropriations water rights. This presentation will discuss the informatics challenges involved with data management and easy-to-use access to high performance computing being tackled in this project.
Exploring Cloud Computing for Large-scale Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Guang; Han, Binh; Yin, Jian

This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
Bioinformatics clouds for big data manipulation.

PubMed

Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

2012-11-28

As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.
Knowledge-based decision support for Space Station assembly sequence planning

NASA Astrophysics Data System (ADS)

1991-04-01

A complete Personal Analysis Assistant (PAA) for Space Station Freedom (SSF) assembly sequence planning consists of three software components: the system infrastructure, intra-flight value added, and inter-flight value added. The system infrastructure is the substrate on which software elements providing inter-flight and intra-flight value-added functionality are built. It provides the capability for building representations of assembly sequence plans and specification of constraints and analysis options. Intra-flight value-added provides functionality that will, given the manifest for each flight, define cargo elements, place them in the National Space Transportation System (NSTS) cargo bay, compute performance measure values, and identify violated constraints. Inter-flight value-added provides functionality that will, given major milestone dates and capability requirements, determine the number and dates of required flights and develop a manifest for each flight. The current project is Phase 1 of a projected two phase program and delivers the system infrastructure. Intra- and inter-flight value-added were to be developed in Phase 2, which has not been funded. Based on experience derived from hundreds of projects conducted over the past seven years, ISX developed an Intelligent Systems Engineering (ISE) methodology that combines the methods of systems engineering and knowledge engineering to meet the special systems development requirements posed by intelligent systems, systems that blend artificial intelligence and other advanced technologies with more conventional computing technologies. The ISE methodology defines a phased program process that begins with an application assessment designed to provide a preliminary determination of the relative technical risks and payoffs associated with a potential application, and then moves through requirements analysis, system design, and development.
Knowledge-based decision support for Space Station assembly sequence planning

NASA Technical Reports Server (NTRS)

1991-01-01

A complete Personal Analysis Assistant (PAA) for Space Station Freedom (SSF) assembly sequence planning consists of three software components: the system infrastructure, intra-flight value added, and inter-flight value added. The system infrastructure is the substrate on which software elements providing inter-flight and intra-flight value-added functionality are built. It provides the capability for building representations of assembly sequence plans and specification of constraints and analysis options. Intra-flight value-added provides functionality that will, given the manifest for each flight, define cargo elements, place them in the National Space Transportation System (NSTS) cargo bay, compute performance measure values, and identify violated constraints. Inter-flight value-added provides functionality that will, given major milestone dates and capability requirements, determine the number and dates of required flights and develop a manifest for each flight. The current project is Phase 1 of a projected two phase program and delivers the system infrastructure. Intra- and inter-flight value-added were to be developed in Phase 2, which has not been funded. Based on experience derived from hundreds of projects conducted over the past seven years, ISX developed an Intelligent Systems Engineering (ISE) methodology that combines the methods of systems engineering and knowledge engineering to meet the special systems development requirements posed by intelligent systems, systems that blend artificial intelligence and other advanced technologies with more conventional computing technologies. The ISE methodology defines a phased program process that begins with an application assessment designed to provide a preliminary determination of the relative technical risks and payoffs associated with a potential application, and then moves through requirements analysis, system design, and development.
Securing the Data Storage and Processing in Cloud Computing Environment

ERIC Educational Resources Information Center

Owens, Rodney

2013-01-01

Organizations increasingly utilize cloud computing architectures to reduce costs and energy consumption both in the data warehouse and on mobile devices by better utilizing the computing resources available. However, the security and privacy issues with publicly available cloud computing infrastructures have not been studied to a sufficient depth…
Building Efficient Wireless Infrastructures for Pervasive Computing Environments

ERIC Educational Resources Information Center

Sheng, Bo

2010-01-01

Pervasive computing is an emerging concept that thoroughly brings computing devices and the consequent technology into people's daily life and activities. Most of these computing devices are very small, sometimes even "invisible", and often embedded into the objects surrounding people. In addition, these devices usually are not isolated, but…
A Service Brokering and Recommendation Mechanism for Better Selecting Cloud Services

PubMed Central

Gui, Zhipeng; Yang, Chaowei; Xia, Jizhe; Huang, Qunying; Liu, Kai; Li, Zhenlong; Yu, Manzhu; Sun, Min; Zhou, Nanyin; Jin, Baoxuan

2014-01-01

Cloud computing is becoming the new generation computing infrastructure, and many cloud vendors provide different types of cloud services. How to choose the best cloud services for specific applications is very challenging. Addressing this challenge requires balancing multiple factors, such as business demands, technologies, policies and preferences in addition to the computing requirements. This paper recommends a mechanism for selecting the best public cloud service at the levels of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). A systematic framework and associated workflow include cloud service filtration, solution generation, evaluation, and selection of public cloud services. Specifically, we propose the following: a hierarchical information model for integrating heterogeneous cloud information from different providers and a corresponding cloud information collecting mechanism; a cloud service classification model for categorizing and filtering cloud services and an application requirement schema for providing rules for creating application-specific configuration solutions; and a preference-aware solution evaluation mode for evaluating and recommending solutions according to the preferences of application providers. To test the proposed framework and methodologies, a cloud service advisory tool prototype was developed after which relevant experiments were conducted. The results show that the proposed system collects/updates/records the cloud information from multiple mainstream public cloud services in real-time, generates feasible cloud configuration solutions according to user specifications and acceptable cost predication, assesses solutions from multiple aspects (e.g., computing capability, potential cost and Service Level Agreement, SLA) and offers rational recommendations based on user preferences and practical cloud provisioning; and visually presents and compares solutions through an interactive web Graphical User Interface (GUI). PMID:25170937
DIRAC universal pilots

NASA Astrophysics Data System (ADS)

Stagni, F.; McNab, A.; Luzzi, C.; Krzemien, W.; Consortium, DIRAC

2017-10-01

In the last few years, new types of computing models, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are in the form of opportunistic ones. Most but not all of these new infrastructures are based on virtualization techniques. In addition, some of them, present opportunities for multi-processor computing slots to the users. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to provide the transparent, uniform interface has become essential. The transparent access to the underlying resources is realized by implementing the pilot model. DIRAC’s newest generation of generic pilots (the so-called Pilots 2.0) are the “pilots for all the skies”, and have been successfully released in production more than a year ago. They use a plugin mechanism that makes them easily adaptable. Pilots 2.0 have been used for fetching and running jobs on every type of resource, being it a Worker Node (WN) behind a CREAM/ARC/HTCondor/DIRAC Computing element, a Virtual Machine running on IaaC infrastructures like Vac or BOINC, on IaaS cloud resources managed by Vcycle, the LHCb High Level Trigger farm nodes, and any type of opportunistic computing resource. Make a machine a “Pilot Machine”, and all diversities between them will disappear. This contribution describes how pilots are made suitable for different resources, and the recent steps taken towards a fully unified framework, including monitoring. Also, the cases of multi-processor computing slots either on real or virtual machines, with the whole node or a partition of it, is discussed.
NiftyNet: a deep-learning platform for medical imaging.

PubMed

Gibson, Eli; Li, Wenqi; Sudre, Carole; Fidon, Lucas; Shakir, Dzhoshkun I; Wang, Guotai; Eaton-Rosen, Zach; Gray, Robert; Doel, Tom; Hu, Yipeng; Whyntie, Tom; Nachev, Parashkev; Modat, Marc; Barratt, Dean C; Ourselin, Sébastien; Cardoso, M Jorge; Vercauteren, Tom

2018-05-01

Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this domain of application requires substantial implementation effort. Consequently, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. The NiftyNet infrastructure provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on the TensorFlow framework and supports features such as TensorBoard visualization of 2D and 3D images and computational graphs by default. We present three illustrative medical image analysis applications built using NiftyNet infrastructure: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. The NiftyNet infrastructure enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
The Magellan Final Report on Cloud Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

,; Coghlan, Susan; Yelick, Katherine

The goal of Magellan, a project funded through the U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR), was to investigate the potential role of cloud computing in addressing the computing needs for the DOE Office of Science (SC), particularly related to serving the needs of mid- range computing and future data-intensive computing workloads. A set of research questions was formed to probe various aspects of cloud computing from performance, usability, and cost. To address these questions, a distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility (ALCF) and the National Energy Research Scientific Computingmore » Center (NERSC). The testbed was designed to be flexible and capable enough to explore a variety of computing models and hardware design points in order to understand the impact for various scientific applications. During the project, the testbed also served as a valuable resource to application scientists. Applications from a diverse set of projects such as MG-RAST (a metagenomics analysis server), the Joint Genome Institute, the STAR experiment at the Relativistic Heavy Ion Collider, and the Laser Interferometer Gravitational Wave Observatory (LIGO), were used by the Magellan project for benchmarking within the cloud, but the project teams were also able to accomplish important production science utilizing the Magellan cloud resources.« less
LEMON - LHC Era Monitoring for Large-Scale Infrastructures

NASA Astrophysics Data System (ADS)

Marian, Babik; Ivan, Fedorko; Nicholas, Hook; Hector, Lansdale Thomas; Daniel, Lenkes; Miroslav, Siket; Denis, Waldron

2011-12-01

At the present time computer centres are facing a massive rise in virtualization and cloud computing as these solutions bring advantages to service providers and consolidate the computer centre resources. However, as a result the monitoring complexity is increasing. Computer centre management requires not only to monitor servers, network equipment and associated software but also to collect additional environment and facilities data (e.g. temperature, power consumption, cooling efficiency, etc.) to have also a good overview of the infrastructure performance. The LHC Era Monitoring (Lemon) system is addressing these requirements for a very large scale infrastructure. The Lemon agent that collects data on every client and forwards the samples to the central measurement repository provides a flexible interface that allows rapid development of new sensors. The system allows also to report on behalf of remote devices such as switches and power supplies. Online and historical data can be visualized via a web-based interface or retrieved via command-line tools. The Lemon Alarm System component can be used for notifying the operator about error situations. In this article, an overview of the Lemon monitoring is provided together with a description of the CERN LEMON production instance. No direct comparison is made with other monitoring tool.
DICOMGrid: a middleware to integrate PACS and EELA-2 grid infrastructure

NASA Astrophysics Data System (ADS)

Moreno, Ramon A.; de Sá Rebelo, Marina; Gutierrez, Marco A.

2010-03-01

Medical images provide lots of information for physicians, but the huge amount of data produced by medical image equipments in a modern Health Institution is not completely explored in its full potential yet. Nowadays medical images are used in hospitals mostly as part of routine activities while its intrinsic value for research is underestimated. Medical images can be used for the development of new visualization techniques, new algorithms for patient care and new image processing techniques. These research areas usually require the use of huge volumes of data to obtain significant results, along with enormous computing capabilities. Such qualities are characteristics of grid computing systems such as EELA-2 infrastructure. The grid technologies allow the sharing of data in large scale in a safe and integrated environment and offer high computing capabilities. In this paper we describe the DicomGrid to store and retrieve medical images, properly anonymized, that can be used by researchers to test new processing techniques, using the computational power offered by grid technology. A prototype of the DicomGrid is under evaluation and permits the submission of jobs into the EELA-2 grid infrastructure while offering a simple interface that requires minimal understanding of the grid operation.

@neurIST: infrastructure for advanced disease management through integration of heterogeneous data, computing, and complex processing services.

PubMed

Benkner, Siegfried; Arbona, Antonio; Berti, Guntram; Chiarini, Alessandro; Dunlop, Robert; Engelbrecht, Gerhard; Frangi, Alejandro F; Friedrich, Christoph M; Hanser, Susanne; Hasselmeyer, Peer; Hose, Rod D; Iavindrasana, Jimison; Köhler, Martin; Iacono, Luigi Lo; Lonsdale, Guy; Meyer, Rodolphe; Moore, Bob; Rajasekaran, Hariharan; Summers, Paul E; Wöhrer, Alexander; Wood, Steven

2010-11-01

The increasing volume of data describing human disease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the @neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system's architecture is generic enough that it could be adapted to the treatment of other diseases. Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers clinicians the tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medical researchers gain access to a critical mass of aneurysm related data due to the system's ability to federate distributed information sources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access and work on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand for performing computationally intensive simulations for treatment planning and research.
Building an infrastructure at PICKSC for the educational use of kinetic software tools

NASA Astrophysics Data System (ADS)

Mori, W. B.; Decyk, V. K.; Tableman, A.; Fonseca, R. A.; Tsung, F. S.; Hu, Q.; Winjum, B. J.; Amorim, L. D.; An, W.; Dalichaouch, T. N.; Davidson, A.; Joglekar, A.; Li, F.; May, J.; Touati, M.; Xu, X. L.; Yu, P.

2016-10-01

One aim of the Particle-In-Cell and Kinetic Simulation Center (PICKSC) at UCLA is to coordinate a community development of educational software for undergraduate and graduate courses in plasma physics and computer science. The rich array of physical behaviors exhibited by plasmas can be difficult to grasp by students. If they are given the ability to quickly and easily explore plasma physics through kinetic simulations, and to make illustrative visualizations of plasma waves, particle motion in electromagnetic fields, instabilities, or other phenomena, then they can be equipped with first-hand experiences that inform and contextualize conventional texts and lectures. We are developing an infrastructure for any interested persons to take our kinetic codes, run them without any prerequisite knowledge, and explore desired scenarios. Furthermore, we are actively interested in any ideas or input from other plasma physicists. This poster aims to illustrate what we have developed and gather a community of interested users and developers. Supported by NSF under Grant ACI-1339893.
Technical support for Life Sciences communities on a production grid infrastructure.

PubMed

Michel, Franck; Montagnat, Johan; Glatard, Tristan

2012-01-01

Production operation of large distributed computing infrastructures (DCI) still requires a lot of human intervention to reach acceptable quality of service. This may be achievable for scientific communities with solid IT support, but it remains a show-stopper for others. Some application execution environments are used to hide runtime technical issues from end users. But they mostly aim at fault-tolerance rather than incident resolution, and their operation still requires substantial manpower. A longer-term support activity is thus needed to ensure sustained quality of service for Virtual Organisations (VO). This paper describes how the biomed VO has addressed this challenge by setting up a technical support team. Its organisation, tooling, daily tasks, and procedures are described. Results are shown in terms of resource usage by end users, amount of reported incidents, and developed software tools. Based on our experience, we suggest ways to measure the impact of the technical support, perspectives to decrease its human cost and make it more community-specific.
Operating a production pilot factory serving several scientific domains

NASA Astrophysics Data System (ADS)

Sfiligoi, I.; Würthwein, F.; Andrews, W.; Dost, J. M.; MacNeill, I.; McCrea, A.; Sheripon, E.; Murphy, C. W.

2011-12-01

Pilot infrastructures are becoming prominent players in the Grid environment. One of the major advantages is represented by the reduced effort required by the user communities (also known as Virtual Organizations or VOs) due to the outsourcing of the Grid interfacing services, i.e. the pilot factory, to Grid experts. One such pilot factory, based on the glideinWMS pilot infrastructure, is being operated by the Open Science Grid at University of California San Diego (UCSD). This pilot factory is serving multiple VOs from several scientific domains. Currently the three major clients are the analysis operations of the HEP experiment CMS, the community VO HCC, which serves mostly math, biology and computer science users, and the structural biology VO NEBioGrid. The UCSD glidein factory allows the served VOs to use Grid resources distributed over 150 sites in North and South America, in Europe, and in Asia. This paper presents the steps taken to create a production quality pilot factory, together with the challenges encountered along the road.
Cyber-Physical Correlations for Infrastructure Resilience: A Game-Theoretic Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S; He, Fei; Ma, Chris Y. T.

In several critical infrastructures, the cyber and physical parts are correlated so that disruptions to one affect the other and hence the whole system. These correlations may be exploited to strategically launch components attacks, and hence must be accounted for ensuring the infrastructure resilience, specified by its survival probability. We characterize the cyber-physical interactions at two levels: (i) the failure correlation function specifies the conditional survival probability of cyber sub-infrastructure given the physical sub-infrastructure as a function of their marginal probabilities, and (ii) the individual survival probabilities of both sub-infrastructures are characterized by first-order differential conditions. We formulate a resiliencemore » problem for infrastructures composed of discrete components as a game between the provider and attacker, wherein their utility functions consist of an infrastructure survival probability term and a cost term expressed in terms of the number of components attacked and reinforced. We derive Nash Equilibrium conditions and sensitivity functions that highlight the dependence of infrastructure resilience on the cost term, correlation function and sub-infrastructure survival probabilities. These results generalize earlier ones based on linear failure correlation functions and independent component failures. We apply the results to models of cloud computing infrastructures and energy grids.« less
New project to support scientific collaboration electronically

NASA Astrophysics Data System (ADS)

Clauer, C. R.; Rasmussen, C. E.; Niciejewski, R. J.; Killeen, T. L.; Kelly, J. D.; Zambre, Y.; Rosenberg, T. J.; Stauning, P.; Friis-Christensen, E.; Mende, S. B.; Weymouth, T. E.; Prakash, A.; McDaniel, S. E.; Olson, G. M.; Finholt, T. A.; Atkins, D. E.

A new multidisciplinary effort is linking research in the upper atmospheric and space, computer, and behavioral sciences to develop a prototype electronic environment for conducting team science worldwide. A real-world electronic collaboration testbed has been established to support scientific work centered around the experimental operations being conducted with instruments from the Sondrestrom Upper Atmospheric Research Facility in Kangerlussuaq, Greenland. Such group computing environments will become an important component of the National Information Infrastructure initiative, which is envisioned as the high-performance communications infrastructure to support national scientific research.
ENFIN--A European network for integrative systems biology.

PubMed

Kahlem, Pascal; Clegg, Andrew; Reisinger, Florian; Xenarios, Ioannis; Hermjakob, Henning; Orengo, Christine; Birney, Ewan

2009-11-01

Integration of biological data of various types and the development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both the generation of new bioinformatics tools and the experimental validation of computational predictions. With the aim of bridging the gap existing between standard wet laboratories and bioinformatics, the ENFIN Network runs integrative research projects to bring the latest computational techniques to bear directly on questions dedicated to systems biology in the wet laboratory environment. The Network maintains internally close collaboration between experimental and computational research, enabling a permanent cycling of experimental validation and improvement of computational prediction methods. The computational work includes the development of a database infrastructure (EnCORE), bioinformatics analysis methods and a novel platform for protein function analysis FuncNet.
Investigations into Gravitational Wave Emission from Compact Body Inspiral Into Massive Black Holes

NASA Technical Reports Server (NTRS)

Hughes, Scott A.

2004-01-01

Much of the grant's support (and associated time) was used in developmental activity, building infrastructure for the core of the work that the grant supports. Though infrastructure development was the bulk of the activity supported this year, important progress was made in research as well. The two most important "infrastructure" items were in computing hardware and personnel. Research activities were primarily focused on improving and extending. Hughes' Teukolsky-equation-based gravitational-wave generator. Several improvements have been incorporated into this generator.
X-ray-induced acoustic computed tomography of concrete infrastructure

NASA Astrophysics Data System (ADS)

Tang, Shanshan; Ramseyer, Chris; Samant, Pratik; Xiang, Liangzhong

2018-02-01

X-ray-induced Acoustic Computed Tomography (XACT) takes advantage of both X-ray absorption contrast and high ultrasonic resolution in a single imaging modality by making use of the thermoacoustic effect. In XACT, X-ray absorption by defects and other structures in concrete create thermally induced pressure jumps that launch ultrasonic waves, which are then received by acoustic detectors to form images. In this research, XACT imaging was used to non-destructively test and identify defects in concrete. For concrete structures, we conclude that XACT imaging allows multiscale imaging at depths ranging from centimeters to meters, with spatial resolutions from sub-millimeter to centimeters. XACT imaging also holds promise for single-side testing of concrete infrastructure and provides an optimal solution for nondestructive inspection of existing bridges, pavement, nuclear power plants, and other concrete infrastructure.
Models of Educational Computing @ Home: New Frontiers for Research on Technology in Learning.

ERIC Educational Resources Information Center

Kafai, Yasmin B.; Fishman, Barry J.; Bruckman, Amy S.; Rockman, Saul

2002-01-01

Discusses models of home educational computing that are linked to learning in school and recommends the need for research that addresses the home as a computer-based learning environment. Topics include a history of research on educational computing at home; technological infrastructure, including software and compatibility; Internet access;…
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system

NASA Astrophysics Data System (ADS)

Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.

2014-06-01

The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
Shrink-film microfluidic education modules: Complete devices within minutes

PubMed Central

Nguyen, Diep; McLane, Jolie; Lew, Valerie; Pegan, Jonathan; Khine, Michelle

2011-01-01

As advances in microfluidics continue to make contributions to diagnostics and life sciences, broader awareness of this expanding field becomes necessary. By leveraging low-cost microfabrication techniques that require no capital equipment or infrastructure, simple, accessible, and effective educational modules can be made available for a broad range of educational needs from middle school demonstrations to college laboratory classes. These modules demonstrate key microfluidic concepts such as diffusion and separation as well as “laboratory on-chip” applications including chemical reactions and biological assays. These modules are intended to provide an interdisciplinary hands-on experience, including chip design, fabrication of functional devices, and experiments at the microscale. Consequently, students will be able to conceptualize physics at small scales, gain experience in computer-aided design and microfabrication, and perform experiments—all in the context of addressing real-world challenges by making their own lab-on-chip devices. PMID:21799715
Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments.

PubMed

Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R

2017-11-10

RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Big data computing: Building a vision for ARS information management

USDA-ARS?s Scientific Manuscript database

Improvements are needed within the ARS to increase scientific capacity and keep pace with new developments in computer technologies that support data acquisition and analysis. Enhancements in computing power and IT infrastructure are needed to provide scientists better access to high performance com...
DREAMS and IMAGE: A Model and Computer Implementation for Concurrent, Life-Cycle Design of Complex Systems

NASA Technical Reports Server (NTRS)

Hale, Mark A.; Craig, James I.; Mistree, Farrokh; Schrage, Daniel P.

1995-01-01

Computing architectures are being assembled that extend concurrent engineering practices by providing more efficient execution and collaboration on distributed, heterogeneous computing networks. Built on the successes of initial architectures, requirements for a next-generation design computing infrastructure can be developed. These requirements concentrate on those needed by a designer in decision-making processes from product conception to recycling and can be categorized in two areas: design process and design information management. A designer both designs and executes design processes throughout design time to achieve better product and process capabilities while expanding fewer resources. In order to accomplish this, information, or more appropriately design knowledge, needs to be adequately managed during product and process decomposition as well as recomposition. A foundation has been laid that captures these requirements in a design architecture called DREAMS (Developing Robust Engineering Analysis Models and Specifications). In addition, a computing infrastructure, called IMAGE (Intelligent Multidisciplinary Aircraft Generation Environment), is being developed that satisfies design requirements defined in DREAMS and incorporates enabling computational technologies.
Can Clouds replace Grids? Will Clouds replace Grids?

NASA Astrophysics Data System (ADS)

Shiers, J. D.

2010-04-01

The world's largest scientific machine - comprising dual 27km circular proton accelerators cooled to 1.9oK and located some 100m underground - currently relies on major production Grid infrastructures for the offline computing needs of the 4 main experiments that will take data at this facility. After many years of sometimes difficult preparation the computing service has been declared "open" and ready to meet the challenges that will come shortly when the machine restarts in 2009. But the service is not without its problems: reliability - as seen by the experiments, as opposed to that measured by the official tools - still needs to be significantly improved. Prolonged downtimes or degradations of major services or even complete sites are still too common and the operational and coordination effort to keep the overall service running is probably not sustainable at this level. Recently "Cloud Computing" - in terms of pay-per-use fabric provisioning - has emerged as a potentially viable alternative but with rather different strengths and no doubt weaknesses too. Based on the concrete needs of the LHC experiments - where the total data volume that will be acquired over the full lifetime of the project, including the additional data copies that are required by the Computing Models of the experiments, approaches 1 Exabyte - we analyze the pros and cons of Grids versus Clouds. This analysis covers not only technical issues - such as those related to demanding database and data management needs - but also sociological aspects, which cannot be ignored, neither in terms of funding nor in the wider context of the essential but often overlooked role of science in society, education and economy.
Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency.

PubMed

Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A

2010-01-01

Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
Bioinformatics clouds for big data manipulation

PubMed Central

2012-01-01

Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475
A numerical study on swimming micro-organisms inside a capillary tube

NASA Astrophysics Data System (ADS)

Zhu, Lailai; Lauga, Eric; Brandt, Luca

2011-11-01

The locomotivity of micro-organisms is highly dependent on the surrounding environments such as walls, free surface and neighbouring cells. In our current work, we perform simulations of swimming micro-organisms inside a capillary tube based on boundary element method. We focus on the swimming speed, power consumption and locomotive trajectory of swimming cells for different levels of confinement. For a cell propelling itself by tangential surface deformation, we show that it will swim along a helical trajectory with a specified swimming gait. Such a helical trajectory was observed before by experiments on swimming Paramecium inside a capillary tube. Funding by VR (the Swedish Research Council) and the National Science Foundation (grant CBET-0746285 to E.L.) is gratefully acknowledged. Computer time provided by SNIC (Swedish National Infrastructure for Computing) is also acknowledged.
Framework for Deploying a Virtualized Computing Environment for Collaborative and Secure Data Analytics

PubMed Central

Meyer, Adrian; Green, Laura; Faulk, Ciearro; Galla, Stephen; Meyer, Anne-Marie

2016-01-01

Introduction: Large amounts of health data generated by a wide range of health care applications across a variety of systems have the potential to offer valuable insight into populations and health care systems, but robust and secure computing and analytic systems are required to leverage this information. Framework: We discuss our experiences deploying a Secure Data Analysis Platform (SeDAP), and provide a framework to plan, build and deploy a virtual desktop infrastructure (VDI) to enable innovation, collaboration and operate within academic funding structures. It outlines 6 core components: Security, Ease of Access, Performance, Cost, Tools, and Training. Conclusion: A platform like SeDAP is not simply successful through technical excellence and performance. It’s adoption is dependent on a collaborative environment where researchers and users plan and evaluate the requirements of all aspects. PMID:27683665

Manned/Unmanned Common Architecture Program (MCAP) net centric flight tests

NASA Astrophysics Data System (ADS)

Johnson, Dale

2009-04-01

Properly architected avionics systems can reduce the costs of periodic functional improvements, maintenance, and obsolescence. With this in mind, the U.S. Army Aviation Applied Technology Directorate (AATD) initiated the Manned/Unmanned Common Architecture Program (MCAP) in 2003 to develop an affordable, high-performance embedded mission processing architecture for potential application to multiple aviation platforms. MCAP analyzed Army helicopter and unmanned air vehicle (UAV) missions, identified supporting subsystems, surveyed advanced hardware and software technologies, and defined computational infrastructure technical requirements. The project selected a set of modular open systems standards and market-driven commercial-off-theshelf (COTS) electronics and software, and, developed experimental mission processors, network architectures, and software infrastructures supporting the integration of new capabilities, interoperability, and life cycle cost reductions. MCAP integrated the new mission processing architecture into an AH-64D Apache Longbow and participated in Future Combat Systems (FCS) network-centric operations field experiments in 2006 and 2007 at White Sands Missile Range (WSMR), New Mexico and at the Nevada Test and Training Range (NTTR) in 2008. The MCAP Apache also participated in PM C4ISR On-the-Move (OTM) Capstone Experiments 2007 (E07) and 2008 (E08) at Ft. Dix, NJ and conducted Mesa, Arizona local area flight tests in December 2005, February 2006, and June 2008.
Three-Dimensional Space to Assess Cloud Interoperability

DTIC Science & Technology

2013-03-01

12 1. Portability and Mobility ...collection of network-enabled services that guarantees to provide a scalable, easy accessible, reliable, and personalized computing infrastructure , based on...are used in research to describe cloud models, such as SaaS (Software as a Service), PaaS (Platform as a service), IaaS ( Infrastructure as a Service
Branch Campus Librarianship with Minimal Infrastructure: Rewards and Challenges

ERIC Educational Resources Information Center

Knickman, Elena; Walton, Kerry

2014-01-01

Delaware County Community College provides library services to its branch campus community members by stationing a librarian at a campus 5 to 20 hours each week, without any more library infrastructure than an Internet-enabled computer on the school network. Faculty and students have reacted favorably to the increased presence of librarians.…
VERA 3.6 Release Notes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williamson, Richard L.; Kochunas, Brendan; Adams, Brian M.

The Virtual Environment for Reactor Applications components included in this distribution include selected computational tools and supporting infrastructure that solve neutronics, thermal-hydraulics, fuel performance, and coupled neutronics-thermal hydraulics problems. The infrastructure components provide a simplified common user input capability and provide for the physics integration with data transfer and coupled-physics iterative solution algorithms.
Quantifying Security Threats and Their Impact

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aissa, Anis Ben; Abercrombie, Robert K; Sheldon, Frederick T

In earlier works, we present a computational infrastructure that allows an analyst to estimate the security of a system in terms of the loss that each stakeholder stands to sustain as a result of security breakdowns. In this paper we illustrate this infrastructure by means of a sample example involving an e-commerce application.
Quantifying Security Threats and Their Potential Impacts: A Case Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aissa, Anis Ben; Abercrombie, Robert K; Sheldon, Frederick T

In earlier works, we present a computational infrastructure that allows an analyst to estimate the security of a system in terms of the loss that each stakeholder stands to sustain as a result of security breakdowns. In this paper, we illustrate this infrastructure by means of an e-commerce application.
WLCG scale testing during CMS data challenges

NASA Astrophysics Data System (ADS)

Gutsche, O.; Hajdu, C.

2008-07-01

The CMS computing model to process and analyze LHC collision data follows a data-location driven approach and is using the WLCG infrastructure to provide access to GRID resources. As a preparation for data taking, CMS tests its computing model during dedicated data challenges. An important part of the challenges is the test of the user analysis which poses a special challenge for the infrastructure with its random distributed access patterns. The CMS Remote Analysis Builder (CRAB) handles all interactions with the WLCG infrastructure transparently for the user. During the 2006 challenge, CMS set its goal to test the infrastructure at a scale of 50,000 user jobs per day using CRAB. Both direct submissions by individual users and automated submissions by robots were used to achieve this goal. A report will be given about the outcome of the user analysis part of the challenge using both the EGEE and OSG parts of the WLCG. In particular, the difference in submission between both GRID middlewares (resource broker vs. direct submission) will be discussed. In the end, an outlook for the 2007 data challenge is given.
Defense strategies for cloud computing multi-site server infrastructures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S.; Ma, Chris Y. T.; He, Fei

We consider cloud computing server infrastructures for big data applications, which consist of multiple server sites connected over a wide-area network. The sites house a number of servers, network elements and local-area connections, and the wide-area network plays a critical, asymmetric role of providing vital connectivity between them. We model this infrastructure as a system of systems, wherein the sites and wide-area network are represented by their cyber and physical components. These components can be disabled by cyber and physical attacks, and also can be protected against them using component reinforcements. The effects of attacks propagate within the systems, andmore » also beyond them via the wide-area network.We characterize these effects using correlations at two levels using: (a) aggregate failure correlation function that specifies the infrastructure failure probability given the failure of an individual site or network, and (b) first-order differential conditions on system survival probabilities that characterize the component-level correlations within individual systems. We formulate a game between an attacker and a provider using utility functions composed of survival probability and cost terms. At Nash Equilibrium, we derive expressions for the expected capacity of the infrastructure given by the number of operational servers connected to the network for sum-form, product-form and composite utility functions.« less
Storing and Using Health Data in a Virtual Private Cloud

PubMed Central

Regola, Nathan

2013-01-01

Electronic health records are being adopted at a rapid rate due to increased funding from the US federal government. Health data provide the opportunity to identify possible improvements in health care delivery by applying data mining and statistical methods to the data and will also enable a wide variety of new applications that will be meaningful to patients and medical professionals. Researchers are often granted access to health care data to assist in the data mining process, but HIPAA regulations mandate comprehensive safeguards to protect the data. Often universities (and presumably other research organizations) have an enterprise information technology infrastructure and a research infrastructure. Unfortunately, both of these infrastructures are generally not appropriate for sensitive research data such as HIPAA, as they require special accommodations on the part of the enterprise information technology (or increased security on the part of the research computing environment). Cloud computing, which is a concept that allows organizations to build complex infrastructures on leased resources, is rapidly evolving to the point that it is possible to build sophisticated network architectures with advanced security capabilities. We present a prototype infrastructure in Amazon’s Virtual Private Cloud to allow researchers and practitioners to utilize the data in a HIPAA-compliant environment. PMID:23485880
Constructing Pairing-Friendly Elliptic Curves under Embedding Degree 1 for Securing Critical Infrastructures.

PubMed

Wang, Maocai; Dai, Guangming; Choo, Kim-Kwang Raymond; Jayaraman, Prem Prakash; Ranjan, Rajiv

2016-01-01

Information confidentiality is an essential requirement for cyber security in critical infrastructure. Identity-based cryptography, an increasingly popular branch of cryptography, is widely used to protect the information confidentiality in the critical infrastructure sector due to the ability to directly compute the user's public key based on the user's identity. However, computational requirements complicate the practical application of Identity-based cryptography. In order to improve the efficiency of identity-based cryptography, this paper presents an effective method to construct pairing-friendly elliptic curves with low hamming weight 4 under embedding degree 1. Based on the analysis of the Complex Multiplication(CM) method, the soundness of our method to calculate the characteristic of the finite field is proved. And then, three relative algorithms to construct pairing-friendly elliptic curve are put forward. 10 elliptic curves with low hamming weight 4 under 160 bits are presented to demonstrate the utility of our approach. Finally, the evaluation also indicates that it is more efficient to compute Tate pairing with our curves, than that of Bertoni et al.
Resilient workflows for computational mechanics platforms

NASA Astrophysics Data System (ADS)

Nguyên, Toàn; Trifan, Laurentiu; Désidéri, Jean-Antoine

2010-06-01

Workflow management systems have recently been the focus of much interest and many research and deployment for scientific applications worldwide [26, 27]. Their ability to abstract the applications by wrapping application codes have also stressed the usefulness of such systems for multidiscipline applications [23, 24]. When complex applications need to provide seamless interfaces hiding the technicalities of the computing infrastructures, their high-level modeling, monitoring and execution functionalities help giving production teams seamless and effective facilities [25, 31, 33]. Software integration infrastructures based on programming paradigms such as Python, Mathlab and Scilab have also provided evidence of the usefulness of such approaches for the tight coupling of multidisciplne application codes [22, 24]. Also high-performance computing based on multi-core multi-cluster infrastructures open new opportunities for more accurate, more extensive and effective robust multi-discipline simulations for the decades to come [28]. This supports the goal of full flight dynamics simulation for 3D aircraft models within the next decade, opening the way to virtual flight-tests and certification of aircraft in the future [23, 24, 29].
Constructing Pairing-Friendly Elliptic Curves under Embedding Degree 1 for Securing Critical Infrastructures

PubMed Central

Dai, Guangming

2016-01-01

Information confidentiality is an essential requirement for cyber security in critical infrastructure. Identity-based cryptography, an increasingly popular branch of cryptography, is widely used to protect the information confidentiality in the critical infrastructure sector due to the ability to directly compute the user’s public key based on the user’s identity. However, computational requirements complicate the practical application of Identity-based cryptography. In order to improve the efficiency of identity-based cryptography, this paper presents an effective method to construct pairing-friendly elliptic curves with low hamming weight 4 under embedding degree 1. Based on the analysis of the Complex Multiplication(CM) method, the soundness of our method to calculate the characteristic of the finite field is proved. And then, three relative algorithms to construct pairing-friendly elliptic curve are put forward. 10 elliptic curves with low hamming weight 4 under 160 bits are presented to demonstrate the utility of our approach. Finally, the evaluation also indicates that it is more efficient to compute Tate pairing with our curves, than that of Bertoni et al. PMID:27564373
Streaming Support for Data Intensive Cloud-Based Sequence Analysis

PubMed Central

Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

2013-01-01

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461
Bringing Federated Identity to Grid Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Teheran, Jeny

The Fermi National Accelerator Laboratory (FNAL) is facing the challenge of providing scientific data access and grid submission to scientific collaborations that span the globe but are hosted at FNAL. Users in these collaborations are currently required to register as an FNAL user and obtain FNAL credentials to access grid resources to perform their scientific computations. These requirements burden researchers with managing additional authentication credentials, and put additional load on FNAL for managing user identities. Our design integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and MyProxy with the FNAL grid submission system to provide secure access formore » users from diverse experiments and collab orations without requiring each user to have authentication credentials from FNAL. The design automates the handling of certificates so users do not need to manage them manually. Although the initial implementation is for FNAL's grid submission system, the design and the core of the implementation are general and could be applied to other distributed computing systems.« less
Improved ATLAS HammerCloud Monitoring for Local Site Administration

NASA Astrophysics Data System (ADS)

Böhler, M.; Elmsheuser, J.; Hönig, F.; Legger, F.; Mancinelli, V.; Sciacca, G.

2015-12-01

Every day hundreds of tests are run on the Worldwide LHC Computing Grid for the ATLAS, and CMS experiments in order to evaluate the performance and reliability of the different computing sites. All this activity is steered, controlled, and monitored by the HammerCloud testing infrastructure. Sites with failing functionality tests are auto-excluded from the ATLAS computing grid, therefore it is essential to provide a detailed and well organized web interface for the local site administrators such that they can easily spot and promptly solve site issues. Additional functionality has been developed to extract and visualize the most relevant information. The site administrators can now be pointed easily to major site issues which lead to site blacklisting as well as possible minor issues that are usually not conspicuous enough to warrant the blacklisting of a specific site, but can still cause undesired effects such as a non-negligible job failure rate. This paper summarizes the different developments and optimizations of the HammerCloud web interface and gives an overview of typical use cases.
Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment

NASA Astrophysics Data System (ADS)

Bowker, Geoffrey C.; Baker, Karen; Millerand, Florence; Ribes, David

This article presents Information Infrastructure Studies, a research area that takes up some core issues in digital information and organization research. Infrastructure Studies simultaneously addresses the technical, social, and organizational aspects of the development, usage, and maintenance of infrastructures in local communities as well as global arenas. While infrastructure is understood as a broad category referring to a variety of pervasive, enabling network resources such as railroad lines, plumbing and pipes, electrical power plants and wires, this article focuses on information infrastructure, such as computational services and help desks, or federating activities such as scientific data repositories and archives spanning the multiple disciplines needed to address such issues as climate warming and the biodiversity crisis. These are elements associated with the internet and, frequently today, associated with cyberinfrastructure or e-science endeavors. We argue that a theoretical understanding of infrastructure provides the context for needed dialogue between design, use, and sustainability of internet-based infrastructure services. This article outlines a research area and outlines overarching themes of Infrastructure Studies. Part one of the paper presents definitions for infrastructure and cyberinfrastructure, reviewing salient previous work. Part two portrays key ideas from infrastructure studies (knowledge work, social and political values, new forms of sociality, etc.). In closing, the character of the field today is considered.
Impact of configuration management system of computer center on support of scientific projects throughout their lifecycle

NASA Astrophysics Data System (ADS)

Bogdanov, A. V.; Iuzhanin, N. V.; Zolotarev, V. I.; Ezhakova, T. R.

2017-12-01

In this article the problem of scientific projects support throughout their lifecycle in the computer center is considered in every aspect of support. Configuration Management system plays a connecting role in processes related to the provision and support of services of a computer center. In view of strong integration of IT infrastructure components with the use of virtualization, control of infrastructure becomes even more critical to the support of research projects, which means higher requirements for the Configuration Management system. For every aspect of research projects support, the influence of the Configuration Management system is being reviewed and development of the corresponding elements of the system is being described in the present paper.
Flexible Description and Adaptive Processing of Earth Observation Data through the BigEarth Platform

NASA Astrophysics Data System (ADS)

Gorgan, Dorian; Bacu, Victor; Stefanut, Teodor; Nandra, Cosmin; Mihon, Danut

2016-04-01

The Earth Observation data repositories extending periodically by several terabytes become a critical issue for organizations. The management of the storage capacity of such big datasets, accessing policy, data protection, searching, and complex processing require high costs that impose efficient solutions to balance the cost and value of data. Data can create value only when it is used, and the data protection has to be oriented toward allowing innovation that sometimes depends on creative people, which achieve unexpected valuable results through a flexible and adaptive manner. The users need to describe and experiment themselves different complex algorithms through analytics in order to valorize data. The analytics uses descriptive and predictive models to gain valuable knowledge and information from data analysis. Possible solutions for advanced processing of big Earth Observation data are given by the HPC platforms such as cloud. With platforms becoming more complex and heterogeneous, the developing of applications is even harder and the efficient mapping of these applications to a suitable and optimum platform, working on huge distributed data repositories, is challenging and complex as well, even by using specialized software services. From the user point of view, an optimum environment gives acceptable execution times, offers a high level of usability by hiding the complexity of computing infrastructure, and supports an open accessibility and control to application entities and functionality. The BigEarth platform [1] supports the entire flow of flexible description of processing by basic operators and adaptive execution over cloud infrastructure [2]. The basic modules of the pipeline such as the KEOPS [3] set of basic operators, the WorDeL language [4], the Planner for sequential and parallel processing, and the Executor through virtual machines, are detailed as the main components of the BigEarth platform [5]. The presentation exemplifies the development of some Earth Observation oriented applications based on flexible description of processing, and adaptive and portable execution over Cloud infrastructure. Main references for further information: [1] BigEarth project, http://cgis.utcluj.ro/projects/bigearth [2] Gorgan, D., "Flexible and Adaptive Processing of Earth Observation Data over High Performance Computation Architectures", International Conference and Exhibition Satellite 2015, August 17-19, Houston, Texas, USA. [3] Mihon, D., Bacu, V., Colceriu, V., Gorgan, D., "Modeling of Earth Observation Use Cases through the KEOPS System", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 455-460, (2015). [4] Nandra, C., Gorgan, D., "Workflow Description Language for Defining Big Earth Data Processing Tasks", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 461-468, (2015). [5] Bacu, V., Stefan, T., Gorgan, D., "Adaptive Processing of Earth Observation Data on Cloud Infrastructures Based on Workflow Description", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp.444-454, (2015).
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

PubMed

Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel

2017-01-01

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System

PubMed Central

Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel

2017-01-01

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). PMID:28381997

The point-spread function measure of resolution for the 3-D electrical resistivity experiment

NASA Astrophysics Data System (ADS)

Oldenborger, Greg A.; Routh, Partha S.

2009-02-01

The solution appraisal component of the inverse problem involves investigation of the relationship between our estimated model and the actual model. However, full appraisal is difficult for large 3-D problems such as electrical resistivity tomography (ERT). We tackle the appraisal problem for 3-D ERT via the point-spread functions (PSFs) of the linearized resolution matrix. The PSFs represent the impulse response of the inverse solution and quantify our parameter-specific resolving capability. We implement an iterative least-squares solution of the PSF for the ERT experiment, using on-the-fly calculation of the sensitivity via an adjoint integral equation with stored Green's functions and subgrid reduction. For a synthetic example, analysis of individual PSFs demonstrates the truly 3-D character of the resolution. The PSFs for the ERT experiment are Gaussian-like in shape, with directional asymmetry and significant off-diagonal features. Computation of attributes representative of the blurring and localization of the PSF reveal significant spatial dependence of the resolution with some correlation to the electrode infrastructure. Application to a time-lapse ground-water monitoring experiment demonstrates the utility of the PSF for assessing feature discrimination, predicting artefacts and identifying model dependence of resolution. For a judicious selection of model parameters, we analyse the PSFs and their attributes to quantify the case-specific localized resolving capability and its variability over regions of interest. We observe approximate interborehole resolving capability of less than 1-1.5m in the vertical direction and less than 1-2.5m in the horizontal direction. Resolving capability deteriorates significantly outside the electrode infrastructure.
The pLISA project in ASTERICS

NASA Astrophysics Data System (ADS)

De Bonis, Giulia; Bozza, Cristiano

2017-03-01

In the framework of Horizon 2020, the European Commission approved the ASTERICS initiative (ASTronomy ESFRI and Research Infrastructure CluSter) to collect knowledge and experiences from astronomy, astrophysics and particle physics and foster synergies among existing research infrastructures and scientific communities, hence paving the way for future ones. ASTERICS aims at producing a common set of tools and strategies to be applied in Astronomy ESFRI facilities. In particular, it will target the so-called multi-messenger approach to combine information from optical and radio telescopes, photon counters and neutrino telescopes. pLISA is a software tool under development in ASTERICS to help and promote machine learning as a unified approach to multivariate analysis of astrophysical data and signals. The library will offer a collection of classification parameters, estimators, classes and methods to be linked and used in reconstruction programs (and possibly also extended), to characterize events in terms of particle identification and energy. The pLISA library aims at offering the software infras tructure for applications developed inside different experiments and has been designed with an effort to extrapolate general, physics-related estimators from the specific features of the data model related to each particular experiment. pLISA is oriented towards parallel computing architectures, with awareness of the opportunity of using GPUs as accelerators demanding specifically optimized algorithms and to reduce the costs of pro cessing hardware requested for the reconstruction tasks. Indeed, a fast (ideally, real-time) reconstruction can open the way for the development or improvement of alert systems, typically required by multi-messenger search programmes among the different experi mental facilities involved in ASTERICS.
Distributed analysis in ATLAS

NASA Astrophysics Data System (ADS)

Dewhurst, A.; Legger, F.

2015-12-01

The ATLAS experiment accumulated more than 140 PB of data during the first run of the Large Hadron Collider (LHC) at CERN. The analysis of such an amount of data is a challenging task for the distributed physics community. The Distributed Analysis (DA) system of the ATLAS experiment is an established and stable component of the ATLAS distributed computing operations. About half a million user jobs are running daily on DA resources, submitted by more than 1500 ATLAS physicists. The reliability of the DA system during the first run of the LHC and the following shutdown period has been high thanks to the continuous automatic validation of the distributed analysis sites and the user support provided by a dedicated team of expert shifters. During the LHC shutdown, the ATLAS computing model has undergone several changes to improve the analysis workflows, including the re-design of the production system, a new analysis data format and event model, and the development of common reduction and analysis frameworks. We report on the impact such changes have on the DA infrastructure, describe the new DA components, and include recent performance measurements.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.

PubMed

Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli

2016-09-10

The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
Sessional, Weekly and Diurnal Patterns of Computer Lab Usage by Students Attending a Regional University in Australia

ERIC Educational Resources Information Center

Spennemann, Dirk H. R.; Atkinson, John; Cornforth, David

2007-01-01

Most universities have invested in extensive infrastructure in the form of computer laboratories and computer kiosks. However, is this investment justified when it is suggested that students work predominantly from home using their own computers? This paper provides an empirical study investigating how students at a regional multi-campus…
Easy, Collaborative and Engaging--The Use of Cloud Computing in the Design of Management Classrooms

ERIC Educational Resources Information Center

Schneckenberg, Dirk

2014-01-01

Background: Cloud computing has recently received interest in information systems research and practice as a new way to organise information with the help of an increasingly ubiquitous computer infrastructure. However, the use of cloud computing in higher education institutions and business schools, as well as its potential to create novel…
Computers and the Environment: Minimizing the Carbon Footprint

ERIC Educational Resources Information Center

Kaestner, Rich

2009-01-01

Computers can be good and bad for the environment; one can maximize the good and minimize the bad. When dealing with environmental issues, it's difficult to ignore the computing infrastructure. With an operations carbon footprint equal to the airline industry's, computer energy use is only part of the problem; everyone is also dealing with the use…
Large-scale parallel genome assembler over cloud computing environment.

PubMed

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Information science and technology developments within the National Biological Information Infrastructure

USGS Publications Warehouse

Frame, M.T.; Cotter, G.; Zolly, L.; Little, J.

2002-01-01

Whether your vantage point is that of an office window or a national park, your view undoubtedly encompasses a rich diversity of life forms, all carefully studied or managed by some scientist, resource manager, or planner. A few simple calculations - the number of species, their interrelationships, and the many researchers studying them - and you can easily see the tremendous challenges that the resulting biological data presents to the information and computer science communities. Biological information varies in format and content: it may pertain to a particular species or an entire ecosystem; it can contain land use characteristics, and geospatially referenced information. The complexity and uniqueness of each individual species or ecosystem do not easily lend themselves to today's computer science tools and applications. To address the challenges that the biological enterprise presents, the National Biological Information Infrastructure (NBII) (http://www.nbii.gov) was established in 1993 on the recommendation of the National Research Council (National Research Council 1993). The NBII is designed to address these issues on a national scale, and through international partnerships. This paper discusses current information and computer science efforts within the National Biological Information Infrastructure Program, and future computer science research endeavors that are needed to address the ever-growing issues related to our nation's biological concerns. ?? 2003 by The Haworth Press, Inc. All rights reserved.
Changing from computing grid to knowledge grid in life-science grid.

PubMed

Talukdar, Veera; Konar, Amit; Datta, Ayan; Choudhury, Anamika Roy

2009-09-01

Grid computing has a great potential to become a standard cyber infrastructure for life sciences that often require high-performance computing and large data handling, which exceeds the computing capacity of a single institution. Grid computer applies the resources of many computers in a network to a single problem at the same time. It is useful to scientific problems that require a great number of computer processing cycles or access to a large amount of data.As biologists,we are constantly discovering millions of genes and genome features, which are assembled in a library and distributed on computers around the world.This means that new, innovative methods must be developed that exploit the re-sources available for extensive calculations - for example grid computing.This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing a "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. By extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
The OSG Open Facility: an on-ramp for opportunistic scientific computing

NASA Astrophysics Data System (ADS)

Jayatilaka, B.; Levshina, T.; Sehgal, C.; Gardner, R.; Rynge, M.; Würthwein, F.

2017-10-01

The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed high-throughput computing (DHTC) to researchers from a wide variety of disciplines via the OSG Open Facility. The Open Facility consists of OSG resources that are available opportunistically to users other than resource owners and their collaborators. In the past two years, the Open Facility has doubled its annual throughput to over 200 million wall hours. More than half of these resources are used by over 100 individual researchers from over 60 institutions in fields such as biology, medicine, math, economics, and many others. Over 10% of these individual users utilized in excess of 1 million computational hours each in the past year. The largest source of these cycles is temporary unused capacity at institutions affiliated with US LHC computational sites. An increasing fraction, however, comes from university HPC clusters and large national infrastructure supercomputers offering unused capacity. Such expansions have allowed the OSG to provide ample computational resources to both individual researchers and small groups as well as sizable international science collaborations such as LIGO, AMS, IceCube, and sPHENIX. Opening up access to the Fermilab FabrIc for Frontier Experiments (FIFE) project has also allowed experiments such as mu2e and NOvA to make substantial use of Open Facility resources, the former with over 40 million wall hours in a year. We present how this expansion was accomplished as well as future plans for keeping the OSG Open Facility at the forefront of enabling scientific research by way of DHTC.
The OSG Open Facility: An On-Ramp for Opportunistic Scientific Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jayatilaka, B.; Levshina, T.; Sehgal, C.

The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed high-throughput computing (DHTC) to researchers from a wide variety of disciplines via the OSG Open Facility. The Open Facility consists of OSG resources that are available opportunistically to users other than resource ownersmore » and their collaborators. In the past two years, the Open Facility has doubled its annual throughput to over 200 million wall hours. More than half of these resources are used by over 100 individual researchers from over 60 institutions in fields such as biology, medicine, math, economics, and many others. Over 10% of these individual users utilized in excess of 1 million computational hours each in the past year. The largest source of these cycles is temporary unused capacity at institutions affiliated with US LHC computational sites. An increasing fraction, however, comes from university HPC clusters and large national infrastructure supercomputers offering unused capacity. Such expansions have allowed the OSG to provide ample computational resources to both individual researchers and small groups as well as sizable international science collaborations such as LIGO, AMS, IceCube, and sPHENIX. Opening up access to the Fermilab FabrIc for Frontier Experiments (FIFE) project has also allowed experiments such as mu2e and NOvA to make substantial use of Open Facility resources, the former with over 40 million wall hours in a year. We present how this expansion was accomplished as well as future plans for keeping the OSG Open Facility at the forefront of enabling scientific research by way of DHTC.« less
Multi-level meta-workflows: new concept for regularly occurring tasks in quantum chemistry.

PubMed

Arshad, Junaid; Hoffmann, Alexander; Gesing, Sandra; Grunzke, Richard; Krüger, Jens; Kiss, Tamas; Herres-Pawlis, Sonja; Terstyanszky, Gabor

2016-01-01

In Quantum Chemistry, many tasks are reoccurring frequently, e.g. geometry optimizations, benchmarking series etc. Here, workflows can help to reduce the time of manual job definition and output extraction. These workflows are executed on computing infrastructures and may require large computing and data resources. Scientific workflows hide these infrastructures and the resources needed to run them. It requires significant efforts and specific expertise to design, implement and test these workflows. Many of these workflows are complex and monolithic entities that can be used for particular scientific experiments. Hence, their modification is not straightforward and it makes almost impossible to share them. To address these issues we propose developing atomic workflows and embedding them in meta-workflows. Atomic workflows deliver a well-defined research domain specific function. Publishing workflows in repositories enables workflow sharing inside and/or among scientific communities. We formally specify atomic and meta-workflows in order to define data structures to be used in repositories for uploading and sharing them. Additionally, we present a formal description focused at orchestration of atomic workflows into meta-workflows. We investigated the operations that represent basic functionalities in Quantum Chemistry, developed the relevant atomic workflows and combined them into meta-workflows. Having these workflows we defined the structure of the Quantum Chemistry workflow library and uploaded these workflows in the SHIWA Workflow Repository.Graphical AbstractMeta-workflows and embedded workflows in the template representation.
Comparing New Zealand's 'Middle Out' health information technology strategy with other OECD nations.

PubMed

Bowden, Tom; Coiera, Enrico

2013-05-01

Implementation of efficient, universally applied, computer to computer communications is a high priority for many national health systems. As a consequence, much effort has been channelled into finding ways in which a patient's previous medical history can be made accessible when needed. A number of countries have attempted to share patients' records, with varying degrees of success. While most efforts to create record-sharing architectures have relied upon government-provided strategy and funding, New Zealand has taken a different approach. Like most British Commonwealth nations, New Zealand has a 'hybrid' publicly/privately funded health system. However its information technology infrastructure and automation has largely been developed by the private sector, working closely with regional and central government agencies. Currently the sector is focused on finding ways in which patient records can be shared amongst providers across three different regions. New Zealand's healthcare IT model combines government contributed funding, core infrastructure, facilitation and leadership with private sector investment and skills and is being delivered via a set of controlled experiments. The net result is a 'Middle Out' approach to healthcare automation. 'Middle Out' relies upon having a clear, well-articulated health-reform strategy and a determination by both public and private sector organisations to implement useful healthcare IT solutions by working closely together. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Quantitative Investigation of the Technologies That Support Cloud Computing

ERIC Educational Resources Information Center

Hu, Wenjin

2014-01-01

Cloud computing is dramatically shaping modern IT infrastructure. It virtualizes computing resources, provides elastic scalability, serves as a pay-as-you-use utility, simplifies the IT administrators' daily tasks, enhances the mobility and collaboration of data, and increases user productivity. We focus on providing generalized black-box…
--No Title--

Science.gov Websites

interoperability emerging infrastructure for data management on computational grids Software Packages Services : ATLAS: Management and Steering: Computing Management Board Software Project Management Board Database Model Group Computing TDR: 4.5 Event Data 4.8 Database and Data Management Services 6.3.4 Production and
An Infrastructure for Web-Based Computer Assisted Learning

ERIC Educational Resources Information Center

Joy, Mike; Muzykantskii, Boris; Rawles, Simon; Evans, Michael

2002-01-01

We describe an initiative under way at Warwick to provide a technical foundation for computer aided learning and computer-assisted assessment tools, which allows a rich dialogue sensitive to individual students' response patterns. The system distinguishes between dialogues for individual problems and the linking of problems. This enables a subject…
Evaluation of Service Level Agreement Approaches for Portfolio Management in the Financial Industry

NASA Astrophysics Data System (ADS)

Pontz, Tobias; Grauer, Manfred; Kuebert, Roland; Tenschert, Axel; Koller, Bastian

The idea of service-oriented Grid computing seems to have the potential for fundamental paradigm change and a new architectural alignment concerning the design of IT infrastructures. There is a wide range of technical approaches from scientific communities which describe basic infrastructures and middlewares for integrating Grid resources in order that by now Grid applications are technically realizable. Hence, Grid computing needs viable business models and enhanced infrastructures to move from academic application right up to commercial application. For a commercial usage of these evolutions service level agreements are needed. The developed approaches are primary of academic interest and mostly have not been put into practice. Based on a business use case of the financial industry, five service level agreement approaches have been evaluated in this paper. Based on the evaluation, a management architecture has been designed and implemented as a prototype.
SEE-GRID eInfrastructure for Regional eScience

NASA Astrophysics Data System (ADS)

Prnjat, Ognjen; Balaz, Antun; Vudragovic, Dusan; Liabotis, Ioannis; Sener, Cevat; Marovic, Branko; Kozlovszky, Miklos; Neagu, Gabriel

In the past 6 years, a number of targeted initiatives, funded by the European Commission via its information society and RTD programmes and Greek infrastructure development actions, have articulated a successful regional development actions in South East Europe that can be used as a role model for other international developments. The SEEREN (South-East European Research and Education Networking initiative) project, through its two phases, established the SEE segment of the pan-European G ´EANT network and successfully connected the research and scientific communities in the region. Currently, the SEE-LIGHT project is working towards establishing a dark-fiber backbone that will interconnect most national Research and Education networks in the region. On the distributed computing and storage provisioning i.e. Grid plane, the SEE-GRID (South-East European GRID e-Infrastructure Development) project, similarly through its two phases, has established a strong human network in the area of scientific computing and has set up a powerful regional Grid infrastructure, and attracted a number of applications from different fields from countries throughout the South-East Europe. The current SEEGRID-SCI project, ending in April 2010, empowers the regional user communities from fields of meteorology, seismology and environmental protection in common use and sharing of the regional e-Infrastructure. Current technical initiatives in formulation are focusing on a set of coordinated actions in the area of HPC and application fields making use of HPC initiatives. Finally, the current SEERA-EI project brings together policy makers - programme managers from 10 countries in the region. The project aims to establish a communication platform between programme managers, pave the way towards common e-Infrastructure strategy and vision, and implement concrete actions for common funding of electronic infrastructures on the regional level. The regional vision on establishing an e-Infrastructure compatible with European developments, and empowering the scientists in the region in equal participation in the use of pan- European infrastructures, is materializing through the above initiatives. This model has a number of concrete operational and organizational guidelines which can be adapted to help e-Infrastructure developments in other world regions. In this paper we review the most important developments and contributions by the SEEGRID- SCI project.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

The Implementation of the California Community Colleges Telecommunications and Technology Infrastructure Program, 2000-2001.

ERIC Educational Resources Information Center

California Community Colleges, Sacramento. Office of the Chancellor.

This is the sixth report on the status and progress of the Telecommunications and Technology Infrastructure Program (TTIP), submitted by the California Community Colleges. In California, familiarity with and use of computers is fundamental to economic success. California is home to many of the major companies involved in creating the future of the…
Managing Information Technology as a Catalyst of Change. Track V: Optimizing the Infrastructure.

ERIC Educational Resources Information Center

CAUSE, Boulder, CO.

This track of the 1993 CAUSE Conference presents eight papers on developments in computer network infrastructure and the challenges for those who plan for, implement, and manage it in colleges and universities. Papers include: (1) "Where Do We Go from Here: Summative Assessment of a Five-Year Strategic Plan for Linking and Integrating…
Communication: Essential for Leadership to a Public Good--an Information Infrastructure.

ERIC Educational Resources Information Center

Moran, Robert F., Jr.

This paper discusses the central role of effective communication in library leadership and how a leadership role in the library and information community can define and help establish an information infrastructure in our society. The opportunity for this leadership to exist in the convergence of libraries and computer centers is examined in a…
In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, a State-of-the-art (STAR) Report

DOE PAGES

Bethel, EW; Bauer, A; Abbasi, H; ...

2016-06-10

The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i.e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed /visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPU’s and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitionersmore » using in situ methods in extreme-scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.« less
Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.

PubMed

Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P

2010-01-15

A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.
A sense of life: computational and experimental investigations with models of biochemical and evolutionary processes.

PubMed

Mishra, Bud; Daruwala, Raoul-Sam; Zhou, Yi; Ugel, Nadia; Policriti, Alberto; Antoniotti, Marco; Paxia, Salvatore; Rejali, Marc; Rudra, Archisman; Cherepinsky, Vera; Silver, Naomi; Casey, William; Piazza, Carla; Simeoni, Marta; Barbano, Paolo; Spivak, Marina; Feng, Jiawu; Gill, Ofer; Venkatesh, Mysore; Cheng, Fang; Sun, Bing; Ioniata, Iuliana; Anantharaman, Thomas; Hubbard, E Jane Albert; Pnueli, Amir; Harel, David; Chandru, Vijay; Hariharan, Ramesh; Wigler, Michael; Park, Frank; Lin, Shih-Chieh; Lazebnik, Yuri; Winkler, Franz; Cantor, Charles R; Carbone, Alessandra; Gromov, Mikhael

2003-01-01

We collaborate in a research program aimed at creating a rigorous framework, experimental infrastructure, and computational environment for understanding, experimenting with, manipulating, and modifying a diverse set of fundamental biological processes at multiple scales and spatio-temporal modes. The novelty of our research is based on an approach that (i) requires coevolution of experimental science and theoretical techniques and (ii) exploits a certain universality in biology guided by a parsimonious model of evolutionary mechanisms operating at the genomic level and manifesting at the proteomic, transcriptomic, phylogenic, and other higher levels. Our current program in "systems biology" endeavors to marry large-scale biological experiments with the tools to ponder and reason about large, complex, and subtle natural systems. To achieve this ambitious goal, ideas and concepts are combined from many different fields: biological experimentation, applied mathematical modeling, computational reasoning schemes, and large-scale numerical and symbolic simulations. From a biological viewpoint, the basic issues are many: (i) understanding common and shared structural motifs among biological processes; (ii) modeling biological noise due to interactions among a small number of key molecules or loss of synchrony; (iii) explaining the robustness of these systems in spite of such noise; and (iv) cataloging multistatic behavior and adaptation exhibited by many biological processes.
Evolution of user analysis on the grid in ATLAS

NASA Astrophysics Data System (ADS)

Dewhurst, A.; Legger, F.; ATLAS Collaboration

2017-10-01

More than one thousand physicists analyse data collected by the ATLAS experiment at the Large Hadron Collider (LHC) at CERN through 150 computing facilities around the world. Efficient distributed analysis requires optimal resource usage and the interplay of several factors: robust grid and software infrastructures, and system capability to adapt to different workloads. The continuous automatic validation of grid sites and the user support provided by a dedicated team of expert shifters have been proven to provide a solid distributed analysis system for ATLAS users. Typical user workflows on the grid, and their associated metrics, are discussed. Measurements of user job performance and typical requirements are also shown.
Recent developments in user-job management with Ganga

NASA Astrophysics Data System (ADS)

Currie, R.; Elmsheuser, J.; Fay, R.; Owen, P. H.; Richards, A.; Slater, M.; Sutcliffe, W.; Williams, M.

2015-12-01

The Ganga project was originally developed for use by LHC experiments and has been used extensively throughout Run1 in both LHCb and ATLAS. This document describes some the most recent developments within the Ganga project. There have been improvements in the handling of large scale computational tasks in the form of a new GangaTasks infrastructure. Improvements in file handling through using a new IGangaFile interface makes handling files largely transparent to the end user. In addition to this the performance and usability of Ganga have both been addressed through the development of a new queues system allows for parallel processing of job related tasks.
CMS Distributed Computing Integration in the LHC sustained operations era

NASA Astrophysics Data System (ADS)

Grandi, C.; Bockelman, B.; Bonacorsi, D.; Fisk, I.; González Caballero, I.; Farina, F.; Hernández, J. M.; Padhi, S.; Sarkar, S.; Sciabà, A.; Sfiligoi, I.; Spiga, F.; Úbeda García, M.; Van Der Ster, D. C.; Zvada, M.

2011-12-01

After many years of preparation the CMS computing system has reached a situation where stability in operations limits the possibility to introduce innovative features. Nevertheless it is the same need of stability and smooth operations that requires the introduction of features that were considered not strategic in the previous phases. Examples are: adequate authorization to control and prioritize the access to storage and computing resources; improved monitoring to investigate problems and identify bottlenecks on the infrastructure; increased automation to reduce the manpower needed for operations; effective process to deploy in production new releases of the software tools. We present the work of the CMS Distributed Computing Integration Activity that is responsible for providing a liaison between the CMS distributed computing infrastructure and the software providers, both internal and external to CMS. In particular we describe the introduction of new middleware features during the last 18 months as well as the requirements to Grid and Cloud software developers for the future.
Public-Private Partnership: Joint recommendations to improve downloads of large Earth observation data

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Murphy, K. J.; Baynes, K.; Lynnes, C.

2016-12-01

With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way Earth observation data is processed, analyzed, and visualized. The cloud infrastructure provides the flexibility to scale up to large volumes of data and handle high velocity data streams efficiently. Having freely available Earth observation data collocated on a cloud infrastructure creates opportunities for innovation and value-added data re-use in ways unforeseen by the original data provider. These innovations spur new industries and applications and spawn new scientific pathways that were previously limited due to data volume and computational infrastructure issues. NASA, in collaboration with Amazon, Google, and Microsoft, have jointly developed a set of recommendations to enable efficient transfer of Earth observation data from existing data systems to a cloud computing infrastructure. The purpose of these recommendations is to provide guidelines against which all data providers can evaluate existing data systems and be used to improve any issues uncovered to enable efficient search, access, and use of large volumes of data. Additionally, these guidelines ensure that all cloud providers utilize a common methodology for bulk-downloading data from data providers thus preventing the data providers from building custom capabilities to meet the needs of individual cloud providers. The intent is to share these recommendations with other Federal agencies and organizations that serve Earth observation to enable efficient search, access, and use of large volumes of data. Additionally, the adoption of these recommendations will benefit data users interested in moving large volumes of data from data systems to any other location. These data users include the cloud providers, cloud users such as scientists, and other users working in a high performance computing environment who need to move large volumes of data.
Robust, Optimal Water Infrastructure Planning Under Deep Uncertainty Using Metamodels

NASA Astrophysics Data System (ADS)

Maier, H. R.; Beh, E. H. Y.; Zheng, F.; Dandy, G. C.; Kapelan, Z.

2015-12-01

Optimal long-term planning plays an important role in many water infrastructure problems. However, this task is complicated by deep uncertainty about future conditions, such as the impact of population dynamics and climate change. One way to deal with this uncertainty is by means of robustness, which aims to ensure that water infrastructure performs adequately under a range of plausible future conditions. However, as robustness calculations require computationally expensive system models to be run for a large number of scenarios, it is generally computationally intractable to include robustness as an objective in the development of optimal long-term infrastructure plans. In order to overcome this shortcoming, an approach is developed that uses metamodels instead of computationally expensive simulation models in robustness calculations. The approach is demonstrated for the optimal sequencing of water supply augmentation options for the southern portion of the water supply for Adelaide, South Australia. A 100-year planning horizon is subdivided into ten equal decision stages for the purpose of sequencing various water supply augmentation options, including desalination, stormwater harvesting and household rainwater tanks. The objectives include the minimization of average present value of supply augmentation costs, the minimization of average present value of greenhouse gas emissions and the maximization of supply robustness. The uncertain variables are rainfall, per capita water consumption and population. Decision variables are the implementation stages of the different water supply augmentation options. Artificial neural networks are used as metamodels to enable all objectives to be calculated in a computationally efficient manner at each of the decision stages. The results illustrate the importance of identifying optimal staged solutions to ensure robustness and sustainability of water supply into an uncertain long-term future.
Building Nationally-Focussed, Globally Federated, High Performance Earth Science Platforms to Solve Next Generation Social and Economic Issues.

NASA Astrophysics Data System (ADS)

Wyborn, Lesley; Evans, Ben; Foster, Clinton; Pugh, Timothy; Uhlherr, Alfred

2015-04-01

Digital geoscience data and information are integral to informing decisions on the social, economic and environmental management of natural resources. Traditionally, such decisions were focused on regional or national viewpoints only, but it is increasingly being recognised that global perspectives are required to meet new challenges such as predicting impacts of climate change; sustainably exploiting scarce water, mineral and energy resources; and protecting our communities through better prediction of the behaviour of natural hazards. In recent years, technical advances in scientific instruments have resulted in a surge in data volumes, with data now being collected at unprecedented rates and at ever increasing resolutions. The size of many earth science data sets now exceed the computational capacity of many government and academic organisations to locally store and dynamically access the data sets; to internally process and analyse them to high resolutions; and then to deliver them online to clients, partners and stakeholders. Fortunately, at the same time, computational capacities have commensurately increased (both cloud and HPC): these can now provide the capability to effectively access the ever-growing data assets within realistic time frames. However, to achieve this, data and computing need to be co-located: bandwidth limits the capacity to move the large data sets; the data transfers are too slow; and latencies to access them are too high. These scenarios are driving the move towards more centralised High Performance (HP) Infrastructures. The rapidly increasing scale of data, the growing complexity of software and hardware environments, combined with the energy costs of running such infrastructures is creating a compelling economic argument for just having one or two major national (or continental) HP facilities that can be federated internationally to enable earth and environmental issues to be tackled at global scales. But at the same time, if properly constructed, these infrastructures can also service very small-scale research projects. The National Computational Infrastructure (NCI) at the Australian National University (ANU) has built such an HP infrastructure as part of the Australian Government's National Collaborative Research Infrastructure Strategy. NCI operates as a formal partnership between the ANU and the three major Australian National Government Scientific Agencies: the Commonwealth Scientific and Industrial Research Organisation (CSIRO), the Bureau of Meteorology and Geoscience Australia. The government partners agreed to explore the new opportunities offered within the partnership with NCI, rather than each running their own separate agenda independently. The data from these national agencies, as well as from collaborating overseas organisations (e.g., NASA, NOAA, USGS, CMIP, etc.) are either replicated to, or produced at, NCI. By co-locating and harmonising these vast data collections within the integrated HP computing environments at NCI, new opportunities have arisen for Data-intensive Interdisciplinary Science at scales and resolutions not hitherto possible. The new NCI infrastructure has also enabled the blending of research by the university sector with the more operational business of government science agencies, with the fundamental shift being that researchers from both sectors work and collaborate within a federated data and computational environment that contains both national and international data collections.
Ontology-Driven Provenance Management in eScience: An Application in Parasite Research

NASA Astrophysics Data System (ADS)

Sahoo, Satya S.; Weatherly, D. Brent; Mutharaju, Raghava; Anantharam, Pramod; Sheth, Amit; Tarleton, Rick L.

Provenance, from the French word "provenir", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.
Bridging the Digital Divide: Developing Mexico’s Information and Communication Technology Infrastructure

DTIC Science & Technology

2011-10-28

since the liberalization of Mexico’s telecommunications industry in the early-1990s, public spending on infrastructure...experience backlash when the national government experiences political, social, or fiscal hardship, related to economic liberalization .39 The tension...million users and a 41 Gobierno de los Estados Unidos Mexicanos, Presidencia de la República
AstroCloud, a Cyber-Infrastructure for Astronomy Research: Cloud Computing Environments

NASA Astrophysics Data System (ADS)

Li, C.; Wang, J.; Cui, C.; He, B.; Fan, D.; Yang, Y.; Chen, J.; Zhang, H.; Yu, C.; Xiao, J.; Wang, C.; Cao, Z.; Fan, Y.; Hong, Z.; Li, S.; Mi, L.; Wan, W.; Wang, J.; Yin, S.

2015-09-01

AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on CloudStack, an open source software, we set up the cloud computing environment for AstroCloud Project. It consists of five distributed nodes across the mainland of China. Users can use and analysis data in this cloud computing environment. Based on GlusterFS, we built a scalable cloud storage system. Each user has a private space, which can be shared among different virtual machines and desktop systems. With this environments, astronomer can access to astronomical data collected by different telescopes and data centers easily, and data producers can archive their datasets safely.
Perm State University HPC-hardware and software services: capabilities for aircraft engine aeroacoustics problems solving

NASA Astrophysics Data System (ADS)

Demenev, A. G.

2018-02-01

The present work is devoted to analyze high-performance computing (HPC) infrastructure capabilities for aircraft engine aeroacoustics problems solving at Perm State University. We explore here the ability to develop new computational aeroacoustics methods/solvers for computer-aided engineering (CAE) systems to handle complicated industrial problems of engine noise prediction. Leading aircraft engine engineering company, including “UEC-Aviadvigatel” JSC (our industrial partners in Perm, Russia), require that methods/solvers to optimize geometry of aircraft engine for fan noise reduction. We analysed Perm State University HPC-hardware resources and software services to use efficiently. The performed results demonstrate that Perm State University HPC-infrastructure are mature enough to face out industrial-like problems of development CAE-system with HPC-method and CFD-solvers.
New Features in the Computational Infrastructure for Nuclear Astrophysics

NASA Astrophysics Data System (ADS)

Smith, M. S.; Lingerfelt, E. J.; Scott, J. P.; Hix, W. R.; Nesaraja, C. D.; Koura, H.; Roberts, L. F.

2006-04-01

The Computational Infrastructure for Nuclear Astrophysics is a suite of computer codes online at nucastrodata.org that streamlines the incorporation of recent nuclear physics results into astrophysical simulations. The freely-available, cross- platform suite enables users to upload cross sections and s-factors, convert them into reaction rates, parameterize the rates, store the rates in customizable libraries, setup and run custom post-processing element synthesis calculations, and visualize the results. New features include the ability for users to comment on rates or libraries using an email-type interface, a nuclear mass model evaluator, enhanced techniques for rate parameterization, better treatment of rate inverses, and creation and exporting of custom animations of simulation results. We also have online animations of r- process, rp-process, and neutrino-p process element synthesis occurring in stellar explosions.
iSERVO: Implementing the International Solid Earth Research Virtual Observatory by Integrating Computational Grid and Geographical Information Web Services

NASA Astrophysics Data System (ADS)

Aktas, Mehmet; Aydin, Galip; Donnellan, Andrea; Fox, Geoffrey; Granat, Robert; Grant, Lisa; Lyzenga, Greg; McLeod, Dennis; Pallickara, Shrideep; Parker, Jay; Pierce, Marlon; Rundle, John; Sayar, Ahmet; Tullis, Terry

2006-12-01

We describe the goals and initial implementation of the International Solid Earth Virtual Observatory (iSERVO). This system is built using a Web Services approach to Grid computing infrastructure and is accessed via a component-based Web portal user interface. We describe our implementations of services used by this system, including Geographical Information System (GIS)-based data grid services for accessing remote data repositories and job management services for controlling multiple execution steps. iSERVO is an example of a larger trend to build globally scalable scientific computing infrastructures using the Service Oriented Architecture approach. Adoption of this approach raises a number of research challenges in millisecond-latency message systems suitable for internet-enabled scientific applications. We review our research in these areas.
An infrastructure for the integration of geoscience instruments and sensors on the Grid

NASA Astrophysics Data System (ADS)

Pugliese, R.; Prica, M.; Kourousias, G.; Del Linz, A.; Curri, A.

2009-04-01

The Grid, as a computing paradigm, has long been in the attention of both academia and industry[1]. The distributed and expandable nature of its general architecture result to scalability and more efficient utilisation of the computing infrastructures. The scientific community, including that of geosciences, often handles problems with very high requirements in data processing, transferring, and storing[2,3]. This has raised the interest on Grid technologies but these are often viewed solely as an access gateway to HPC. Suitable Grid infrastructures could provide the geoscience community with additional benefits like those of sharing, remote access and control of scientific systems. These systems can be scientific instruments, sensors, robots, cameras and any other device used in geosciences. The solution for practical, general, and feasible Grid-enabling of such devices requires non-intrusive extensions on core parts of the current Grid architecture. We propose an extended version of an architecture[4] that can serve as the solution to the problem. The solution we propose is called Grid Instrument Element (IE) [5]. It is an addition to the existing core Grid parts; the Computing Element (CE) and the Storage Element (SE) that serve the purposes that their name suggests. The IE that we will be referring to, and the related technologies have been developed in the EU project on the Deployment of Remote Instrumentation Infrastructure (DORII1). In DORII, partners of various scientific communities including those of Earthquake, Environmental science, and Experimental science, have adopted the technology of the Instrument Element in order to integrate to the Grid their devices. The Oceanographic and coastal observation and modelling Mediterranean Ocean Observing Network (OGS2), a DORII partner, is in the process of deploying the above mentioned Grid technologies on two types of observational modules: Argo profiling floats and a novel Autonomous Underwater Vehicle (AUV). In this paper i) we define the need for integration of instrumentation in the Grid, ii) we introduce the solution of the Instrument Element, iii) we demonstrate a suitable end-user web portal for accessing Grid resources, iv) we describe from the Grid-technological point of view the process of the integration to the Grid of two advanced environmental monitoring devices. References [1] M. Surridge, S. Taylor, D. De Roure, and E. Zaluska, "Experiences with GRIA—Industrial Applications on a Web Services Grid," e-Science and Grid Computing, First International Conference on e-Science and Grid Computing, 2005, pp. 98-105. [2] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, "The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets," Journal of Network and Computer Applications, vol. 23, 2000, pp. 187-200. [3] B. Allcock, J. Bester, J. Bresnahan, A.L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke, "Data management and transfer in high-performance computational grid environments," Parallel Computing, vol. 28, 2002, pp. 749-771. [4] E. Frizziero, M. Gulmini, F. Lelli, G. Maron, A. Oh, S. Orlando, A. Petrucci, S. Squizzato, and S. Traldi, "Instrument Element: A New Grid component that Enables the Control of Remote Instrumentation," Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)-Volume 00, IEEE Computer Society Washington, DC, USA, 2006. [5] R. Ranon, L. De Marco, A. Senerchia, S. Gabrielli, L. Chittaro, R. Pugliese, L. Del Cano, F. Asnicar, and M. Prica, "A Web-based Tool for Collaborative Access to Scientific Instruments in Cyberinfrastructures." 1 The DORII project is supported by the European Commission within the 7th Framework Programme (FP7/2007-2013) under grant agreement no. RI-213110. URL: http://www.dorii.eu 2 Istituto Nazionale di Oceanografia e di Geofisica Sperimentale. URL: http://www.ogs.trieste.it
Globus Quick Start Guide. Globus Software Version 1.1

NASA Technical Reports Server (NTRS)

1999-01-01

The Globus Project is a community effort, led by Argonne National Laboratory and the University of Southern California's Information Sciences Institute. Globus is developing the basic software infrastructure for computations that integrate geographically distributed computational and information resources.

Visualization at Supercomputing Centers: The Tale of Little Big Iron and the Three Skinny Guys

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, E. Wes; van Rosendale, John; Southard, Dale

2010-12-01

Supercomputing Centers (SC's) are unique resources that aim to enable scientific knowledge discovery through the use of large computational resources, the Big Iron. Design, acquisition, installation, and management of the Big Iron are activities that are carefully planned and monitored. Since these Big Iron systems produce a tsunami of data, it is natural to co-locate visualization and analysis infrastructure as part of the same facility. This infrastructure consists of hardware (Little Iron) and staff (Skinny Guys). Our collective experience suggests that design, acquisition, installation, and management of the Little Iron and Skinny Guys does not receive the same level ofmore » treatment as that of the Big Iron. The main focus of this article is to explore different aspects of planning, designing, fielding, and maintaining the visualization and analysis infrastructure at supercomputing centers. Some of the questions we explore in this article include:"How should the Little Iron be sized to adequately support visualization and analysis of data coming off the Big Iron?" What sort of capabilities does it need to have?" Related questions concern the size of visualization support staff:"How big should a visualization program be (number of persons) and what should the staff do?" and"How much of the visualization should be provided as a support service, and how much should applications scientists be expected to do on their own?"« less
Control System Applicable Use Assessment of the Secure Computing Corporation - Secure Firewall (Sidewinder)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadley, Mark D.; Clements, Samuel L.

2009-01-01

Battelle’s National Security & Defense objective is, “applying unmatched expertise and unique facilities to deliver homeland security solutions. From detection and protection against weapons of mass destruction to emergency preparedness/response and protection of critical infrastructure, we are working with industry and government to integrate policy, operational, technological, and logistical parameters that will secure a safe future”. In an ongoing effort to meet this mission, engagements with industry that are intended to improve operational and technical attributes of commercial solutions that are related to national security initiatives are necessary. This necessity will ensure that capabilities for protecting critical infrastructure assets aremore » considered by commercial entities in their development, design, and deployment lifecycles thus addressing the alignment of identified deficiencies and improvements needed to support national cyber security initiatives. The Secure Firewall (Sidewinder) appliance by Secure Computing was assessed for applicable use in critical infrastructure control system environments, such as electric power, nuclear and other facilities containing critical systems that require augmented protection from cyber threat. The testing was performed in the Pacific Northwest National Laboratory’s (PNNL) Electric Infrastructure Operations Center (EIOC). The Secure Firewall was tested in a network configuration that emulates a typical control center network and then evaluated. A number of observations and recommendations are included in this report relating to features currently included in the Secure Firewall that support critical infrastructure security needs.« less
Migration of the CERN IT Data Centre Support System to ServiceNow

NASA Astrophysics Data System (ADS)

Alvarez Alonso, R.; Arneodo, G.; Barring, O.; Bonfillou, E.; Coelho dos Santos, M.; Dore, V.; Lefebure, V.; Fedorko, I.; Grossir, A.; Hefferman, J.; Mendez Lorenzo, P.; Moller, M.; Pera Mira, O.; Salter, W.; Trevisani, F.; Toteva, Z.

2014-06-01

The large potential and flexibility of the ServiceNow infrastructure based on "best practises" methods is allowing the migration of some of the ticketing systems traditionally used for the monitoring of the servers and services available at the CERN IT Computer Centre. This migration enables the standardization and globalization of the ticketing and control systems implementing a generic system extensible to other departments and users. One of the activities of the Service Management project together with the Computing Facilities group has been the migration of the ITCM structure based on Remedy to ServiceNow within the context of one of the ITIL processes called Event Management. The experience gained during the first months of operation has been instrumental towards the migration to ServiceNow of other service monitoring systems and databases. The usage of this structure is also extended to the service tracking at the Wigner Centre in Budapest.
The distributed production system of the SuperB project: description and results

NASA Astrophysics Data System (ADS)

Brown, D.; Corvo, M.; Di Simone, A.; Fella, A.; Luppi, E.; Paoloni, E.; Stroili, R.; Tomassetti, L.

2011-12-01

The SuperB experiment needs large samples of MonteCarlo simulated events in order to finalize the detector design and to estimate the data analysis performances. The requirements are beyond the capabilities of a single computing farm, so a distributed production model capable of exploiting the existing HEP worldwide distributed computing infrastructure is needed. In this paper we describe the set of tools that have been developed to manage the production of the required simulated events. The production of events follows three main phases: distribution of input data files to the remote site Storage Elements (SE); job submission, via SuperB GANGA interface, to all available remote sites; output files transfer to CNAF repository. The job workflow includes procedures for consistency checking, monitoring, data handling and bookkeeping. A replication mechanism allows storing the job output on the local site SE. Results from 2010 official productions are reported.
SUPAR: Smartphone as a ubiquitous physical activity recognizer for u-healthcare services.

PubMed

Fahim, Muhammad; Lee, Sungyoung; Yoon, Yongik

2014-01-01

Current generation smartphone can be seen as one of the most ubiquitous device for physical activity recognition. In this paper we proposed a physical activity recognizer to provide u-healthcare services in a cost effective manner by utilizing cloud computing infrastructure. Our model is comprised on embedded triaxial accelerometer of the smartphone to sense the body movements and a cloud server to store and process the sensory data for numerous kind of services. We compute the time and frequency domain features over the raw signals and evaluate different machine learning algorithms to identify an accurate activity recognition model for four kinds of physical activities (i.e., walking, running, cycling and hopping). During our experiments we found Support Vector Machine (SVM) algorithm outperforms for the aforementioned physical activities as compared to its counterparts. Furthermore, we also explain how smartphone application and cloud server communicate with each other.
The Materials Commons: A Collaboration Platform and Information Repository for the Global Materials Community

NASA Astrophysics Data System (ADS)

Puchala, Brian; Tarcea, Glenn; Marquis, Emmanuelle. A.; Hedstrom, Margaret; Jagadish, H. V.; Allison, John E.

2016-08-01

Accelerating the pace of materials discovery and development requires new approaches and means of collaborating and sharing information. To address this need, we are developing the Materials Commons, a collaboration platform and information repository for use by the structural materials community. The Materials Commons has been designed to be a continuous, seamless part of the scientific workflow process. Researchers upload the results of experiments and computations as they are performed, automatically where possible, along with the provenance information describing the experimental and computational processes. The Materials Commons website provides an easy-to-use interface for uploading and downloading data and data provenance, as well as for searching and sharing data. This paper provides an overview of the Materials Commons. Concepts are also outlined for integrating the Materials Commons with the broader Materials Information Infrastructure that is evolving to support the Materials Genome Initiative.
Laboratory on legs: an architecture for adjustable morphology with legged robots

NASA Astrophysics Data System (ADS)

Haynes, G. Clark; Pusey, Jason; Knopf, Ryan; Johnson, Aaron M.; Koditschek, Daniel E.

2012-06-01

For mobile robots, the essential units of actuation, computation, and sensing must be designed to fit within the body of the robot. Additional capabilities will largely depend upon a given activity, and should be easily reconfigurable to maximize the diversity of applications and experiments. To address this issue, we introduce a modular architecture originally developed and tested in the design and implementation of the X-RHex hexapod that allows the robot to operate as a mobile laboratory on legs. In the present paper we will introduce the specification, design and very earliest operational data of Canid, an actively driven compliant-spined quadruped whose completely different morphology and intended dynamical operating point are nevertheless built around exactly the same "Lab on Legs" actuation, computation, and sensing infrastructure. We will review as well, more briefly a second RHex variation, the XRL platform, built using the same components.
Mobile Crowd Sensing for Traffic Prediction in Internet of Vehicles.

PubMed

Wan, Jiafu; Liu, Jianqi; Shao, Zehui; Vasilakos, Athanasios V; Imran, Muhammad; Zhou, Keliang

2016-01-11

The advances in wireless communication techniques, mobile cloud computing, automotive and intelligent terminal technology are driving the evolution of vehicle ad hoc networks into the Internet of Vehicles (IoV) paradigm. This leads to a change in the vehicle routing problem from a calculation based on static data towards real-time traffic prediction. In this paper, we first address the taxonomy of cloud-assisted IoV from the viewpoint of the service relationship between cloud computing and IoV. Then, we review the traditional traffic prediction approached used by both Vehicle to Infrastructure (V2I) and Vehicle to Vehicle (V2V) communications. On this basis, we propose a mobile crowd sensing technology to support the creation of dynamic route choices for drivers wishing to avoid congestion. Experiments were carried out to verify the proposed approaches. Finally, we discuss the outlook of reliable traffic prediction.
Mobile Crowd Sensing for Traffic Prediction in Internet of Vehicles

PubMed Central

Wan, Jiafu; Liu, Jianqi; Shao, Zehui; Vasilakos, Athanasios V.; Imran, Muhammad; Zhou, Keliang

2016-01-01

The advances in wireless communication techniques, mobile cloud computing, automotive and intelligent terminal technology are driving the evolution of vehicle ad hoc networks into the Internet of Vehicles (IoV) paradigm. This leads to a change in the vehicle routing problem from a calculation based on static data towards real-time traffic prediction. In this paper, we first address the taxonomy of cloud-assisted IoV from the viewpoint of the service relationship between cloud computing and IoV. Then, we review the traditional traffic prediction approached used by both Vehicle to Infrastructure (V2I) and Vehicle to Vehicle (V2V) communications. On this basis, we propose a mobile crowd sensing technology to support the creation of dynamic route choices for drivers wishing to avoid congestion. Experiments were carried out to verify the proposed approaches. Finally, we discuss the outlook of reliable traffic prediction. PMID:26761013
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

ERIC Educational Resources Information Center

Younge, Andrew J.

2016-01-01

With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…
Introducing the Cloud in an Introductory IT Course

ERIC Educational Resources Information Center

Woods, David M.

2018-01-01

Cloud computing is a rapidly emerging topic, but should it be included in an introductory IT course? The magnitude of cloud computing use, especially cloud infrastructure, along with students' limited knowledge of the topic support adding cloud content to the IT curriculum. There are several arguments that support including cloud computing in an…
ASSURED CLOUD COMPUTING UNIVERSITY CENTER OFEXCELLENCE (ACC UCOE)

DTIC Science & Technology

2018-01-18

average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed...infrastructure security -Design of algorithms and techniques for real- time assuredness in cloud computing -Map-reduce task assignment with data locality...46 DESIGN OF ALGORITHMS AND TECHNIQUES FOR REAL- TIME ASSUREDNESS IN CLOUD COMPUTING
The Role of Wireless Computing Technology in the Design of Schools.

ERIC Educational Resources Information Center

Nair, Prakash

This document discusses integrating computers logically and affordably into a school building's infrastructure through the use of wireless technology. It begins by discussing why wireless networks using mobile computers are preferable to desktop machines in each classoom. It then explains the features of a wireless local area network (WLAN) and…
Low Latency Workflow Scheduling and an Application of Hyperspectral Brightness Temperatures

NASA Astrophysics Data System (ADS)

Nguyen, P. T.; Chapman, D. R.; Halem, M.

2012-12-01

New system analytics for Big Data computing holds the promise of major scientific breakthroughs and discoveries from the exploration and mining of the massive data sets becoming available to the science community. However, such data intensive scientific applications face severe challenges in accessing, managing and analyzing petabytes of data. While the Hadoop MapReduce environment has been successfully applied to data intensive problems arising in business, there are still many scientific problem domains where limitations in the functionality of MapReduce systems prevent its wide adoption by those communities. This is mainly because MapReduce does not readily support the unique science discipline needs such as special science data formats, graphic and computational data analysis tools, maintaining high degrees of computational accuracies, and interfacing with application's existing components across heterogeneous computing processors. We address some of these limitations by exploiting the MapReduce programming model for satellite data intensive scientific problems and address scalability, reliability, scheduling, and data management issues when dealing with climate data records and their complex observational challenges. In addition, we will present techniques to support the unique Earth science discipline needs such as dealing with special science data formats (HDF and NetCDF). We have developed a Hadoop task scheduling algorithm that improves latency by 2x for a scientific workflow including the gridding of the EOS AIRS hyperspectral Brightness Temperatures (BT). This workflow processing algorithm has been tested at the Multicore Computing Center private Hadoop based Intel Nehalem cluster, as well as in a virtual mode under the Open Source Eucalyptus cloud. The 55TB AIRS hyperspectral L1b Brightness Temperature record has been gridded at the resolution of 0.5x1.0 degrees, and we have computed a 0.9 annual anti-correlation to the El Nino Southern oscillation in the Nino 4 region, as well as a 1.9 Kelvin decadal Arctic warming in the 4u and 12u spectral regions. Additionally, we will present the frequency of extreme global warming events by the use of a normalized maximum BT in a grid cell relative to its local standard deviation. A low-latency Hadoop scheduling environment maintains data integrity and fault tolerance in a MapReduce data intensive Cloud environment while improving the "time to solution" metric by 35% when compared to a more traditional parallel processing system for the same dataset. Our next step will be to improve the usability of our Hadoop task scheduling system, to enable rapid prototyping of data intensive experiments by means of processing "kernels". We will report on the performance and experience of implementing these experiments on the NEX testbed, and propose the use of a graphical directed acyclic graph (DAG) interface to help us develop on-demand scientific experiments. Our workflow system works within Hadoop infrastructure as a replacement for the FIFO or FairScheduler, thus the use of Apache "Pig" latin or other Apache tools may also be worth investigating on the NEX system to improve the usability of our workflow scheduling infrastructure for rapid experimentation.
Computational Materials Science and Chemistry: Accelerating Discovery and Innovation through Simulation-Based Engineering and Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crabtree, George; Glotzer, Sharon; McCurdy, Bill

This report is based on a SC Workshop on Computational Materials Science and Chemistry for Innovation on July 26-27, 2010, to assess the potential of state-of-the-art computer simulations to accelerate understanding and discovery in materials science and chemistry, with a focus on potential impacts in energy technologies and innovation. The urgent demand for new energy technologies has greatly exceeded the capabilities of today's materials and chemical processes. To convert sunlight to fuel, efficiently store energy, or enable a new generation of energy production and utilization technologies requires the development of new materials and processes of unprecedented functionality and performance. Newmore » materials and processes are critical pacing elements for progress in advanced energy systems and virtually all industrial technologies. Over the past two decades, the United States has developed and deployed the world's most powerful collection of tools for the synthesis, processing, characterization, and simulation and modeling of materials and chemical systems at the nanoscale, dimensions of a few atoms to a few hundred atoms across. These tools, which include world-leading x-ray and neutron sources, nanoscale science facilities, and high-performance computers, provide an unprecedented view of the atomic-scale structure and dynamics of materials and the molecular-scale basis of chemical processes. For the first time in history, we are able to synthesize, characterize, and model materials and chemical behavior at the length scale where this behavior is controlled. This ability is transformational for the discovery process and, as a result, confers a significant competitive advantage. Perhaps the most spectacular increase in capability has been demonstrated in high performance computing. Over the past decade, computational power has increased by a factor of a million due to advances in hardware and software. This rate of improvement, which shows no sign of abating, has enabled the development of computer simulations and models of unprecedented fidelity. We are at the threshold of a new era where the integrated synthesis, characterization, and modeling of complex materials and chemical processes will transform our ability to understand and design new materials and chemistries with predictive power. In turn, this predictive capability will transform technological innovation by accelerating the development and deployment of new materials and processes in products and manufacturing. Harnessing the potential of computational science and engineering for the discovery and development of materials and chemical processes is essential to maintaining leadership in these foundational fields that underpin energy technologies and industrial competitiveness. Capitalizing on the opportunities presented by simulation-based engineering and science in materials and chemistry will require an integration of experimental capabilities with theoretical and computational modeling; the development of a robust and sustainable infrastructure to support the development and deployment of advanced computational models; and the assembly of a community of scientists and engineers to implement this integration and infrastructure. This community must extend to industry, where incorporating predictive materials science and chemistry into design tools can accelerate the product development cycle and drive economic competitiveness. The confluence of new theories, new materials synthesis capabilities, and new computer platforms has created an unprecedented opportunity to implement a "materials-by-design" paradigm with wide-ranging benefits in technological innovation and scientific discovery. The Workshop on Computational Materials Science and Chemistry for Innovation was convened in Bethesda, Maryland, on July 26-27, 2010. Sponsored by the Department of Energy (DOE) Offices of Advanced Scientific Computing Research and Basic Energy Sciences, the workshop brought together 160 experts in materials science, chemistry, and computational science representing more than 65 universities, laboratories, and industries, and four agencies. The workshop examined seven foundational challenge areas in materials science and chemistry: materials for extreme conditions, self-assembly, light harvesting, chemical reactions, designer fluids, thin films and interfaces, and electronic structure. Each of these challenge areas is critical to the development of advanced energy systems, and each can be accelerated by the integrated application of predictive capability with theory and experiment. The workshop concluded that emerging capabilities in predictive modeling and simulation have the potential to revolutionize the development of new materials and chemical processes. Coupled with world-leading materials characterization and nanoscale science facilities, this predictive capability provides the foundation for an innovation ecosystem that can accelerate the discovery, development, and deployment of new technologies, including advanced energy systems. Delivering on the promise of this innovation ecosystem requires the following: Integration of synthesis, processing, characterization, theory, and simulation and modeling. Many of the newly established Energy Frontier Research Centers and Energy Hubs are exploiting this integration. Achieving/strengthening predictive capability in foundational challenge areas. Predictive capability in the seven foundational challenge areas described in this report is critical to the development of advanced energy technologies. Developing validated computational approaches that span vast differences in time and length scales. This fundamental computational challenge crosscuts all of the foundational challenge areas. Similarly challenging is coupling of analytical data from multiple instruments and techniques that are required to link these length and time scales. Experimental validation and quantification of uncertainty in simulation and modeling. Uncertainty quantification becomes increasingly challenging as simulations become more complex. Robust and sustainable computational infrastructure, including software and applications. For modeling and simulation, software equals infrastructure. To validate the computational tools, software is critical infrastructure that effectively translates huge arrays of experimental data into useful scientific understanding. An integrated approach for managing this infrastructure is essential. Efficient transfer and incorporation of simulation-based engineering and science in industry. Strategies for bridging the gap between research and industrial applications and for widespread industry adoption of integrated computational materials engineering are needed.« less
Employing Replay Connectors for SIEM Operator Education

DTIC Science & Technology

2013-09-01

BLANK xiii LIST OF ACRONYMS AND ABBREVIATIONS CORR Correlation Optimized Retention and Retrieval CII Critical Information Infrastructure GLBA...vast distances is now quicker and easier with the advancement in mobile computing devices and more ubiquitous connectivity and bandwidth. As a result...breakdown of the Critical Information Infrastructure (CII) is one of the core risks facing the international economy. The World Economic Forum
Configuration Management and Infrastructure Monitoring Using CFEngine and Icinga for Real-time Heterogeneous Data Taking Environment

NASA Astrophysics Data System (ADS)

Poat, M. D.; Lauret, J.; Betts, W.

2015-12-01

The STAR online computing environment is an intensive ever-growing system used for real-time data collection and analysis. Composed of heterogeneous and sometimes groups of custom-tuned machines, the computing infrastructure was previously managed by manual configurations and inconsistently monitored by a combination of tools. This situation led to configuration inconsistency and an overload of repetitive tasks along with lackluster communication between personnel and machines. Globally securing this heterogeneous cyberinfrastructure was tedious at best and an agile, policy-driven system ensuring consistency, was pursued. Three configuration management tools, Chef, Puppet, and CFEngine have been compared in reliability, versatility and performance along with a comparison of infrastructure monitoring tools Nagios and Icinga. STAR has selected the CFEngine configuration management tool and the Icinga infrastructure monitoring system leading to a versatile and sustainable solution. By leveraging these two tools STAR can now swiftly upgrade and modify the environment to its needs with ease as well as promptly react to cyber-security requests. By creating a sustainable long term monitoring solution, the detection of failures was reduced from days to minutes, allowing rapid actions before the issues become dire problems, potentially causing loss of precious experimental data or uptime.
The Experiment Factory: Standardizing Behavioral Experiments.

PubMed

Sochat, Vanessa V; Eisenberg, Ian W; Enkavi, A Zeynep; Li, Jamie; Bissett, Patrick G; Poldrack, Russell A

2016-01-01

The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms.
The Experiment Factory: Standardizing Behavioral Experiments

PubMed Central

Sochat, Vanessa V.; Eisenberg, Ian W.; Enkavi, A. Zeynep; Li, Jamie; Bissett, Patrick G.; Poldrack, Russell A.

2016-01-01

The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms. PMID:27199843
Social infrastructure to integrate science and practice: the experience of the Long Tom Watershed Council

Treesearch

Rebecca L. Flitcroft; Dana C. Dedrick; Courtland L. Smith; Cynthia A. Thieman; John P. Bolte

2009-01-01

Ecological problem solving requires a flexible social infrastructure that can incorporate scientific insights and adapt to changing conditions. As applied to watershed management, social infrastructure includes mechanisms to design, carry out, evaluate, and modify plans for resource protection or restoration. Efforts to apply the best science will not bring anticipated...

Structural health monitoring of civil infrastructure.

PubMed

Brownjohn, J M W

2007-02-15

Structural health monitoring (SHM) is a term increasingly used in the last decade to describe a range of systems implemented on full-scale civil infrastructures and whose purposes are to assist and inform operators about continued 'fitness for purpose' of structures under gradual or sudden changes to their state, to learn about either or both of the load and response mechanisms. Arguably, various forms of SHM have been employed in civil infrastructure for at least half a century, but it is only in the last decade or two that computer-based systems are being designed for the purpose of assisting owners/operators of ageing infrastructure with timely information for their continued safe and economic operation. This paper describes the motivations for and recent history of SHM applications to various forms of civil infrastructure and provides case studies on specific types of structure. It ends with a discussion of the present state-of-the-art and future developments in terms of instrumentation, data acquisition, communication systems and data mining and presentation procedures for diagnosis of infrastructural 'health'.
Remote third shift EAST operation: a new paradigm

NASA Astrophysics Data System (ADS)

Schissel, D. P.; Coviello, E.; Eidietis, N.; Flanagan, S.; Garcia, F.; Humphreys, D.; Kostuk, M.; Lanctot, M.; Lee, X.; Margo, M.; Miller, D.; Parker, C.; Penaflor, B.; Qian, J. P.; Sun, X.; Tan, H.; Walker, M.; Xiao, B.; Yuan, Q.

2017-05-01

General Atomics’ (GA) scientists in the United States remotely conducted experimental operation of the experimental advanced superconducting tokamak (EAST) in China during its third shift. Scientists led these experiments in a dedicated remote control room that utilized a novel computer science hardware and software infrastructure to allow data movement, visualization, and communication on the time scale of EAST’s pulse cycle. This Fusion Science Collaboration Zone infrastructure allows the movement of large amounts of data between continents in a short time scale with a 300-fold increase in data transfer rate over that available using the traditional transmission protocol. Real-time data from control systems is moved almost instantaneously. An event system tied to the EAST pulse cycle allows automatic initiation of data transfers, resulting in bulk EAST data to be transferred to GA within minutes. The EAST data at GA is served via MDSplus to approved US collaborators avoiding multiple US clients from requesting data from EAST and competing for the long-haul network’s bandwidth. At present there are 37 approved scientists from 8 US research institutions.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, E Wes; Brugger, Eric

Supercomputing centers are unique resources that aim to enable scientific knowledge discovery by employing large computational resources - the 'Big Iron.' Design, acquisition, installation, and management of the Big Iron are carefully planned and monitored. Because these Big Iron systems produce a tsunami of data, it's natural to colocate the visualization and analysis infrastructure. This infrastructure consists of hardware (Little Iron) and staff (Skinny Guys). Our collective experience suggests that design, acquisition, installation, and management of the Little Iron and Skinny Guys doesn't receive the same level of treatment as that of the Big Iron. This article explores the followingmore » questions about the Little Iron: How should we size the Little Iron to adequately support visualization and analysis of data coming off the Big Iron? What sort of capabilities must it have? Related questions concern the size of visualization support staff: How big should a visualization program be - that is, how many Skinny Guys should it have? What should the staff do? How much of the visualization should be provided as a support service, and how much should applications scientists be expected to do on their own?« less
Interoperable PKI Data Distribution in Computational Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pala, Massimiliano; Cholia, Shreyas; Rea, Scott A.

One of the most successful working examples of virtual organizations, computational grids need authentication mechanisms that inter-operate across domain boundaries. Public Key Infrastructures(PKIs) provide sufficient flexibility to allow resource managers to securely grant access to their systems in such distributed environments. However, as PKIs grow and services are added to enhance both security and usability, users and applications must struggle to discover available resources-particularly when the Certification Authority (CA) is alien to the relying party. This article presents how to overcome these limitations of the current grid authentication model by integrating the PKI Resource Query Protocol (PRQP) into the Gridmore » Security Infrastructure (GSI).« less
The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable

PubMed Central

2016-01-01

Reconstructing and understanding the Human Physiome virtually is a complex mathematical problem, and a highly demanding computational challenge. Mathematical models spanning from the molecular level through to whole populations of individuals must be integrated, then personalized. This requires interoperability with multiple disparate and geographically separated data sources, and myriad computational software tools. Extracting and producing knowledge from such sources, even when the databases and software are readily available, is a challenging task. Despite the difficulties, researchers must frequently perform these tasks so that available knowledge can be continually integrated into the common framework required to realize the Human Physiome. Software and infrastructures that support the communities that generate these, together with their underlying standards to format, describe and interlink the corresponding data and computer models, are pivotal to the Human Physiome being realized. They provide the foundations for integrating, exchanging and re-using data and models efficiently, and correctly, while also supporting the dissemination of growing knowledge in these forms. In this paper, we explore the standards, software tooling, repositories and infrastructures that support this work, and detail what makes them vital to realizing the Human Physiome. PMID:27051515
The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable.

PubMed

Nickerson, David; Atalag, Koray; de Bono, Bernard; Geiger, Jörg; Goble, Carole; Hollmann, Susanne; Lonien, Joachim; Müller, Wolfgang; Regierer, Babette; Stanford, Natalie J; Golebiewski, Martin; Hunter, Peter

2016-04-06

Reconstructing and understanding the Human Physiome virtually is a complex mathematical problem, and a highly demanding computational challenge. Mathematical models spanning from the molecular level through to whole populations of individuals must be integrated, then personalized. This requires interoperability with multiple disparate and geographically separated data sources, and myriad computational software tools. Extracting and producing knowledge from such sources, even when the databases and software are readily available, is a challenging task. Despite the difficulties, researchers must frequently perform these tasks so that available knowledge can be continually integrated into the common framework required to realize the Human Physiome. Software and infrastructures that support the communities that generate these, together with their underlying standards to format, describe and interlink the corresponding data and computer models, are pivotal to the Human Physiome being realized. They provide the foundations for integrating, exchanging and re-using data and models efficiently, and correctly, while also supporting the dissemination of growing knowledge in these forms. In this paper, we explore the standards, software tooling, repositories and infrastructures that support this work, and detail what makes them vital to realizing the Human Physiome.
The TENCompetence Infrastructure: A Learning Network Implementation

NASA Astrophysics Data System (ADS)

Vogten, Hubert; Martens, Harrie; Lemmers, Ruud

The TENCompetence project developed a first release of a Learning Network infrastructure to support individuals, groups and organisations in professional competence development. This infrastructure Learning Network infrastructure was released as open source to the community thereby allowing users and organisations to use and contribute to this development as they see fit. The infrastructure consists of client applications providing the user experience and server components that provide the services to these clients. These services implement the domain model (Koper 2006) by provisioning the entities of the domain model (see also Sect. 18.4) and henceforth will be referenced as domain entity services.
Computational Infrastructure for Geodynamics (CIG)

NASA Astrophysics Data System (ADS)

Gurnis, M.; Kellogg, L. H.; Bloxham, J.; Hager, B. H.; Spiegelman, M.; Willett, S.; Wysession, M. E.; Aivazis, M.

2004-12-01

Solid earth geophysicists have a long tradition of writing scientific software to address a wide range of problems. In particular, computer simulations came into wide use in geophysics during the decade after the plate tectonic revolution. Solution schemes and numerical algorithms that developed in other areas of science, most notably engineering, fluid mechanics, and physics, were adapted with considerable success to geophysics. This software has largely been the product of individual efforts and although this approach has proven successful, its strength for solving problems of interest is now starting to show its limitations as we try to share codes and algorithms or when we want to recombine codes in novel ways to produce new science. With funding from the NSF, the US community has embarked on a Computational Infrastructure for Geodynamics (CIG) that will develop, support, and disseminate community-accessible software for the greater geodynamics community from model developers to end-users. The software is being developed for problems involving mantle and core dynamics, crustal and earthquake dynamics, magma migration, seismology, and other related topics. With a high level of community participation, CIG is leveraging state-of-the-art scientific computing into a suite of open-source tools and codes. The infrastructure that we are now starting to develop will consist of: (a) a coordinated effort to develop reusable, well-documented and open-source geodynamics software; (b) the basic building blocks - an infrastructure layer - of software by which state-of-the-art modeling codes can be quickly assembled; (c) extension of existing software frameworks to interlink multiple codes and data through a superstructure layer; (d) strategic partnerships with the larger world of computational science and geoinformatics; and (e) specialized training and workshops for both the geodynamics and broader Earth science communities. The CIG initiative has already started to leverage and develop long-term strategic partnerships with open source development efforts within the larger thrusts of scientific computing and geoinformatics. These strategic partnerships are essential as the frontier has moved into multi-scale and multi-physics problems in which many investigators now want to use simulation software for data interpretation, data assimilation, and hypothesis testing.
WISDOM-II: screening against multiple targets implicated in malaria using computational grid infrastructures.

PubMed

Kasam, Vinod; Salzemann, Jean; Botha, Marli; Dacosta, Ana; Degliesposti, Gianluca; Isea, Raul; Kim, Doman; Maass, Astrid; Kenyon, Colin; Rastelli, Giulio; Hofmann-Apitius, Martin; Breton, Vincent

2009-05-01

Despite continuous efforts of the international community to reduce the impact of malaria on developing countries, no significant progress has been made in the recent years and the discovery of new drugs is more than ever needed. Out of the many proteins involved in the metabolic activities of the Plasmodium parasite, some are promising targets to carry out rational drug discovery. Recent years have witnessed the emergence of grids, which are highly distributed computing infrastructures particularly well fitted for embarrassingly parallel computations like docking. In 2005, a first attempt at using grids for large-scale virtual screening focused on plasmepsins and ended up in the identification of previously unknown scaffolds, which were confirmed in vitro to be active plasmepsin inhibitors. Following this success, a second deployment took place in the fall of 2006 focussing on one well known target, dihydrofolate reductase (DHFR), and on a new promising one, glutathione-S-transferase. In silico drug design, especially vHTS is a widely and well-accepted technology in lead identification and lead optimization. This approach, therefore builds, upon the progress made in computational chemistry to achieve more accurate in silico docking and in information technology to design and operate large scale grid infrastructures. On the computational side, a sustained infrastructure has been developed: docking at large scale, using different strategies in result analysis, storing of the results on the fly into MySQL databases and application of molecular dynamics refinement are MM-PBSA and MM-GBSA rescoring. The modeling results obtained are very promising. Based on the modeling results, In vitro results are underway for all the targets against which screening is performed. The current paper describes the rational drug discovery activity at large scale, especially molecular docking using FlexX software on computational grids in finding hits against three different targets (PfGST, PfDHFR, PvDHFR (wild type and mutant forms) implicated in malaria. Grid-enabled virtual screening approach is proposed to produce focus compound libraries for other biological targets relevant to fight the infectious diseases of the developing world.
Modeling, Simulation and Analysis of Public Key Infrastructure

NASA Technical Reports Server (NTRS)

Liu, Yuan-Kwei; Tuey, Richard; Ma, Paul (Technical Monitor)

1998-01-01

Security is an essential part of network communication. The advances in cryptography have provided solutions to many of the network security requirements. Public Key Infrastructure (PKI) is the foundation of the cryptography applications. The main objective of this research is to design a model to simulate a reliable, scalable, manageable, and high-performance public key infrastructure. We build a model to simulate the NASA public key infrastructure by using SimProcess and MatLab Software. The simulation is from top level all the way down to the computation needed for encryption, decryption, digital signature, and secure web server. The application of secure web server could be utilized in wireless communications. The results of the simulation are analyzed and confirmed by using queueing theory.
An adaptive process-based cloud infrastructure for space situational awareness applications

NASA Astrophysics Data System (ADS)

Liu, Bingwei; Chen, Yu; Shen, Dan; Chen, Genshe; Pham, Khanh; Blasch, Erik; Rubin, Bruce

2014-06-01

Space situational awareness (SSA) and defense space control capabilities are top priorities for groups that own or operate man-made spacecraft. Also, with the growing amount of space debris, there is an increase in demand for contextual understanding that necessitates the capability of collecting and processing a vast amount sensor data. Cloud computing, which features scalable and flexible storage and computing services, has been recognized as an ideal candidate that can meet the large data contextual challenges as needed by SSA. Cloud computing consists of physical service providers and middleware virtual machines together with infrastructure, platform, and software as service (IaaS, PaaS, SaaS) models. However, the typical Virtual Machine (VM) abstraction is on a per operating systems basis, which is at too low-level and limits the flexibility of a mission application architecture. In responding to this technical challenge, a novel adaptive process based cloud infrastructure for SSA applications is proposed in this paper. In addition, the details for the design rationale and a prototype is further examined. The SSA Cloud (SSAC) conceptual capability will potentially support space situation monitoring and tracking, object identification, and threat assessment. Lastly, the benefits of a more granular and flexible cloud computing resources allocation are illustrated for data processing and implementation considerations within a representative SSA system environment. We show that the container-based virtualization performs better than hypervisor-based virtualization technology in an SSA scenario.
Engaging Cyber Communities

DTIC Science & Technology

2010-04-01

technology centric operations such as computer network attack and computer network defense. 3 This leads to the question of whether the US military is... information and infrastructure. For the purpose of military operations, CNO are divided into CNA, CND, and computer network exploitation (CNE) enabling...of a CNA if they take undesirable action,” 21 and from a defensive stance in CND, “providing information about non-military threat to computers in
EVER-EST: a virtual research environment for Earth Sciences

NASA Astrophysics Data System (ADS)

Marelli, Fulvio; Albani, Mirko; Glaves, Helen

2016-04-01

There is an increasing requirement for researchers to work collaboratively using common resources whilst being geographically dispersed. By creating a virtual research environment (VRE) using a service oriented architecture (SOA) tailored to the needs of Earth Science (ES) communities, the EVEREST project will provide a range of both generic and domain specific data management services to support a dynamic approach to collaborative research. EVER-EST will provide the means to overcome existing barriers to sharing of Earth Science data and information allowing research teams to discover, access, share and process heterogeneous data, algorithms, results and experiences within and across their communities, including those domains beyond Earth Science. Researchers will be able to seamlessly manage both the data involved in their computationally intensive disciplines and the scientific methods applied in their observations and modelling, which lead to the specific results that need to be attributable, validated and shared both within the community and more widely e.g. in the form of scholarly communications. Central to the EVEREST approach is the concept of the Research Object (RO) , which provides a semantically rich mechanism to aggregate related resources about a scientific investigation so that they can be shared together using a single unique identifier. Although several e-laboratories are incorporating the research object concept in their infrastructure, the EVER-EST VRE will be the first infrastructure to leverage the concept of Research Objects and their application in observational rather than experimental disciplines. Development of the EVEREST VRE will leverage the results of several previous projects which have produced state-of-the-art technologies for scientific data management and curation as well those which have developed models, techniques and tools for the preservation of scientific methods and their implementation in computational forms such as scientific workflows. The EVER-EST data processing infrastructure will be based on a Cloud Computing approach, in which new applications can be integrated using "virtual machines" that have their own specifications (disk size, processor speed, operating system etc.) and run on shared private (physical deployment over local hardware) or commercial Cloud infrastructures. The EVER-EST e-infrastructure will be validated by four virtual research communities (VRC) covering different multidisciplinary Earth Science domains including: ocean monitoring, natural hazards, land monitoring and risk management (volcanoes and seismicity). Each VRC will use the virtual research environment according to its own specific requirements for data, software, best practice and community engagement. This user-centric approach will allow an assessment to be made of the capability for the proposed solution to satisfy the heterogeneous needs of a variety of Earth Science communities for more effective collaboration, and higher efficiency and creativity in research. EVER-EST is funded by the European Commission's H2020 for three years starting in October 2015. The project is led by the European Space Agency (ESA), involves some of the major European Earth Science data providers/users including NERC, DLR, INGV, CNR and SatCEN.
Implementing a web-based oncology protocol system in Australia: evaluation of the first 3 years of operation.

PubMed

Hains, I M; Ward, R L; Pearson, S-A

2012-01-01

EviQ is a web-based oncology protocol system launched across Australia in 2005 (http://www.eviq.org.au). We evaluated eviQ use at the point-of-care and determined the factors impacting on its uptake and routine use in the first three years of operation. We conducted a suite of qualitative and quantitative studies with over 200 Australian oncology physicians, nurses and pharmacists working at treatment centres in diverse geographical locations. EviQ was part of routine care at many hospitals; however, the way in which it was used at the point-of-care varies according to clinician roles and hospital location. We identified a range of factors impacting on eviQ uptake and routine use. Infrastructure, such as availability of point-of-care computers, and formal policies endorsing eviQ are fundamental to increasing uptake. Furthermore, the level of clinical and computer experience of end-users, the attitudes and behaviour of clinicians, endorsement and promotion strategies, and level and type of eviQ education all need to be considered and managed to ensure that the system is being used to its full potential. Our findings show that the dissemination of web-based treatment protocols does not guarantee widespread use. Organisational, environmental and clinician-specific factors play a role in uptake and utilisation. The deployment of sufficient computer infrastructure, implementation of targeted training programmes and hospital policies and investment in marketing approaches are fundamental to uptake and continued use. This study highlights the value of ongoing monitoring and evaluation to ensure systems like eviQ achieve their primary purpose - reducing treatment variation and improving quality of care. © 2010 The Authors. Internal Medicine Journal © 2010 Royal Australasian College of Physicians.
General consumer communication tools for improved image management and communication in medicine.

PubMed

Rosset, Chantal; Rosset, Antoine; Ratib, Osman

2005-12-01

We elected to explore new technologies emerging on the general consumer market that can improve and facilitate image and data communication in medical and clinical environment. These new technologies developed for communication and storage of data can improve the user convenience and facilitate the communication and transport of images and related data beyond the usual limits and restrictions of a traditional picture archiving and communication systems (PACS) network. We specifically tested and implemented three new technologies provided on Apple computer platforms. (1) We adopted the iPod, a MP3 portable player with a hard disk storage, to easily and quickly move large number of DICOM images. (2) We adopted iChat, a videoconference and instant-messaging software, to transmit DICOM images in real time to a distant computer for conferencing teleradiology. (3) Finally, we developed a direct secure interface to use the iDisk service, a file-sharing service based on the WebDAV technology, to send and share DICOM files between distant computers. These three technologies were integrated in a new open-source image navigation and display software called OsiriX allowing for manipulation and communication of multimodality and multidimensional DICOM image data sets. This software is freely available as an open-source project at http://homepage.mac.com/rossetantoine/OsiriX. Our experience showed that the implementation of these technologies allowed us to significantly enhance the existing PACS with valuable new features without any additional investment or the need for complex extensions of our infrastructure. The added features such as teleradiology, secure and convenient image and data communication, and the use of external data storage services open the gate to a much broader extension of our imaging infrastructure to the outside world.
Building a Generic Virtual Research Environment Framework for Multiple Earth and Space Science Domains and a Diversity of Users.

NASA Astrophysics Data System (ADS)

Wyborn, L. A.; Fraser, R.; Evans, B. J. K.; Friedrich, C.; Klump, J. F.; Lescinsky, D. T.

2017-12-01

Virtual Research Environments (VREs) are now part of academic infrastructures. Online research workflows can be orchestrated whereby data can be accessed from multiple external repositories with processing taking place on public or private clouds, and centralised supercomputers using a mixture of user codes, and well-used community software and libraries. VREs enable distributed members of research teams to actively work together to share data, models, tools, software, workflows, best practices, infrastructures, etc. These environments and their components are increasingly able to support the needs of undergraduate teaching. External to the research sector, they can also be reused by citizen scientists, and be repurposed for industry users to help accelerate the diffusion and hence enable the translation of research innovations. The Virtual Geophysics Laboratory (VGL) in Australia was started in 2012, built using a collaboration between CSIRO, the National Computational Infrastructure (NCI) and Geoscience Australia, with support funding from the Australian Government Department of Education. VGL comprises three main modules that provide an interface to enable users to first select their required data; to choose a tool to process that data; and then access compute infrastructure for execution. VGL was initially built to enable a specific set of researchers in government agencies access to specific data sets and a limited number of tools. Over the years it has evolved into a multi-purpose Earth science platform with access to an increased variety of data (e.g., Natural Hazards, Geochemistry), a broader range of software packages, and an increasing diversity of compute infrastructures. This expansion has been possible because of the approach to loosely couple data, tools and compute resources via interfaces that are built on international standards and accessed as network-enabled services wherever possible. Built originally for researchers that were not fussy about general usability, increasing emphasis on User Interfaces (UIs) and stability will lead to increased uptake in the education and industry sectors. Simultaneously, improvements are being added to facilitate access to data and tools by experienced researchers who want direct access to both data and flexible workflows.
An authentication infrastructure for today and tomorrow

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engert, D.E.

1996-06-01

The Open Software Foundation`s Distributed Computing Environment (OSF/DCE) was originally designed to provide a secure environment for distributed applications. By combining it with Kerberos Version 5 from MIT, it can be extended to provide network security as well. This combination can be used to build both an inter and intra organizational infrastructure while providing single sign-on for the user with overall improved security. The ESnet community of the Department of Energy is building just such an infrastructure. ESnet has modified these systems to improve their interoperability, while encouraging the developers to incorporate these changes and work more closely together tomore » continue to improve the interoperability. The success of this infrastructure depends on its flexibility to meet the needs of many applications and network security requirements. The open nature of Kerberos, combined with the vendor support of OSF/DCE, provides the infrastructure for today and tomorrow.« less
Experiences and Lessons Learnt with Collaborative e-Research Infrastructure and the application of Identity Management and Access Control for the Centre for Environmental Data Analysis

NASA Astrophysics Data System (ADS)

Kershaw, P.

2016-12-01

CEDA, the Centre for Environmental Data Analysis, hosts a range of services on behalf of NERC (Natural Environment Research Council) for the UK environmental sciences community and its work with international partners. It is host to four data centres covering atmospheric science, earth observation, climate and space data domain areas. It holds this data on behalf of a number of different providers each with their own data policies which has thus required the development of a comprehensive system to manage access. With the advent of CMIP5, CEDA committed to be one of a number of centres to host the climate model outputs and make them available through the Earth System Grid Federation, a globally distributed software infrastructure developed for this purpose. From the outset, a means for restricting access to datasets was required, necessitating the development a federated system for authentication and authorisation so that access to data could be managed across multiple providers around the world. From 2012, CEDA has seen a further evolution with the development of JASMIN, a multi-petabyte data analysis facility. Hosted alongside the CEDA archive, it provides a range of services for users including a batch compute cluster, group workspaces and a community cloud. This has required significant changes and enhancements to the access control system. In common with many other examples in the research community, the experiences of the above underline the difficulties of developing collaborative e-Research infrastructures. Drawing from these there are some recurring themes: Clear requirements need to be established at the outset recognising that implementing strict access policies can incur additional development and administrative overhead. An appropriate balance is needed between ease of access desired by end users and metrics and monitoring required by resource providers. The major technical challenge is not with security technologies themselves but their effective integration with services and resources which they must protect. Effective policy and governance structures are needed for ongoing operations Federated identity infrastructures often exist only at the national level making it difficult for international research collaborations to exploit them.
Development of a SaaS application probe to the physical properties of the Earth's interior: An attempt at moving HPC to the cloud

NASA Astrophysics Data System (ADS)

Huang, Qian

2014-09-01

Scientific computing often requires the availability of a massive number of computers for performing large-scale simulations, and computing in mineral physics is no exception. In order to investigate physical properties of minerals at extreme conditions in computational mineral physics, parallel computing technology is used to speed up the performance by utilizing multiple computer resources to process a computational task simultaneously thereby greatly reducing computation time. Traditionally, parallel computing has been addressed by using High Performance Computing (HPC) solutions and installed facilities such as clusters and super computers. Today, it has been seen that there is a tremendous growth in cloud computing. Infrastructure as a Service (IaaS), the on-demand and pay-as-you-go model, creates a flexible and cost-effective mean to access computing resources. In this paper, a feasibility report of HPC on a cloud infrastructure is presented. It is found that current cloud services in IaaS layer still need to improve performance to be useful to research projects. On the other hand, Software as a Service (SaaS), another type of cloud computing, is introduced into an HPC system for computing in mineral physics, and an application of which is developed. In this paper, an overall description of this SaaS application is presented. This contribution can promote cloud application development in computational mineral physics, and cross-disciplinary studies.
WPS mediation: An approach to process geospatial data on different computing backends

NASA Astrophysics Data System (ADS)

Giuliani, Gregory; Nativi, Stefano; Lehmann, Anthony; Ray, Nicolas

2012-10-01

The OGC Web Processing Service (WPS) specification allows generating information by processing distributed geospatial data made available through Spatial Data Infrastructures (SDIs). However, current SDIs have limited analytical capacities and various problems emerge when trying to use them in data and computing-intensive domains such as environmental sciences. These problems are usually not or only partially solvable using single computing resources. Therefore, the Geographic Information (GI) community is trying to benefit from the superior storage and computing capabilities offered by distributed computing (e.g., Grids, Clouds) related methods and technologies. Currently, there is no commonly agreed approach to grid-enable WPS. No implementation allows one to seamlessly execute a geoprocessing calculation following user requirements on different computing backends, ranging from a stand-alone GIS server up to computer clusters and large Grid infrastructures. Considering this issue, this paper presents a proof of concept by mediating different geospatial and Grid software packages, and by proposing an extension of WPS specification through two optional parameters. The applicability of this approach will be demonstrated using a Normalized Difference Vegetation Index (NDVI) mediated WPS process, highlighting benefits, and issues that need to be further investigated to improve performances.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.