Sample records for distributed computing infrastructure

  1. Characterizing Crowd Participation and Productivity of Foldit Through Web Scraping

    DTIC Science & Technology

    2016-03-01

    Berkeley Open Infrastructure for Network Computing CDF Cumulative Distribution Function CPU Central Processing Unit CSSG Crowdsourced Serious Game...computers at once can create a similar capacity. According to Anderson [6], principal investigator for the Berkeley Open Infrastructure for Network...extraterrestrial life. From this project, a software-based distributed computing platform called the Berkeley Open Infrastructure for Network Computing

  2. A Development of Lightweight Grid Interface

    NASA Astrophysics Data System (ADS)

    Iwai, G.; Kawai, Y.; Sasaki, T.; Watase, Y.

    2011-12-01

    In order to help a rapid development of Grid/Cloud aware applications, we have developed API to abstract the distributed computing infrastructures based on SAGA (A Simple API for Grid Applications). SAGA, which is standardized in the OGF (Open Grid Forum), defines API specifications to access distributed computing infrastructures, such as Grid, Cloud and local computing resources. The Universal Grid API (UGAPI), which is a set of command line interfaces (CLI) and APIs, aims to offer simpler API to combine several SAGA interfaces with richer functionalities. These CLIs of the UGAPI offer typical functionalities required by end users for job management and file access to the different distributed computing infrastructures as well as local computing resources. We have also built a web interface for the particle therapy simulation and demonstrated the large scale calculation using the different infrastructures at the same time. In this paper, we would like to present how the web interface based on UGAPI and SAGA achieve more efficient utilization of computing resources over the different infrastructures with technical details and practical experiences.

  3. Infrastructures for Distributed Computing: the case of BESIII

    NASA Astrophysics Data System (ADS)

    Pellegrino, J.

    2018-05-01

    The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.

  4. New security infrastructure model for distributed computing systems

    NASA Astrophysics Data System (ADS)

    Dubenskaya, J.; Kryukov, A.; Demichev, A.; Prikhodko, N.

    2016-02-01

    At the paper we propose a new approach to setting up a user-friendly and yet secure authentication and authorization procedure in a distributed computing system. The security concept of the most heterogeneous distributed computing systems is based on the public key infrastructure along with proxy certificates which are used for rights delegation. In practice a contradiction between the limited lifetime of the proxy certificates and the unpredictable time of the request processing is a big issue for the end users of the system. We propose to use unlimited in time hashes which are individual for each request instead of proxy certificate. Our approach allows to avoid using of the proxy certificates. Thus the security infrastructure of distributed computing system becomes easier for development, support and use.

  5. Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions.

    PubMed

    Dinov, Ivo D; Siegrist, Kyle; Pearl, Dennis K; Kalinin, Alexandr; Christou, Nicolas

    2016-06-01

    Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome , which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols.

  6. Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions

    PubMed Central

    Dinov, Ivo D.; Siegrist, Kyle; Pearl, Dennis K.; Kalinin, Alexandr; Christou, Nicolas

    2015-01-01

    Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome, which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols. PMID:27158191

  7. Distributed Accounting on the Grid

    NASA Technical Reports Server (NTRS)

    Thigpen, William; Hacker, Thomas J.; McGinnis, Laura F.; Athey, Brian D.

    2001-01-01

    By the late 1990s, the Internet was adequately equipped to move vast amounts of data between HPC (High Performance Computing) systems, and efforts were initiated to link together the national infrastructure of high performance computational and data storage resources together into a general computational utility 'grid', analogous to the national electrical power grid infrastructure. The purpose of the Computational grid is to provide dependable, consistent, pervasive, and inexpensive access to computational resources for the computing community in the form of a computing utility. This paper presents a fully distributed view of Grid usage accounting and a methodology for allocating Grid computational resources for use on a Grid computing system.

  8. Integration of Cloud resources in the LHCb Distributed Computing

    NASA Astrophysics Data System (ADS)

    Úbeda García, Mario; Méndez Muñoz, Víctor; Stagni, Federico; Cabarrou, Baptiste; Rauschmayr, Nathalie; Charpentier, Philippe; Closier, Joel

    2014-06-01

    This contribution describes how Cloud resources have been integrated in the LHCb Distributed Computing. LHCb is using its specific Dirac extension (LHCbDirac) as an interware for its Distributed Computing. So far, it was seamlessly integrating Grid resources and Computer clusters. The cloud extension of DIRAC (VMDIRAC) allows the integration of Cloud computing infrastructures. It is able to interact with multiple types of infrastructures in commercial and institutional clouds, supported by multiple interfaces (Amazon EC2, OpenNebula, OpenStack and CloudStack) - instantiates, monitors and manages Virtual Machines running on this aggregation of Cloud resources. Moreover, specifications for institutional Cloud resources proposed by Worldwide LHC Computing Grid (WLCG), mainly by the High Energy Physics Unix Information Exchange (HEPiX) group, have been taken into account. Several initiatives and computing resource providers in the eScience environment have already deployed IaaS in production during 2013. Keeping this on mind, pros and cons of a cloud based infrasctructure have been studied in contrast with the current setup. As a result, this work addresses four different use cases which represent a major improvement on several levels of our infrastructure. We describe the solution implemented by LHCb for the contextualisation of the VMs based on the idea of Cloud Site. We report on operational experience of using in production several institutional Cloud resources that are thus becoming integral part of the LHCb Distributed Computing resources. Furthermore, we describe as well the gradual migration of our Service Infrastructure towards a fully distributed architecture following the Service as a Service (SaaS) model.

  9. Dynamic Collaboration Infrastructure for Hydrologic Science

    NASA Astrophysics Data System (ADS)

    Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.

    2016-12-01

    Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.

  10. Scientific Services on the Cloud

    NASA Astrophysics Data System (ADS)

    Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong

    Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.

  11. Consolidation and development roadmap of the EMI middleware

    NASA Astrophysics Data System (ADS)

    Kónya, B.; Aiftimiei, C.; Cecchi, M.; Field, L.; Fuhrmann, P.; Nilsen, J. K.; White, J.

    2012-12-01

    Scientific research communities have benefited recently from the increasing availability of computing and data infrastructures with unprecedented capabilities for large scale distributed initiatives. These infrastructures are largely defined and enabled by the middleware they deploy. One of the major issues in the current usage of research infrastructures is the need to use similar but often incompatible middleware solutions. The European Middleware Initiative (EMI) is a collaboration of the major European middleware providers ARC, dCache, gLite and UNICORE. EMI aims to: deliver a consolidated set of middleware components for deployment in EGI, PRACE and other Distributed Computing Infrastructures; extend the interoperability between grids and other computing infrastructures; strengthen the reliability of the services; establish a sustainable model to maintain and evolve the middleware; fulfil the requirements of the user communities. This paper presents the consolidation and development objectives of the EMI software stack covering the last two years. The EMI development roadmap is introduced along the four technical areas of compute, data, security and infrastructure. The compute area plan focuses on consolidation of standards and agreements through a unified interface for job submission and management, a common format for accounting, the wide adoption of GLUE schema version 2.0 and the provision of a common framework for the execution of parallel jobs. The security area is working towards a unified security model and lowering the barriers to Grid usage by allowing users to gain access with their own credentials. The data area is focusing on implementing standards to ensure interoperability with other grids and industry components and to reuse already existing clients in operating systems and open source distributions. One of the highlights of the infrastructure area is the consolidation of the information system services via the creation of a common information backbone.

  12. The Czech National Grid Infrastructure

    NASA Astrophysics Data System (ADS)

    Chudoba, J.; Křenková, I.; Mulač, M.; Ruda, M.; Sitera, J.

    2017-10-01

    The Czech National Grid Infrastructure is operated by MetaCentrum, a CESNET department responsible for coordinating and managing activities related to distributed computing. CESNET as the Czech National Research and Education Network (NREN) provides many e-infrastructure services, which are used by 94% of the scientific and research community in the Czech Republic. Computing and storage resources owned by different organizations are connected by fast enough network to provide transparent access to all resources. We describe in more detail the computing infrastructure, which is based on several different technologies and covers grid, cloud and map-reduce environment. While the largest part of CPUs is still accessible via distributed torque servers, providing environment for long batch jobs, part of infrastructure is available via standard EGI tools in EGI, subset of NGI resources is provided into EGI FedCloud environment with cloud interface and there is also Hadoop cluster provided by the same e-infrastructure.A broad spectrum of computing servers is offered; users can choose from standard 2 CPU servers to large SMP machines with up to 6 TB of RAM or servers with GPU cards. Different groups have different priorities on various resources, resource owners can even have an exclusive access. The software is distributed via AFS. Storage servers offering up to tens of terabytes of disk space to individual users are connected via NFS4 on top of GPFS and access to long term HSM storage with peta-byte capacity is also provided. Overview of available resources and recent statistics of usage will be given.

  13. A Grid Infrastructure for Supporting Space-based Science Operations

    NASA Technical Reports Server (NTRS)

    Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)

    2002-01-01

    Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.

  14. Pilots 2.0: DIRAC pilots for all the skies

    NASA Astrophysics Data System (ADS)

    Stagni, F.; Tsaregorodtsev, A.; McNab, A.; Luzzi, C.

    2015-12-01

    In the last few years, new types of computing infrastructures, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are opportunistic. Most of these new infrastructures are based on virtualization techniques. Meanwhile, some concepts, such as distributed queues, lost appeal, while still supporting a vast amount of resources. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to hide the diversity of underlying resources has become essential. The DIRAC WMS is based on the concept of pilot jobs that was introduced back in 2004. A pilot is what creates the possibility to run jobs on a worker node. Within DIRAC, we developed a new generation of pilot jobs, that we dubbed Pilots 2.0. Pilots 2.0 are not tied to a specific infrastructure; rather they are generic, fully configurable and extendible pilots. A Pilot 2.0 can be sent, as a script to be run, or it can be fetched from a remote location. A pilot 2.0 can run on every computing resource, e.g.: on CREAM Computing elements, on DIRAC Computing elements, on Virtual Machines as part of the contextualization script, or IAAC resources, provided that these machines are properly configured, hiding all the details of the Worker Nodes (WNs) infrastructure. Pilots 2.0 can be generated server and client side. Pilots 2.0 are the “pilots to fly in all the skies”, aiming at easy use of computing power, in whatever form it is presented. Another aim is the unification and simplification of the monitoring infrastructure for all kinds of computing resources, by using pilots as a network of distributed sensors coordinated by a central resource monitoring system. Pilots 2.0 have been developed using the command pattern. VOs using DIRAC can tune pilots 2.0 as they need, and extend or replace each and every pilot command in an easy way. In this paper we describe how Pilots 2.0 work with distributed and heterogeneous resources providing the necessary abstraction to deal with different kind of computing resources.

  15. Online catalog access and distribution of remotely sensed information

    NASA Astrophysics Data System (ADS)

    Lutton, Stephen M.

    1997-09-01

    Remote sensing is providing voluminous data and value added information products. Electronic sensors, communication electronics, computer software, hardware, and network communications technology have matured to the point where a distributed infrastructure for remotely sensed information is a reality. The amount of remotely sensed data and information is making distributed infrastructure almost a necessity. This infrastructure provides data collection, archiving, cataloging, browsing, processing, and viewing for applications from scientific research to economic, legal, and national security decision making. The remote sensing field is entering a new exciting stage of commercial growth and expansion into the mainstream of government and business decision making. This paper overviews this new distributed infrastructure and then focuses on describing a software system for on-line catalog access and distribution of remotely sensed information.

  16. Cooperative high-performance storage in the accelerated strategic computing initiative

    NASA Technical Reports Server (NTRS)

    Gary, Mark; Howard, Barry; Louis, Steve; Minuzzo, Kim; Seager, Mark

    1996-01-01

    The use and acceptance of new high-performance, parallel computing platforms will be impeded by the absence of an infrastructure capable of supporting orders-of-magnitude improvement in hierarchical storage and high-speed I/O (Input/Output). The distribution of these high-performance platforms and supporting infrastructures across a wide-area network further compounds this problem. We describe an architectural design and phased implementation plan for a distributed, Cooperative Storage Environment (CSE) to achieve the necessary performance, user transparency, site autonomy, communication, and security features needed to support the Accelerated Strategic Computing Initiative (ASCI). ASCI is a Department of Energy (DOE) program attempting to apply terascale platforms and Problem-Solving Environments (PSEs) toward real-world computational modeling and simulation problems. The ASCI mission must be carried out through a unified, multilaboratory effort, and will require highly secure, efficient access to vast amounts of data. The CSE provides a logically simple, geographically distributed, storage infrastructure of semi-autonomous cooperating sites to meet the strategic ASCI PSE goal of highperformance data storage and access at the user desktop.

  17. The StratusLab cloud distribution: Use-cases and support for scientific applications

    NASA Astrophysics Data System (ADS)

    Floros, E.

    2012-04-01

    The StratusLab project is integrating an open cloud software distribution that enables organizations to setup and provide their own private or public IaaS (Infrastructure as a Service) computing clouds. StratusLab distribution capitalizes on popular infrastructure virtualization solutions like KVM, the OpenNebula virtual machine manager, Claudia service manager and SlipStream deployment platform, which are further enhanced and expanded with additional components developed within the project. The StratusLab distribution covers the core aspects of a cloud IaaS architecture, namely Computing (life-cycle management of virtual machines), Storage, Appliance management and Networking. The resulting software stack provides a packaged turn-key solution for deploying cloud computing services. The cloud computing infrastructures deployed using StratusLab can support a wide range of scientific and business use cases. Grid computing has been the primary use case pursued by the project and for this reason the initial priority has been the support for the deployment and operation of fully virtualized production-level grid sites; a goal that has already been achieved by operating such a site as part of EGI's (European Grid Initiative) pan-european grid infrastructure. In this area the project is currently working to provide non-trivial capabilities like elastic and autonomic management of grid site resources. Although grid computing has been the motivating paradigm, StratusLab's cloud distribution can support a wider range of use cases. Towards this direction, we have developed and currently provide support for setting up general purpose computing solutions like Hadoop, MPI and Torque clusters. For what concerns scientific applications the project is collaborating closely with the Bioinformatics community in order to prepare VM appliances and deploy optimized services for bioinformatics applications. In a similar manner additional scientific disciplines like Earth Science can take advantage of StratusLab cloud solutions. Interested users are welcomed to join StratusLab's user community by getting access to the reference cloud services deployed by the project and offered to the public.

  18. Grids, virtualization, and clouds at Fermilab

    DOE PAGES

    Timm, S.; Chadwick, K.; Garzoglio, G.; ...

    2014-06-11

    Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture andmore » the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). Lastly, this work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.« less

  19. Grids, virtualization, and clouds at Fermilab

    NASA Astrophysics Data System (ADS)

    Timm, S.; Chadwick, K.; Garzoglio, G.; Noh, S.

    2014-06-01

    Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture and the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). This work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.

  20. Sustaining a Community Computing Infrastructure for Online Teacher Professional Development: A Case Study of Designing Tapped In

    NASA Astrophysics Data System (ADS)

    Farooq, Umer; Schank, Patricia; Harris, Alexandra; Fusco, Judith; Schlager, Mark

    Community computing has recently grown to become a major research area in human-computer interaction. One of the objectives of community computing is to support computer-supported cooperative work among distributed collaborators working toward shared professional goals in online communities of practice. A core issue in designing and developing community computing infrastructures — the underlying sociotechnical layer that supports communitarian activities — is sustainability. Many community computing initiatives fail because the underlying infrastructure does not meet end user requirements; the community is unable to maintain a critical mass of users consistently over time; it generates insufficient social capital to support significant contributions by members of the community; or, as typically happens with funded initiatives, financial and human capital resource become unavailable to further maintain the infrastructure. On the basis of more than 9 years of design experience with Tapped In-an online community of practice for education professionals — we present a case study that discusses four design interventions that have sustained the Tapped In infrastructure and its community to date. These interventions represent broader design strategies for developing online environments for professional communities of practice.

  1. Managing a tier-2 computer centre with a private cloud infrastructure

    NASA Astrophysics Data System (ADS)

    Bagnasco, Stefano; Berzano, Dario; Brunetti, Riccardo; Lusso, Stefano; Vallero, Sara

    2014-06-01

    In a typical scientific computing centre, several applications coexist and share a single physical infrastructure. An underlying Private Cloud infrastructure eases the management and maintenance of such heterogeneous applications (such as multipurpose or application-specific batch farms, Grid sites, interactive data analysis facilities and others), allowing dynamic allocation resources to any application. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques. Such infrastructures are being deployed in some large centres (see e.g. the CERN Agile Infrastructure project), but with several open-source tools reaching maturity this is becoming viable also for smaller sites. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 centre, an Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The private cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem and the OpenWRT Linux distribution (used for network virtualization); a future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and OCCI.

  2. Towards a Multi-Mission, Airborne Science Data System Environment

    NASA Astrophysics Data System (ADS)

    Crichton, D. J.; Hardman, S.; Law, E.; Freeborn, D.; Kay-Im, E.; Lau, G.; Oswald, J.

    2011-12-01

    NASA earth science instruments are increasingly relying on airborne missions. However, traditionally, there has been limited common infrastructure support available to principal investigators in the area of science data systems. As a result, each investigator has been required to develop their own computing infrastructures for the science data system. Typically there is little software reuse and many projects lack sufficient resources to provide a robust infrastructure to capture, process, distribute and archive the observations acquired from airborne flights. At NASA's Jet Propulsion Laboratory (JPL), we have been developing a multi-mission data system infrastructure for airborne instruments called the Airborne Cloud Computing Environment (ACCE). ACCE encompasses the end-to-end lifecycle covering planning, provisioning of data system capabilities, and support for scientific analysis in order to improve the quality, cost effectiveness, and capabilities to enable new scientific discovery and research in earth observation. This includes improving data system interoperability across each instrument. A principal characteristic is being able to provide an agile infrastructure that is architected to allow for a variety of configurations of the infrastructure from locally installed compute and storage services to provisioning those services via the "cloud" from cloud computer vendors such as Amazon.com. Investigators often have different needs that require a flexible configuration. The data system infrastructure is built on the Apache's Object Oriented Data Technology (OODT) suite of components which has been used for a number of spaceborne missions and provides a rich set of open source software components and services for constructing science processing and data management systems. In 2010, a partnership was formed between the ACCE team and the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to support the data processing and data management needs. A principal goal is to provide support for the Fourier Transform Spectrometer (FTS) instrument which will produce over 700,000 soundings over the life of their three-year mission. The cost to purchase and operate a cluster-based system in order to generate Level 2 Full Physics products from this data was prohibitive. Through an evaluation of cloud computing solutions, Amazon's Elastic Compute Cloud (EC2) was selected for the CARVE deployment. As the ACCE infrastructure is developed and extended to form an infrastructure for airborne missions, the experience of working with CARVE has provided a number of lessons learned and has proven to be important in reinforcing the unique aspects of airborne missions and the importance of the ACCE infrastructure in developing a cost effective, flexible multi-mission capability that leverages emerging capabilities in cloud computing, workflow management, and distributed computing.

  3. Dynamic VM Provisioning for TORQUE in a Cloud Environment

    NASA Astrophysics Data System (ADS)

    Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.

    2014-06-01

    Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.

  4. CMS Distributed Computing Integration in the LHC sustained operations era

    NASA Astrophysics Data System (ADS)

    Grandi, C.; Bockelman, B.; Bonacorsi, D.; Fisk, I.; González Caballero, I.; Farina, F.; Hernández, J. M.; Padhi, S.; Sarkar, S.; Sciabà, A.; Sfiligoi, I.; Spiga, F.; Úbeda García, M.; Van Der Ster, D. C.; Zvada, M.

    2011-12-01

    After many years of preparation the CMS computing system has reached a situation where stability in operations limits the possibility to introduce innovative features. Nevertheless it is the same need of stability and smooth operations that requires the introduction of features that were considered not strategic in the previous phases. Examples are: adequate authorization to control and prioritize the access to storage and computing resources; improved monitoring to investigate problems and identify bottlenecks on the infrastructure; increased automation to reduce the manpower needed for operations; effective process to deploy in production new releases of the software tools. We present the work of the CMS Distributed Computing Integration Activity that is responsible for providing a liaison between the CMS distributed computing infrastructure and the software providers, both internal and external to CMS. In particular we describe the introduction of new middleware features during the last 18 months as well as the requirements to Grid and Cloud software developers for the future.

  5. Towards Portable Large-Scale Image Processing with High-Performance Computing.

    PubMed

    Huo, Yuankai; Blaber, Justin; Damon, Stephen M; Boyd, Brian D; Bao, Shunxing; Parvathaneni, Prasanna; Noguera, Camilo Bermudez; Chaganti, Shikha; Nath, Vishwesh; Greer, Jasmine M; Lyu, Ilwoo; French, William R; Newton, Allen T; Rogers, Baxter P; Landman, Bennett A

    2018-05-03

    High-throughput, large-scale medical image computing demands tight integration of high-performance computing (HPC) infrastructure for data storage, job distribution, and image processing. The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has constructed a large-scale image storage and processing infrastructure that is composed of (1) a large-scale image database using the eXtensible Neuroimaging Archive Toolkit (XNAT), (2) a content-aware job scheduling platform using the Distributed Automation for XNAT pipeline automation tool (DAX), and (3) a wide variety of encapsulated image processing pipelines called "spiders." The VUIIS CCI medical image data storage and processing infrastructure have housed and processed nearly half-million medical image volumes with Vanderbilt Advanced Computing Center for Research and Education (ACCRE), which is the HPC facility at the Vanderbilt University. The initial deployment was natively deployed (i.e., direct installations on a bare-metal server) within the ACCRE hardware and software environments, which lead to issues of portability and sustainability. First, it could be laborious to deploy the entire VUIIS CCI medical image data storage and processing infrastructure to another HPC center with varying hardware infrastructure, library availability, and software permission policies. Second, the spiders were not developed in an isolated manner, which has led to software dependency issues during system upgrades or remote software installation. To address such issues, herein, we describe recent innovations using containerization techniques with XNAT/DAX which are used to isolate the VUIIS CCI medical image data storage and processing infrastructure from the underlying hardware and software environments. The newly presented XNAT/DAX solution has the following new features: (1) multi-level portability from system level to the application level, (2) flexible and dynamic software development and expansion, and (3) scalable spider deployment compatible with HPC clusters and local workstations.

  6. A global distributed storage architecture

    NASA Technical Reports Server (NTRS)

    Lionikis, Nemo M.; Shields, Michael F.

    1996-01-01

    NSA architects and planners have come to realize that to gain the maximum benefit from, and keep pace with, emerging technologies, we must move to a radically different computing architecture. The compute complex of the future will be a distributed heterogeneous environment, where, to a much greater extent than today, network-based services are invoked to obtain resources. Among the rewards of implementing the services-based view are that it insulates the user from much of the complexity of our multi-platform, networked, computer and storage environment and hides its diverse underlying implementation details. In this paper, we will describe one of the fundamental services being built in our envisioned infrastructure; a global, distributed archive with near-real-time access characteristics. Our approach for adapting mass storage services to this infrastructure will become clear as the service is discussed.

  7. Monitoring performance of a highly distributed and complex computing infrastructure in LHCb

    NASA Astrophysics Data System (ADS)

    Mathe, Z.; Haen, C.; Stagni, F.

    2017-10-01

    In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.

  8. Integrating Computing Resources: A Shared Distributed Architecture for Academics and Administrators.

    ERIC Educational Resources Information Center

    Beltrametti, Monica; English, Will

    1994-01-01

    Development and implementation of a shared distributed computing architecture at the University of Alberta (Canada) are described. Aspects discussed include design of the architecture, users' views of the electronic environment, technical and managerial challenges, and the campuswide human infrastructures needed to manage such an integrated…

  9. AGIS: Integration of new technologies used in ATLAS Distributed Computing

    NASA Astrophysics Data System (ADS)

    Anisenkov, Alexey; Di Girolamo, Alessandro; Alandes Pradillo, Maria

    2017-10-01

    The variety of the ATLAS Distributed Computing infrastructure requires a central information system to define the topology of computing resources and to store different parameters and configuration data which are needed by various ATLAS software components. The ATLAS Grid Information System (AGIS) is the system designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by ATLAS Distributed Computing applications and services. Being an intermediate middleware system between clients and external information sources (like central BDII, GOCDB, MyOSG), AGIS defines the relations between experiment specific used resources and physical distributed computing capabilities. Being in production during LHC Runl AGIS became the central information system for Distributed Computing in ATLAS and it is continuously evolving to fulfil new user requests, enable enhanced operations and follow the extension of the ATLAS Computing model. The ATLAS Computing model and data structures used by Distributed Computing applications and services are continuously evolving and trend to fit newer requirements from ADC community. In this note, we describe the evolution and the recent developments of AGIS functionalities, related to integration of new technologies recently become widely used in ATLAS Computing, like flexible computing utilization of opportunistic Cloud and HPC resources, ObjectStore services integration for Distributed Data Management (Rucio) and ATLAS workload management (PanDA) systems, unified storage protocols declaration required for PandDA Pilot site movers and others. The improvements of information model and general updates are also shown, in particular we explain how other collaborations outside ATLAS could benefit the system as a computing resources information catalogue. AGIS is evolving towards a common information system, not coupled to a specific experiment.

  10. Distributed telemedicine for the National Information Infrastructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Forslund, D.W.; Lee, Seong H.; Reverbel, F.C.

    1997-08-01

    TeleMed is an advanced system that provides a distributed multimedia electronic medical record available over a wide area network. It uses object-based computing, distributed data repositories, advanced graphical user interfaces, and visualization tools along with innovative concept extraction of image information for storing and accessing medical records developed in a separate project from 1994-5. In 1996, we began the transition to Java, extended the infrastructure, and worked to begin deploying TeleMed-like technologies throughout the nation. Other applications are mentioned.

  11. Flexible services for the support of research.

    PubMed

    Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John

    2013-01-28

    Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.

  12. The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications

    NASA Technical Reports Server (NTRS)

    Johnston, William E.

    2002-01-01

    With the advent of Grids - infrastructure for using and managing widely distributed computing and data resources in the science environment - there is now an opportunity to provide a standard, large-scale, computing, data, instrument, and collaboration environment for science that spans many different projects and provides the required infrastructure and services in a relatively uniform and supportable way. Grid technology has evolved over the past several years to provide the services and infrastructure needed for building 'virtual' systems and organizations. We argue that Grid technology provides an excellent basis for the creation of the integrated environments that can combine the resources needed to support the large- scale science projects located at multiple laboratories and universities. We present some science case studies that indicate that a paradigm shift in the process of science will come about as a result of Grids providing transparent and secure access to advanced and integrated information and technologies infrastructure: powerful computing systems, large-scale data archives, scientific instruments, and collaboration tools. These changes will be in the form of services that can be integrated with the user's work environment, and that enable uniform and highly capable access to these computers, data, and instruments, regardless of the location or exact nature of these resources. These services will integrate transient-use resources like computing systems, scientific instruments, and data caches (e.g., as they are needed to perform a simulation or analyze data from a single experiment); persistent-use resources. such as databases, data catalogues, and archives, and; collaborators, whose involvement will continue for the lifetime of a project or longer. While we largely address large-scale science in this paper, Grids, particularly when combined with Web Services, will address a broad spectrum of science scenarios. both large and small scale.

  13. Learning from Home: Implementing Technical Infrastructure for Distributed Learning via Home-PC within Telenor.

    ERIC Educational Resources Information Center

    Folkman, Kristian; Berge, Zane L.

    2002-01-01

    Presents results from a study conducted at Telenor, a Norwegian telecom operator, regarding the distribution of home personal computers to employees to encourage professional development outside working hours on an individual basis. Findings from a survey suggest that home computers can contribute positively to increasing employee's knowledge…

  14. Cloud Computing and Its Applications in GIS

    NASA Astrophysics Data System (ADS)

    Kang, Cao

    2011-12-01

    Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)

  15. Interoperable PKI Data Distribution in Computational Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pala, Massimiliano; Cholia, Shreyas; Rea, Scott A.

    One of the most successful working examples of virtual organizations, computational grids need authentication mechanisms that inter-operate across domain boundaries. Public Key Infrastructures(PKIs) provide sufficient flexibility to allow resource managers to securely grant access to their systems in such distributed environments. However, as PKIs grow and services are added to enhance both security and usability, users and applications must struggle to discover available resources-particularly when the Certification Authority (CA) is alien to the relying party. This article presents how to overcome these limitations of the current grid authentication model by integrating the PKI Resource Query Protocol (PRQP) into the Gridmore » Security Infrastructure (GSI).« less

  16. Advanced Optical Burst Switched Network Concepts

    NASA Astrophysics Data System (ADS)

    Nejabati, Reza; Aracil, Javier; Castoldi, Piero; de Leenheer, Marc; Simeonidou, Dimitra; Valcarenghi, Luca; Zervas, Georgios; Wu, Jian

    In recent years, as the bandwidth and the speed of networks have increased significantly, a new generation of network-based applications using the concept of distributed computing and collaborative services is emerging (e.g., Grid computing applications). The use of the available fiber and DWDM infrastructure for these applications is a logical choice offering huge amounts of cheap bandwidth and ensuring global reach of computing resources [230]. Currently, there is a great deal of interest in deploying optical circuit (wavelength) switched network infrastructure for distributed computing applications that require long-lived wavelength paths and address the specific needs of a small number of well-known users. Typical users are particle physicists who, due to their international collaborations and experiments, generate enormous amounts of data (Petabytes per year). These users require a network infrastructures that can support processing and analysis of large datasets through globally distributed computing resources [230]. However, providing wavelength granularity bandwidth services is not an efficient and scalable solution for applications and services that address a wider base of user communities with different traffic profiles and connectivity requirements. Examples of such applications may be: scientific collaboration in smaller scale (e.g., bioinformatics, environmental research), distributed virtual laboratories (e.g., remote instrumentation), e-health, national security and defense, personalized learning environments and digital libraries, evolving broadband user services (i.e., high resolution home video editing, real-time rendering, high definition interactive TV). As a specific example, in e-health services and in particular mammography applications due to the size and quantity of images produced by remote mammography, stringent network requirements are necessary. Initial calculations have shown that for 100 patients to be screened remotely, the network would have to securely transport 1.2 GB of data every 30 s [230]. According to the above explanation it is clear that these types of applications need a new network infrastructure and transport technology that makes large amounts of bandwidth at subwavelength granularity, storage, computation, and visualization resources potentially available to a wide user base for specified time durations. As these types of collaborative and network-based applications evolve addressing a wide range and large number of users, it is infeasible to build dedicated networks for each application type or category. Consequently, there should be an adaptive network infrastructure able to support all application types, each with their own access, network, and resource usage patterns. This infrastructure should offer flexible and intelligent network elements and control mechanism able to deploy new applications quickly and efficiently.

  17. Reconfiguring practice: the interdependence of experimental procedure and computing infrastructure in distributed earthquake engineering.

    PubMed

    De La Flor, Grace; Ojaghi, Mobin; Martínez, Ignacio Lamata; Jirotka, Marina; Williams, Martin S; Blakeborough, Anthony

    2010-09-13

    When transitioning local laboratory practices into distributed environments, the interdependent relationship between experimental procedure and the technologies used to execute experiments becomes highly visible and a focal point for system requirements. We present an analysis of ways in which this reciprocal relationship is reconfiguring laboratory practices in earthquake engineering as a new computing infrastructure is embedded within three laboratories in order to facilitate the execution of shared experiments across geographically distributed sites. The system has been developed as part of the UK Network for Earthquake Engineering Simulation e-Research project, which links together three earthquake engineering laboratories at the universities of Bristol, Cambridge and Oxford. We consider the ways in which researchers have successfully adapted their local laboratory practices through the modification of experimental procedure so that they may meet the challenges of coordinating distributed earthquake experiments.

  18. @neurIST: infrastructure for advanced disease management through integration of heterogeneous data, computing, and complex processing services.

    PubMed

    Benkner, Siegfried; Arbona, Antonio; Berti, Guntram; Chiarini, Alessandro; Dunlop, Robert; Engelbrecht, Gerhard; Frangi, Alejandro F; Friedrich, Christoph M; Hanser, Susanne; Hasselmeyer, Peer; Hose, Rod D; Iavindrasana, Jimison; Köhler, Martin; Iacono, Luigi Lo; Lonsdale, Guy; Meyer, Rodolphe; Moore, Bob; Rajasekaran, Hariharan; Summers, Paul E; Wöhrer, Alexander; Wood, Steven

    2010-11-01

    The increasing volume of data describing human disease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the @neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system's architecture is generic enough that it could be adapted to the treatment of other diseases. Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers clinicians the tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medical researchers gain access to a critical mass of aneurysm related data due to the system's ability to federate distributed information sources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access and work on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand for performing computationally intensive simulations for treatment planning and research.

  19. Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds

    NASA Astrophysics Data System (ADS)

    Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.

    In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.

  20. VERA 3.6 Release Notes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williamson, Richard L.; Kochunas, Brendan; Adams, Brian M.

    The Virtual Environment for Reactor Applications components included in this distribution include selected computational tools and supporting infrastructure that solve neutronics, thermal-hydraulics, fuel performance, and coupled neutronics-thermal hydraulics problems. The infrastructure components provide a simplified common user input capability and provide for the physics integration with data transfer and coupled-physics iterative solution algorithms.

  1. Globus Quick Start Guide. Globus Software Version 1.1

    NASA Technical Reports Server (NTRS)

    1999-01-01

    The Globus Project is a community effort, led by Argonne National Laboratory and the University of Southern California's Information Sciences Institute. Globus is developing the basic software infrastructure for computations that integrate geographically distributed computational and information resources.

  2. A service-based BLAST command tool supported by cloud infrastructures.

    PubMed

    Carrión, Abel; Blanquer, Ignacio; Hernández, Vicente

    2012-01-01

    Notwithstanding the benefits of distributed-computing infrastructures for empowering bioinformatics analysis tools with the needed computing and storage capability, the actual use of these infrastructures is still low. Learning curves and deployment difficulties have reduced the impact on the wide research community. This article presents a porting strategy of BLAST based on a multiplatform client and a service that provides the same interface as sequential BLAST, thus reducing learning curve and with minimal impact on their integration on existing workflows. The porting has been done using the execution and data access components from the EC project Venus-C and the Windows Azure infrastructure provided in this project. The results obtained demonstrate a low overhead on the global execution framework and reasonable speed-up and cost-efficiency with respect to a sequential version.

  3. Bringing the CMS distributed computing system into scalable operations

    NASA Astrophysics Data System (ADS)

    Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.

    2010-04-01

    Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.

  4. Complete distributed computing environment for a HEP experiment: experience with ARC-connected infrastructure for ATLAS

    NASA Astrophysics Data System (ADS)

    Read, A.; Taga, A.; O-Saada, F.; Pajchel, K.; Samset, B. H.; Cameron, D.

    2008-07-01

    Computing and storage resources connected by the Nordugrid ARC middleware in the Nordic countries, Switzerland and Slovenia are a part of the ATLAS computing Grid. This infrastructure is being commissioned with the ongoing ATLAS Monte Carlo simulation production in preparation for the commencement of data taking in 2008. The unique non-intrusive architecture of ARC, its straightforward interplay with the ATLAS Production System via the Dulcinea executor, and its performance during the commissioning exercise is described. ARC support for flexible and powerful end-user analysis within the GANGA distributed analysis framework is also shown. Whereas the storage solution for this Grid was earlier based on a large, distributed collection of GridFTP-servers, the ATLAS computing design includes a structured SRM-based system with a limited number of storage endpoints. The characteristics, integration and performance of the old and new storage solutions are presented. Although the hardware resources in this Grid are quite modest, it has provided more than double the agreed contribution to the ATLAS production with an efficiency above 95% during long periods of stable operation.

  5. An authentication infrastructure for today and tomorrow

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Engert, D.E.

    1996-06-01

    The Open Software Foundation`s Distributed Computing Environment (OSF/DCE) was originally designed to provide a secure environment for distributed applications. By combining it with Kerberos Version 5 from MIT, it can be extended to provide network security as well. This combination can be used to build both an inter and intra organizational infrastructure while providing single sign-on for the user with overall improved security. The ESnet community of the Department of Energy is building just such an infrastructure. ESnet has modified these systems to improve their interoperability, while encouraging the developers to incorporate these changes and work more closely together tomore » continue to improve the interoperability. The success of this infrastructure depends on its flexibility to meet the needs of many applications and network security requirements. The open nature of Kerberos, combined with the vendor support of OSF/DCE, provides the infrastructure for today and tomorrow.« less

  6. The International Symposium on Grids and Clouds

    NASA Astrophysics Data System (ADS)

    The International Symposium on Grids and Clouds (ISGC) 2012 will be held at Academia Sinica in Taipei from 26 February to 2 March 2012, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). 2012 is the decennium anniversary of the ISGC which over the last decade has tracked the convergence, collaboration and innovation of individual researchers across the Asia Pacific region to a coherent community. With the continuous support and dedication from the delegates, ISGC has provided the primary international distributed computing platform where distinguished researchers and collaboration partners from around the world share their knowledge and experiences. The last decade has seen the wide-scale emergence of e-Infrastructure as a critical asset for the modern e-Scientist. The emergence of large-scale research infrastructures and instruments that has produced a torrent of electronic data is forcing a generational change in the scientific process and the mechanisms used to analyse the resulting data deluge. No longer can the processing of these vast amounts of data and production of relevant scientific results be undertaken by a single scientist. Virtual Research Communities that span organisations around the world, through an integrated digital infrastructure that connects the trust and administrative domains of multiple resource providers, have become critical in supporting these analyses. Topics covered in ISGC 2012 include: High Energy Physics, Biomedicine & Life Sciences, Earth Science, Environmental Changes and Natural Disaster Mitigation, Humanities & Social Sciences, Operations & Management, Middleware & Interoperability, Security and Networking, Infrastructure Clouds & Virtualisation, Business Models & Sustainability, Data Management, Distributed Volunteer & Desktop Grid Computing, High Throughput Computing, and High Performance, Manycore & GPU Computing.

  7. Grid Computing at GSI for ALICE and FAIR - present and future

    NASA Astrophysics Data System (ADS)

    Schwarz, Kilian; Uhlig, Florian; Karabowicz, Radoslaw; Montiel-Gonzalez, Almudena; Zynovyev, Mykhaylo; Preuss, Carsten

    2012-12-01

    The future FAIR experiments CBM and PANDA have computing requirements that fall in a category that could currently not be satisfied by one single computing centre. One needs a larger, distributed computing infrastructure to cope with the amount of data to be simulated and analysed. Since 2002, GSI operates a tier2 center for ALICE@CERN. The central component of the GSI computing facility and hence the core of the ALICE tier2 centre is a LSF/SGE batch farm, currently split into three subclusters with a total of 15000 CPU cores shared by the participating experiments, and accessible both locally and soon also completely via Grid. In terms of data storage, a 5.5 PB Lustre file system, directly accessible from all worker nodes is maintained, as well as a 300 TB xrootd-based Grid storage element. Based on this existing expertise, and utilising ALICE's middleware ‘AliEn’, the Grid infrastructure for PANDA and CBM is being built. Besides a tier0 centre at GSI, the computing Grids of the two FAIR collaborations encompass now more than 17 sites in 11 countries and are constantly expanding. The operation of the distributed FAIR computing infrastructure benefits significantly from the experience gained with the ALICE tier2 centre. A close collaboration between ALICE Offline and FAIR provides mutual advantages. The employment of a common Grid middleware as well as compatible simulation and analysis software frameworks ensure significant synergy effects.

  8. A Computational framework for telemedicine.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.

    1998-07-01

    Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less

  9. NASA's Participation in the National Computational Grid

    NASA Technical Reports Server (NTRS)

    Feiereisen, William J.; Zornetzer, Steve F. (Technical Monitor)

    1998-01-01

    Over the last several years it has become evident that the character of NASA's supercomputing needs has changed. One of the major missions of the agency is to support the design and manufacture of aero- and space-vehicles with technologies that will significantly reduce their cost. It is becoming clear that improvements in the process of aerospace design and manufacturing will require a high performance information infrastructure that allows geographically dispersed teams to draw upon resources that are broader than traditional supercomputing. A computational grid draws together our information resources into one system. We can foresee the time when a Grid will allow engineers and scientists to use the tools of supercomputers, databases and on line experimental devices in a virtual environment to collaborate with distant colleagues. The concept of a computational grid has been spoken of for many years, but several events in recent times are conspiring to allow us to actually build one. In late 1997 the National Science Foundation initiated the Partnerships for Advanced Computational Infrastructure (PACI) which is built around the idea of distributed high performance computing. The Alliance lead, by the National Computational Science Alliance (NCSA), and the National Partnership for Advanced Computational Infrastructure (NPACI), lead by the San Diego Supercomputing Center, have been instrumental in drawing together the "Grid Community" to identify the technology bottlenecks and propose a research agenda to address them. During the same period NASA has begun to reformulate parts of two major high performance computing research programs to concentrate on distributed high performance computing and has banded together with the PACI centers to address the research agenda in common.

  10. Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

    ERIC Educational Resources Information Center

    Younge, Andrew J.

    2016-01-01

    With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…

  11. S3DB core: a framework for RDF generation and management in bioinformatics infrastructures

    PubMed Central

    2010-01-01

    Background Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine. Results A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations. Conclusions The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology. PMID:20646315

  12. Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing.

    PubMed

    Hasson, Uri; Skipper, Jeremy I; Wilde, Michael J; Nusbaum, Howard C; Small, Steven L

    2008-01-15

    The increasingly complex research questions addressed by neuroimaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on storage of binary or text files. We then describe in detail the implementation of such a system and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data.

  13. Improving the Analysis, Storage and Sharing of Neuroimaging Data using Relational Databases and Distributed Computing

    PubMed Central

    Hasson, Uri; Skipper, Jeremy I.; Wilde, Michael J.; Nusbaum, Howard C.; Small, Steven L.

    2007-01-01

    The increasingly complex research questions addressed by neuroimaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on storage of binary or text files. We then describe in detail the implementation of such a system and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data. PMID:17964812

  14. Distributed Monitoring Infrastructure for Worldwide LHC Computing Grid

    NASA Astrophysics Data System (ADS)

    Andrade, P.; Babik, M.; Bhatt, K.; Chand, P.; Collados, D.; Duggal, V.; Fuente, P.; Hayashi, S.; Imamagic, E.; Joshi, P.; Kalmady, R.; Karnani, U.; Kumar, V.; Lapka, W.; Quick, R.; Tarragon, J.; Teige, S.; Triantafyllidis, C.

    2012-12-01

    The journey of a monitoring probe from its development phase to the moment its execution result is presented in an availability report is a complex process. It goes through multiple phases such as development, testing, integration, release, deployment, execution, data aggregation, computation, and reporting. Further, it involves people with different roles (developers, site managers, VO[1] managers, service managers, management), from different middleware providers (ARC[2], dCache[3], gLite[4], UNICORE[5] and VDT[6]), consortiums (WLCG[7], EMI[11], EGI[15], OSG[13]), and operational teams (GOC[16], OMB[8], OTAG[9], CSIRT[10]). The seamless harmonization of these distributed actors is in daily use for monitoring of the WLCG infrastructure. In this paper we describe the monitoring of the WLCG infrastructure from the operational perspective. We explain the complexity of the journey of a monitoring probe from its execution on a grid node to the visualization on the MyWLCG[27] portal where it is exposed to other clients. This monitoring workflow profits from the interoperability established between the SAM[19] and RSV[20] frameworks. We show how these two distributed structures are capable of uniting technologies and hiding the complexity around them, making them easy to be used by the community. Finally, the different supported deployment strategies, tailored not only for monitoring the entire infrastructure but also for monitoring sites and virtual organizations, are presented and the associated operational benefits highlighted.

  15. International Symposium on Grids and Clouds (ISGC) 2016

    NASA Astrophysics Data System (ADS)

    The International Symposium on Grids and Clouds (ISGC) 2016 will be held at Academia Sinica in Taipei, Taiwan from 13-18 March 2016, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). The theme of ISGC 2016 focuses on“Ubiquitous e-infrastructures and Applications”. Contemporary research is impossible without a strong IT component - researchers rely on the existence of stable and widely available e-infrastructures and their higher level functions and properties. As a result of these expectations, e-Infrastructures are becoming ubiquitous, providing an environment that supports large scale collaborations that deal with global challenges as well as smaller and temporal research communities focusing on particular scientific problems. To support those diversified communities and their needs, the e-Infrastructures themselves are becoming more layered and multifaceted, supporting larger groups of applications. Following the call for the last year conference, ISGC 2016 continues its aim to bring together users and application developers with those responsible for the development and operation of multi-purpose ubiquitous e-Infrastructures. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities, Arts, and Social Sciences (HASS) Applications, Virtual Research Environment (including Middleware, tools, services, workflow, etc.), Data Management, Big Data, Networking & Security, Infrastructure & Operations, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC), etc.

  16. Infrastructure sensing.

    PubMed

    Soga, Kenichi; Schooling, Jennifer

    2016-08-06

    Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors.

  17. Infrastructure sensing

    PubMed Central

    Soga, Kenichi; Schooling, Jennifer

    2016-01-01

    Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors. PMID:27499845

  18. Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation

    NASA Astrophysics Data System (ADS)

    Anisenkov, A. V.

    2018-03-01

    In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).

  19. INDIGO-DataCloud solutions for Earth Sciences

    NASA Astrophysics Data System (ADS)

    Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Fiore, Sandro; Monna, Stephen; Chen, Yin

    2017-04-01

    INDIGO-DataCloud (https://www.indigo-datacloud.eu/) is a European Commission funded project aiming to develop a data and computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid (private or public) e-infrastructures. The development of INDIGO solutions covers the different layers in cloud computing (IaaS, PaaS, SaaS), and provides tools to exploit resources like HPC or GPGPUs. INDIGO is oriented to support European Scientific research communities, that are well represented in the project. Twelve different Case Studies have been analyzed in detail from different fields: Biological & Medical sciences, Social sciences & Humanities, Environmental and Earth sciences and Physics & Astrophysics. INDIGO-DataCloud provides solutions to emerging challenges in Earth Science like: -Enabling an easy deployment of community services at different cloud sites. Many Earth Science research infrastructures often involve distributed observation stations across countries, and also have distributed data centers to support the corresponding data acquisition and curation. There is a need to easily deploy new data center services while the research infrastructure continuous spans. As an example: LifeWatch (ESFRI, Ecosystems and Biodiversity) uses INDIGO solutions to manage the deployment of services to perform complex hydrodynamics and water quality modelling over a Cloud Computing environment, predicting algae blooms, using the Docker technology: TOSCA requirement description, Docker repository, Orchestrator for deployment, AAI (AuthN, AuthZ) and OneData (Distributed Storage System). -Supporting Big Data Analysis. Nowadays, many Earth Science research communities produce large amounts of data and and are challenged by the difficulties of processing and analysing it. A climate models intercomparison data analysis case study for the European Network for Earth System Modelling (ENES) community has been setup, based on the Ophidia big data analysis framework and the Kepler workflow management system. Such services normally involve a large and distributed set of data and computing resources. In this regard, this case study exploits the INDIGO PaaS for a flexible and dynamic allocation of the resources at the infrastructural level. -Providing Distributed Data Storage Solutions. In order to allow scientific communities to perform heavy computation on huge datasets, INDIGO provides global data access solutions allowing researchers to access data in a distributed environment like fashion regardless of its location, and also to publish and share their research results with public or close communities. INDIGO solutions that support the access to distributed data storage (OneData) are being tested on EMSO infrastructure (Ocean Sciences and Geohazards) data. Another aspect of interest for the EMSO community is in efficient data processing by exploiting INDIGO services like PaaS Orchestrator. Further, for HPC exploitation, a new solution named Udocker has been implemented, enabling users to execute docker containers in supercomputers, without requiring administration privileges. This presentation will overview INDIGO solutions that are interesting and useful for Earth science communities and will show how they can be applied to other Case Studies.

  20. WPS mediation: An approach to process geospatial data on different computing backends

    NASA Astrophysics Data System (ADS)

    Giuliani, Gregory; Nativi, Stefano; Lehmann, Anthony; Ray, Nicolas

    2012-10-01

    The OGC Web Processing Service (WPS) specification allows generating information by processing distributed geospatial data made available through Spatial Data Infrastructures (SDIs). However, current SDIs have limited analytical capacities and various problems emerge when trying to use them in data and computing-intensive domains such as environmental sciences. These problems are usually not or only partially solvable using single computing resources. Therefore, the Geographic Information (GI) community is trying to benefit from the superior storage and computing capabilities offered by distributed computing (e.g., Grids, Clouds) related methods and technologies. Currently, there is no commonly agreed approach to grid-enable WPS. No implementation allows one to seamlessly execute a geoprocessing calculation following user requirements on different computing backends, ranging from a stand-alone GIS server up to computer clusters and large Grid infrastructures. Considering this issue, this paper presents a proof of concept by mediating different geospatial and Grid software packages, and by proposing an extension of WPS specification through two optional parameters. The applicability of this approach will be demonstrated using a Normalized Difference Vegetation Index (NDVI) mediated WPS process, highlighting benefits, and issues that need to be further investigated to improve performances.

  1. A parallel-processing approach to computing for the geographic sciences; applications and systems enhancements

    USGS Publications Warehouse

    Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Liu, Shu-Guang; Nichols, Erin; Haga, Jim; Maddox, Brian; Bilderback, Chris; Feller, Mark; Homer, George

    2001-01-01

    The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost, personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting information science research into parallel computing systems and applications.

  2. Geospatial Data as a Service: Towards planetary scale real-time analytics

    NASA Astrophysics Data System (ADS)

    Evans, B. J. K.; Larraondo, P. R.; Antony, J.; Richards, C. J.

    2017-12-01

    The rapid growth of earth systems, environmental and geophysical datasets poses a challenge to both end-users and infrastructure providers. For infrastructure and data providers, tasks like managing, indexing and storing large collections of geospatial data needs to take into consideration the various use cases by which consumers will want to access and use the data. Considerable investment has been made by the Earth Science community to produce suitable real-time analytics platforms for geospatial data. There are currently different interfaces that have been defined to provide data services. Unfortunately, there is considerable difference on the standards, protocols or data models which have been designed to target specific communities or working groups. The Australian National University's National Computational Infrastructure (NCI) is used for a wide range of activities in the geospatial community. Earth observations, climate and weather forecasting are examples of these communities which generate large amounts of geospatial data. The NCI has been carrying out significant effort to develop a data and services model that enables the cross-disciplinary use of data. Recent developments in cloud and distributed computing provide a publicly accessible platform where new infrastructures can be built. One of the key components these technologies offer is the possibility of having "limitless" compute power next to where the data is stored. This model is rapidly transforming data delivery from centralised monolithic services towards ubiquitous distributed services that scale up and down adapting to fluctuations in the demand. NCI has developed GSKY, a scalable, distributed server which presents a new approach for geospatial data discovery and delivery based on OGC standards. We will present the architecture and motivating use-cases that drove GSKY's collaborative design, development and production deployment. We show our approach offers the community valuable exploratory analysis capabilities, for dealing with petabyte-scale geospatial data collections.

  3. e-Infrastructures for e-Sciences 2013 A CHAIN-REDS Workshop organised under the aegis of the European Commission

    NASA Astrophysics Data System (ADS)

    The CHAIN-REDS Project is organising a workshop on "e-Infrastructures for e-Sciences" focusing on Cloud Computing and Data Repositories under the aegis of the European Commission and in co-location with the International Conference on e-Science 2013 (IEEE2013) that will be held in Beijing, P.R. of China on October 17-22, 2013. The core objective of the CHAIN-REDS project is to promote, coordinate and support the effort of a critical mass of non-European e-Infrastructures for Research and Education to collaborate with Europe addressing interoperability and interoperation of Grids and other Distributed Computing Infrastructures (DCI). From this perspective, CHAIN-REDS will optimise the interoperation of European infrastructures with those present in 6 other regions of the world, both from a development and use point of view, and catering to different communities. Overall, CHAIN-REDS will provide input for future strategies and decision-making regarding collaboration with other regions on e-Infrastructure deployment and availability of related data; it will raise the visibility of e-Infrastructures towards intercontinental audiences, covering most of the world and will provide support to establish globally connected and interoperable infrastructures, in particular between the EU and the developing regions. Organised by IHEP, INFN and Sigma Orionis with the support of all project partners, this workshop will aim at: - Presenting the state of the art of Cloud computing in Europe and in China and discussing the opportunities offered by having interoperable and federated e-Infrastructures; - Exploring the existing initiatives of Data Infrastructures in Europe and China, and highlighting the Data Repositories of interest for the Virtual Research Communities in several domains such as Health, Agriculture, Climate, etc.

  4. Waggle: A Framework for Intelligent Attentive Sensing and Actuation

    NASA Astrophysics Data System (ADS)

    Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.

    2014-12-01

    Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.

  5. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  6. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  7. Next Generation Distributed Computing for Cancer Research

    PubMed Central

    Agarwal, Pankaj; Owzar, Kouros

    2014-01-01

    Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing. PMID:25983539

  8. Next generation distributed computing for cancer research.

    PubMed

    Agarwal, Pankaj; Owzar, Kouros

    2014-01-01

    Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing.

  9. NGScloud: RNA-seq analysis of non-model species using cloud computing.

    PubMed

    Mora-Márquez, Fernando; Vázquez-Poletti, José Luis; López de Heredia, Unai

    2018-05-03

    RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon's hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis. NGScloud is freely available at https://github.com/GGFHF/NGScloud/. A manual detailing installation and how-to-use instructions is available with the distribution. unai.lopezdeheredia@upm.es.

  10. Network Computing Infrastructure to Share Tools and Data in Global Nuclear Energy Partnership

    NASA Astrophysics Data System (ADS)

    Kim, Guehee; Suzuki, Yoshio; Teshima, Naoya

    CCSE/JAEA (Center for Computational Science and e-Systems/Japan Atomic Energy Agency) integrated a prototype system of a network computing infrastructure for sharing tools and data to support the U.S. and Japan collaboration in GNEP (Global Nuclear Energy Partnership). We focused on three technical issues to apply our information process infrastructure, which are accessibility, security, and usability. In designing the prototype system, we integrated and improved both network and Web technologies. For the accessibility issue, we adopted SSL-VPN (Security Socket Layer-Virtual Private Network) technology for the access beyond firewalls. For the security issue, we developed an authentication gateway based on the PKI (Public Key Infrastructure) authentication mechanism to strengthen the security. Also, we set fine access control policy to shared tools and data and used shared key based encryption method to protect tools and data against leakage to third parties. For the usability issue, we chose Web browsers as user interface and developed Web application to provide functions to support sharing tools and data. By using WebDAV (Web-based Distributed Authoring and Versioning) function, users can manipulate shared tools and data through the Windows-like folder environment. We implemented the prototype system in Grid infrastructure for atomic energy research: AEGIS (Atomic Energy Grid Infrastructure) developed by CCSE/JAEA. The prototype system was applied for the trial use in the first period of GNEP.

  11. Parallel, distributed and GPU computing technologies in single-particle electron microscopy

    PubMed Central

    Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

    2009-01-01

    Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined. PMID:19564686

  12. Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

    PubMed

    Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

    2009-07-01

    Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.

  13. GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data

    NASA Astrophysics Data System (ADS)

    Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.

    2016-12-01

    Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.

  14. cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design

    PubMed Central

    Pan, Yuchao; Dong, Yuxi; Zhou, Jingtian; Hallen, Mark; Donald, Bruce R.; Xu, Wei

    2016-01-01

    Abstract Finding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches. PMID:27154509

  15. The open science grid

    NASA Astrophysics Data System (ADS)

    Pordes, Ruth; OSG Consortium; Petravick, Don; Kramer, Bill; Olson, Doug; Livny, Miron; Roy, Alain; Avery, Paul; Blackburn, Kent; Wenaus, Torre; Würthwein, Frank; Foster, Ian; Gardner, Rob; Wilde, Mike; Blatecky, Alan; McGee, John; Quick, Rob

    2007-07-01

    The Open Science Grid (OSG) provides a distributed facility where the Consortium members provide guaranteed and opportunistic access to shared computing and storage resources. OSG provides support for and evolution of the infrastructure through activities that cover operations, security, software, troubleshooting, addition of new capabilities, and support for existing and engagement with new communities. The OSG SciDAC-2 project provides specific activities to manage and evolve the distributed infrastructure and support it's use. The innovative aspects of the project are the maintenance and performance of a collaborative (shared & common) petascale national facility over tens of autonomous computing sites, for many hundreds of users, transferring terabytes of data a day, executing tens of thousands of jobs a day, and providing robust and usable resources for scientific groups of all types and sizes. More information can be found at the OSG web site: www.opensciencegrid.org.

  16. Knowledge Cultures and the Shaping of Work-Based Learning: The Case of Computer Engineering

    ERIC Educational Resources Information Center

    Nerland, Monika

    2008-01-01

    This paper examines how the knowledge culture of computer engineering--that is, the ways in which knowledge is produced, distributed, accumulated and collectively approached within this profession--serve to construct work-based learning in specific ways. Typically, the epistemic infrastructures take the form of information structures with a global…

  17. A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

    NASA Astrophysics Data System (ADS)

    Loring, B.; Karimabadi, H.; Rortershteyn, V.

    2015-10-01

    The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.

  18. A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim

    2014-07-01

    The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not.more » We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.« less

  19. A Latency-Tolerant Partitioner for Distributed Computing on the Information Power Grid

    NASA Technical Reports Server (NTRS)

    Das, Sajal K.; Harvey, Daniel J.; Biwas, Rupak; Kwak, Dochan (Technical Monitor)

    2001-01-01

    NASA's Information Power Grid (IPG) is an infrastructure designed to harness the power of graphically distributed computers, databases, and human expertise, in order to solve large-scale realistic computational problems. This type of a meta-computing environment is necessary to present a unified virtual machine to application developers that hides the intricacies of a highly heterogeneous environment and yet maintains adequate security. In this paper, we present a novel partitioning scheme. called MinEX, that dynamically balances processor workloads while minimizing data movement and runtime communication, for applications that are executed in a parallel distributed fashion on the IPG. We also analyze the conditions that are required for the IPG to be an effective tool for such distributed computations. Our results show that MinEX is a viable load balancer provided the nodes of the IPG are connected by a high-speed asynchronous interconnection network.

  20. An EMSO data case study within the INDIGO-DC project

    NASA Astrophysics Data System (ADS)

    Monna, Stephen; Marcucci, Nicola M.; Marinaro, Giuditta; Fiore, Sandro; D'Anca, Alessandro; Antonacci, Marica; Beranzoli, Laura; Favali, Paolo

    2017-04-01

    We present our experience based on a case study within the INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation) project (www.indigo-datacloud.eu). The aim of INDIGO-DC is to develop a data and computing platform targeting scientific communities. Our case study is an example of activities performed by INGV using data from seafloor observatories that are nodes of the infrastructure EMSO (European Multidisciplinary Seafloor and water column Observatory)-ERIC (www.emso-eu.org). EMSO is composed of several deep-seafloor and water column observatories, deployed at key sites in the European waters, thus forming a widely distributed pan-European infrastructure. In our case study we consider data collected by the NEMO-SN1 observatory, one of the EMSO nodes used for geohazard monitoring, located in the Western Ionian Sea in proximity of Etna volcano. Starting from the case study, through an agile approach, we defined some requirements for INDIGO developers, and tested some of the proposed INDIGO solutions that are of interest for our research community. Given that EMSO is a distributed infrastructure, we are interested in INDIGO solutions that allow access to distributed data storage. Access should be both user-oriented and machine-oriented, and with the use of a common identity and access system. For this purpose, we have been testing: - ONEDATA (https://onedata.org), as global data management system. - INDIGO-IAM as Identity and Access Management system. Another aspect we are interested in is the efficient data processing, and we have focused on two types of INDIGO products: - Ophidia (http://ophidia.cmcc.it), a big data analytics framework for eScience for the analysis of multidimensional data. - A collection of INDIGO Services to run processes for scientific computing through the INDIGO Orchestrator.

  1. Open source system OpenVPN in a function of Virtual Private Network

    NASA Astrophysics Data System (ADS)

    Skendzic, A.; Kovacic, B.

    2017-05-01

    Using of Virtual Private Networks (VPN) can establish high security level in network communication. VPN technology enables high security networking using distributed or public network infrastructure. VPN uses different security and managing rules inside networks. It can be set up using different communication channels like Internet or separate ISP communication infrastructure. VPN private network makes security communication channel over public network between two endpoints (computers). OpenVPN is an open source software product under GNU General Public License (GPL) that can be used to establish VPN communication between two computers inside business local network over public communication infrastructure. It uses special security protocols and 256-bit Encryption and it is capable of traversing network address translators (NATs) and firewalls. It allows computers to authenticate each other using a pre-shared secret key, certificates or username and password. This work gives review of VPN technology with a special accent on OpenVPN. This paper will also give comparison and financial benefits of using open source VPN software in business environment.

  2. The Integration of CloudStack and OCCI/OpenNebula with DIRAC

    NASA Astrophysics Data System (ADS)

    Méndez Muñoz, Víctor; Fernández Albor, Víctor; Graciani Diaz, Ricardo; Casajús Ramo, Adriàn; Fernández Pena, Tomás; Merino Arévalo, Gonzalo; José Saborido Silva, Juan

    2012-12-01

    The increasing availability of Cloud resources is arising as a realistic alternative to the Grid as a paradigm for enabling scientific communities to access large distributed computing resources. The DIRAC framework for distributed computing is an easy way to efficiently access to resources from both systems. This paper explains the integration of DIRAC with two open-source Cloud Managers: OpenNebula (taking advantage of the OCCI standard) and CloudStack. These are computing tools to manage the complexity and heterogeneity of distributed data center infrastructures, allowing to create virtual clusters on demand, including public, private and hybrid clouds. This approach has required to develop an extension to the previous DIRAC Virtual Machine engine, which was developed for Amazon EC2, allowing the connection with these new cloud managers. In the OpenNebula case, the development has been based on the CernVM Virtual Software Appliance with appropriate contextualization, while in the case of CloudStack, the infrastructure has been kept more general, which permits other Virtual Machine sources and operating systems being used. In both cases, CernVM File System has been used to facilitate software distribution to the computing nodes. With the resulting infrastructure, the cloud resources are transparent to the users through a friendly interface, like the DIRAC Web Portal. The main purpose of this integration is to get a system that can manage cloud and grid resources at the same time. This particular feature pushes DIRAC to a new conceptual denomination as interware, integrating different middleware. Users from different communities do not need to care about the installation of the standard software that is available at the nodes, nor the operating system of the host machine which is transparent to the user. This paper presents an analysis of the overhead of the virtual layer, doing some tests to compare the proposed approach with the existing Grid solution. License Notice: Published under licence in Journal of Physics: Conference Series by IOP Publishing Ltd.

  3. Cloud@Home: A New Enhanced Computing Paradigm

    NASA Astrophysics Data System (ADS)

    Distefano, Salvatore; Cunsolo, Vincenzo D.; Puliafito, Antonio; Scarpa, Marco

    Cloud computing is a distributed computing paradigm that mixes aspects of Grid computing, ("… hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities" (Foster, 2002)) Internet Computing ("…a computing platform geographically distributed across the Internet" (Milenkovic et al., 2003)), Utility computing ("a collection of technologies and business practices that enables computing to be delivered seamlessly and reliably across multiple computers, ... available as needed and billed according to usage, much like water and electricity are today" (Ross & Westerman, 2004)) Autonomic computing ("computing systems that can manage themselves given high-level objectives from administrators" (Kephart & Chess, 2003)), Edge computing ("… provides a generic template facility for any type of application to spread its execution across a dedicated grid, balancing the load …" Davis, Parikh, & Weihl, 2004) and Green computing (a new frontier of Ethical computing1 starting from the assumption that in next future energy costs will be related to the environment pollution).

  4. AstroCloud, a Cyber-Infrastructure for Astronomy Research: Cloud Computing Environments

    NASA Astrophysics Data System (ADS)

    Li, C.; Wang, J.; Cui, C.; He, B.; Fan, D.; Yang, Y.; Chen, J.; Zhang, H.; Yu, C.; Xiao, J.; Wang, C.; Cao, Z.; Fan, Y.; Hong, Z.; Li, S.; Mi, L.; Wan, W.; Wang, J.; Yin, S.

    2015-09-01

    AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on CloudStack, an open source software, we set up the cloud computing environment for AstroCloud Project. It consists of five distributed nodes across the mainland of China. Users can use and analysis data in this cloud computing environment. Based on GlusterFS, we built a scalable cloud storage system. Each user has a private space, which can be shared among different virtual machines and desktop systems. With this environments, astronomer can access to astronomical data collected by different telescopes and data centers easily, and data producers can archive their datasets safely.

  5. Grid computing technology for hydrological applications

    NASA Astrophysics Data System (ADS)

    Lecca, G.; Petitdidier, M.; Hluchy, L.; Ivanovic, M.; Kussul, N.; Ray, N.; Thieron, V.

    2011-06-01

    SummaryAdvances in e-Infrastructure promise to revolutionize sensing systems and the way in which data are collected and assimilated, and complex water systems are simulated and visualized. According to the EU Infrastructure 2010 work-programme, data and compute infrastructures and their underlying technologies, either oriented to tackle scientific challenges or complex problem solving in engineering, are expected to converge together into the so-called knowledge infrastructures, leading to a more effective research, education and innovation in the next decade and beyond. Grid technology is recognized as a fundamental component of e-Infrastructures. Nevertheless, this emerging paradigm highlights several topics, including data management, algorithm optimization, security, performance (speed, throughput, bandwidth, etc.), and scientific cooperation and collaboration issues that require further examination to fully exploit it and to better inform future research policies. The paper illustrates the results of six different surface and subsurface hydrology applications that have been deployed on the Grid. All the applications aim to answer to strong requirements from the Civil Society at large, relatively to natural and anthropogenic risks. Grid technology has been successfully tested to improve flood prediction, groundwater resources management and Black Sea hydrological survey, by providing large computing resources. It is also shown that Grid technology facilitates e-cooperation among partners by means of services for authentication and authorization, seamless access to distributed data sources, data protection and access right, and standardization.

  6. DREAMS and IMAGE: A Model and Computer Implementation for Concurrent, Life-Cycle Design of Complex Systems

    NASA Technical Reports Server (NTRS)

    Hale, Mark A.; Craig, James I.; Mistree, Farrokh; Schrage, Daniel P.

    1995-01-01

    Computing architectures are being assembled that extend concurrent engineering practices by providing more efficient execution and collaboration on distributed, heterogeneous computing networks. Built on the successes of initial architectures, requirements for a next-generation design computing infrastructure can be developed. These requirements concentrate on those needed by a designer in decision-making processes from product conception to recycling and can be categorized in two areas: design process and design information management. A designer both designs and executes design processes throughout design time to achieve better product and process capabilities while expanding fewer resources. In order to accomplish this, information, or more appropriately design knowledge, needs to be adequately managed during product and process decomposition as well as recomposition. A foundation has been laid that captures these requirements in a design architecture called DREAMS (Developing Robust Engineering Analysis Models and Specifications). In addition, a computing infrastructure, called IMAGE (Intelligent Multidisciplinary Aircraft Generation Environment), is being developed that satisfies design requirements defined in DREAMS and incorporates enabling computational technologies.

  7. The INDIGO-Datacloud Authentication and Authorization Infrastructure

    NASA Astrophysics Data System (ADS)

    Ceccanti, A.; Hardt, M.; Wegh, B.; Millar, AP; Caberletti, M.; Vianello, E.; Licehammer, S.

    2017-10-01

    Contemporary distributed computing infrastructures (DCIs) are not easily and securely accessible by scientists. These computing environments are typically hard to integrate due to interoperability problems resulting from the use of different authentication mechanisms, identity negotiation protocols and access control policies. Such limitations have a big impact on the user experience making it hard for user communities to port and run their scientific applications on resources aggregated from multiple providers. The INDIGO-DataCloud project wants to provide the services and tools needed to enable a secure composition of resources from multiple providers in support of scientific applications. In order to do so, a common AAI architecture has to be defined that supports multiple authentication mechanisms, support delegated authorization across services and can be easily integrated in off-the-shelf software. In this contribution we introduce the INDIGO Authentication and Authorization Infrastructure, describing its main components and their status and how authentication, delegation and authorization flows are implemented across services.

  8. About Distributed Simulation-based Optimization of Forming Processes using a Grid Architecture

    NASA Astrophysics Data System (ADS)

    Grauer, Manfred; Barth, Thomas

    2004-06-01

    Permanently increasing complexity of products and their manufacturing processes combined with a shorter "time-to-market" leads to more and more use of simulation and optimization software systems for product design. Finding a "good" design of a product implies the solution of computationally expensive optimization problems based on the results of simulation. Due to the computational load caused by the solution of these problems, the requirements on the Information&Telecommunication (IT) infrastructure of an enterprise or research facility are shifting from stand-alone resources towards the integration of software and hardware resources in a distributed environment for high-performance computing. Resources can either comprise software systems, hardware systems, or communication networks. An appropriate IT-infrastructure must provide the means to integrate all these resources and enable their use even across a network to cope with requirements from geographically distributed scenarios, e.g. in computational engineering and/or collaborative engineering. Integrating expert's knowledge into the optimization process is inevitable in order to reduce the complexity caused by the number of design variables and the high dimensionality of the design space. Hence, utilization of knowledge-based systems must be supported by providing data management facilities as a basis for knowledge extraction from product data. In this paper, the focus is put on a distributed problem solving environment (PSE) capable of providing access to a variety of necessary resources and services. A distributed approach integrating simulation and optimization on a network of workstations and cluster systems is presented. For geometry generation the CAD-system CATIA is used which is coupled with the FEM-simulation system INDEED for simulation of sheet-metal forming processes and the problem solving environment OpTiX for distributed optimization.

  9. Implementation of Grid Tier 2 and Tier 3 facilities on a Distributed OpenStack Cloud

    NASA Astrophysics Data System (ADS)

    Limosani, Antonio; Boland, Lucien; Coddington, Paul; Crosby, Sean; Huang, Joanna; Sevior, Martin; Wilson, Ross; Zhang, Shunde

    2014-06-01

    The Australian Government is making a AUD 100 million investment in Compute and Storage for the academic community. The Compute facilities are provided in the form of 30,000 CPU cores located at 8 nodes around Australia in a distributed virtualized Infrastructure as a Service facility based on OpenStack. The storage will eventually consist of over 100 petabytes located at 6 nodes. All will be linked via a 100 Gb/s network. This proceeding describes the development of a fully connected WLCG Tier-2 grid site as well as a general purpose Tier-3 computing cluster based on this architecture. The facility employs an extension to Torque to enable dynamic allocations of virtual machine instances. A base Scientific Linux virtual machine (VM) image is deployed in the OpenStack cloud and automatically configured as required using Puppet. Custom scripts are used to launch multiple VMs, integrate them into the dynamic Torque cluster and to mount remote file systems. We report on our experience in developing this nation-wide ATLAS and Belle II Tier 2 and Tier 3 computing infrastructure using the national Research Cloud and storage facilities.

  10. Integrating multiple scientific computing needs via a Private Cloud infrastructure

    NASA Astrophysics Data System (ADS)

    Bagnasco, S.; Berzano, D.; Brunetti, R.; Lusso, S.; Vallero, S.

    2014-06-01

    In a typical scientific computing centre, diverse applications coexist and share a single physical infrastructure. An underlying Private Cloud facility eases the management and maintenance of heterogeneous use cases such as multipurpose or application-specific batch farms, Grid sites catering to different communities, parallel interactive data analysis facilities and others. It allows to dynamically and efficiently allocate resources to any application and to tailor the virtual machines according to the applications' requirements. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques; for example, rolling updates can be performed easily and minimizing the downtime. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 site and a dynamically expandable PROOF-based Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The Private Cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem (used in two different configurations for worker- and service-class hypervisors) and the OpenWRT Linux distribution (used for network virtualization). A future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and by using mainstream contextualization tools like CloudInit.

  11. Applications of the pipeline environment for visual informatics and genomics computations

    PubMed Central

    2011-01-01

    Background Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols. Results This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls. Conclusions The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community. PMID:21791102

  12. INFN-Pisa scientific computation environment (GRID, HPC and Interactive Analysis)

    NASA Astrophysics Data System (ADS)

    Arezzini, S.; Carboni, A.; Caruso, G.; Ciampa, A.; Coscetti, S.; Mazzoni, E.; Piras, S.

    2014-06-01

    The INFN-Pisa Tier2 infrastructure is described, optimized not only for GRID CPU and Storage access, but also for a more interactive use of the resources in order to provide good solutions for the final data analysis step. The Data Center, equipped with about 6700 production cores, permits the use of modern analysis techniques realized via advanced statistical tools (like RooFit and RooStat) implemented in multicore systems. In particular a POSIX file storage access integrated with standard SRM access is provided. Therefore the unified storage infrastructure is described, based on GPFS and Xrootd, used both for SRM data repository and interactive POSIX access. Such a common infrastructure allows a transparent access to the Tier2 data to the users for their interactive analysis. The organization of a specialized many cores CPU facility devoted to interactive analysis is also described along with the login mechanism integrated with the INFN-AAI (National INFN Infrastructure) to extend the site access and use to a geographical distributed community. Such infrastructure is used also for a national computing facility in use to the INFN theoretical community, it enables a synergic use of computing and storage resources. Our Center initially developed for the HEP community is now growing and includes also HPC resources fully integrated. In recent years has been installed and managed a cluster facility (1000 cores, parallel use via InfiniBand connection) and we are now updating this facility that will provide resources for all the intermediate level HPC computing needs of the INFN theoretical national community.

  13. A Domain-Specific Language for Aviation Domain Interoperability

    ERIC Educational Resources Information Center

    Comitz, Paul

    2013-01-01

    Modern information systems require a flexible, scalable, and upgradeable infrastructure that allows communication and collaboration between heterogeneous information processing and computing environments. Aviation systems from different organizations often use differing representations and distribution policies for the same data and messages,…

  14. OGC and Grid Interoperability in enviroGRIDS Project

    NASA Astrophysics Data System (ADS)

    Gorgan, Dorian; Rodila, Denisa; Bacu, Victor; Giuliani, Gregory; Ray, Nicolas

    2010-05-01

    EnviroGRIDS (Black Sea Catchment Observation and Assessment System supporting Sustainable Development) [1] is a 4-years FP7 Project aiming to address the subjects of ecologically unsustainable development and inadequate resource management. The project develops a Spatial Data Infrastructure of the Black Sea Catchment region. The geospatial technologies offer very specialized functionality for Earth Science oriented applications as well as the Grid oriented technology that is able to support distributed and parallel processing. One challenge of the enviroGRIDS project is the interoperability between geospatial and Grid infrastructures by providing the basic and the extended features of the both technologies. The geospatial interoperability technology has been promoted as a way of dealing with large volumes of geospatial data in distributed environments through the development of interoperable Web service specifications proposed by the Open Geospatial Consortium (OGC), with applications spread across multiple fields but especially in Earth observation research. Due to the huge volumes of data available in the geospatial domain and the additional introduced issues (data management, secure data transfer, data distribution and data computation), the need for an infrastructure capable to manage all those problems becomes an important aspect. The Grid promotes and facilitates the secure interoperations of geospatial heterogeneous distributed data within a distributed environment, the creation and management of large distributed computational jobs and assures a security level for communication and transfer of messages based on certificates. This presentation analysis and discusses the most significant use cases for enabling the OGC Web services interoperability with the Grid environment and focuses on the description and implementation of the most promising one. In these use cases we give a special attention to issues such as: the relations between computational grid and the OGC Web service protocols, the advantages offered by the Grid technology - such as providing a secure interoperability between the distributed geospatial resource -and the issues introduced by the integration of distributed geospatial data in a secure environment: data and service discovery, management, access and computation. enviroGRIDS project proposes a new architecture which allows a flexible and scalable approach for integrating the geospatial domain represented by the OGC Web services with the Grid domain represented by the gLite middleware. The parallelism offered by the Grid technology is discussed and explored at the data level, management level and computation level. The analysis is carried out for OGC Web service interoperability in general but specific details are emphasized for Web Map Service (WMS), Web Feature Service (WFS), Web Coverage Service (WCS), Web Processing Service (WPS) and Catalog Service for Web (CSW). Issues regarding the mapping and the interoperability between the OGC and the Grid standards and protocols are analyzed as they are the base in solving the communication problems between the two environments: grid and geospatial. The presetation mainly highlights how the Grid environment and Grid applications capabilities can be extended and utilized in geospatial interoperability. Interoperability between geospatial and Grid infrastructures provides features such as the specific geospatial complex functionality and the high power computation and security of the Grid, high spatial model resolution and geographical area covering, flexible combination and interoperability of the geographical models. According with the Service Oriented Architecture concepts and requirements of interoperability between geospatial and Grid infrastructures each of the main functionality is visible from enviroGRIDS Portal and consequently, by the end user applications such as Decision Maker/Citizen oriented Applications. The enviroGRIDS portal is the single way of the user to get into the system and the portal faces a unique style of the graphical user interface. Main reference for further information: [1] enviroGRIDS Project, http://www.envirogrids.net/

  15. The computing and data infrastructure to interconnect EEE stations

    NASA Astrophysics Data System (ADS)

    Noferini, F.; EEE Collaboration

    2016-07-01

    The Extreme Energy Event (EEE) experiment is devoted to the search of high energy cosmic rays through a network of telescopes installed in about 50 high schools distributed throughout the Italian territory. This project requires a peculiar data management infrastructure to collect data registered in stations very far from each other and to allow a coordinated analysis. Such an infrastructure is realized at INFN-CNAF, which operates a Cloud facility based on the OpenStack opensource Cloud framework and provides Infrastructure as a Service (IaaS) for its users. In 2014 EEE started to use it for collecting, monitoring and reconstructing the data acquired in all the EEE stations. For the synchronization between the stations and the INFN-CNAF infrastructure we used BitTorrent Sync, a free peer-to-peer software designed to optimize data syncronization between distributed nodes. All data folders are syncronized with the central repository in real time to allow an immediate reconstruction of the data and their publication in a monitoring webpage. We present the architecture and the functionalities of this data management system that provides a flexible environment for the specific needs of the EEE project.

  16. A high performance computing framework for physics-based modeling and simulation of military ground vehicles

    NASA Astrophysics Data System (ADS)

    Negrut, Dan; Lamb, David; Gorsich, David

    2011-06-01

    This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up of a combination of CPUs and Graphics Processing Units (GPUs). The computational dynamics applications targeted to execute on such a hardware topology include many-body dynamics, smoothed-particle hydrodynamics (SPH) fluid simulation, and fluid-solid interaction analysis. The underlying theme of the solution approach embraced by HCT is that of partitioning the domain of interest into a number of subdomains that are each managed by a separate core/accelerator (CPU/GPU) pair. Five components at the core of HCT enable the envisioned distributed computing approach to large-scale dynamical system simulation: (a) the ability to partition the problem according to the one-to-one mapping; i.e., spatial subdivision, discussed above (pre-processing); (b) a protocol for passing data between any two co-processors; (c) algorithms for element proximity computation; and (d) the ability to carry out post-processing in a distributed fashion. In this contribution the components (a) and (b) of the HCT are demonstrated via the example of the Discrete Element Method (DEM) for rigid body dynamics with friction and contact. The collision detection task required in frictional-contact dynamics (task (c) above), is shown to benefit on the GPU of a two order of magnitude gain in efficiency when compared to traditional sequential implementations. Note: Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not imply its endorsement, recommendation, or favoring by the United States Army. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Army, and shall not be used for advertising or product endorsement purposes.

  17. WLCG scale testing during CMS data challenges

    NASA Astrophysics Data System (ADS)

    Gutsche, O.; Hajdu, C.

    2008-07-01

    The CMS computing model to process and analyze LHC collision data follows a data-location driven approach and is using the WLCG infrastructure to provide access to GRID resources. As a preparation for data taking, CMS tests its computing model during dedicated data challenges. An important part of the challenges is the test of the user analysis which poses a special challenge for the infrastructure with its random distributed access patterns. The CMS Remote Analysis Builder (CRAB) handles all interactions with the WLCG infrastructure transparently for the user. During the 2006 challenge, CMS set its goal to test the infrastructure at a scale of 50,000 user jobs per day using CRAB. Both direct submissions by individual users and automated submissions by robots were used to achieve this goal. A report will be given about the outcome of the user analysis part of the challenge using both the EGEE and OSG parts of the WLCG. In particular, the difference in submission between both GRID middlewares (resource broker vs. direct submission) will be discussed. In the end, an outlook for the 2007 data challenge is given.

  18. Expeditionary Oblong Mezzanine

    DTIC Science & Technology

    2016-03-01

    Operating System OSI Open Systems Interconnection OS X Operating System Ten PDU Power Distribution Unit POE Power Over Ethernet xvii SAAS ...providing infrastructure as a service (IaaS) and software as a service ( SaaS ) cloud computing technologies. IaaS is a way of providing computing services...such as servers, storage, and network equipment services (Mell & Grance, 2009). SaaS is a means of providing software and applications as an on

  19. Integrating Data Distribution and Data Assimilation Between the OOI CI and the NOAA DIF

    NASA Astrophysics Data System (ADS)

    Meisinger, M.; Arrott, M.; Clemesha, A.; Farcas, C.; Farcas, E.; Im, T.; Schofield, O.; Krueger, I.; Klacansky, I.; Orcutt, J.; Peach, C.; Chave, A.; Raymer, D.; Vernon, F.

    2008-12-01

    The Ocean Observatories Initiative (OOI) is an NSF funded program to establish the ocean observing infrastructure of the 21st century benefiting research and education. It is currently approaching final design and promises to deliver cyber and physical observatory infrastructure components as well as substantial core instrumentation to study environmental processes of the ocean at various scales, from coastal shelf-slope exchange processes to the deep ocean. The OOI's data distribution network lies at the heart of its cyber- infrastructure, which enables a multitude of science and education applications, ranging from data analysis, to processing, visualization and ontology supported query and mediation. In addition, it fundamentally supports a class of applications exploiting the knowledge gained from analyzing observational data for objective-driven ocean observing applications, such as automatically triggered response to episodic environmental events and interactive instrument tasking and control. The U.S. Department of Commerce through NOAA operates the Integrated Ocean Observing System (IOOS) providing continuous data in various formats, rates and scales on open oceans and coastal waters to scientists, managers, businesses, governments, and the public to support research and inform decision-making. The NOAA IOOS program initiated development of the Data Integration Framework (DIF) to improve management and delivery of an initial subset of ocean observations with the expectation of achieving improvements in a select set of NOAA's decision-support tools. Both OOI and NOAA through DIF collaborate on an effort to integrate the data distribution, access and analysis needs of both programs. We present details and early findings from this collaboration; one part of it is the development of a demonstrator combining web-based user access to oceanographic data through ERDDAP, efficient science data distribution, and scalable, self-healing deployment in a cloud computing environment. ERDDAP is a web-based front-end application integrating oceanographic data sources of various formats, for instance CDF data files as aggregated through NcML or presented using a THREDDS server. The OOI-designed data distribution network provides global traffic management and computational load balancing for observatory resources; it makes use of the OpenDAP Data Access Protocol (DAP) for efficient canonical science data distribution over the network. A cloud computing strategy is the basis for scalable, self-healing organization of an observatory's computing and storage resources, independent of the physical location and technical implementation of these resources.

  20. High-throughput neuroimaging-genetics computational infrastructure

    PubMed Central

    Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Hobel, Sam; Vespa, Paul; Woo Moon, Seok; Van Horn, John D.; Franco, Joseph; Toga, Arthur W.

    2014-01-01

    Many contemporary neuroscientific investigations face significant challenges in terms of data management, computational processing, data mining, and results interpretation. These four pillars define the core infrastructure necessary to plan, organize, orchestrate, validate, and disseminate novel scientific methods, computational resources, and translational healthcare findings. Data management includes protocols for data acquisition, archival, query, transfer, retrieval, and aggregation. Computational processing involves the necessary software, hardware, and networking infrastructure required to handle large amounts of heterogeneous neuroimaging, genetics, clinical, and phenotypic data and meta-data. Data mining refers to the process of automatically extracting data features, characteristics and associations, which are not readily visible by human exploration of the raw dataset. Result interpretation includes scientific visualization, community validation of findings and reproducible findings. In this manuscript we describe the novel high-throughput neuroimaging-genetics computational infrastructure available at the Institute for Neuroimaging and Informatics (INI) and the Laboratory of Neuro Imaging (LONI) at University of Southern California (USC). INI and LONI include ultra-high-field and standard-field MRI brain scanners along with an imaging-genetics database for storing the complete provenance of the raw and derived data and meta-data. In addition, the institute provides a large number of software tools for image and shape analysis, mathematical modeling, genomic sequence processing, and scientific visualization. A unique feature of this architecture is the Pipeline environment, which integrates the data management, processing, transfer, and visualization. Through its client-server architecture, the Pipeline environment provides a graphical user interface for designing, executing, monitoring validating, and disseminating of complex protocols that utilize diverse suites of software tools and web-services. These pipeline workflows are represented as portable XML objects which transfer the execution instructions and user specifications from the client user machine to remote pipeline servers for distributed computing. Using Alzheimer's and Parkinson's data, we provide several examples of translational applications using this infrastructure1. PMID:24795619

  1. AGIS: Evolution of Distributed Computing information system for ATLAS

    NASA Astrophysics Data System (ADS)

    Anisenkov, A.; Di Girolamo, A.; Alandes, M.; Karavakis, E.

    2015-12-01

    ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produces petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization of computing resources in order to meet the ATLAS requirements of petabytes scale data operations. It has been evolved after the first period of LHC data taking (Run-1) in order to cope with new challenges of the upcoming Run- 2. In this paper we describe the evolution and recent developments of the ATLAS Grid Information System (AGIS), developed in order to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.

  2. Fog-computing concept usage as means to enhance information and control system reliability

    NASA Astrophysics Data System (ADS)

    Melnik, E. V.; Klimenko, A. B.; Ivanov, D. Ya

    2018-05-01

    This paper focuses on the reliability issue of information and control systems (ICS). The authors propose using the elements of the fog-computing concept to enhance the reliability function. The key idea of fog-computing is to shift computations to the fog-layer of the network, and thus to decrease the workload of the communication environment and data processing components. As for ICS, workload also can be distributed among sensors, actuators and network infrastructure facilities near the sources of data. The authors simulated typical workload distribution situations for the “traditional” ICS architecture and for the one with fogcomputing concept elements usage. The paper contains some models, selected simulation results and conclusion about the prospects of the fog-computing as a means to enhance ICS reliability.

  3. Subtlenoise: sonification of distributed computing operations

    NASA Astrophysics Data System (ADS)

    Love, P. A.

    2015-12-01

    The operation of distributed computing systems requires comprehensive monitoring to ensure reliability and robustness. There are two components found in most monitoring systems: one being visually rich time-series graphs and another being notification systems for alerting operators under certain pre-defined conditions. In this paper the sonification of monitoring messages is explored using an architecture that fits easily within existing infrastructures based on mature opensource technologies such as ZeroMQ, Logstash, and Supercollider (a synth engine). Message attributes are mapped onto audio attributes based on broad classification of the message (continuous or discrete metrics) but keeping the audio stream subtle in nature. The benefits of audio rendering are described in the context of distributed computing operations and may provide a less intrusive way to understand the operational health of these systems.

  4. UNH Data Cooperative: A Cyber Infrastructure for Earth System Studies

    NASA Astrophysics Data System (ADS)

    Braswell, B. H.; Fekete, B. M.; Prusevich, A.; Gliden, S.; Magill, A.; Vorosmarty, C. J.

    2007-12-01

    Earth system scientists and managers have a continuously growing demand for a wide array of earth observations derived from various data sources including (a) modern satellite retrievals, (b) "in-situ" records, (c) various simulation outputs, and (d) assimilated data products combining model results with observational records. The sheer quantity of data, and formatting inconsistencies make it difficult for users to take full advantage of this important information resource. Thus the system could benefit from a thorough retooling of our current data processing procedures and infrastructure. Emerging technologies, like OPeNDAP and OGC map services, open standard data formats (NetCDF, HDF) data cataloging systems (NASA-Echo, Global Change Master Directory, etc.) are providing the basis for a new approach in data management and processing, where web- services are increasingly designed to serve computer-to-computer communications without human interactions and complex analysis can be carried out over distributed computer resources interconnected via cyber infrastructure. The UNH Earth System Data Collaborative is designed to utilize the aforementioned emerging web technologies to offer new means of access to earth system data. While the UNH Data Collaborative serves a wide array of data ranging from weather station data (Climate Portal) to ocean buoy records and ship tracks (Portsmouth Harbor Initiative) to land cover characteristics, etc. the underlaying data architecture shares common components for data mining and data dissemination via web-services. Perhaps the most unique element of the UNH Data Cooperative's IT infrastructure is its prototype modeling environment for regional ecosystem surveillance over the Northeast corridor, which allows the integration of complex earth system model components with the Cooperative's data services. While the complexity of the IT infrastructure to perform complex computations is continuously increasing, scientists are often forced to spend considerable amount of time to solve basic data management and preprocessing tasks and deal with low level computational design problems like parallelization of model codes. Our modeling infrastructure is designed to take care the bulk of the common tasks found in complex earth system models like I/O handling, computational domain and time management, parallel execution of the modeling tasks, etc. The modeling infrastructure allows scientists to focus on the numerical implementation of the physical processes on a single computational objects(typically grid cells) while the framework takes care of the preprocessing of input data, establishing of the data exchange between computation objects and the execution of the science code. In our presentation, we will discuss the key concepts of our modeling infrastructure. We will demonstrate integration of our modeling framework with data services offered by the UNH Earth System Data Collaborative via web interfaces. We will layout the road map to turn our prototype modeling environment into a truly community framework for wide range of earth system scientists and environmental managers.

  5. The architecture of a distributed medical dictionary.

    PubMed

    Fowler, J; Buffone, G; Moreau, D

    1995-01-01

    Exploiting high-speed computer networks to provide a national medical information infrastructure is a goal for medical informatics. The Distributed Medical Dictionary under development at Baylor College of Medicine is a model for an architecture that supports collaborative development of a distributed online medical terminology knowledge-base. A prototype is described that illustrates the concept. Issues that must be addressed by such a system include high availability, acceptable response time, support for local idiom, and control of vocabulary.

  6. PRACE - The European HPC Infrastructure

    NASA Astrophysics Data System (ADS)

    Stadelmeyer, Peter

    2014-05-01

    The mission of PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to realize this mission by offering world class computing and data management resources and services through a peer review process. This talk gives a general overview about PRACE and the PRACE research infrastructure (RI). PRACE is established as an international not-for-profit association and the PRACE RI is a pan-European supercomputing infrastructure which offers access to computing and data management resources at partner sites distributed throughout Europe. Besides a short summary about the organization, history, and activities of PRACE, it is explained how scientists and researchers from academia and industry from around the world can access PRACE systems and which education and training activities are offered by PRACE. The overview also contains a selection of PRACE contributions to societal challenges and ongoing activities. Examples of the latter are beside others petascaling, application benchmark suite, best practice guides for efficient use of key architectures, application enabling / scaling, new programming models, and industrial applications. The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels. The PRACE Research Infrastructure provides a persistent world-class high performance computing service for scientists and researchers from academia and industry in Europe. The computer systems and their operations accessible through PRACE are provided by 4 PRACE members (BSC representing Spain, CINECA representing Italy, GCS representing Germany and GENCI representing France). The Implementation Phase of PRACE receives funding from the EU's Seventh Framework Programme (FP7/2007-2013) under grant agreements RI-261557, RI-283493 and RI-312763. For more information, see www.prace-ri.eu

  7. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis

    PubMed Central

    Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

    2016-01-01

    The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153

  8. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

    PubMed

    Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

    2016-01-01

    The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.

  9. Development of a Dynamically Configurable,Object-Oriented Framework for Distributed, Multi-modal Computational Aerospace Systems Simulation

    NASA Technical Reports Server (NTRS)

    Afjeh, Abdollah A.; Reed, John A.

    2003-01-01

    This research is aimed at developing a neiv and advanced simulation framework that will significantly improve the overall efficiency of aerospace systems design and development. This objective will be accomplished through an innovative integration of object-oriented and Web-based technologies ivith both new and proven simulation methodologies. The basic approach involves Ihree major areas of research: Aerospace system and component representation using a hierarchical object-oriented component model which enables the use of multimodels and enforces component interoperability. Collaborative software environment that streamlines the process of developing, sharing and integrating aerospace design and analysis models. . Development of a distributed infrastructure which enables Web-based exchange of models to simplify the collaborative design process, and to support computationally intensive aerospace design and analysis processes. Research for the first year dealt with the design of the basic architecture and supporting infrastructure, an initial implementation of that design, and a demonstration of its application to an example aircraft engine system simulation.

  10. VASA: Interactive Computational Steering of Large Asynchronous Simulation Pipelines for Societal Infrastructure.

    PubMed

    Ko, Sungahn; Zhao, Jieqiong; Xia, Jing; Afzal, Shehzad; Wang, Xiaoyu; Abram, Greg; Elmqvist, Niklas; Kne, Len; Van Riper, David; Gaither, Kelly; Kennedy, Shaun; Tolone, William; Ribarsky, William; Ebert, David S

    2014-12-01

    We present VASA, a visual analytics platform consisting of a desktop application, a component model, and a suite of distributed simulation components for modeling the impact of societal threats such as weather, food contamination, and traffic on critical infrastructure such as supply chains, road networks, and power grids. Each component encapsulates a high-fidelity simulation model that together form an asynchronous simulation pipeline: a system of systems of individual simulations with a common data and parameter exchange format. At the heart of VASA is the Workbench, a visual analytics application providing three distinct features: (1) low-fidelity approximations of the distributed simulation components using local simulation proxies to enable analysts to interactively configure a simulation run; (2) computational steering mechanisms to manage the execution of individual simulation components; and (3) spatiotemporal and interactive methods to explore the combined results of a simulation run. We showcase the utility of the platform using examples involving supply chains during a hurricane as well as food contamination in a fast food restaurant chain.

  11. Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics.

    PubMed

    Peek, N; Holmes, J H; Sun, J

    2014-08-15

    To review technical and methodological challenges for big data research in biomedicine and health. We discuss sources of big datasets, survey infrastructures for big data storage and big data processing, and describe the main challenges that arise when analyzing big data. The life and biomedical sciences are massively contributing to the big data revolution through secondary use of data that were collected during routine care and through new data sources such as social media. Efficient processing of big datasets is typically achieved by distributing computation over a cluster of computers. Data analysts should be aware of pitfalls related to big data such as bias in routine care data and the risk of false-positive findings in high-dimensional datasets. The major challenge for the near future is to transform analytical methods that are used in the biomedical and health domain, to fit the distributed storage and processing model that is required to handle big data, while ensuring confidentiality of the data being analyzed.

  12. Enterprise Cloud Architecture for Chinese Ministry of Railway

    NASA Astrophysics Data System (ADS)

    Shan, Xumei; Liu, Hefeng

    Enterprise like PRC Ministry of Railways (MOR), is facing various challenges ranging from highly distributed computing environment and low legacy system utilization, Cloud Computing is increasingly regarded as one workable solution to address this. This article describes full scale cloud solution with Intel Tashi as virtual machine infrastructure layer, Hadoop HDFS as computing platform, and self developed SaaS interface, gluing virtual machine and HDFS with Xen hypervisor. As a result, on demand computing task application and deployment have been tackled per MOR real working scenarios at the end of article.

  13. AGIS: The ATLAS Grid Information System

    NASA Astrophysics Data System (ADS)

    Anisenkov, A.; Di Girolamo, A.; Klimentov, A.; Oleynik, D.; Petrosyan, A.; Atlas Collaboration

    2014-06-01

    ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produced petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization and computing resources able to meet ATLAS requirements of petabytes scale data operations. In this paper we describe the ATLAS Grid Information System (AGIS), designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.

  14. Redefining Tactical Operations for MER Using Cloud Computing

    NASA Technical Reports Server (NTRS)

    Joswig, Joseph C.; Shams, Khawaja S.

    2011-01-01

    The Mars Exploration Rover Mission (MER) includes the twin rovers, Spirit and Opportunity, which have been performing geological research and surface exploration since early 2004. The rovers' durability well beyond their original prime mission (90 sols or Martian days) has allowed them to be a valuable platform for scientific research for well over 2000 sols, but as a by-product it has produced new challenges in providing efficient and cost-effective tactical operational planning. An early stage process adaptation was the move to distributed operations as mission scientists returned to their places of work in the summer of 2004, but they would still came together via teleconference and connected software to plan rover activities a few times a week. This distributed model has worked well since, but it requires the purchase, operation, and maintenance of a dedicated infrastructure at the Jet Propulsion Laboratory. This server infrastructure is costly to operate and the periodic nature of its usage (typically heavy usage for 8 hours every 2 days) has made moving to a cloud based tactical infrastructure an extremely tempting proposition. In this paper we will review both past and current implementations of the tactical planning application focusing on remote plan saving and discuss the unique challenges present with long-latency, distributed operations. We then detail the motivations behind our move to cloud based computing services and as well as our system design and implementation. We will discuss security and reliability concerns and how they were addressed

  15. A parallel-processing approach to computing for the geographic sciences

    USGS Publications Warehouse

    Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark

    2001-01-01

    The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.

  16. OSiRIS: a distributed Ceph deployment using software defined networking for multi-institutional research

    NASA Astrophysics Data System (ADS)

    McKee, Shawn; Kissel, Ezra; Meekhof, Benjeman; Swany, Martin; Miller, Charles; Gregorowicz, Michael

    2017-10-01

    We report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The projects goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to read, write, manage and share data directly from their own computing locations. The NSF CC*DNI DIBBS program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges. While OSiRIS will eventually be serving a broad range of science domains, its first adopter will be the LHC ATLAS detector project via the ATLAS Great Lakes Tier-2 (AGLT2) jointly located at the University of Michigan and Michigan State University. Part of our presentation will cover how ATLAS is using the OSiRIS infrastructure and our experiences integrating our first user community. The presentation will also review the motivations for and goals of the project, the technical details of the OSiRIS infrastructure, the challenges in providing such an infrastructure, and the technical choices made to address those challenges. We will conclude with our plans for the remaining 4 years of the project and our vision for what we hope to deliver by the projects end.

  17. Controlling Infrastructure Costs: Right-Sizing the Mission Control Facility

    NASA Technical Reports Server (NTRS)

    Martin, Keith; Sen-Roy, Michael; Heiman, Jennifer

    2009-01-01

    Johnson Space Center's Mission Control Center is a space vehicle, space program agnostic facility. The current operational design is essentially identical to the original facility architecture that was developed and deployed in the mid-90's. In an effort to streamline the support costs of the mission critical facility, the Mission Operations Division (MOD) of Johnson Space Center (JSC) has sponsored an exploratory project to evaluate and inject current state-of-the-practice Information Technology (IT) tools, processes and technology into legacy operations. The general push in the IT industry has been trending towards a data-centric computer infrastructure for the past several years. Organizations facing challenges with facility operations costs are turning to creative solutions combining hardware consolidation, virtualization and remote access to meet and exceed performance, security, and availability requirements. The Operations Technology Facility (OTF) organization at the Johnson Space Center has been chartered to build and evaluate a parallel Mission Control infrastructure, replacing the existing, thick-client distributed computing model and network architecture with a data center model utilizing virtualization to provide the MCC Infrastructure as a Service. The OTF will design a replacement architecture for the Mission Control Facility, leveraging hardware consolidation through the use of blade servers, increasing utilization rates for compute platforms through virtualization while expanding connectivity options through the deployment of secure remote access. The architecture demonstrates the maturity of the technologies generally available in industry today and the ability to successfully abstract the tightly coupled relationship between thick-client software and legacy hardware into a hardware agnostic "Infrastructure as a Service" capability that can scale to meet future requirements of new space programs and spacecraft. This paper discusses the benefits and difficulties that a migration to cloud-based computing philosophies has uncovered when compared to the legacy Mission Control Center architecture. The team consists of system and software engineers with extensive experience with the MCC infrastructure and software currently used to support the International Space Station (ISS) and Space Shuttle program (SSP).

  18. Outlook for grid service technologies within the @neurIST eHealth environment.

    PubMed

    Arbona, A; Benkner, S; Fingberg, J; Frangi, A F; Hofmann, M; Hose, D R; Lonsdale, G; Ruefenacht, D; Viceconti, M

    2006-01-01

    The aim of the @neurIST project is to create an IT infrastructure for the management of all processes linked to research, diagnosis and treatment development for complex and multi-factorial diseases. The IT infrastructure will be developed for one such disease, cerebral aneurysm and subarachnoid haemorrhage, but its core technologies will be transferable to meet the needs of other medical areas. Since the IT infrastructure for @neurIST will need to encompass data repositories, computational analysis services and information systems handling multi-scale, multi-modal information at distributed sites, the natural basis for the IT infrastructure is a Grid Service middleware. The project will adopt a service-oriented architecture because it aims to provide a system addressing the needs of medical researchers, clinicians and health care specialists (and their IT providers/systems) and medical supplier/consulting industries.

  19. Community-driven computational biology with Debian Linux.

    PubMed

    Möller, Steffen; Krabbenhöft, Hajo Nils; Tille, Andreas; Paleino, David; Williams, Alan; Wolstencroft, Katy; Goble, Carole; Holland, Richard; Belhachemi, Dominique; Plessy, Charles

    2010-12-21

    The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlying hardware and the analysis pipelines can be shared along with the software. Debian Med closes the gap between developers and users. It provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology. For geographically distributed teams it can ensure they are working on the same versions of tools, in the same conditions. This contributes to the world-wide networking of researchers.

  20. The multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) high performance computing infrastructure: applications in neuroscience and neuroinformatics research

    PubMed Central

    Goscinski, Wojtek J.; McIntosh, Paul; Felzmann, Ulrich; Maksimenko, Anton; Hall, Christopher J.; Gureyev, Timur; Thompson, Darren; Janke, Andrew; Galloway, Graham; Killeen, Neil E. B.; Raniga, Parnesh; Kaluza, Owen; Ng, Amanda; Poudel, Govinda; Barnes, David G.; Nguyen, Toan; Bonnington, Paul; Egan, Gary F.

    2014-01-01

    The Multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) is a national imaging and visualization facility established by Monash University, the Australian Synchrotron, the Commonwealth Scientific Industrial Research Organization (CSIRO), and the Victorian Partnership for Advanced Computing (VPAC), with funding from the National Computational Infrastructure and the Victorian Government. The MASSIVE facility provides hardware, software, and expertise to drive research in the biomedical sciences, particularly advanced brain imaging research using synchrotron x-ray and infrared imaging, functional and structural magnetic resonance imaging (MRI), x-ray computer tomography (CT), electron microscopy and optical microscopy. The development of MASSIVE has been based on best practice in system integration methodologies, frameworks, and architectures. The facility has: (i) integrated multiple different neuroimaging analysis software components, (ii) enabled cross-platform and cross-modality integration of neuroinformatics tools, and (iii) brought together neuroimaging databases and analysis workflows. MASSIVE is now operational as a nationally distributed and integrated facility for neuroinfomatics and brain imaging research. PMID:24734019

  1. A prototype Infrastructure for Cloud-based distributed services in High Availability over WAN

    NASA Astrophysics Data System (ADS)

    Bulfon, C.; Carlino, G.; De Salvo, A.; Doria, A.; Graziosi, C.; Pardi, S.; Sanchez, A.; Carboni, M.; Bolletta, P.; Puccio, L.; Capone, V.; Merola, L.

    2015-12-01

    In this work we present the architectural and performance studies concerning a prototype of a distributed Tier2 infrastructure for HEP, instantiated between the two Italian sites of INFN-Romal and INFN-Napoli. The network infrastructure is based on a Layer-2 geographical link, provided by the Italian NREN (GARR), directly connecting the two remote LANs of the named sites. By exploiting the possibilities offered by the new distributed file systems, a shared storage area with synchronous copy has been set up. The computing infrastructure, based on an OpenStack facility, is using a set of distributed Hypervisors installed in both sites. The main parameter to be taken into account when managing two remote sites with a single framework is the effect of the latency, due to the distance and the end-to-end service overhead. In order to understand the capabilities and limits of our setup, the impact of latency has been investigated by means of a set of stress tests, including data I/O throughput, metadata access performance evaluation and network occupancy, during the life cycle of a Virtual Machine. A set of resilience tests has also been performed, in order to verify the stability of the system on the event of hardware or software faults. The results of this work show that the reliability and robustness of the chosen architecture are effective enough to build a production system and to provide common services. This prototype can also be extended to multiple sites with small changes of the network topology, thus creating a National Network of Cloud-based distributed services, in HA over WAN.

  2. Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization Case Study

    NASA Astrophysics Data System (ADS)

    Hassan, A. H.; Fluke, C. J.; Barnes, D. G.

    2012-09-01

    Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed today's single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a “software as a service” manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.

  3. Cloud flexibility using DIRAC interware

    NASA Astrophysics Data System (ADS)

    Fernandez Albor, Víctor; Seco Miguelez, Marcos; Fernandez Pena, Tomas; Mendez Muñoz, Victor; Saborido Silva, Juan Jose; Graciani Diaz, Ricardo

    2014-06-01

    Communities of different locations are running their computing jobs on dedicated infrastructures without the need to worry about software, hardware or even the site where their programs are going to be executed. Nevertheless, this usually implies that they are restricted to use certain types or versions of an Operating System because either their software needs an definite version of a system library or a specific platform is required by the collaboration to which they belong. On this scenario, if a data center wants to service software to incompatible communities, it has to split its physical resources among those communities. This splitting will inevitably lead to an underuse of resources because the data centers are bound to have periods where one or more of its subclusters are idle. It is, in this situation, where Cloud Computing provides the flexibility and reduction in computational cost that data centers are searching for. This paper describes a set of realistic tests that we ran on one of such implementations. The test comprise software from three different HEP communities (Auger, LHCb and QCD phenomelogists) and the Parsec Benchmark Suite running on one or more of three Linux flavors (SL5, Ubuntu 10.04 and Fedora 13). The implemented infrastructure has, at the cloud level, CloudStack that manages the virtual machines (VM) and the hosts on which they run, and, at the user level, the DIRAC framework along with a VM extension that will submit, monitorize and keep track of the user jobs and also requests CloudStack to start or stop the necessary VM's. In this infrastructure, the community software is distributed via the CernVM-FS, which has been proven to be a reliable and scalable software distribution system. With the resulting infrastructure, users are allowed to send their jobs transparently to the Data Center. The main purpose of this system is the creation of flexible cluster, multiplatform with an scalable method for software distribution for several VOs. Users from different communities do not need to care about the installation of the standard software that is available at the nodes, nor the operating system of the host machine, which is transparent to the user.

  4. Maintaining Traceability in an Evolving Distributed Computing Environment

    NASA Astrophysics Data System (ADS)

    Collier, I.; Wartel, R.

    2015-12-01

    The management of risk is fundamental to the operation of any distributed computing infrastructure. Identifying the cause of incidents is essential to prevent them from re-occurring. In addition, it is a goal to contain the impact of an incident while keeping services operational. For response to incidents to be acceptable this needs to be commensurate with the scale of the problem. The minimum level of traceability for distributed computing infrastructure usage is to be able to identify the source of all actions (executables, file transfers, pilot jobs, portal jobs, etc.) and the individual who initiated them. In addition, sufficiently fine-grained controls, such as blocking the originating user and monitoring to detect abnormal behaviour, are necessary for keeping services operational. It is essential to be able to understand the cause and to fix any problems before re-enabling access for the user. The aim is to be able to answer the basic questions who, what, where, and when concerning any incident. This requires retaining all relevant information, including timestamps and the digital identity of the user, sufficient to identify, for each service instance, and for every security event including at least the following: connect, authenticate, authorize (including identity changes) and disconnect. In traditional grid infrastructures (WLCG, EGI, OSG etc.) best practices and procedures for gathering and maintaining the information required to maintain traceability are well established. In particular, sites collect and store information required to ensure traceability of events at their sites. With the increased use of virtualisation and private and public clouds for HEP workloads established procedures, which are unable to see 'inside' running virtual machines no longer capture all the information required. Maintaining traceability will at least involve a shift of responsibility from sites to Virtual Organisations (VOs) bringing with it new requirements for their logging infrastructures. VOs indeed need to fulfil a new operational role and become fully active participants in the incident response process. We present an analysis of the changing requirements to maintain traceability for virtualised and cloud based workflows with particular reference to the work of the WLCG Traceability Working Group.

  5. Integration of end-user Cloud storage for CMS analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riahi, Hassen; Aimar, Alberto; Ayllon, Alejandro Alvarez

    End-user Cloud storage is increasing rapidly in popularity in research communities thanks to the collaboration capabilities it offers, namely synchronisation and sharing. CERN IT has implemented a model of such storage named, CERNBox, integrated with the CERN AuthN and AuthZ services. To exploit the use of the end-user Cloud storage for the distributed data analysis activity, the CMS experiment has started the integration of CERNBox as a Grid resource. This will allow CMS users to make use of their own storage in the Cloud for their analysis activities as well as to benefit from synchronisation and sharing capabilities to achievemore » results faster and more effectively. It will provide an integration model of Cloud storages in the Grid, which is implemented and commissioned over the world’s largest computing Grid infrastructure, Worldwide LHC Computing Grid (WLCG). In this paper, we present the integration strategy and infrastructure changes needed in order to transparently integrate end-user Cloud storage with the CMS distributed computing model. We describe the new challenges faced in data management between Grid and Cloud and how they were addressed, along with details of the support for Cloud storage recently introduced into the WLCG data movement middleware, FTS3. Finally, the commissioning experience of CERNBox for the distributed data analysis activity is also presented.« less

  6. Integration of end-user Cloud storage for CMS analysis

    DOE PAGES

    Riahi, Hassen; Aimar, Alberto; Ayllon, Alejandro Alvarez; ...

    2017-05-19

    End-user Cloud storage is increasing rapidly in popularity in research communities thanks to the collaboration capabilities it offers, namely synchronisation and sharing. CERN IT has implemented a model of such storage named, CERNBox, integrated with the CERN AuthN and AuthZ services. To exploit the use of the end-user Cloud storage for the distributed data analysis activity, the CMS experiment has started the integration of CERNBox as a Grid resource. This will allow CMS users to make use of their own storage in the Cloud for their analysis activities as well as to benefit from synchronisation and sharing capabilities to achievemore » results faster and more effectively. It will provide an integration model of Cloud storages in the Grid, which is implemented and commissioned over the world’s largest computing Grid infrastructure, Worldwide LHC Computing Grid (WLCG). In this paper, we present the integration strategy and infrastructure changes needed in order to transparently integrate end-user Cloud storage with the CMS distributed computing model. We describe the new challenges faced in data management between Grid and Cloud and how they were addressed, along with details of the support for Cloud storage recently introduced into the WLCG data movement middleware, FTS3. Finally, the commissioning experience of CERNBox for the distributed data analysis activity is also presented.« less

  7. Jali - Unstructured Mesh Infrastructure for Multi-Physics Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garimella, Rao V; Berndt, Markus; Coon, Ethan

    2017-04-13

    Jali is a parallel unstructured mesh infrastructure library designed for use by multi-physics simulations. It supports 2D and 3D arbitrary polyhedral meshes distributed over hundreds to thousands of nodes. Jali can read write Exodus II meshes along with fields and sets on the mesh and support for other formats is partially implemented or is (https://github.com/MeshToolkit/MSTK), an open source general purpose unstructured mesh infrastructure library from Los Alamos National Laboratory. While it has been made to work with other mesh frameworks such as MOAB and STKmesh in the past, support for maintaining the interface to these frameworks has been suspended formore » now. Jali supports distributed as well as on-node parallelism. Support of on-node parallelism is through direct use of the the mesh in multi-threaded constructs or through the use of "tiles" which are submeshes or sub-partitions of a partition destined for a compute node.« less

  8. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.

    PubMed

    Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

    2016-01-01

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.

  9. Device-Enabled Authorization in the Grey System

    DTIC Science & Technology

    2005-02-01

    proof checker. Journal of Automated Reasoning 31(3-4):231–260, 2003. [7] D. Balfanz , D. Dean, and M. Spreitzer. A security infrastructure for...distributed Java applications. In Proceedings of the 21st IEEE Symposium on Security and Privacy, May 2002. [8] D. Balfanz and E. Felten. Hand-held computers

  10. Poster — Thur Eve — 74: Distributed, asynchronous, reactive dosimetric and outcomes analysis using DICOMautomaton

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, Haley; BC Cancer Agency, Surrey, B.C.; BC Cancer Agency, Vancouver, B.C.

    2014-08-15

    Many have speculated about the future of computational technology in clinical radiation oncology. It has been advocated that the next generation of computational infrastructure will improve on the current generation by incorporating richer aspects of automation, more heavily and seamlessly featuring distributed and parallel computation, and providing more flexibility toward aggregate data analysis. In this report we describe how a recently created — but currently existing — analysis framework (DICOMautomaton) incorporates these aspects. DICOMautomaton supports a variety of use cases but is especially suited for dosimetric outcomes correlation analysis, investigation and comparison of radiotherapy treatment efficacy, and dose-volume computation. Wemore » describe: how it overcomes computational bottlenecks by distributing workload across a network of machines; how modern, asynchronous computational techniques are used to reduce blocking and avoid unnecessary computation; and how issues of out-of-date data are addressed using reactive programming techniques and data dependency chains. We describe internal architecture of the software and give a detailed demonstration of how DICOMautomaton could be used to search for correlations between dosimetric and outcomes data.« less

  11. Downscaling seasonal to centennial simulations on distributed computing infrastructures using WRF model. The WRF4G project

    NASA Astrophysics Data System (ADS)

    Cofino, A. S.; Fernández Quiruelas, V.; Blanco Real, J. C.; García Díez, M.; Fernández, J.

    2013-12-01

    Nowadays Grid Computing is powerful computational tool which is ready to be used for scientific community in different areas (such as biomedicine, astrophysics, climate, etc.). However, the use of this distributed computing infrastructures (DCI) is not yet common practice in climate research, and only a few teams and applications in this area take advantage of this infrastructure. Thus, the WRF4G project objective is to popularize the use of this technology in the atmospheric sciences area. In order to achieve this objective, one of the most used applications has been taken (WRF; a limited- area model, successor of the MM5 model), that has a user community formed by more than 8000 researchers worldwide. This community develop its research activity on different areas and could benefit from the advantages of Grid resources (case study simulations, regional hind-cast/forecast, sensitivity studies, etc.). The WRF model is used by many groups, in the climate research community, to carry on downscaling simulations. Therefore this community will also benefit. However, Grid infrastructures have some drawbacks for the execution of applications that make an intensive use of CPU and memory for a long period of time. This makes necessary to develop a specific framework (middleware). This middleware encapsulates the application and provides appropriate services for the monitoring and management of the simulations and the data. Thus,another objective of theWRF4G project consists on the development of a generic adaptation of WRF to DCIs. It should simplify the access to the DCIs for the researchers, and also to free them from the technical and computational aspects of the use of theses DCI. Finally, in order to demonstrate the ability of WRF4G solving actual scientific challenges with interest and relevance on the climate science (implying a high computational cost) we will shown results from different kind of downscaling experiments, like ERA-Interim re-analysis, CMIP5 models, or seasonal. WRF4G is been used to run WRF simulations which are contributing to the CORDEX initiative and others projects like SPECS and EUPORIAS. This work is been partially funded by the European Regional Development Fund (ERDF) and the Spanish National R&D Plan 2008-2011 (CGL2011-28864)

  12. MindModeling@Home . . . and Anywhere Else You Have Idle Processors

    DTIC Science & Technology

    2009-12-01

    was SETI @Home. It was established in 1999 for the purpose of demonstrating the utility of “distributed grid computing” by providing a mechanism for...the public imagination, and SETI @Home remains the longest running and one of the most popular volunteer computing projects in the world. This...pursuits. Most of them, including SETI @Home, run on a software architecture called the Berkeley Open Infrastructure for Network Computing (BOINC). Some of

  13. NiftyNet: a deep-learning platform for medical imaging.

    PubMed

    Gibson, Eli; Li, Wenqi; Sudre, Carole; Fidon, Lucas; Shakir, Dzhoshkun I; Wang, Guotai; Eaton-Rosen, Zach; Gray, Robert; Doel, Tom; Hu, Yipeng; Whyntie, Tom; Nachev, Parashkev; Modat, Marc; Barratt, Dean C; Ourselin, Sébastien; Cardoso, M Jorge; Vercauteren, Tom

    2018-05-01

    Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this domain of application requires substantial implementation effort. Consequently, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. The NiftyNet infrastructure provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on the TensorFlow framework and supports features such as TensorBoard visualization of 2D and 3D images and computational graphs by default. We present three illustrative medical image analysis applications built using NiftyNet infrastructure: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. The NiftyNet infrastructure enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  14. ChemCalc: a building block for tomorrow's chemical infrastructure.

    PubMed

    Patiny, Luc; Borel, Alain

    2013-05-24

    Web services, as an aspect of cloud computing, are becoming an important part of the general IT infrastructure, and scientific computing is no exception to this trend. We propose a simple approach to develop chemical Web services, through which servers could expose the essential data manipulation functionality that students and researchers need for chemical calculations. These services return their results as JSON (JavaScript Object Notation) objects, which facilitates their use for Web applications. The ChemCalc project http://www.chemcalc.org demonstrates this approach: we present three Web services related with mass spectrometry, namely isotopic distribution simulation, peptide fragmentation simulation, and molecular formula determination. We also developed a complete Web application based on these three Web services, taking advantage of modern HTML5 and JavaScript libraries (ChemDoodle and jQuery).

  15. Defense strategies for asymmetric networked systems under composite utilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S.; Ma, Chris Y. T.; Hausken, Kjell

    We consider an infrastructure of networked systems with discrete components that can be reinforced at certain costs to guard against attacks. The communications network plays a critical, asymmetric role of providing the vital connectivity between the systems. We characterize the correlations within this infrastructure at two levels using (a) aggregate failure correlation function that specifies the infrastructure failure probability giventhe failure of an individual system or network, and (b) first order differential conditions on system survival probabilities that characterize component-level correlations. We formulate an infrastructure survival game between an attacker and a provider, who attacks and reinforces individual components, respectively.more » They use the composite utility functions composed of a survival probability term and a cost term, and the previously studiedsum-form and product-form utility functions are their special cases. At Nash Equilibrium, we derive expressions for individual system survival probabilities and the expected total number of operational components. We apply and discuss these estimates for a simplified model of distributed cloud computing infrastructure« less

  16. Design and Implement of Astronomical Cloud Computing Environment In China-VO

    NASA Astrophysics Data System (ADS)

    Li, Changhua; Cui, Chenzhou; Mi, Linying; He, Boliang; Fan, Dongwei; Li, Shanshan; Yang, Sisi; Xu, Yunfei; Han, Jun; Chen, Junyi; Zhang, Hailong; Yu, Ce; Xiao, Jian; Wang, Chuanjun; Cao, Zihuang; Fan, Yufeng; Liu, Liang; Chen, Xiao; Song, Wenming; Du, Kangyu

    2017-06-01

    Astronomy cloud computing environment is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on virtualization technology, astronomy cloud computing environment was designed and implemented by China-VO team. It consists of five distributed nodes across the mainland of China. Astronomer can get compuitng and storage resource in this cloud computing environment. Through this environments, astronomer can easily search and analyze astronomical data collected by different telescopes and data centers , and avoid the large scale dataset transportation.

  17. Game-Theoretic strategies for systems of components using product-form utilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S; Ma, Cheng-Yu; Hausken, K.

    Many critical infrastructures are composed of multiple systems of components which are correlated so that disruptions to one may propagate to others. We consider such infrastructures with correlations characterized in two ways: (i) an aggregate failure correlation function specifies the conditional failure probability of the infrastructure given the failure of an individual system, and (ii) a pairwise correlation function between two systems specifies the failure probability of one system given the failure of the other. We formulate a game for ensuring the resilience of the infrastructure, wherein the utility functions of the provider and attacker are products of an infrastructuremore » survival probability term and a cost term, both expressed in terms of the numbers of system components attacked and reinforced. The survival probabilities of individual systems satisfy first-order differential conditions that lead to simple Nash Equilibrium conditions. We then derive sensitivity functions that highlight the dependence of infrastructure resilience on the cost terms, correlation functions, and individual system survival probabilities. We apply these results to simplified models of distributed cloud computing and energy grid infrastructures.« less

  18. Design for Run-Time Monitor on Cloud Computing

    NASA Astrophysics Data System (ADS)

    Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.

  19. Belle II grid computing: An overview of the distributed data management system.

    NASA Astrophysics Data System (ADS)

    Bansal, Vikas; Schram, Malachi; Belle Collaboration, II

    2017-01-01

    The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start physics data taking in 2018 and will accumulate 50/ab of e +e- collision data, about 50 times larger than the data set of the Belle experiment. The computing requirements of Belle II are comparable to those of a Run I LHC experiment. Computing at this scale requires efficient use of the compute grids in North America, Asia and Europe and will take advantage of upgrades to the high-speed global network. We present the architecture of data flow and data handling as a part of the Belle II computing infrastructure.

  20. Proposed Use of the NASA Ames Nebula Cloud Computing Platform for Numerical Weather Prediction and the Distribution of High Resolution Satellite Imagery

    NASA Technical Reports Server (NTRS)

    Limaye, Ashutosh S.; Molthan, Andrew L.; Srikishen, Jayanthi

    2010-01-01

    The development of the Nebula Cloud Computing Platform at NASA Ames Research Center provides an open-source solution for the deployment of scalable computing and storage capabilities relevant to the execution of real-time weather forecasts and the distribution of high resolution satellite data to the operational weather community. Two projects at Marshall Space Flight Center may benefit from use of the Nebula system. The NASA Short-term Prediction Research and Transition (SPoRT) Center facilitates the use of unique NASA satellite data and research capabilities in the operational weather community by providing datasets relevant to numerical weather prediction, and satellite data sets useful in weather analysis. SERVIR provides satellite data products for decision support, emphasizing environmental threats such as wildfires, floods, landslides, and other hazards, with interests in numerical weather prediction in support of disaster response. The Weather Research and Forecast (WRF) model Environmental Modeling System (WRF-EMS) has been configured for Nebula cloud computing use via the creation of a disk image and deployment of repeated instances. Given the available infrastructure within Nebula and the "infrastructure as a service" concept, the system appears well-suited for the rapid deployment of additional forecast models over different domains, in response to real-time research applications or disaster response. Future investigations into Nebula capabilities will focus on the development of a web mapping server and load balancing configuration to support the distribution of high resolution satellite data sets to users within the National Weather Service and international partners of SERVIR.

  1. A Disciplined Architectural Approach to Scaling Data Analysis for Massive, Scientific Data

    NASA Astrophysics Data System (ADS)

    Crichton, D. J.; Braverman, A. J.; Cinquini, L.; Turmon, M.; Lee, H.; Law, E.

    2014-12-01

    Data collections across remote sensing and ground-based instruments in astronomy, Earth science, and planetary science are outpacing scientists' ability to analyze them. Furthermore, the distribution, structure, and heterogeneity of the measurements themselves pose challenges that limit the scalability of data analysis using traditional approaches. Methods for developing science data processing pipelines, distribution of scientific datasets, and performing analysis will require innovative approaches that integrate cyber-infrastructure, algorithms, and data into more systematic approaches that can more efficiently compute and reduce data, particularly distributed data. This requires the integration of computer science, machine learning, statistics and domain expertise to identify scalable architectures for data analysis. The size of data returned from Earth Science observing satellites and the magnitude of data from climate model output, is predicted to grow into the tens of petabytes challenging current data analysis paradigms. This same kind of growth is present in astronomy and planetary science data. One of the major challenges in data science and related disciplines defining new approaches to scaling systems and analysis in order to increase scientific productivity and yield. Specific needs include: 1) identification of optimized system architectures for analyzing massive, distributed data sets; 2) algorithms for systematic analysis of massive data sets in distributed environments; and 3) the development of software infrastructures that are capable of performing massive, distributed data analysis across a comprehensive data science framework. NASA/JPL has begun an initiative in data science to address these challenges. Our goal is to evaluate how scientific productivity can be improved through optimized architectural topologies that identify how to deploy and manage the access, distribution, computation, and reduction of massive, distributed data, while managing the uncertainties of scientific conclusions derived from such capabilities. This talk will provide an overview of JPL's efforts in developing a comprehensive architectural approach to data science.

  2. A bioinformatics knowledge discovery in text application for grid computing

    PubMed Central

    Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

    2009-01-01

    Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749

  3. A bioinformatics knowledge discovery in text application for grid computing.

    PubMed

    Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

    2009-06-16

    A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.

  4. Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy

    PubMed Central

    Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

    2012-01-01

    Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313

  5. Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.

    PubMed

    Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

    2012-06-01

    Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.

  6. Data Center Energy Efficiency Technologies and Methodologies: A Review of Commercial Technologies and Recommendations for Application to Department of Defense Systems

    DTIC Science & Technology

    2015-11-01

    provided by a stand-alone desktop or hand held computing device. This introduces into the discussion a large number of mobile , tactical command...control, communications, and computer (C4) systems across the Services. A couple of examples are mobile command posts mounted on the back of an M1152... infrastructure (DCPI). This term encompasses on-site backup generators, switchgear, uninterruptible power supplies (UPS), power distribution units

  7. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.

    PubMed

    Simonyan, Vahan; Mazumder, Raja

    2014-09-30

    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.

  8. Detecting Abnormal Machine Characteristics in Cloud Infrastructures

    NASA Technical Reports Server (NTRS)

    Bhaduri, Kanishka; Das, Kamalika; Matthews, Bryan L.

    2011-01-01

    In the cloud computing environment resources are accessed as services rather than as a product. Monitoring this system for performance is crucial because of typical pay-peruse packages bought by the users for their jobs. With the huge number of machines currently in the cloud system, it is often extremely difficult for system administrators to keep track of all machines using distributed monitoring programs such as Ganglia1 which lacks system health assessment and summarization capabilities. To overcome this problem, we propose a technique for automated anomaly detection using machine performance data in the cloud. Our algorithm is entirely distributed and runs locally on each computing machine on the cloud in order to rank the machines in order of their anomalous behavior for given jobs. There is no need to centralize any of the performance data for the analysis and at the end of the analysis, our algorithm generates error reports, thereby allowing the system administrators to take corrective actions. Experiments performed on real data sets collected for different jobs validate the fact that our algorithm has a low overhead for tracking anomalous machines in a cloud infrastructure.

  9. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis

    PubMed Central

    Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953

  10. Advances in Spatial Data Infrastructure, Acquisition, Analysis, Archiving and Dissemination

    NASA Technical Reports Server (NTRS)

    Ramapriyan, Hampapuran K.; Rochon, Gilbert L.; Duerr, Ruth; Rank, Robert; Nativi, Stefano; Stocker, Erich Franz

    2010-01-01

    The authors review recent contributions to the state-of-thescience and benign proliferation of satellite remote sensing, spatial data infrastructure, near-real-time data acquisition, analysis on high performance computing platforms, sapient archiving, multi-modal dissemination and utilization for a wide array of scientific applications. The authors also address advances in Geoinformatics and its growing ubiquity, as evidenced by its inclusion as a focus area within the American Geophysical Union (AGU), European Geosciences Union (EGU), as well as by the evolution of the IEEE Geoscience and Remote Sensing Society's (GRSS) Data Archiving and Distribution Technical Committee (DAD TC).

  11. The framework for simulation of bioinspired security mechanisms against network infrastructure attacks.

    PubMed

    Shorov, Andrey; Kotenko, Igor

    2014-01-01

    The paper outlines a bioinspired approach named "network nervous system" and methods of simulation of infrastructure attacks and protection mechanisms based on this approach. The protection mechanisms based on this approach consist of distributed procedures of information collection and processing, which coordinate the activities of the main devices of a computer network, identify attacks, and determine necessary countermeasures. Attacks and protection mechanisms are specified as structural models using a set-theoretic approach. An environment for simulation of protection mechanisms based on the biological metaphor is considered; the experiments demonstrating the effectiveness of the protection mechanisms are described.

  12. GOES-R GS Product Generation Infrastructure Operations

    NASA Astrophysics Data System (ADS)

    Blanton, M.; Gundy, J.

    2012-12-01

    GOES-R GS Product Generation Infrastructure Operations: The GOES-R Ground System (GS) will produce a much larger set of products with higher data density than previous GOES systems. This requires considerably greater compute and memory resources to achieve the necessary latency and availability for these products. Over time, new algorithms could be added and existing ones removed or updated, but the GOES-R GS cannot go down during this time. To meet these GOES-R GS processing needs, the Harris Corporation will implement a Product Generation (PG) infrastructure that is scalable, extensible, extendable, modular and reliable. The primary parts of the PG infrastructure are the Service Based Architecture (SBA), which includes the Distributed Data Fabric (DDF). The SBA is the middleware that encapsulates and manages science algorithms that generate products. The SBA is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. The SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DDF to provide this data communication layer between algorithms. The DDF provides an abstract interface over a distributed and persistent multi-layered storage system (memory based caching above disk-based storage) and an event system that allows algorithm services to know when data is available and to get the data that they need to begin processing when they need it. Together, the SBA and the DDF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.

  13. VERCE, Virtual Earthquake and Seismology Research Community in Europe, a new ESFRI initiative integrating data infrastructure, Grid and HPC infrastructures for data integration, data analysis and data modeling in seismology

    NASA Astrophysics Data System (ADS)

    van Hemert, Jano; Vilotte, Jean-Pierre

    2010-05-01

    Research in earthquake and seismology addresses fundamental problems in understanding Earth's internal wave sources and structures, and augment applications to societal concerns about natural hazards, energy resources and environmental change. This community is central to the European Plate Observing System (EPOS)—the ESFRI initiative in solid Earth Sciences. Global and regional seismology monitoring systems are continuously operated and are transmitting a growing wealth of data from Europe and from around the world. These tremendous volumes of seismograms, i.e., records of ground motions as a function of time, have a definite multi-use attribute, which puts a great premium on open-access data infrastructures that are integrated globally. In Europe, the earthquake and seismology community is part of the European Integrated Data Archives (EIDA) infrastructure and is structured as "horizontal" data services. On top of this distributed data archive system, the community has developed recently within the EC project NERIES advanced SOA-based web services and a unified portal system. Enabling advanced analysis of these data by utilising a data-aware distributed computing environment is instrumental to fully exploit the cornucopia of data and to guarantee optimal operation of the high-cost monitoring facilities. The strategy of VERCE is driven by the needs of data-intensive applications in data mining and modelling and will be illustrated through a set of applications. It aims to provide a comprehensive architecture and framework adapted to the scale and the diversity of these applications, and to integrate the community data infrastructure with Grid and HPC infrastructures. A first novel aspect is a service-oriented architecture that provides well-equipped integrated workbenches, with an efficient communication layer between data and Grid infrastructures, augmented with bridges to the HPC facilities. A second novel aspect is the coupling between Grid data analysis and HPC data modelling applications through workflow and data sharing mechanisms. VERCE will develop important interactions with the European infrastructure initiatives in Grid and HPC computing. The VERCE team: CNRS-France (IPG Paris, LGIT Grenoble), UEDIN (UK), KNMI-ORFEUS (Holland), EMSC, INGV (Italy), LMU (Germany), ULIV (UK), BADW-LRZ (Germany), SCAI (Germany), CINECA (Italy)

  14. Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

    PubMed

    Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel

    2017-01-01

    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).

  15. Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System

    PubMed Central

    Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel

    2017-01-01

    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). PMID:28381997

  16. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences

    PubMed Central

    Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

    2016-01-01

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627

  17. Integrating Reconfigurable Hardware-Based Grid for High Performance Computing

    PubMed Central

    Dondo Gazzano, Julio; Sanchez Molina, Francisco; Rincon, Fernando; López, Juan Carlos

    2015-01-01

    FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process. PMID:25874241

  18. Application of large-scale computing infrastructure for diverse environmental research applications using GC3Pie

    NASA Astrophysics Data System (ADS)

    Maffioletti, Sergio; Dawes, Nicholas; Bavay, Mathias; Sarni, Sofiane; Lehning, Michael

    2013-04-01

    The Swiss Experiment platform (SwissEx: http://www.swiss-experiment.ch) provides a distributed storage and processing infrastructure for environmental research experiments. The aim of the second phase project (the Open Support Platform for Environmental Research, OSPER, 2012-2015) is to develop the existing infrastructure to provide scientists with an improved workflow. This improved workflow will include pre-defined, documented and connected processing routines. A large-scale computing and data facility is required to provide reliable and scalable access to data for analysis, and it is desirable that such an infrastructure should be free of traditional data handling methods. Such an infrastructure has been developed using the cloud-based part of the Swiss national infrastructure SMSCG (http://www.smscg.ch) and Academic Cloud. The infrastructure under construction supports two main usage models: 1) Ad-hoc data analysis scripts: These scripts are simple processing scripts, written by the environmental researchers themselves, which can be applied to large data sets via the high power infrastructure. Examples of this type of script are spatial statistical analysis scripts (R-based scripts), mostly computed on raw meteorological and/or soil moisture data. These provide processed output in the form of a grid, a plot, or a kml. 2) Complex models: A more intense data analysis pipeline centered (initially) around the physical process model, Alpine3D, and the MeteoIO plugin; depending on the data set, this may require a tightly coupled infrastructure. SMSCG already supports Alpine3D executions as both regular grid jobs and as virtual software appliances. A dedicated appliance with the Alpine3D specific libraries has been created and made available through the SMSCG infrastructure. The analysis pipelines are activated and supervised by simple control scripts that, depending on the data fetched from the meteorological stations, launch new instances of the Alpine3D appliance, execute location-based subroutines at each grid point and store the results back into the central repository for post-processing. An optional extension of this infrastructure will be to provide a 'ring buffer'-type database infrastructure, such that model results (e.g. test runs made to check parameter dependency or for development) can be visualised and downloaded after completion without submitting them to a permanent storage infrastructure. Data organization Data collected from sensors are archived and classified in distributed sites connected with an open-source software middleware, GSN. Publicly available data are available through common web services and via a cloud storage server (based on Swift). Collocation of the data and processing in the cloud would eventually eliminate data transfer requirements. Execution control logic Execution of the data analysis pipelines (for both the R-based analysis and the Alpine3D simulations) has been implemented using the GC3Pie framework developed by UZH. (https://code.google.com/p/gc3pie/). This allows large-scale, fault-tolerant execution of the pipelines to be described in terms of software appliances. GC3Pie also allows supervision of the execution of large campaigns of appliances as a single simulation. This poster will present the fundamental architectural components of the data analysis pipelines together with initial experimental results.

  19. A Methodological Approach for Assessing Amplified Reflection Distributed Denial of Service on the Internet of Things

    PubMed Central

    Costa Gondim, João José; de Oliveira Albuquerque, Robson; Clayton Alves Nascimento, Anderson; García Villalba, Luis Javier; Kim, Tai-Hoon

    2016-01-01

    Concerns about security on Internet of Things (IoT) cover data privacy and integrity, access control, and availability. IoT abuse in distributed denial of service attacks is a major issue, as typical IoT devices’ limited computing, communications, and power resources are prioritized in implementing functionality rather than security features. Incidents involving attacks have been reported, but without clear characterization and evaluation of threats and impacts. The main purpose of this work is to methodically assess the possible impacts of a specific class–amplified reflection distributed denial of service attacks (AR-DDoS)–against IoT. The novel approach used to empirically examine the threat represented by running the attack over a controlled environment, with IoT devices, considered the perspective of an attacker. The methodology used in tests includes that perspective, and actively prospects vulnerabilities in computer systems. This methodology defines standardized procedures for tool-independent vulnerability assessment based on strategy, and the decision flows during execution of penetration tests (pentests). After validation in different scenarios, the methodology was applied in amplified reflection distributed denial of service (AR-DDoS) attack threat assessment. Results show that, according to attack intensity, AR-DDoS saturates reflector infrastructure. Therefore, concerns about AR-DDoS are founded, but expected impact on abused IoT infrastructure and devices will be possibly as hard as on final victims. PMID:27827931

  20. A Methodological Approach for Assessing Amplified Reflection Distributed Denial of Service on the Internet of Things.

    PubMed

    Costa Gondim, João José; de Oliveira Albuquerque, Robson; Clayton Alves Nascimento, Anderson; García Villalba, Luis Javier; Kim, Tai-Hoon

    2016-11-04

    Concerns about security on Internet of Things (IoT) cover data privacy and integrity, access control, and availability. IoT abuse in distributed denial of service attacks is a major issue, as typical IoT devices' limited computing, communications, and power resources are prioritized in implementing functionality rather than security features. Incidents involving attacks have been reported, but without clear characterization and evaluation of threats and impacts. The main purpose of this work is to methodically assess the possible impacts of a specific class-amplified reflection distributed denial of service attacks (AR-DDoS)-against IoT. The novel approach used to empirically examine the threat represented by running the attack over a controlled environment, with IoT devices, considered the perspective of an attacker. The methodology used in tests includes that perspective, and actively prospects vulnerabilities in computer systems. This methodology defines standardized procedures for tool-independent vulnerability assessment based on strategy, and the decision flows during execution of penetration tests (pentests). After validation in different scenarios, the methodology was applied in amplified reflection distributed denial of service (AR-DDoS) attack threat assessment. Results show that, according to attack intensity, AR-DDoS saturates reflector infrastructure. Therefore, concerns about AR-DDoS are founded, but expected impact on abused IoT infrastructure and devices will be possibly as hard as on final victims.

  1. The GENIUS Grid Portal and robot certificates: a new tool for e-Science

    PubMed Central

    Barbera, Roberto; Donvito, Giacinto; Falzone, Alberto; La Rocca, Giuseppe; Milanesi, Luciano; Maggi, Giorgio Pietro; Vicario, Saverio

    2009-01-01

    Background Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. Methods Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. Results The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. Conclusion The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities. PMID:19534747

  2. The GENIUS Grid Portal and robot certificates: a new tool for e-Science.

    PubMed

    Barbera, Roberto; Donvito, Giacinto; Falzone, Alberto; La Rocca, Giuseppe; Milanesi, Luciano; Maggi, Giorgio Pietro; Vicario, Saverio

    2009-06-16

    Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities.

  3. Interactive analysis of geographically distributed population imaging data collections over light-path data networks

    NASA Astrophysics Data System (ADS)

    van Lew, Baldur; Botha, Charl P.; Milles, Julien R.; Vrooman, Henri A.; van de Giessen, Martijn; Lelieveldt, Boudewijn P. F.

    2015-03-01

    The cohort size required in epidemiological imaging genetics studies often mandates the pooling of data from multiple hospitals. Patient data, however, is subject to strict privacy protection regimes, and physical data storage may be legally restricted to a hospital network. To enable biomarker discovery, fast data access and interactive data exploration must be combined with high-performance computing resources, while respecting privacy regulations. We present a system using fast and inherently secure light-paths to access distributed data, thereby obviating the need for a central data repository. A secure private cloud computing framework facilitates interactive, computationally intensive exploration of this geographically distributed, privacy sensitive data. As a proof of concept, MRI brain imaging data hosted at two remote sites were processed in response to a user command at a third site. The system was able to automatically start virtual machines, run a selected processing pipeline and write results to a user accessible database, while keeping data locally stored in the hospitals. Individual tasks took approximately 50% longer compared to a locally hosted blade server but the cloud infrastructure reduced the total elapsed time by a factor of 40 using 70 virtual machines in the cloud. We demonstrated that the combination light-path and private cloud is a viable means of building an analysis infrastructure for secure data analysis. The system requires further work in the areas of error handling, load balancing and secure support of multiple users.

  4. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    PubMed

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  5. EGI-EUDAT integration activity - Pair data and high-throughput computing resources together

    NASA Astrophysics Data System (ADS)

    Scardaci, Diego; Viljoen, Matthew; Vitlacil, Dejan; Fiameni, Giuseppe; Chen, Yin; sipos, Gergely; Ferrari, Tiziana

    2016-04-01

    EGI (www.egi.eu) is a publicly funded e-infrastructure put together to give scientists access to more than 530,000 logical CPUs, 200 PB of disk capacity and 300 PB of tape storage to drive research and innovation in Europe. The infrastructure provides both high throughput computing and cloud compute/storage capabilities. Resources are provided by about 350 resource centres which are distributed across 56 countries in Europe, the Asia-Pacific region, Canada and Latin America. EUDAT (www.eudat.eu) is a collaborative Pan-European infrastructure providing research data services, training and consultancy for researchers, research communities, research infrastructures and data centres. EUDAT's vision is to enable European researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure (CDI) conceived as a network of collaborating, cooperating centres, combining the richness of numerous community-specific data repositories with the permanence and persistence of some of Europe's largest scientific data centres. EGI and EUDAT, in the context of their flagship projects, EGI-Engage and EUDAT2020, started in March 2015 a collaboration to harmonise the two infrastructures, including technical interoperability, authentication, authorisation and identity management, policy and operations. The main objective of this work is to provide end-users with a seamless access to an integrated infrastructure offering both EGI and EUDAT services and, then, pairing data and high-throughput computing resources together. To define the roadmap of this collaboration, EGI and EUDAT selected a set of relevant user communities, already collaborating with both infrastructures, which could bring requirements and help to assign the right priorities to each of them. In this way, from the beginning, this activity has been really driven by the end users. The identified user communities are relevant European Research infrastructure in the field of Earth Science (EPOS and ICOS), Bioinformatics (BBMRI and ELIXIR) and Space Physics (EISCAT-3D). The first outcome of this activity has been the definition of a generic use case that captures the typical user scenario with respect the integrated use of the EGI and EUDAT infrastructures. This generic use case allows a user to instantiate a set of Virtual Machine images on the EGI Federated Cloud to perform computational jobs that analyse data previously stored on EUDAT long-term storage systems. The results of such analysis can be staged back to EUDAT storages, and if needed, allocated with Permanent identifyers (PIDs) for future use. The implementation of this generic use case requires the following integration activities between EGI and EUDAT: (1) harmonisation of the user authentication and authorisation models, (2) implementing interface connectors between the relevant EGI and EUDAT services, particularly EGI Cloud compute facilities and EUDAT long-term storage and PID systems. In the presentation, the collected user requirements and the implementation status of the universal use case will be showed. Furthermore, how the universal use case is currently applied to satisfy EPOS and ICOS needs will be described.

  6. Processing LHC data in the UK

    PubMed Central

    Colling, D.; Britton, D.; Gordon, J.; Lloyd, S.; Doyle, A.; Gronbech, P.; Coles, J.; Sansum, A.; Patrick, G.; Jones, R.; Middleton, R.; Kelsey, D.; Cass, A.; Geddes, N.; Clark, P.; Barnby, L.

    2013-01-01

    The Large Hadron Collider (LHC) is one of the greatest scientific endeavours to date. The construction of the collider itself and the experiments that collect data from it represent a huge investment, both financially and in terms of human effort, in our hope to understand the way the Universe works at a deeper level. Yet the volumes of data produced are so large that they cannot be analysed at any single computing centre. Instead, the experiments have all adopted distributed computing models based on the LHC Computing Grid. Without the correct functioning of this grid infrastructure the experiments would not be able to understand the data that they have collected. Within the UK, the Grid infrastructure needed by the experiments is provided by the GridPP project. We report on the operations, performance and contributions made to the experiments by the GridPP project during the years of 2010 and 2011—the first two significant years of the running of the LHC. PMID:23230163

  7. Community-driven computational biology with Debian Linux

    PubMed Central

    2010-01-01

    Background The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments. Results The Debian Med initiative provides ready and coherent software packages for medical informatics and bioinformatics. These packages can be used together in Taverna workflows via the UseCase plugin to manage execution on local or remote machines. If such packages are available in cloud computing environments, the underlying hardware and the analysis pipelines can be shared along with the software. Conclusions Debian Med closes the gap between developers and users. It provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology. For geographically distributed teams it can ensure they are working on the same versions of tools, in the same conditions. This contributes to the world-wide networking of researchers. PMID:21210984

  8. CERN's Common Unix and X Terminal Environment

    NASA Astrophysics Data System (ADS)

    Cass, Tony

    The Desktop Infrastructure Group of CERN's Computing and Networks Division has developed a Common Unix and X Terminal Environment to ease the migration to Unix based Interactive Computing. The CUTE architecture relies on a distributed filesystem—currently Trans arc's AFS—to enable essentially interchangeable client work-stations to access both "home directory" and program files transparently. Additionally, we provide a suite of programs to configure workstations for CUTE and to ensure continued compatibility. This paper describes the different components and the development of the CUTE architecture.

  9. Infrastructure and the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Dowler, P.; Gaudet, S.; Schade, D.

    2011-07-01

    The modern data center is faced with architectural and software engineering challenges that grow along with the challenges facing observatories: massive data flow, distributed computing environments, and distributed teams collaborating on large and small projects. By using VO standards as key components of the infrastructure, projects can take advantage of a decade of intellectual investment by the IVOA community. By their nature, these standards are proven and tested designs that already exist. Adopting VO standards saves considerable design effort, allows projects to take advantage of open-source software and test suites to speed development, and enables the use of third party tools that understand the VO protocols. The evolving CADC architecture now makes heavy use of VO standards. We show examples of how these standards may be used directly, coupled with non-VO standards, or extended with custom capabilities to solve real problems and provide value to our users. In the end, we use VO services as major parts of the core infrastructure to reduce cost rather than as an extra layer with additional cost and we can deliver more general purpose and robust services to our user community.

  10. Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms

    PubMed Central

    Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.

    2009-01-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578

  11. Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

    PubMed

    Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

    2009-06-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).

  12. AIMES Final Technical Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Katz, Daniel S; Jha, Shantenu; Weissman, Jon

    2017-01-31

    This is the final technical report for the AIMES project. Many important advances in science and engineering are due to large-scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. The AIMES project was conceived against this backdrop, following on the heels of a comprehensive survey of scientific distributed applications. AIMES laid the foundations to address the tripartite challenge of dynamic resource management, integrating information, and portable and interoperablemore » distributed applications. Four abstractions were defined and implemented: skeleton, resource bundle, pilot, and execution strategy. The four abstractions were implemented into software modules and then aggregated into the AIMES middleware. This middleware successfully integrates information across the application layer (skeletons) and resource layer (Bundles), derives a suitable execution strategy for the given skeleton and enacts its execution by means of pilots on one or more resources, depending on the application requirements, and resource availabilities and capabilities.« less

  13. AIMES Final Technical Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weissman, Jon; Katz, Dan; Jha, Shantenu

    2017-01-31

    This is the final technical report for the AIMES project. Many important advances in science and engineering are due to large scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. The AIMES project was conceived against this backdrop, following on the heels of a comprehensive survey of scientific distributed applications. AIMES laid the foundations to address the tripartite challenge of dynamic resource management, integrating information, and portable andmore » interoperable distributed applications. Four abstractions were defined and implemented: skeleton, resource bundle, pilot, and execution strategy. The four abstractions were implemented into software modules and then aggregated into the AIMES middleware. This middleware successfully integrates information across the application layer (skeletons) and resource layer (Bundles), derives a suitable execution strategy for the given skeleton and enacts its execution by means of pilots on one or more resources, depending on the application requirements, and resource availabilities and capabilities.« less

  14. Enabling BOINC in infrastructure as a service cloud system

    NASA Astrophysics Data System (ADS)

    Montes, Diego; Añel, Juan A.; Pena, Tomás F.; Uhe, Peter; Wallom, David C. H.

    2017-02-01

    Volunteer or crowd computing is becoming increasingly popular for solving complex research problems from an increasingly diverse range of areas. The majority of these have been built using the Berkeley Open Infrastructure for Network Computing (BOINC) platform, which provides a range of different services to manage all computation aspects of a project. The BOINC system is ideal in those cases where not only does the research community involved need low-cost access to massive computing resources but also where there is a significant public interest in the research being done.We discuss the way in which cloud services can help BOINC-based projects to deliver results in a fast, on demand manner. This is difficult to achieve using volunteers, and at the same time, using scalable cloud resources for short on demand projects can optimize the use of the available resources. We show how this design can be used as an efficient distributed computing platform within the cloud, and outline new approaches that could open up new possibilities in this field, using Climateprediction.net (http://www.climateprediction.net/) as a case study.

  15. A Framework for Debugging Geoscience Projects in a High Performance Computing Environment

    NASA Astrophysics Data System (ADS)

    Baxter, C.; Matott, L.

    2012-12-01

    High performance computing (HPC) infrastructure has become ubiquitous in today's world with the emergence of commercial cloud computing and academic supercomputing centers. Teams of geoscientists, hydrologists and engineers can take advantage of this infrastructure to undertake large research projects - for example, linking one or more site-specific environmental models with soft computing algorithms, such as heuristic global search procedures, to perform parameter estimation and predictive uncertainty analysis, and/or design least-cost remediation systems. However, the size, complexity and distributed nature of these projects can make identifying failures in the associated numerical experiments using conventional ad-hoc approaches both time- consuming and ineffective. To address these problems a multi-tiered debugging framework has been developed. The framework allows for quickly isolating and remedying a number of potential experimental failures, including: failures in the HPC scheduler; bugs in the soft computing code; bugs in the modeling code; and permissions and access control errors. The utility of the framework is demonstrated via application to a series of over 200,000 numerical experiments involving a suite of 5 heuristic global search algorithms and 15 mathematical test functions serving as cheap analogues for the simulation-based optimization of pump-and-treat subsurface remediation systems.

  16. Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

    NASA Astrophysics Data System (ADS)

    Klimentov, A.; Buncic, P.; De, K.; Jha, S.; Maeno, T.; Mount, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Porter, R. J.; Read, K. F.; Vaniachine, A.; Wells, J. C.; Wenaus, T.

    2015-05-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled ‘Next Generation Workload Management and Analysis System for Big Data’ (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.

  17. Game-theoretic strategies for asymmetric networked systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S.; Ma, Chris Y. T.; Hausken, Kjell

    Abstract—We consider an infrastructure consisting of a network of systems each composed of discrete components that can be reinforced at a certain cost to guard against attacks. The network provides the vital connectivity between systems, and hence plays a critical, asymmetric role in the infrastructure operations. We characterize the system-level correlations using the aggregate failure correlation function that specifies the infrastructure failure probability given the failure of an individual system or network. The survival probabilities of systems and network satisfy first-order differential conditions that capture the component-level correlations. We formulate the problem of ensuring the infrastructure survival as a gamemore » between anattacker and a provider, using the sum-form and product-form utility functions, each composed of a survival probability term and a cost term. We derive Nash Equilibrium conditions which provide expressions for individual system survival probabilities, and also the expected capacity specified by the total number of operational components. These expressions differ only in a single term for the sum-form and product-form utilities, despite their significant differences.We apply these results to simplified models of distributed cloud computing infrastructures.« less

  18. Sensor4PRI: A Sensor Platform for the Protection of Railway Infrastructures

    PubMed Central

    Cañete, Eduardo; Chen, Jaime; Díaz, Manuel; Llopis, Luis; Rubio, Bartolomé

    2015-01-01

    Wireless Sensor Networks constitute pervasive and distributed computing systems and are potentially one of the most important technologies of this century. They have been specifically identified as a good candidate to become an integral part of the protection of critical infrastructures. In this paper we focus on railway infrastructure protection and we present the details of a sensor platform designed to be integrated into a slab track system in order to carry out both installation and maintenance monitoring activities. In the installation phase, the platform helps operators to install the slab tracks in the right position. In the maintenance phase, the platform collects information about the structural health and behavior of the infrastructure when a train travels along it and relays the readings to a base station. The base station uses trains as data mules to upload the information to the internet. The use of a train as a data mule is especially suitable for collecting information from remote or inaccessible places which do not have a direct connection to the internet and require less network infrastructure. The overall aim of the system is to deploy a permanent economically viable monitoring system to improve the safety of railway infrastructures. PMID:25734648

  19. The distributed production system of the SuperB project: description and results

    NASA Astrophysics Data System (ADS)

    Brown, D.; Corvo, M.; Di Simone, A.; Fella, A.; Luppi, E.; Paoloni, E.; Stroili, R.; Tomassetti, L.

    2011-12-01

    The SuperB experiment needs large samples of MonteCarlo simulated events in order to finalize the detector design and to estimate the data analysis performances. The requirements are beyond the capabilities of a single computing farm, so a distributed production model capable of exploiting the existing HEP worldwide distributed computing infrastructure is needed. In this paper we describe the set of tools that have been developed to manage the production of the required simulated events. The production of events follows three main phases: distribution of input data files to the remote site Storage Elements (SE); job submission, via SuperB GANGA interface, to all available remote sites; output files transfer to CNAF repository. The job workflow includes procedures for consistency checking, monitoring, data handling and bookkeeping. A replication mechanism allows storing the job output on the local site SE. Results from 2010 official productions are reported.

  20. Scaling to diversity: The DERECHOS distributed infrastructure for analyzing and sharing data

    NASA Astrophysics Data System (ADS)

    Rilee, M. L.; Kuo, K. S.; Clune, T.; Oloso, A.; Brown, P. G.

    2016-12-01

    Integrating Earth Science data from diverse sources such as satellite imagery and simulation output can be expensive and time-consuming, limiting scientific inquiry and the quality of our analyses. Reducing these costs will improve innovation and quality in science. The current Earth Science data infrastructure focuses on downloading data based on requests formed from the search and analysis of associated metadata. And while the data products provided by archives may use the best available data sharing technologies, scientist end-users generally do not have such resources (including staff) available to them. Furthermore, only once an end-user has received the data from multiple diverse sources and has integrated them can the actual analysis and synthesis begin. The cost of getting from idea to where synthesis can start dramatically slows progress. In this presentation we discuss a distributed computational and data storage framework that eliminates much of the aforementioned cost. The SciDB distributed array database is central as it is optimized for scientific computing involving very large arrays, performing better than less specialized frameworks like Spark. Adding spatiotemporal functions to the SciDB creates a powerful platform for analyzing and integrating massive, distributed datasets. SciDB allows Big Earth Data analysis to be performed "in place" without the need for expensive downloads and end-user resources. Spatiotemporal indexing technologies such as the hierarchical triangular mesh enable the compute and storage affinity needed to efficiently perform co-located and conditional analyses minimizing data transfers. These technologies automate the integration of diverse data sources using the framework, a critical step beyond current metadata search and analysis. Instead of downloading data into their idiosyncratic local environments, end-users can generate and share data products integrated from diverse multiple sources using a common shared environment, turning distributed active archive centers (DAACs) from warehouses into distributed active analysis centers.

  1. The Framework for Simulation of Bioinspired Security Mechanisms against Network Infrastructure Attacks

    PubMed Central

    Kotenko, Igor

    2014-01-01

    The paper outlines a bioinspired approach named “network nervous system" and methods of simulation of infrastructure attacks and protection mechanisms based on this approach. The protection mechanisms based on this approach consist of distributed prosedures of information collection and processing, which coordinate the activities of the main devices of a computer network, identify attacks, and determine nessesary countermeasures. Attacks and protection mechanisms are specified as structural models using a set-theoretic approach. An environment for simulation of protection mechanisms based on the biological metaphor is considered; the experiments demonstrating the effectiveness of the protection mechanisms are described. PMID:25254229

  2. Automation of multi-agent control for complex dynamic systems in heterogeneous computational network

    NASA Astrophysics Data System (ADS)

    Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan

    2017-01-01

    The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.

  3. Information Power Grid: Distributed High-Performance Computing and Large-Scale Data Management for Science and Engineering

    NASA Technical Reports Server (NTRS)

    Johnston, William E.; Gannon, Dennis; Nitzberg, Bill

    2000-01-01

    We use the term "Grid" to refer to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. This infrastructure includes: (1) Tools for constructing collaborative, application oriented Problem Solving Environments / Frameworks (the primary user interfaces for Grids); (2) Programming environments, tools, and services providing various approaches for building applications that use aggregated computing and storage resources, and federated data sources; (3) Comprehensive and consistent set of location independent tools and services for accessing and managing dynamic collections of widely distributed resources: heterogeneous computing systems, storage systems, real-time data sources and instruments, human collaborators, and communications systems; (4) Operational infrastructure including management tools for distributed systems and distributed resources, user services, accounting and auditing, strong and location independent user authentication and authorization, and overall system security services The vision for NASA's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks. Such Grids will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. Examples of these problems include: (1) Coupled, multidisciplinary simulations too large for single systems (e.g., multi-component NPSS turbomachine simulation); (2) Use of widely distributed, federated data archives (e.g., simultaneous access to metrological, topological, aircraft performance, and flight path scheduling databases supporting a National Air Space Simulation systems}; (3) Coupling large-scale computing and data systems to scientific and engineering instruments (e.g., realtime interaction with experiments through real-time data analysis and interpretation presented to the experimentalist in ways that allow direct interaction with the experiment (instead of just with instrument control); (5) Highly interactive, augmented reality and virtual reality remote collaborations (e.g., Ames / Boeing Remote Help Desk providing field maintenance use of coupled video and NDI to a remote, on-line airframe structures expert who uses this data to index into detailed design databases, and returns 3D internal aircraft geometry to the field); (5) Single computational problems too large for any single system (e.g. the rotocraft reference calculation). Grids also have the potential to provide pools of resources that could be called on in extraordinary / rapid response situations (such as disaster response) because they can provide common interfaces and access mechanisms, standardized management, and uniform user authentication and authorization, for large collections of distributed resources (whether or not they normally function in concert). IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: the scientist / design engineer whose primary interest is problem solving (e.g. determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user is the tool designer: the computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. The results of the analysis of the needs of these two types of users provides a broad set of requirements that gives rise to a general set of required capabilities. The IPG project is intended to address all of these requirements. In some cases the required computing technology exists, and in some cases it must be researched and developed. The project is using available technology to provide a prototype set of capabilities in a persistent distributed computing testbed. Beyond this, there are required capabilities that are not immediately available, and whose development spans the range from near-term engineering development (one to two years) to much longer term R&D (three to six years). Additional information is contained in the original.

  4. Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

    PubMed Central

    Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon

    2011-01-01

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811

  5. Design and development of a run-time monitor for multi-core architectures in cloud computing.

    PubMed

    Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

    2011-01-01

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.

  6. The Next Generation of Lab and Classroom Computing - The Silver Lining

    DTIC Science & Technology

    2016-12-01

    desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The research... infrastructure , VDI, hardware cost, software cost, manpower, availability, cloud computing, private cloud, bring your own device, BYOD, thin client...virtual desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The

  7. Designing and validating the joint battlespace infosphere

    NASA Astrophysics Data System (ADS)

    Peterson, Gregory D.; Alexander, W. Perry; Birdwell, J. Douglas

    2001-08-01

    Fielding and managing the dynamic, complex information systems infrastructure necessary for defense operations presents significant opportunities for revolutionary improvements in capabilities. An example of this technology trend is the creation and validation of the Joint Battlespace Infosphere (JBI) being developed by the Air Force Research Lab. The JBI is a system of systems that integrates, aggregates, and distributes information to users at all echelons, from the command center to the battlefield. The JBI is a key enabler of meeting the Air Force's Joint Vision 2010 core competencies such as Information Superiority, by providing increased situational awareness, planning capabilities, and dynamic execution. At the same time, creating this new operational environment introduces significant risk due to an increased dependency on computational and communications infrastructure combined with more sophisticated and frequent threats. Hence, the challenge facing the nation is the most effective means to exploit new computational and communications technologies while mitigating the impact of attacks, faults, and unanticipated usage patterns.

  8. Running SW4 On New Commodity Technology Systems (CTS-1) Platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rodgers, Arthur J.; Petersson, N. Anders; Pitarka, Arben

    We have recently been running earthquake ground motion simulations with SW4 on the new capacity computing systems, called the Commodity Technology Systems - 1 (CTS-1) at Lawrence Livermore National Laboratory (LLNL). SW4 is a fourth order time domain finite difference code developed by LLNL and distributed by the Computational Infrastructure for Geodynamics (CIG). SW4 simulates seismic wave propagation in complex three-dimensional Earth models including anelasticity and surface topography. We are modeling near-fault earthquake strong ground motions for the purposes of evaluating the response of engineered structures, such as nuclear power plants and other critical infrastructure. Engineering analysis of structures requiresmore » the inclusion of high frequencies which can cause damage, but are often difficult to include in simulations because of the need for large memory to model fine grid spacing on large domains.« less

  9. Lowering the barriers to computational modeling of Earth's surface: coupling Jupyter Notebooks with Landlab, HydroShare, and CyberGIS for research and education.

    NASA Astrophysics Data System (ADS)

    Bandaragoda, C.; Castronova, A. M.; Phuong, J.; Istanbulluoglu, E.; Strauch, R. L.; Nudurupati, S. S.; Tarboton, D. G.; Wang, S. W.; Yin, D.; Barnhart, K. R.; Tucker, G. E.; Hutton, E.; Hobley, D. E. J.; Gasparini, N. M.; Adams, J. M.

    2017-12-01

    The ability to test hypotheses about hydrology, geomorphology and atmospheric processes is invaluable to research in the era of big data. Although community resources are available, there remain significant educational, logistical and time investment barriers to their use. Knowledge infrastructure is an emerging intellectual framework to understand how people are creating, sharing and distributing knowledge - which has been dramatically transformed by Internet technologies. In addition to the technical and social components in a cyberinfrastructure system, knowledge infrastructure considers educational, institutional, and open source governance components required to advance knowledge. We are designing an infrastructure environment that lowers common barriers to reproducing modeling experiments for earth surface investigation. Landlab is an open-source modeling toolkit for building, coupling, and exploring two-dimensional numerical models. HydroShare is an online collaborative environment for sharing hydrologic data and models. CyberGIS-Jupyter is an innovative cyberGIS framework for achieving data-intensive, reproducible, and scalable geospatial analytics using the Jupyter Notebook based on ROGER - the first cyberGIS supercomputer, so that models that can be elastically reproduced through cloud computing approaches. Our team of geomorphologists, hydrologists, and computer geoscientists has created a new infrastructure environment that combines these three pieces of software to enable knowledge discovery. Through this novel integration, any user can interactively execute and explore their shared data and model resources. Landlab on HydroShare with CyberGIS-Jupyter supports the modeling continuum from fully developed modelling applications, prototyping new science tools, hands on research demonstrations for training workshops, and classroom applications. Computational geospatial models based on big data and high performance computing can now be more efficiently developed, improved, scaled, and seamlessly reproduced among multidisciplinary users, thereby expanding the active learning curriculum and research opportunities for students in earth surface modeling and informatics.

  10. Use of Emerging Grid Computing Technologies for the Analysis of LIGO Data

    NASA Astrophysics Data System (ADS)

    Koranda, Scott

    2004-03-01

    The LIGO Scientific Collaboration (LSC) today faces the challenge of enabling analysis of terabytes of LIGO data by hundreds of scientists from institutions all around the world. To meet this challenge the LSC is developing tools, infrastructure, applications, and expertise leveraging Grid Computing technologies available today, and making available to LSC scientists compute resources at sites across the United States and Europe. We use digital credentials for strong and secure authentication and authorization to compute resources and data. Building on top of products from the Globus project for high-speed data transfer and information discovery we have created the Lightweight Data Replicator (LDR) to securely and robustly replicate data to resource sites. We have deployed at our computing sites the Virtual Data Toolkit (VDT) Server and Client packages, developed in collaboration with our partners in the GriPhyN and iVDGL projects, providing uniform access to distributed resources for users and their applications. Taken together these Grid Computing technologies and infrastructure have formed the LSC DataGrid--a coherent and uniform environment across two continents for the analysis of gravitational-wave detector data. Much work, however, remains in order to scale current analyses and recent lessons learned need to be integrated into the next generation of Grid middleware.

  11. Design and implementation of a reliable and cost-effective cloud computing infrastructure: the INFN Napoli experience

    NASA Astrophysics Data System (ADS)

    Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.

    2012-12-01

    Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.

  12. Economic models for management of resources in peer-to-peer and grid computing

    NASA Astrophysics Data System (ADS)

    Buyya, Rajkumar; Stockinger, Heinz; Giddy, Jonathan; Abramson, David

    2001-07-01

    The accelerated development in Peer-to-Peer (P2P) and Grid computing has positioned them as promising next generation computing platforms. They enable the creation of Virtual Enterprises (VE) for sharing resources distributed across the world. However, resource management, application development and usage models in these environments is a complex undertaking. This is due to the geographic distribution of resources that are owned by different organizations or peers. The resource owners of each of these resources have different usage or access policies and cost models, and varying loads and availability. In order to address complex resource management issues, we have proposed a computational economy framework for resource allocation and for regulating supply and demand in Grid computing environments. The framework provides mechanisms for optimizing resource provider and consumer objective functions through trading and brokering services. In a real world market, there exist various economic models for setting the price for goods based on supply-and-demand and their value to the user. They include commodity market, posted price, tenders and auctions. In this paper, we discuss the use of these models for interaction between Grid components in deciding resource value and the necessary infrastructure to realize them. In addition to normal services offered by Grid computing systems, we need an infrastructure to support interaction protocols, allocation mechanisms, currency, secure banking, and enforcement services. Furthermore, we demonstrate the usage of some of these economic models in resource brokering through Nimrod/G deadline and cost-based scheduling for two different optimization strategies on the World Wide Grid (WWG) testbed that contains peer-to-peer resources located on five continents: Asia, Australia, Europe, North America, and South America.

  13. SEE-GRID eInfrastructure for Regional eScience

    NASA Astrophysics Data System (ADS)

    Prnjat, Ognjen; Balaz, Antun; Vudragovic, Dusan; Liabotis, Ioannis; Sener, Cevat; Marovic, Branko; Kozlovszky, Miklos; Neagu, Gabriel

    In the past 6 years, a number of targeted initiatives, funded by the European Commission via its information society and RTD programmes and Greek infrastructure development actions, have articulated a successful regional development actions in South East Europe that can be used as a role model for other international developments. The SEEREN (South-East European Research and Education Networking initiative) project, through its two phases, established the SEE segment of the pan-European G ´EANT network and successfully connected the research and scientific communities in the region. Currently, the SEE-LIGHT project is working towards establishing a dark-fiber backbone that will interconnect most national Research and Education networks in the region. On the distributed computing and storage provisioning i.e. Grid plane, the SEE-GRID (South-East European GRID e-Infrastructure Development) project, similarly through its two phases, has established a strong human network in the area of scientific computing and has set up a powerful regional Grid infrastructure, and attracted a number of applications from different fields from countries throughout the South-East Europe. The current SEEGRID-SCI project, ending in April 2010, empowers the regional user communities from fields of meteorology, seismology and environmental protection in common use and sharing of the regional e-Infrastructure. Current technical initiatives in formulation are focusing on a set of coordinated actions in the area of HPC and application fields making use of HPC initiatives. Finally, the current SEERA-EI project brings together policy makers - programme managers from 10 countries in the region. The project aims to establish a communication platform between programme managers, pave the way towards common e-Infrastructure strategy and vision, and implement concrete actions for common funding of electronic infrastructures on the regional level. The regional vision on establishing an e-Infrastructure compatible with European developments, and empowering the scientists in the region in equal participation in the use of pan- European infrastructures, is materializing through the above initiatives. This model has a number of concrete operational and organizational guidelines which can be adapted to help e-Infrastructure developments in other world regions. In this paper we review the most important developments and contributions by the SEEGRID- SCI project.

  14. The Gain of Resource Delegation in Distributed Computing Environments

    NASA Astrophysics Data System (ADS)

    Fölling, Alexander; Grimme, Christian; Lepping, Joachim; Papaspyrou, Alexander

    In this paper, we address job scheduling in Distributed Computing Infrastructures, that is a loosely coupled network of autonomous acting High Performance Computing systems. In contrast to the common approach of mutual workload exchange, we consider the more intuitive operator's viewpoint of load-dependent resource reconfiguration. In case of a site's over-utilization, the scheduling system is able to lease resources from other sites to keep up service quality for its local user community. Contrary, the granting of idle resources can increase utilization in times of low local workload and thus ensure higher efficiency. The evaluation considers real workload data and is done with respect to common service quality indicators. For two simple resource exchange policies and three basic setups we show the possible gain of this approach and analyze the dynamics in workload-adaptive reconfiguration behavior.

  15. The dependence of educational infrastructure on clinical infrastructure.

    PubMed Central

    Cimino, C.

    1998-01-01

    The Albert Einstein College of Medicine needed to assess the growth of its infrastructure for educational computing as a first step to determining if student needs were being met. Included in computing infrastructure are space, equipment, software, and computing services. The infrastructure was assessed by reviewing purchasing and support logs for a six year period from 1992 to 1998. This included equipment, software, and e-mail accounts provided to students and to faculty for educational purposes. Student space has grown at a constant rate (averaging 14% increase each year respectively). Student equipment on campus has grown by a constant amount each year (average 8.3 computers each year). Student infrastructure off campus and educational support of faculty has not kept pace. It has either declined or remained level over the six year period. The availability of electronic mail clearly demonstrates this with accounts being used by 99% of students, 78% of Basic Science Course Leaders, 38% of Clerkship Directors, 18% of Clerkship Site Directors, and 8% of Clinical Elective Directors. The collection of the initial descriptive infrastructure data has revealed problems that may generalize to other medical schools. The discrepancy between infrastructure available to students and faculty on campus and students and faculty off campus creates a setting where students perceive a paradoxical declining support for computer use as they progress through medical school. While clinical infrastructure may be growing, it is at the expense of educational infrastructure at affiliate hospitals. PMID:9929262

  16. Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure

    DTIC Science & Technology

    2017-01-05

    AFRL-AFOSR-JP-TR-2017-0002 Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure Manabu...Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA2386...UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This report for the project titled ’Advanced Computational Methods for Optimization of

  17. KODAMA and VPC based Framework for Ubiquitous Systems and its Experiment

    NASA Astrophysics Data System (ADS)

    Takahashi, Kenichi; Amamiya, Satoshi; Iwao, Tadashige; Zhong, Guoqiang; Kainuma, Tatsuya; Amamiya, Makoto

    Recently, agent technologies have attracted a lot of interest as an emerging programming paradigm. With such agent technologies, services are provided through collaboration among agents. At the same time, the spread of mobile technologies and communication infrastructures has made it possible to access the network anytime and from anywhere. Using agents and mobile technologies to realize ubiquitous computing systems, we propose a new framework based on KODAMA and VPC. KODAMA provides distributed management mechanisms by using the concept of community and communication infrastructure to deliver messages among agents without agents being aware of the physical network. VPC provides a method of defining peer-to-peer services based on agent communication with policy packages. By merging the characteristics of both KODAMA and VPC functions, we propose a new framework for ubiquitous computing environments. It provides distributed management functions according to the concept of agent communities, agent communications which are abstracted from the physical environment, and agent collaboration with policy packages. Using our new framework, we conducted a large-scale experiment in shopping malls in Nagoya, which sent advertisement e-mails to users' cellular phones according to user location and attributes. The empirical results showed that our new framework worked effectively for sales in shopping malls.

  18. Establishing a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia.

    PubMed

    Schneider, Maria Victoria; Griffin, Philippa C; Tyagi, Sonika; Flannery, Madison; Dayalan, Saravanan; Gladman, Simon; Watson-Haigh, Nathan; Bayer, Philipp E; Charleston, Michael; Cooke, Ira; Cook, Rob; Edwards, Richard J; Edwards, David; Gorse, Dominique; McConville, Malcolm; Powell, David; Wilkins, Marc R; Lonie, Andrew

    2017-06-30

    EMBL Australia Bioinformatics Resource (EMBL-ABR) is a developing national research infrastructure, providing bioinformatics resources and support to life science and biomedical researchers in Australia. EMBL-ABR comprises 10 geographically distributed national nodes with one coordinating hub, with current funding provided through Bioplatforms Australia and the University of Melbourne for its initial 2-year development phase. The EMBL-ABR mission is to: (1) increase Australia's capacity in bioinformatics and data sciences; (2) contribute to the development of training in bioinformatics skills; (3) showcase Australian data sets at an international level and (4) enable engagement in international programs. The activities of EMBL-ABR are focussed in six key areas, aligning with comparable international initiatives such as ELIXIR, CyVerse and NIH Commons. These key areas-Tools, Data, Standards, Platforms, Compute and Training-are described in this article. © The Author 2017. Published by Oxford University Press.

  19. High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

    NASA Astrophysics Data System (ADS)

    Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

    2012-09-01

    By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.

  20. The International Symposium on Grids and Clouds and the Open Grid Forum

    NASA Astrophysics Data System (ADS)

    The International Symposium on Grids and Clouds 20111 was held at Academia Sinica in Taipei, Taiwan on 19th to 25th March 2011. A series of workshops and tutorials preceded the symposium. The aim of ISGC is to promote the use of grid and cloud computing in the Asia Pacific region. Over the 9 years that ISGC has been running, the programme has evolved to become more user community focused with subjects reaching out to a larger population. Research communities are making widespread use of distributed computing facilities. Linking together data centers, production grids, desktop systems or public clouds, many researchers are able to do more research and produce results more quickly. They could do much more if the computing infrastructures they use worked together more effectively. Changes in the way we approach distributed computing, and new services from commercial providers, mean that boundaries are starting to blur. This opens the way for hybrid solutions that make it easier for researchers to get their job done. Consequently the theme for ISGC2011 was the opportunities that better integrated computing infrastructures can bring, and the steps needed to achieve the vision of a seamless global research infrastructure. 2011 is a year of firsts for ISGC. First the title - while the acronym remains the same, its meaning has changed to reflect the evolution of computing: The International Symposium on Grids and Clouds. Secondly the programming - ISGC 2011 has always included topical workshops and tutorials. But 2011 is the first year that ISGC has been held in conjunction with the Open Grid Forum2 which held its 31st meeting with a series of working group sessions. The ISGC plenary session included keynote speakers from OGF that highlighted the relevance of standards for the research community. ISGC with its focus on applications and operational aspects complemented well with OGF's focus on standards development. ISGC brought to OGF real-life use cases and needs to be addressed while OGF exposed the state of current developments and issues to be resolved if commonalities are to be exploited. Another first is for the Proceedings for 2011, an open access online publishing scheme will ensure these Proceedings will appear more quickly and more people will have access to the results, providing a long-term online archive of the event. The symposium attracted more than 212 participants from 29 countries spanning Asia, Europe and the Americas. Coming so soon after the earthquake and tsunami in Japan, the participation of our Japanese colleagues was particularly appreciated. Keynotes by invited speakers highlighted the impact of distributed computing infrastructures in the social sciences and humanities, high energy physics, earth and life sciences. Plenary sessions entitled Grid Activities in Asia Pacific surveyed the state of grid deployment across 11 Asian countries. Through the parallel sessions, the impact of distributed computing infrastructures in a range of research disciplines was highlighted. Operational procedures, middleware and security aspects were addressed in a dedicated sessions. The symposium was covered online in real-time by the GridCast team from the GridTalk project. A running blog including summarises of specific sessions as well as video interviews with keynote speakers and personalities and photos. As with all regions of the world, grid and cloud computing has to be prove it is adding value to researchers if it is be accepted by them and demonstrate its impact on society as a while if it to be supported by national governments, funding agencies and the general public. ISGC has helped foster the emergence of a strong regional interest in the earth and life sciences, notably for natural disaster mitigation and bioinformatics studies. Prof. Simon C. Lin organised an intense social programme with a gastronomic tour of Taipei culminating with a banquet for all the symposium's participants at the hotel Palais de Chine. I would like to thank all the members of the programme committee, the participants and above all our hosts, Prof. Simon C. Lin and his excellent support team at Academia Sinica. Dr. Bob Jones Programme Chair 1 http://event.twgrid.org/isgc2011/ 2 http://www.gridforum.org/

  1. KeyWare: an open wireless distributed computing environment

    NASA Astrophysics Data System (ADS)

    Shpantzer, Isaac; Schoenfeld, Larry; Grindahl, Merv; Kelman, Vladimir

    1995-12-01

    Deployment of distributed applications in the wireless domain lack equivalent tools, methodologies, architectures, and network management that exist in LAN based applications. A wireless distributed computing environment (KeyWareTM) based on intelligent agents within a multiple client multiple server scheme was developed to resolve this problem. KeyWare renders concurrent application services to wireline and wireless client nodes encapsulated in multiple paradigms such as message delivery, database access, e-mail, and file transfer. These services and paradigms are optimized to cope with temporal and spatial radio coverage, high latency, limited throughput and transmission costs. A unified network management paradigm for both wireless and wireline facilitates seamless extensions of LAN- based management tools to include wireless nodes. A set of object oriented tools and methodologies enables direct asynchronous invocation of agent-based services supplemented by tool-sets matched to supported KeyWare paradigms. The open architecture embodiment of KeyWare enables a wide selection of client node computing platforms, operating systems, transport protocols, radio modems and infrastructures while maintaining application portability.

  2. POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph

    NASA Astrophysics Data System (ADS)

    Poat, M. D.; Lauret, J.; Betts, W.

    2015-12-01

    The STAR online computing infrastructure has become an intensive dynamic system used for first-hand data collection and analysis resulting in a dense collection of data output. As we have transitioned to our current state, inefficient, limited storage systems have become an impediment to fast feedback to online shift crews. Motivation for a centrally accessible, scalable and redundant distributed storage system had become a necessity in this environment. OpenStack Swift Object Storage and Ceph Object Storage are two eye-opening technologies as community use and development have led to success elsewhere. In this contribution, OpenStack Swift and Ceph have been put to the test with single and parallel I/O tests, emulating real world scenarios for data processing and workflows. The Ceph file system storage, offering a POSIX compliant file system mounted similarly to an NFS share was of particular interest as it aligned with our requirements and was retained as our solution. I/O performance tests were run against the Ceph POSIX file system and have presented surprising results indicating true potential for fast I/O and reliability. STAR'S online compute farm historical use has been for job submission and first hand data analysis. The goal of reusing the online compute farm to maintain a storage cluster and job submission will be an efficient use of the current infrastructure.

  3. New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases

    NASA Astrophysics Data System (ADS)

    Brescia, Massimo

    2012-11-01

    Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.

  4. The future of PanDA in ATLAS distributed computing

    NASA Astrophysics Data System (ADS)

    De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.

    2015-12-01

    Experiments at the Large Hadron Collider (LHC) face unprecedented computing challenges. Heterogeneous resources are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, while data processing requires more than a few billion hours of computing usage per year. The PanDA (Production and Distributed Analysis) system was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. In the process, the old batch job paradigm of locally managed computing in HEP was discarded in favour of a far more automated, flexible and scalable model. The success of PanDA in ATLAS is leading to widespread adoption and testing by other experiments. PanDA is the first exascale workload management system in HEP, already operating at more than a million computing jobs per day, and processing over an exabyte of data in 2013. There are many new challenges that PanDA will face in the near future, in addition to new challenges of scale, heterogeneity and increasing user base. PanDA will need to handle rapidly changing computing infrastructure, will require factorization of code for easier deployment, will need to incorporate additional information sources including network metrics in decision making, be able to control network circuits, handle dynamically sized workload processing, provide improved visualization, and face many other challenges. In this talk we will focus on the new features, planned or recently implemented, that are relevant to the next decade of distributed computing workload management using PanDA.

  5. Cloud Computing with iPlant Atmosphere.

    PubMed

    McKay, Sheldon J; Skidmore, Edwin J; LaRose, Christopher J; Mercer, Andre W; Noutsos, Christos

    2013-10-15

    Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. Copyright © 2013 John Wiley & Sons, Inc.

  6. Parallel Processing of Images in Mobile Devices using BOINC

    NASA Astrophysics Data System (ADS)

    Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo

    2018-04-01

    Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.

  7. Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing

    PubMed Central

    Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

    2018-01-01

    The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination. PMID:29565313

  8. Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing.

    PubMed

    Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

    2018-03-22

    The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination.

  9. WISDOM-II: screening against multiple targets implicated in malaria using computational grid infrastructures.

    PubMed

    Kasam, Vinod; Salzemann, Jean; Botha, Marli; Dacosta, Ana; Degliesposti, Gianluca; Isea, Raul; Kim, Doman; Maass, Astrid; Kenyon, Colin; Rastelli, Giulio; Hofmann-Apitius, Martin; Breton, Vincent

    2009-05-01

    Despite continuous efforts of the international community to reduce the impact of malaria on developing countries, no significant progress has been made in the recent years and the discovery of new drugs is more than ever needed. Out of the many proteins involved in the metabolic activities of the Plasmodium parasite, some are promising targets to carry out rational drug discovery. Recent years have witnessed the emergence of grids, which are highly distributed computing infrastructures particularly well fitted for embarrassingly parallel computations like docking. In 2005, a first attempt at using grids for large-scale virtual screening focused on plasmepsins and ended up in the identification of previously unknown scaffolds, which were confirmed in vitro to be active plasmepsin inhibitors. Following this success, a second deployment took place in the fall of 2006 focussing on one well known target, dihydrofolate reductase (DHFR), and on a new promising one, glutathione-S-transferase. In silico drug design, especially vHTS is a widely and well-accepted technology in lead identification and lead optimization. This approach, therefore builds, upon the progress made in computational chemistry to achieve more accurate in silico docking and in information technology to design and operate large scale grid infrastructures. On the computational side, a sustained infrastructure has been developed: docking at large scale, using different strategies in result analysis, storing of the results on the fly into MySQL databases and application of molecular dynamics refinement are MM-PBSA and MM-GBSA rescoring. The modeling results obtained are very promising. Based on the modeling results, In vitro results are underway for all the targets against which screening is performed. The current paper describes the rational drug discovery activity at large scale, especially molecular docking using FlexX software on computational grids in finding hits against three different targets (PfGST, PfDHFR, PvDHFR (wild type and mutant forms) implicated in malaria. Grid-enabled virtual screening approach is proposed to produce focus compound libraries for other biological targets relevant to fight the infectious diseases of the developing world.

  10. The CMS Tier0 goes cloud and grid for LHC Run 2

    DOE PAGES

    Hufnagel, Dirk

    2015-12-23

    In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threadedmore » framework to deal with the increased event complexity and to ensure efficient use of the resources. Furthermore, this contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.« less

  11. Data issues in the life sciences.

    PubMed

    Thessen, Anne E; Patterson, David J

    2011-01-01

    We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the "Big New Biology". Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available.

  12. Data issues in the life sciences

    PubMed Central

    Thessen, Anne E.; Patterson, David J.

    2011-01-01

    Abstract We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the “Big New Biology”. Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available. PMID:22207805

  13. The CMS TierO goes Cloud and Grid for LHC Run 2

    NASA Astrophysics Data System (ADS)

    Hufnagel, Dirk

    2015-12-01

    In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threaded framework to deal with the increased event complexity and to ensure efficient use of the resources. This contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.

  14. 77 FR 28901 - Amended Certification Regarding Eligibility To Apply for Worker Adjustment Assistance; The UBS...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-16

    ... Partners, Inc., Corporate Center Division, Group Technology Infrastructure Services, Infrastructure Service... Infrastructure Services, Distributed Systems and Storage Group, Chicago, Illinois. The workers provide... unit formerly known as Group Technology Infrastructure Services, Distributed Systems and Storage is...

  15. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

    PubMed

    Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

    2014-01-13

    Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.

  16. Cloud Computing in Support of Applied Learning: A Baseline Study of Infrastructure Design at Southern Polytechnic State University

    ERIC Educational Resources Information Center

    Conn, Samuel S.; Reichgelt, Han

    2013-01-01

    Cloud computing represents an architecture and paradigm of computing designed to deliver infrastructure, platforms, and software as constructible computing resources on demand to networked users. As campuses are challenged to better accommodate academic needs for applications and computing environments, cloud computing can provide an accommodating…

  17. Do regions of ALICE matter? Social relationships and data exchanges in the Grid

    NASA Astrophysics Data System (ADS)

    Widmer, E. D.; Carminati, F.; Grigoras, C.; Viry, G.; Galli Carminati, G.

    2012-06-01

    Following a previous publication [1], this study aims at investigating the impact of regional affiliations of centres on the organisation of collaboration within the Distributed Computing ALICE infrastructure, based on social networks methods. A self-administered questionnaire was sent to all centre managers about support, email interactions and wished collaborations in the infrastructure. Several additional measures, stemming from technical observations were produced, such as bandwidth, data transfers and Internet Round Trip Time (RTT) were also included. Information for 50 centres were considered (60% response rate). Empirical analysis shows that despite the centralisation on CERN, the network is highly organised by regions. The results are discussed in the light of policy and efficiency issues.

  18. Do regions matter in ALICE?. Social relationships and data exchanges in the Grid

    NASA Astrophysics Data System (ADS)

    Widmer, E. D.; Viry, G.; Carminati, F.; Galli-Carminati, G.

    2012-02-01

    This study aims at investigating the impact of regional affiliations of centres on the organisation of collaborations within the Distributed Computing ALICE infrastructure, based on social networks methods. A self-administered questionnaire was sent to all centre managers about support, email interactions and wished collaborations in the infrastructure. Several additional measures, stemming from technical observations were collected, such as bandwidth, data transfers and Internet Round Trip Time (RTT) were also included. Information for 50 centres were considered (about 70% response rate). Empirical analysis shows that despite the centralisation on CERN, the network is highly organised by regions. The results are discussed in the light of policy and efficiency issues.

  19. Building a Generic Virtual Research Environment Framework for Multiple Earth and Space Science Domains and a Diversity of Users.

    NASA Astrophysics Data System (ADS)

    Wyborn, L. A.; Fraser, R.; Evans, B. J. K.; Friedrich, C.; Klump, J. F.; Lescinsky, D. T.

    2017-12-01

    Virtual Research Environments (VREs) are now part of academic infrastructures. Online research workflows can be orchestrated whereby data can be accessed from multiple external repositories with processing taking place on public or private clouds, and centralised supercomputers using a mixture of user codes, and well-used community software and libraries. VREs enable distributed members of research teams to actively work together to share data, models, tools, software, workflows, best practices, infrastructures, etc. These environments and their components are increasingly able to support the needs of undergraduate teaching. External to the research sector, they can also be reused by citizen scientists, and be repurposed for industry users to help accelerate the diffusion and hence enable the translation of research innovations. The Virtual Geophysics Laboratory (VGL) in Australia was started in 2012, built using a collaboration between CSIRO, the National Computational Infrastructure (NCI) and Geoscience Australia, with support funding from the Australian Government Department of Education. VGL comprises three main modules that provide an interface to enable users to first select their required data; to choose a tool to process that data; and then access compute infrastructure for execution. VGL was initially built to enable a specific set of researchers in government agencies access to specific data sets and a limited number of tools. Over the years it has evolved into a multi-purpose Earth science platform with access to an increased variety of data (e.g., Natural Hazards, Geochemistry), a broader range of software packages, and an increasing diversity of compute infrastructures. This expansion has been possible because of the approach to loosely couple data, tools and compute resources via interfaces that are built on international standards and accessed as network-enabled services wherever possible. Built originally for researchers that were not fussy about general usability, increasing emphasis on User Interfaces (UIs) and stability will lead to increased uptake in the education and industry sectors. Simultaneously, improvements are being added to facilitate access to data and tools by experienced researchers who want direct access to both data and flexible workflows.

  20. Automating ATLAS Computing Operations using the Site Status Board

    NASA Astrophysics Data System (ADS)

    J, Andreeva; Iglesias C, Borrego; S, Campana; Girolamo A, Di; I, Dzhunov; Curull X, Espinal; S, Gayazov; E, Magradze; M, Nowotka M.; L, Rinaldi; P, Saiz; J, Schovancova; A, Stewart G.; M, Wright

    2012-12-01

    The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in the SSB. It will demonstrate the positive impact of the use of the SSB on the overall performance of ATLAS computing activities and will overview future plans.

  1. Predictive Anomaly Management for Resilient Virtualized Computing Infrastructures

    DTIC Science & Technology

    2015-05-27

    PREC: Practical Root Exploit Containment for Android Devices, ACM Conference on Data and Application Security and Privacy (CODASPY) . 03-MAR-14...05-OCT-11, . : , Hiep Nguyen, Yongmin Tan, Xiaohui Gu. Propagation-aware Anomaly Localization for Cloud Hosted Distributed Applications , ACM...Workshop on Managing Large-Scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques (SLAML) in conjunction with SOSP

  2. Real-Time Optimization and Control of Next-Generation Distribution

    Science.gov Websites

    Infrastructure | Grid Modernization | NREL Real-Time Optimization and Control of Next -Generation Distribution Infrastructure Real-Time Optimization and Control of Next-Generation Distribution Infrastructure This project develops innovative, real-time optimization and control methods for next-generation

  3. Caribou distribution during the post-calving period in relation to infrastructure in the Prudhoe Bay oil field, Alaska

    USGS Publications Warehouse

    Cronin, Matthew A.; Amstrup, Steven C.; Durner, George M.; Noel, Lynn E.; McDonald, Trent L.; Ballard, Warren B.

    1998-01-01

    There is concern that caribou (Rangifer tarandus) may avoid roads and facilities (i.e., infrastructure) in the Prudhoe Bay oil field (PBOF) in northern Alaska, and that this avoidance can have negative effects on the animals. We quantified the relationship between caribou distribution and PBOF infrastructure during the post-calving period (mid-June to mid-August) with aerial surveys from 1990 to 1995. We conducted four to eight surveys per year with complete coverage of the PBOF. We identified active oil field infrastructure and used a geographic information system (GIS) to construct ten 1 km wide concentric intervals surrounding the infrastructure. We tested whether caribou distribution is related to distance from infrastructure with a chi-squared habitat utilization-availability analysis and log-linear regression. We considered bulls, calves, and total caribou of all sex/age classes separately. The habitat utilization-availability analysis indicated there was no consistent trend of attraction to or avoidance of infrastructure. Caribou frequently were more abundant than expected in the intervals close to infrastructure, and this trend was more pronounced for bulls and for total caribou of all sex/age classes than for calves. Log-linear regression (with Poisson error structure) of numbers of caribou and distance from infrastructure were also done, with and without combining data into the 1 km distance intervals. The analysis without intervals revealed no relationship between caribou distribution and distance from oil field infrastructure, or between caribou distribution and Julian date, year, or distance from the Beaufort Sea coast. The log-linear regression with caribou combined into distance intervals showed the density of bulls and total caribou of all sex/age classes declined with distance from infrastructure. Our results indicate that during the post-calving period: 1) caribou distribution is largely unrelated to distance from infrastructure; 2) caribou regularly use habitats in the PBOF; 3) caribou often occur close to infrastructure; and 4) caribou do not appear to avoid oil field infrastructure.

  4. SCALING AN URBAN EMERGENCY EVACUATION FRAMEWORK: CHALLENGES AND PRACTICES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karthik, Rajasekar; Lu, Wei

    2014-01-01

    Critical infrastructure disruption, caused by severe weather events, natural disasters, terrorist attacks, etc., has significant impacts on urban transportation systems. We built a computational framework to simulate urban transportation systems under critical infrastructure disruption in order to aid real-time emergency evacuation. This framework will use large scale datasets to provide a scalable tool for emergency planning and management. Our framework, World-Wide Emergency Evacuation (WWEE), integrates population distribution and urban infrastructure networks to model travel demand in emergency situations at global level. Also, a computational model of agent-based traffic simulation is used to provide an optimal evacuation plan for traffic operationmore » purpose [1]. In addition, our framework provides a web-based high resolution visualization tool for emergency evacuation modelers and practitioners. We have successfully tested our framework with scenarios in both United States (Alexandria, VA) and Europe (Berlin, Germany) [2]. However, there are still some major drawbacks for scaling this framework to handle big data workloads in real time. On our back-end, lack of proper infrastructure limits us in ability to process large amounts of data, run the simulation efficiently and quickly, and provide fast retrieval and serving of data. On the front-end, the visualization performance of microscopic evacuation results is still not efficient enough due to high volume data communication between server and client. We are addressing these drawbacks by using cloud computing and next-generation web technologies, namely Node.js, NoSQL, WebGL, Open Layers 3 and HTML5 technologies. We will describe briefly about each one and how we are using and leveraging these technologies to provide an efficient tool for emergency management organizations. Our early experimentation demonstrates that using above technologies is a promising approach to build a scalable and high performance urban emergency evacuation framework that can improve traffic mobility and safety under critical infrastructure disruption in today s socially connected world.« less

  5. Defense Strategies for Asymmetric Networked Systems with Discrete Components.

    PubMed

    Rao, Nageswara S V; Ma, Chris Y T; Hausken, Kjell; He, Fei; Yau, David K Y; Zhuang, Jun

    2018-05-03

    We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models.

  6. Defense Strategies for Asymmetric Networked Systems with Discrete Components

    PubMed Central

    Rao, Nageswara S. V.; Ma, Chris Y. T.; Hausken, Kjell; He, Fei; Yau, David K. Y.

    2018-01-01

    We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models. PMID:29751588

  7. Computers, the Internet and medical education in Africa.

    PubMed

    Williams, Christopher D; Pitchforth, Emma L; O'Callaghan, Christopher

    2010-05-01

    OBJECTIVES This study aimed to explore the use of information and communications technology (ICT) in undergraduate medical education in developing countries. METHODS Educators (deans and heads of medical education) in English-speaking countries across Africa were sent a questionnaire to establish the current state of ICT at medical schools. Non-respondents were contacted firstly by e-mail, subsequently by two postal mailings at 3-month intervals, and finally by telephone. Main outcome measures included cross-sectional data about the availability of computers, specifications, Internet connection speeds, use of ICT by students, and teaching of ICT and computerised research skills, presented by country or region. RESULTS The mean computer : student ratio was 0.123. Internet speeds were rated as 'slow' or 'very slow' on a 5-point Likert scale by 25.0% of respondents overall, but by 58.3% in East Africa and 33.3% in West Africa (including Cameroon). Mean estimates showed that campus computers more commonly supported CD-ROM (91.4%) and sound (87.3%) than DVD-ROM (48.1%) and Internet (72.5%). The teaching of ICT and computerised research skills, and the use of computers by medical students for research, assignments and personal projects were common. CONCLUSIONS It is clear that ICT infrastructure in Africa lags behind that in other regions. Poor download speeds limit the potential of Internet resources (especially videos, sound and other large downloads) to benefit students, particularly in East and West (including Cameroon) Africa. CD-ROM capability is more widely available, but has not yet gained momentum as a means of distributing materials. Despite infrastructure limitations, ICT is already being used and there is enthusiasm for developing this further. Priority should be given to developing partnerships to improve ICT infrastructure and maximise the potential of existing technology.

  8. VOSpace: a Prototype for Grid 2.0

    NASA Astrophysics Data System (ADS)

    Graham, M. J.; Morris, D.; Rixon, G.

    2007-10-01

    As Grid 1.0 was characterized by distributed computation, so Grid 2.0 will be characterized by distributed data and the infrastructure needed to support and exploit it: the emerging success of Amazon S3 is already testimony to this. VOSpace is the IVOA interface standard for accessing distributed data. Although the base definition (VOSpace 1.0) only relates to flat, unconnected data stores, subsequent versions will add additional layers of functionality. In this paper, we consider how incorporating popular web concepts such as folksonomies (tagging), social networking, and data-spaces could lead to a much richer data environment than provided by a traditional collection of networked data stores.

  9. Federation in genomics pipelines: techniques and challenges.

    PubMed

    Chaterji, Somali; Koo, Jinkyu; Li, Ninghui; Meyer, Folker; Grama, Ananth; Bagchi, Saurabh

    2017-08-29

    Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

    DOE PAGES

    Klimentov, A.; Buncic, P.; De, K.; ...

    2015-05-22

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less

  11. Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klimentov, A.; Buncic, P.; De, K.

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Managementmore » System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10 2) sites, O(10 5) cores, O(10 8) jobs per year, O(10 3) users, and ATLAS data volume is O(10 17) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. Finally, we will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.« less

  12. Monitoring techniques and alarm procedures for CMS services and sites in WLCG

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Molina-Perez, J.; Bonacorsi, D.; Gutsche, O.

    2012-01-01

    The CMS offline computing system is composed of roughly 80 sites (including most experienced T3s) and a number of central services to distribute, process and analyze data worldwide. A high level of stability and reliability is required from the underlying infrastructure and services, partially covered by local or automated monitoring and alarming systems such as Lemon and SLS, the former collects metrics from sensors installed on computing nodes and triggers alarms when values are out of range, the latter measures the quality of service and warns managers when service is affected. CMS has established computing shift procedures with personnel operatingmore » worldwide from remote Computing Centers, under the supervision of the Computing Run Coordinator at CERN. This dedicated 24/7 computing shift personnel is contributing to detect and react timely on any unexpected error and hence ensure that CMS workflows are carried out efficiently and in a sustained manner. Synergy among all the involved actors is exploited to ensure the 24/7 monitoring, alarming and troubleshooting of the CMS computing sites and services. We review the deployment of the monitoring and alarming procedures, and report on the experience gained throughout the first two years of LHC operation. We describe the efficiency of the communication tools employed, the coherent monitoring framework, the proactive alarming systems and the proficient troubleshooting procedures that helped the CMS Computing facilities and infrastructure to operate at high reliability levels.« less

  13. Grids: The Top Ten Questions

    DOE PAGES

    Schopf, Jennifer M.; Nitzberg, Bill

    2002-01-01

    The design and implementation of a national computing system and data grid has become a reachable goal from both the computer science and computational science point of view. A distributed infrastructure capable of sophisticated computational functions can bring many benefits to scientific work, but poses many challenges, both technical and socio-political. Technical challenges include having basic software tools, higher-level services, functioning and pervasive security, and standards, while socio-political issues include building a user community, adding incentives for sites to be part of a user-centric environment, and educating funding sources about the needs of this community. This paper details the areasmore » relating to Grid research that we feel still need to be addressed to fully leverage the advantages of the Grid.« less

  14. Swiss Experiment: Design, implemention and use of a cross-disciplinary infrastructure for data intensive science

    NASA Astrophysics Data System (ADS)

    Dawes, N.; Salehi, A.; Clifton, A.; Bavay, M.; Aberer, K.; Parlange, M. B.; Lehning, M.

    2010-12-01

    It has long been known that environmental processes are cross-disciplinary, but data has continued to be acquired and held for a single purpose. Swiss Experiment is a rapidly evolving cross-disciplinary, distributed sensor data infrastructure, where tools for the environmental science community stem directly from computer science research. The platform uses the bleeding edge of computer science to acquire, store and distribute data and metadata from all environmental science disciplines at a variety of temporal and spatial resolutions. SwissEx is simultaneously developing new technologies to allow low cost, high spatial and temporal resolution measurements such that small areas can be intensely monitored. This data is then combined with existing widespread, low density measurements in the cross-disciplinary platform to provide well documented datasets, which are of use to multiple research disciplines. We present a flexible, generic infrastructure at an advanced stage of development. The infrastructure makes the most of Web 2.0 technologies for a collaborative working environment and as a user interface for a metadata database. This environment is already closely integrated with GSN, an open-source database middleware developed under Swiss Experiment for acquisition and storage of generic time-series data (2D and 3D). GSN can be queried directly by common data processing packages and makes data available in real-time to models and 3rd party software interfaces via its web service interface. It also provides real-time push or pull data exchange between instances, a user management system which leaves data owners in charge of their data, advanced real-time processing and much more. The SwissEx interface is increasingly gaining users and supporting environmental science in Switzerland. It is also an integral part of environmental education projects ClimAtscope and O3E, where the technologies can provide rapid feedback of results for children of all ages and where the data from their own stations can be compared to national data networks.

  15. A Cloud-based Infrastructure and Architecture for Environmental System Research

    NASA Astrophysics Data System (ADS)

    Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.

    2016-12-01

    The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.

  16. SimBOX: a scalable architecture for aggregate distributed command and control of spaceport and service constellation

    NASA Astrophysics Data System (ADS)

    Prasad, Guru; Jayaram, Sanjay; Ward, Jami; Gupta, Pankaj

    2004-08-01

    In this paper, Aximetric proposes a decentralized Command and Control (C2) architecture for a distributed control of a cluster of on-board health monitoring and software enabled control systems called SimBOX that will use some of the real-time infrastructure (RTI) functionality from the current military real-time simulation architecture. The uniqueness of the approach is to provide a "plug and play environment" for various system components that run at various data rates (Hz) and the ability to replicate or transfer C2 operations to various subsystems in a scalable manner. This is possible by providing a communication bus called "Distributed Shared Data Bus" and a distributed computing environment used to scale the control needs by providing a self-contained computing, data logging and control function module that can be rapidly reconfigured to perform different functions. This kind of software-enabled control is very much needed to meet the needs of future aerospace command and control functions.

  17. SimBox: a simulation-based scalable architecture for distributed command and control of spaceport and service constellations

    NASA Astrophysics Data System (ADS)

    Prasad, Guru; Jayaram, Sanjay; Ward, Jami; Gupta, Pankaj

    2004-09-01

    In this paper, Aximetric proposes a decentralized Command and Control (C2) architecture for a distributed control of a cluster of on-board health monitoring and software enabled control systems called SimBOX that will use some of the real-time infrastructure (RTI) functionality from the current military real-time simulation architecture. The uniqueness of the approach is to provide a "plug and play environment" for various system components that run at various data rates (Hz) and the ability to replicate or transfer C2 operations to various subsystems in a scalable manner. This is possible by providing a communication bus called "Distributed Shared Data Bus" and a distributed computing environment used to scale the control needs by providing a self-contained computing, data logging and control function module that can be rapidly reconfigured to perform different functions. This kind of software-enabled control is very much needed to meet the needs of future aerospace command and control functions.

  18. Towards a single seismological service infrastructure in Europe

    NASA Astrophysics Data System (ADS)

    Spinuso, A.; Trani, L.; Frobert, L.; Van Eck, T.

    2012-04-01

    In the last five year services and data providers, within the seismological community in Europe, focused their efforts in migrating the way of opening their archives towards a Service Oriented Architecture (SOA). This process tries to follow pragmatically the technological trends and available solutions aiming at effectively improving all the data stewardship activities. These advancements are possible thanks to the cooperation and the follow-ups of several EC infrastructural projects that, by looking at general purpose techniques, combine their developments envisioning a multidisciplinary platform for the earth observation as the final common objective (EPOS, Earth Plate Observation System) One of the first results of this effort is the Earthquake Data Portal (http://www.seismicportal.eu), which provides a collection of tools to discover, visualize and access a variety of seismological data sets like seismic waveform, accelerometric data, earthquake catalogs and parameters. The Portal offers a cohesive distributed search environment, linking data search and access across multiple data providers through interactive web-services, map-based tools and diverse command-line clients. Our work continues under other EU FP7 projects. Here we will address initiatives in two of those projects. The NERA, (Network of European Research Infrastructures for Earthquake Risk Assessment and Mitigation) project will implement a Common Services Architecture based on OGC services APIs, in order to provide Resource-Oriented common interfaces across the data access and processing services. This will improve interoperability between tools and across projects, enabling the development of higher-level applications that can uniformly access the data and processing services of all participants. This effort will be conducted jointly with the VERCE project (Virtual Earthquake and Seismology Research Community for Europe). VERCE aims to enable seismologists to exploit the wealth of seismic data within a data-intensive computation framework, which will be tailored to the specific needs of the community. It will provide a new interoperable infrastructure, as the computational backbone laying behind the publicly available interfaces. VERCE will have to face the challenges of implementing a service oriented architecture providing an efficient layer between the Data and the Grid infrastructures, coupling HPC data analysis and HPC data modeling applications through the execution of workflows and data sharing mechanism. Online registries of interoperable worklflow components, storage of intermediate results and data provenance are those aspects that are currently under investigations to make the VERCE facilities usable from a large scale of users, data and service providers. For such purposes the adoption of a Digital Object Architecture, to create online catalogs referencing and describing semantically all these distributed resources, such as datasets, computational processes and derivative products, is seen as one of the viable solution to monitor and steer the usage of the infrastructure, increasing its efficiency and the cooperation among the community.

  19. Contingency theoretic methodology for agent-based web-oriented manufacturing systems

    NASA Astrophysics Data System (ADS)

    Durrett, John R.; Burnell, Lisa J.; Priest, John W.

    2000-12-01

    The development of distributed, agent-based, web-oriented, N-tier Information Systems (IS) must be supported by a design methodology capable of responding to the convergence of shifts in business process design, organizational structure, computing, and telecommunications infrastructures. We introduce a contingency theoretic model for the use of open, ubiquitous software infrastructure in the design of flexible organizational IS. Our basic premise is that developers should change in the way they view the software design process from a view toward the solution of a problem to one of the dynamic creation of teams of software components. We postulate that developing effective, efficient, flexible, component-based distributed software requires reconceptualizing the current development model. The basic concepts of distributed software design are merged with the environment-causes-structure relationship from contingency theory; the task-uncertainty of organizational- information-processing relationships from information processing theory; and the concept of inter-process dependencies from coordination theory. Software processes are considered as employees, groups of processes as software teams, and distributed systems as software organizations. Design techniques already used in the design of flexible business processes and well researched in the domain of the organizational sciences are presented. Guidelines that can be utilized in the creation of component-based distributed software will be discussed.

  20. Web Services and Handle Infrastructure - WDCC's Contributions to International Projects

    NASA Astrophysics Data System (ADS)

    Föll, G.; Weigelt, T.; Kindermann, S.; Lautenschlager, M.; Toussaint, F.

    2012-04-01

    Climate science demands on data management are growing rapidly as climate models grow in the precision with which they depict spatial structures and in the completeness with which they describe a vast range of physical processes. The ExArch project is exploring the challenges of developing a software management infrastructure which will scale to the multi-exabyte archives of climate data which are likely to be crucial to major policy decisions in by the end of the decade. The ExArch approach to future integration of exascale climate archives is based on one hand on a distributed web service architecture providing data analysis and quality control functionality across archvies. On the other hand a consistent persistent identifier infrastructure is deployed to support distributed data management and data replication. Distributed data analysis functionality is based on the CDO climate data operators' package. The CDO-Tool is used for processing of the archived data and metadata. CDO is a collection of command line Operators to manipulate and analyse Climate and forecast model Data. A range of formats is supported and over 500 operators are provided. CDO presently is designed to work in a scripting environment with local files. ExArch will extend the tool to support efficient usage in an exascale archive with distributed data and computational resources by providing flexible scheduling capabilities. Quality control will become increasingly important in an exascale computing context. Researchers will be dealing with millions of data files from multiple sources and will need to know whether the files satisfy a range of basic quality criterea. Hence ExArch will provide a flexible and extensible quality control system. The data will be held at more than 30 computing centres and data archives around the world, but for users it will appear as a single archive due to a standardized ExArch Web Processing Service. Data infrastructures such as the one built by ExArch can greatly benefit from assigning persistent identifiers (PIDs) to the main entities, such as data and metadata records. A PID should then not only consist of a globally unique identifier, but also support built-in facilities to relate PIDs to each other, to build multi-hierarchical virtual collections and to enable attaching basic metadata directly to PIDs. With such a toolset, PIDs can support crucial data management tasks. For example, data replication performed in ExArch can be supported through PIDs as they can help to establish durable links between identical copies. By linking derivative data objects together, their provenance can be traced with a level of detail and reliability currently unavailable in the Earth system modelling domain. Regarding data transfers, virtual collections of PIDs may be used to package data prior to transmission. If the PID of such a collection is used as the primary key in data transfers, safety of transfer and traceability of data objects across repositories increases. End-users can benefit from PIDs as well since they make data discovery independent from particular storage sites and enable user-friendly communication about primary research objects. A generic PID system can in fact be a fundamental building block for scientific e-infrastructures across projects and domains.

  1. OOI CyberInfrastructure - Next Generation Oceanographic Research

    NASA Astrophysics Data System (ADS)

    Farcas, C.; Fox, P.; Arrott, M.; Farcas, E.; Klacansky, I.; Krueger, I.; Meisinger, M.; Orcutt, J.

    2008-12-01

    Software has become a key enabling technology for scientific discovery, observation, modeling, and exploitation of natural phenomena. New value emerges from the integration of individual subsystems into networked federations of capabilities exposed to the scientific community. Such data-intensive interoperability networks are crucial for future scientific collaborative research, as they open up new ways of fusing data from different sources and across various domains, and analysis on wide geographic areas. The recently established NSF OOI program, through its CyberInfrastructure component addresses this challenge by providing broad access from sensor networks for data acquisition up to computational grids for massive computations and binding infrastructure facilitating policy management and governance of the emerging system-of-scientific-systems. We provide insight into the integration core of this effort, namely, a hierarchic service-oriented architecture for a robust, performant, and maintainable implementation. We first discuss the relationship between data management and CI crosscutting concerns such as identity management, policy and governance, which define the organizational contexts for data access and usage. Next, we detail critical services including data ingestion, transformation, preservation, inventory, and presentation. To address interoperability issues between data represented in various formats we employ a semantic framework derived from the Earth System Grid technology, a canonical representation for scientific data based on DAP/OPeNDAP, and related data publishers such as ERDDAP. Finally, we briefly present the underlying transport based on a messaging infrastructure over the AMQP protocol, and the preservation based on a distributed file system through SDSC iRODS.

  2. JINR cloud infrastructure evolution

    NASA Astrophysics Data System (ADS)

    Baranov, A. V.; Balashov, N. A.; Kutovskiy, N. A.; Semenov, R. N.

    2016-09-01

    To fulfil JINR commitments in different national and international projects related to the use of modern information technologies such as cloud and grid computing as well as to provide a modern tool for JINR users for their scientific research a cloud infrastructure was deployed at Laboratory of Information Technologies of Joint Institute for Nuclear Research. OpenNebula software was chosen as a cloud platform. Initially it was set up in simple configuration with single front-end host and a few cloud nodes. Some custom development was done to tune JINR cloud installation to fit local needs: web form in the cloud web-interface for resources request, a menu item with cloud utilization statistics, user authentication via Kerberos, custom driver for OpenVZ containers. Because of high demand in that cloud service and its resources over-utilization it was re-designed to cover increasing users' needs in capacity, availability and reliability. Recently a new cloud instance has been deployed in high-availability configuration with distributed network file system and additional computing power.

  3. Software Defined Networking challenges and future direction: A case study of implementing SDN features on OpenStack private cloud

    NASA Astrophysics Data System (ADS)

    Shamugam, Veeramani; Murray, I.; Leong, J. A.; Sidhu, Amandeep S.

    2016-03-01

    Cloud computing provides services on demand instantly, such as access to network infrastructure consisting of computing hardware, operating systems, network storage, database and applications. Network usage and demands are growing at a very fast rate and to meet the current requirements, there is a need for automatic infrastructure scaling. Traditional networks are difficult to automate because of the distributed nature of their decision making process for switching or routing which are collocated on the same device. Managing complex environments using traditional networks is time-consuming and expensive, especially in the case of generating virtual machines, migration and network configuration. To mitigate the challenges, network operations require efficient, flexible, agile and scalable software defined networks (SDN). This paper discuss various issues in SDN and suggests how to mitigate the network management related issues. A private cloud prototype test bed was setup to implement the SDN on the OpenStack platform to test and evaluate the various network performances provided by the various configurations.

  4. The Sunrise project: An R&D project for a national information infrastructure prototype

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Juhnyoung

    1995-02-01

    Sunrise is a Los Alamos National Laboratory (LANL) project started in October 1993. It is intended to a prototype National Information Infrastructure (NII) development project. A main focus of Sunrise is to tie together enabling technologies (networking, object-oriented distributed computing, graphical interfaces, security, multimedia technologies, and data mining technologies) with several specific applications. A diverse set of application areas was chosen to ensure that the solutions developed in the project are as generic as possible. Some of the application areas are materials modeling, medical records and image analysis, transportation simulations, and education. This paper provides a description of Sunrise andmore » a view of the architecture and objectives of this evolving project. The primary objectives of Sunrise are three-fold: (1) To develop common information-enabling tools for advanced scientific research and its applications to industry; (2) To enhance the capabilities of important research programs at the Laboratory; and (3) To define a new way of collaboration between computer science and industrially relevant research.« less

  5. Cloud-Based Perception and Control of Sensor Nets and Robot Swarms

    DTIC Science & Technology

    2016-04-01

    distributed stream processing framework provides the necessary API and infrastructure to develop and execute such applications in a cluster of computation...streaming DDDAS applications based on challenges they present to the backend Cloud control system. Figure 2 Parallel SLAM Application 3 1) Set of...the art deep learning- based object detectors can recognize among hundreds of object classes and this capability would be very useful for mobile

  6. Toward an automated parallel computing environment for geosciences

    NASA Astrophysics Data System (ADS)

    Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping

    2007-08-01

    Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.

  7. Dashboard Task Monitor for Managing ATLAS User Analysis on the Grid

    NASA Astrophysics Data System (ADS)

    Sargsyan, L.; Andreeva, J.; Jha, M.; Karavakis, E.; Kokoszkiewicz, L.; Saiz, P.; Schovancova, J.; Tuckett, D.; Atlas Collaboration

    2014-06-01

    The organization of the distributed user analysis on the Worldwide LHC Computing Grid (WLCG) infrastructure is one of the most challenging tasks among the computing activities at the Large Hadron Collider. The Experiment Dashboard offers a solution that not only monitors but also manages (kill, resubmit) user tasks and jobs via a web interface. The ATLAS Dashboard Task Monitor provides analysis users with a tool that is independent of the operating system and Grid environment. This contribution describes the functionality of the application and its implementation details, in particular authentication, authorization and audit of the management operations.

  8. Separating Added Value from Hype: Some Experiences and Prognostications

    NASA Astrophysics Data System (ADS)

    Reed, Dan

    2004-03-01

    These are exciting times for the interplay of science and computing technology. As new data archives, instruments and computing facilities are connected nationally and internationally, a new model of distributed scientific collaboration is emerging. However, any new technology brings both opportunities and challenges -- Grids are no exception. In this talk, we will discuss some of the experiences deploying Grid software in production environments, illustrated with experiences from the NSF PACI Alliance, the NSF Extensible Terascale Facility (ETF) and other Grid projects. From these experiences, we derive some guidelines for deployment and some suggestions for community engagement, software development and infrastructure

  9. Efficient workload management in geographically distributed data centers leveraging autoregressive models

    NASA Astrophysics Data System (ADS)

    Altomare, Albino; Cesario, Eugenio; Mastroianni, Carlo

    2016-10-01

    The opportunity of using Cloud resources on a pay-as-you-go basis and the availability of powerful data centers and high bandwidth connections are speeding up the success and popularity of Cloud systems, which is making on-demand computing a common practice for enterprises and scientific communities. The reasons for this success include natural business distribution, the need for high availability and disaster tolerance, the sheer size of their computational infrastructure, and/or the desire to provide uniform access times to the infrastructure from widely distributed client sites. Nevertheless, the expansion of large data centers is resulting in a huge rise of electrical power consumed by hardware facilities and cooling systems. The geographical distribution of data centers is becoming an opportunity: the variability of electricity prices, environmental conditions and client requests, both from site to site and with time, makes it possible to intelligently and dynamically (re)distribute the computational workload and achieve as diverse business goals as: the reduction of costs, energy consumption and carbon emissions, the satisfaction of performance constraints, the adherence to Service Level Agreement established with users, etc. This paper proposes an approach that helps to achieve the business goals established by the data center administrators. The workload distribution is driven by a fitness function, evaluated for each data center, which weighs some key parameters related to business objectives, among which, the price of electricity, the carbon emission rate, the balance of load among the data centers etc. For example, the energy costs can be reduced by using a "follow the moon" approach, e.g. by migrating the workload to data centers where the price of electricity is lower at that time. Our approach uses data about historical usage of the data centers and data about environmental conditions to predict, with the help of regressive models, the values of the parameters of the fitness function, and then to appropriately tune the weights assigned to the parameters in accordance to the business goals. Preliminary experimental results, presented in this paper, show encouraging benefits.

  10. International Symposium on Grids and Clouds (ISGC) 2014

    NASA Astrophysics Data System (ADS)

    The International Symposium on Grids and Clouds (ISGC) 2014 will be held at Academia Sinica in Taipei, Taiwan from 23-28 March 2014, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC).“Bringing the data scientist to global e-Infrastructures” is the theme of ISGC 2014. The last decade has seen the phenomenal growth in the production of data in all forms by all research communities to produce a deluge of data from which information and knowledge need to be extracted. Key to this success will be the data scientist - educated to use advanced algorithms, applications and infrastructures - collaborating internationally to tackle society’s challenges. ISGC 2014 will bring together researchers working in all aspects of data science from different disciplines around the world to collaborate and educate themselves in the latest achievements and techniques being used to tackle the data deluge. In addition to the regular workshops, technical presentations and plenary keynotes, ISGC this year will focus on how to grow the data science community by considering the educational foundation needed for tomorrow’s data scientist. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities & Social Sciences Application, Virtual Research Environment (including Middleware, tools, services, workflow, ... etc.), Data Management, Big Data, Infrastructure & Operations Management, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC).

  11. SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology.

    PubMed

    Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E; Troein, Carl; Millar, Andrew J; Goryanin, Igor; Gilmore, Stephen

    2013-03-01

    Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI's use of standard data formats. All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials.

  12. Computing shifts to monitor ATLAS distributed computing infrastructure and operations

    NASA Astrophysics Data System (ADS)

    Adam, C.; Barberis, D.; Crépé-Renaudin, S.; De, K.; Fassi, F.; Stradling, A.; Svatos, M.; Vartapetian, A.; Wolters, H.

    2017-10-01

    The ATLAS Distributed Computing (ADC) group established a new Computing Run Coordinator (CRC) shift at the start of LHC Run 2 in 2015. The main goal was to rely on a person with a good overview of the ADC activities to ease the ADC experts’ workload. The CRC shifter keeps track of ADC tasks related to their fields of expertise and responsibility. At the same time, the shifter maintains a global view of the day-to-day operations of the ADC system. During Run 1, this task was accomplished by a person of the expert team called the ADC Manager on Duty (AMOD), a position that was removed during the shutdown period due to the reduced number and availability of ADC experts foreseen for Run 2. The CRC position was proposed to cover some of the AMODs former functions, while allowing more people involved in computing to participate. In this way, CRC shifters help with the training of future ADC experts. The CRC shifters coordinate daily ADC shift operations, including tracking open issues, reporting, and representing ADC in relevant meetings. The CRC also facilitates communication between the ADC experts team and the other ADC shifters. These include the Distributed Analysis Support Team (DAST), which is the first point of contact for addressing all distributed analysis questions, and the ATLAS Distributed Computing Shifters (ADCoS), which check and report problems in central services, sites, Tier-0 export, data transfers and production tasks. Finally, the CRC looks at the level of ADC activities on a weekly or monthly timescale to ensure that ADC resources are used efficiently.

  13. Internet of things for an age-friendly healthcare.

    PubMed

    Konstantinidis, Evdokimos I; Bamparopoulos, Giorgos; Billis, Antonis; Bamidis, Panagiotis D

    2015-01-01

    In healthcare applications a large cohort of recent implementations utilises IoT-oriented infrastructures (XMPP) as well as smart mobile devices as communication gateways. IoT characteristi Communication/Connectivity, Pervasive Computing and Ambient Intelligence, are all highly related to Active and Healthy Aging environments. This paper presents a new idea, that of IoT enabled devices which are directly connected to the IoT (a glucose meter is used as an example herein), complying with the XMPP messaging protocol and the incorporation of a recently released Controller Application Communication (CAC) framework for distributed, cross-platform communication. A web based exergaming platform and a disease management tool, provide the vehicles for the demonstration of the feasibility and the successful implementation and integration of the aforementioned infrastructure.

  14. DNAseq Workflow in a Diagnostic Context and an Example of a User Friendly Implementation.

    PubMed

    Wolf, Beat; Kuonen, Pierre; Dandekar, Thomas; Atlan, David

    2015-01-01

    Over recent years next generation sequencing (NGS) technologies evolved from costly tools used by very few, to a much more accessible and economically viable technology. Through this recently gained popularity, its use-cases expanded from research environments into clinical settings. But the technical know-how and infrastructure required to analyze the data remain an obstacle for a wider adoption of this technology, especially in smaller laboratories. We present GensearchNGS, a commercial DNAseq software suite distributed by Phenosystems SA. The focus of GensearchNGS is the optimal usage of already existing infrastructure, while keeping its use simple. This is achieved through the integration of existing tools in a comprehensive software environment, as well as custom algorithms developed with the restrictions of limited infrastructures in mind. This includes the possibility to connect multiple computers to speed up computing intensive parts of the analysis such as sequence alignments. We present a typical DNAseq workflow for NGS data analysis and the approach GensearchNGS takes to implement it. The presented workflow goes from raw data quality control to the final variant report. This includes features such as gene panels and the integration of online databases, like Ensembl for annotations or Cafe Variome for variant sharing.

  15. NCI's Distributed Geospatial Data Server

    NASA Astrophysics Data System (ADS)

    Larraondo, P. R.; Evans, B. J. K.; Antony, J.

    2016-12-01

    Earth systems, environmental and geophysics datasets are an extremely valuable source of information about the state and evolution of the Earth. However, different disciplines and applications require this data to be post-processed in different ways before it can be used. For researchers experimenting with algorithms across large datasets or combining multiple data sets, the traditional approach to batch data processing and storing all the output for later analysis rapidly becomes unfeasible, and often requires additional work to publish for others to use. Recent developments on distributed computing using interactive access to significant cloud infrastructure opens the door for new ways of processing data on demand, hence alleviating the need for storage space for each individual copy of each product. The Australian National Computational Infrastructure (NCI) has developed a highly distributed geospatial data server which supports interactive processing of large geospatial data products, including satellite Earth Observation data and global model data, using flexible user-defined functions. This system dynamically and efficiently distributes the required computations among cloud nodes and thus provides a scalable analysis capability. In many cases this completely alleviates the need to preprocess and store the data as products. This system presents a standards-compliant interface, allowing ready accessibility for users of the data. Typical data wrangling problems such as handling different file formats and data types, or harmonising the coordinate projections or temporal and spatial resolutions, can now be handled automatically by this service. The geospatial data server exposes functionality for specifying how the data should be aggregated and transformed. The resulting products can be served using several standards such as the Open Geospatial Consortium's (OGC) Web Map Service (WMS) or Web Feature Service (WFS), Open Street Map tiles, or raw binary arrays under different conventions. We will show some cases where we have used this new capability to provide a significant improvement over previous approaches.

  16. 1001 Ways to run AutoDock Vina for virtual screening

    NASA Astrophysics Data System (ADS)

    Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D.

    2016-03-01

    Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.

  17. 1001 Ways to run AutoDock Vina for virtual screening.

    PubMed

    Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D

    2016-03-01

    Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.

  18. Distributed analysis in ATLAS

    NASA Astrophysics Data System (ADS)

    Dewhurst, A.; Legger, F.

    2015-12-01

    The ATLAS experiment accumulated more than 140 PB of data during the first run of the Large Hadron Collider (LHC) at CERN. The analysis of such an amount of data is a challenging task for the distributed physics community. The Distributed Analysis (DA) system of the ATLAS experiment is an established and stable component of the ATLAS distributed computing operations. About half a million user jobs are running daily on DA resources, submitted by more than 1500 ATLAS physicists. The reliability of the DA system during the first run of the LHC and the following shutdown period has been high thanks to the continuous automatic validation of the distributed analysis sites and the user support provided by a dedicated team of expert shifters. During the LHC shutdown, the ATLAS computing model has undergone several changes to improve the analysis workflows, including the re-design of the production system, a new analysis data format and event model, and the development of common reduction and analysis frameworks. We report on the impact such changes have on the DA infrastructure, describe the new DA components, and include recent performance measurements.

  19. A Study of ATLAS Grid Performance for Distributed Analysis

    NASA Astrophysics Data System (ADS)

    Panitkin, Sergey; Fine, Valery; Wenaus, Torre

    2012-12-01

    In the past two years the ATLAS Collaboration at the LHC has collected a large volume of data and published a number of ground breaking papers. The Grid-based ATLAS distributed computing infrastructure played a crucial role in enabling timely analysis of the data. We will present a study of the performance and usage of the ATLAS Grid as platform for physics analysis in 2011. This includes studies of general properties as well as timing properties of user jobs (wait time, run time, etc). These studies are based on mining of data archived by the PanDA workload management system.

  20. Modeling the probability distribution of peak discharge for infiltrating hillslopes

    NASA Astrophysics Data System (ADS)

    Baiamonte, Giorgio; Singh, Vijay P.

    2017-07-01

    Hillslope response plays a fundamental role in the prediction of peak discharge at the basin outlet. The peak discharge for the critical duration of rainfall and its probability distribution are needed for designing urban infrastructure facilities. This study derives the probability distribution, denoted as GABS model, by coupling three models: (1) the Green-Ampt model for computing infiltration, (2) the kinematic wave model for computing discharge hydrograph from the hillslope, and (3) the intensity-duration-frequency (IDF) model for computing design rainfall intensity. The Hortonian mechanism for runoff generation is employed for computing the surface runoff hydrograph. Since the antecedent soil moisture condition (ASMC) significantly affects the rate of infiltration, its effect on the probability distribution of peak discharge is investigated. Application to a watershed in Sicily, Italy, shows that with the increase of probability, the expected effect of ASMC to increase the maximum discharge diminishes. Only for low values of probability, the critical duration of rainfall is influenced by ASMC, whereas its effect on the peak discharge seems to be less for any probability. For a set of parameters, the derived probability distribution of peak discharge seems to be fitted by the gamma distribution well. Finally, an application to a small watershed, with the aim to test the possibility to arrange in advance the rational runoff coefficient tables to be used for the rational method, and a comparison between peak discharges obtained by the GABS model with those measured in an experimental flume for a loamy-sand soil were carried out.

  1. mGrid: A load-balanced distributed computing environment for the remote execution of the user-defined Matlab code

    PubMed Central

    Karpievitch, Yuliya V; Almeida, Jonas S

    2006-01-01

    Background Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. Results mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Conclusion Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet. PMID:16539707

  2. mGrid: a load-balanced distributed computing environment for the remote execution of the user-defined Matlab code.

    PubMed

    Karpievitch, Yuliya V; Almeida, Jonas S

    2006-03-15

    Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet.

  3. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

    PubMed Central

    2014-01-01

    Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292

  4. Implementing Computer-Aided Instruction in Distance Education: An Infrastructure. RR/89-06.

    ERIC Educational Resources Information Center

    Kotze, Paula

    The infrastructure required for the implementation of computer aided instruction is described with particular reference to the distance education environment at the University of South Africa. A review of the state of the art of online distance education in the United States and Europe is followed by an outline of the proposed infrastructure for…

  5. Adventures in Private Cloud: Balancing Cost and Capability at the CloudSat Data Processing Center

    NASA Astrophysics Data System (ADS)

    Partain, P.; Finley, S.; Fluke, J.; Haynes, J. M.; Cronk, H. Q.; Miller, S. D.

    2016-12-01

    Since the beginning of the CloudSat Mission in 2006, The CloudSat Data Processing Center (DPC) at the Cooperative Institute for Research in the Atmosphere (CIRA) has been ingesting data from the satellite and other A-Train sensors, producing data products, and distributing them to researchers around the world. The computing infrastructure was specifically designed to fulfill the requirements as specified at the beginning of what nominally was a two-year mission. The environment consisted of servers dedicated to specific processing tasks in a rigid workflow to generate the required products. To the benefit of science and with credit to the mission engineers, CloudSat has lasted well beyond its planned lifetime and is still collecting data ten years later. Over that period requirements of the data processing system have greatly expanded and opportunities for providing value-added services have presented themselves. But while demands on the system have increased, the initial design allowed for very little expansion in terms of scalability and flexibility. The design did change to include virtual machine processing nodes and distributed workflows but infrastructure management was still a time consuming task when system modification was required to run new tests or implement new processes. To address the scalability, flexibility, and manageability of the system Cloud computing methods and technologies are now being employed. The use of a public cloud like Amazon Elastic Compute Cloud or Google Compute Engine was considered but, among other issues, data transfer and storage cost becomes a problem especially when demand fluctuates as a result of reprocessing and the introduction of new products and services. Instead, the existing system was converted to an on premises private Cloud using the OpenStack computing platform and Ceph software defined storage to reap the benefits of the Cloud computing paradigm. This work details the decisions that were made, the benefits that have been realized, the difficulties that were encountered and issues that still exist.

  6. Evolution of the ATLAS distributed computing system during the LHC long shutdown

    NASA Astrophysics Data System (ADS)

    Campana, S.; Atlas Collaboration

    2014-06-01

    The ATLAS Distributed Computing project (ADC) was established in 2007 to develop and operate a framework, following the ATLAS computing model, to enable data storage, processing and bookkeeping on top of the Worldwide LHC Computing Grid (WLCG) distributed infrastructure. ADC development has always been driven by operations and this contributed to its success. The system has fulfilled the demanding requirements of ATLAS, daily consolidating worldwide up to 1 PB of data and running more than 1.5 million payloads distributed globally, supporting almost one thousand concurrent distributed analysis users. Comprehensive automation and monitoring minimized the operational manpower required. The flexibility of the system to adjust to operational needs has been important to the success of the ATLAS physics program. The LHC shutdown in 2013-2015 affords an opportunity to improve the system in light of operational experience and scale it to cope with the demanding requirements of 2015 and beyond, most notably a much higher trigger rate and event pileup. We will describe the evolution of the ADC software foreseen during this period. This includes consolidating the existing Production and Distributed Analysis framework (PanDA) and ATLAS Grid Information System (AGIS), together with the development and commissioning of next generation systems for distributed data management (DDM/Rucio) and production (Prodsys-2). We will explain how new technologies such as Cloud Computing and NoSQL databases, which ATLAS investigated as R&D projects in past years, will be integrated in production. Finally, we will describe more fundamental developments such as breaking job-to-data locality by exploiting storage federations and caches, and event level (rather than file or dataset level) workload engines.

  7. Using Swarming Agents for Scalable Security in Large Network Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crouse, Michael; White, Jacob L.; Fulp, Errin W.

    2011-09-23

    The difficulty of securing computer infrastructures increases as they grow in size and complexity. Network-based security solutions such as IDS and firewalls cannot scale because of exponentially increasing computational costs inherent in detecting the rapidly growing number of threat signatures. Hostbased solutions like virus scanners and IDS suffer similar issues, and these are compounded when enterprises try to monitor these in a centralized manner. Swarm-based autonomous agent systems like digital ants and artificial immune systems can provide a scalable security solution for large network environments. The digital ants approach offers a biologically inspired design where each ant in the virtualmore » colony can detect atoms of evidence that may help identify a possible threat. By assembling the atomic evidences from different ant types the colony may detect the threat. This decentralized approach can require, on average, fewer computational resources than traditional centralized solutions; however there are limits to its scalability. This paper describes how dividing a large infrastructure into smaller managed enclaves allows the digital ant framework to effectively operate in larger environments. Experimental results will show that using smaller enclaves allows for more consistent distribution of agents and results in faster response times.« less

  8. High Resolution Sensing and Control of Urban Water Networks

    NASA Astrophysics Data System (ADS)

    Bartos, M. D.; Wong, B. P.; Kerkez, B.

    2016-12-01

    We present a framework to enable high-resolution sensing, modeling, and control of urban watersheds using (i) a distributed sensor network based on low-cost cellular-enabled motes, (ii) hydraulic models powered by a cloud computing infrastructure, and (iii) automated actuation valves that allow infrastructure to be controlled in real time. This platform initiates two major advances. First, we achieve a high density of measurements in urban environments, with an anticipated 40+ sensors over each urban area of interest. In addition to new measurements, we also illustrate the design and evaluation of a "smart" control system for real-world hydraulic networks. This control system improves water quality and mitigates flooding by using real-time hydraulic models to adaptively control releases from retention basins. We evaluate the potential of this platform through two ongoing deployments: (i) a flood monitoring network in the Dallas-Fort Worth metropolitan area that detects and anticipates floods at the level of individual roadways, and (ii) a real-time hydraulic control system in the city of Ann Arbor, MI—soon to be one of the most densely instrumented urban watersheds in the United States. Through these applications, we demonstrate that distributed sensing and control of water infrastructure can improve flash flood predictions, emergency response, and stormwater contaminant mitigation.

  9. Data Center Consolidation: A Step towards Infrastructure Clouds

    NASA Astrophysics Data System (ADS)

    Winter, Markus

    Application service providers face enormous challenges and rising costs in managing and operating a growing number of heterogeneous system and computing landscapes. Limitations of traditional computing environments force IT decision-makers to reorganize computing resources within the data center, as continuous growth leads to an inefficient utilization of the underlying hardware infrastructure. This paper discusses a way for infrastructure providers to improve data center operations based on the findings of a case study on resource utilization of very large business applications and presents an outlook beyond server consolidation endeavors, transforming corporate data centers into compute clouds.

  10. Elastic Cloud Computing Infrastructures in the Open Cirrus Testbed Implemented via Eucalyptus

    NASA Astrophysics Data System (ADS)

    Baun, Christian; Kunze, Marcel

    Cloud computing realizes the advantages and overcomes some restrictionsof the grid computing paradigm. Elastic infrastructures can easily be createdand managed by cloud users. In order to accelerate the research ondata center management and cloud services the OpenCirrusTM researchtestbed has been started by HP, Intel and Yahoo!. Although commercialcloud offerings are proprietary, Open Source solutions exist in the field ofIaaS with Eucalyptus, PaaS with AppScale and at the applications layerwith Hadoop MapReduce. This paper examines the I/O performance ofcloud computing infrastructures implemented with Eucalyptus in contrastto Amazon S3.

  11. The Earth System Grid Federation (ESGF) Project

    NASA Astrophysics Data System (ADS)

    Carenton-Madiec, Nicolas; Denvil, Sébastien; Greenslade, Mark

    2015-04-01

    The Earth System Grid Federation (ESGF) Peer-to-Peer (P2P) enterprise system is a collaboration that develops, deploys and maintains software infrastructure for the management, dissemination, and analysis of model output and observational data. ESGF's primary goal is to facilitate advancements in Earth System Science. It is an interagency and international effort led by the US Department of Energy (DOE), and co-funded by National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA), National Science Foundation (NSF), Infrastructure for the European Network of Earth System Modelling (IS-ENES) and international laboratories such as the Max Planck Institute for Meteorology (MPI-M) german Climate Computing Centre (DKRZ), the Australian National University (ANU) National Computational Infrastructure (NCI), Institut Pierre-Simon Laplace (IPSL), and the British Atmospheric Data Center (BADC). Its main mission is to support current CMIP5 activities and prepare for future assesments. The ESGF architecture is based on a system of autonomous and distributed nodes, which interoperate through common acceptance of federation protocols and trust agreements. Data is stored at multiple nodes around the world, and served through local data and metadata services. Nodes exchange information about their data holdings and services, trust each other for registering users and establishing access control decisions. The net result is that a user can use a web browser, connect to any node, and seamlessly find and access data throughout the federation. This type of collaborative working organization and distributed architecture context en-lighted the need of integration and testing processes definition to ensure the quality of software releases and interoperability. This presentation will introduce the ESGF project and demonstrate the range of tools and processes that have been set up to support release management activities.

  12. Climate Science's Globally Distributed Infrastructure

    NASA Astrophysics Data System (ADS)

    Williams, D. N.

    2016-12-01

    The Earth System Grid Federation (ESGF) is primarily funded by the Department of Energy's (DOE's) Office of Science (the Office of Biological and Environmental Research [BER] Climate Data Informatics Program and the Office of Advanced Scientific Computing Research Next Generation Network for Science Program), the National Oceanic and Atmospheric Administration (NOAA), the National Aeronautics and Space Administration (NASA), and the National Science Foundation (NSF), the European Infrastructure for the European Network for Earth System Modeling (IS-ENES), and the Australian National University (ANU). Support also comes from other U.S. federal and international agencies. The federation works across multiple worldwide data centers and spans seven international network organizations to provide users with the ability to access, analyze, and visualize data using a globally federated collection of networks, computers, and software. Its architecture employs a series of geographically distributed peer nodes that are independently administered and united by common federation protocols and application programming interfaces (APIs). The full ESGF infrastructure has now been adopted by multiple Earth science projects and allows access to petabytes of geophysical data, including the Coupled Model Intercomparison Project (CMIP; output used by the Intergovernmental Panel on Climate Change assessment reports), multiple model intercomparison projects (MIPs; endorsed by the World Climate Research Programme [WCRP]), and the Accelerated Climate Modeling for Energy (ACME; ESGF is included in the overarching ACME workflow process to store model output). ESGF is a successful example of integration of disparate open-source technologies into a cohesive functional system that serves the needs the global climate science community. Data served by ESGF includes not only model output but also observational data from satellites and instruments, reanalysis, and generated images.

  13. The Neurona at Home project: Simulating a large-scale cellular automata brain in a distributed computing environment

    NASA Astrophysics Data System (ADS)

    Acedo, L.; Villanueva-Oller, J.; Moraño, J. A.; Villanueva, R.-J.

    2013-01-01

    The Berkeley Open Infrastructure for Network Computing (BOINC) has become the standard open source solution for grid computing in the Internet. Volunteers use their computers to complete an small part of the task assigned by a dedicated server. We have developed a BOINC project called Neurona@Home whose objective is to simulate a cellular automata random network with, at least, one million neurons. We consider a cellular automata version of the integrate-and-fire model in which excitatory and inhibitory nodes can activate or deactivate neighbor nodes according to a set of probabilistic rules. Our aim is to determine the phase diagram of the model and its behaviour and to compare it with the electroencephalographic signals measured in real brains.

  14. Network and computing infrastructure for scientific applications in Georgia

    NASA Astrophysics Data System (ADS)

    Kvatadze, R.; Modebadze, Z.

    2016-09-01

    Status of network and computing infrastructure and available services for research and education community of Georgia are presented. Research and Educational Networking Association - GRENA provides the following network services: Internet connectivity, network services, cyber security, technical support, etc. Computing resources used by the research teams are located at GRENA and at major state universities. GE-01-GRENA site is included in European Grid infrastructure. Paper also contains information about programs of Learning Center and research and development projects in which GRENA is participating.

  15. Intelligent Agents for the Digital Battlefield

    DTIC Science & Technology

    1998-11-01

    specific outcome of our long term research will be the development of a collaborative agent technology system, CATS , that will provide the underlying...software infrastructure needed to build large, heterogeneous, distributed agent applications. CATS will provide a software environment through which multiple...intelligent agents may interact with other agents, both human and computational. In addition, CATS will contain a number of intelligent agent components that will be useful for a wide variety of applications.

  16. Mock Data Challenge for the MPD/NICA Experiment on the HybriLIT Cluster

    NASA Astrophysics Data System (ADS)

    Gertsenberger, Konstantin; Rogachevsky, Oleg

    2018-02-01

    Simulation of data processing before receiving first experimental data is an important issue in high-energy physics experiments. This article presents the current Event Data Model and the Mock Data Challenge for the MPD experiment at the NICA accelerator complex which uses ongoing simulation studies to exercise in a stress-testing the distributed computing infrastructure and experiment software in the full production environment from simulated data through the physical analysis.

  17. Do Clouds Compute? A Framework for Estimating the Value of Cloud Computing

    NASA Astrophysics Data System (ADS)

    Klems, Markus; Nimis, Jens; Tai, Stefan

    On-demand provisioning of scalable and reliable compute services, along with a cost model that charges consumers based on actual service usage, has been an objective in distributed computing research and industry for a while. Cloud Computing promises to deliver on this objective: consumers are able to rent infrastructure in the Cloud as needed, deploy applications and store data, and access them via Web protocols on a pay-per-use basis. The acceptance of Cloud Computing, however, depends on the ability for Cloud Computing providers and consumers to implement a model for business value co-creation. Therefore, a systematic approach to measure costs and benefits of Cloud Computing is needed. In this paper, we discuss the need for valuation of Cloud Computing, identify key components, and structure these components in a framework. The framework assists decision makers in estimating Cloud Computing costs and to compare these costs to conventional IT solutions. We demonstrate by means of representative use cases how our framework can be applied to real world scenarios.

  18. Applications integration in a hybrid cloud computing environment: modelling and platform

    NASA Astrophysics Data System (ADS)

    Li, Qing; Wang, Ze-yuan; Li, Wei-hua; Li, Jun; Wang, Cheng; Du, Rui-yang

    2013-08-01

    With the development of application services providers and cloud computing, more and more small- and medium-sized business enterprises use software services and even infrastructure services provided by professional information service companies to replace all or part of their information systems (ISs). These information service companies provide applications, such as data storage, computing processes, document sharing and even management information system services as public resources to support the business process management of their customers. However, no cloud computing service vendor can satisfy the full functional IS requirements of an enterprise. As a result, enterprises often have to simultaneously use systems distributed in different clouds and their intra enterprise ISs. Thus, this article presents a framework to integrate applications deployed in public clouds and intra ISs. A run-time platform is developed and a cross-computing environment process modelling technique is also developed to improve the feasibility of ISs under hybrid cloud computing environments.

  19. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE PAGES

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less

  20. Optimization of tomographic reconstruction workflows on geographically distributed resources

    PubMed Central

    Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149

  1. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less

  2. The EGI-Engage EPOS Competence Center - Interoperating heterogeneous AAI mechanisms and Orchestrating distributed computational resources

    NASA Astrophysics Data System (ADS)

    Bailo, Daniele; Scardaci, Diego; Spinuso, Alessandro; Sterzel, Mariusz; Schwichtenberg, Horst; Gemuend, Andre

    2016-04-01

    The mission of EGI-Engage project [1] is to accelerate the implementation of the Open Science Commons vision, where researchers from all disciplines have easy and open access to the innovative digital services, data, knowledge and expertise they need for collaborative and excellent research. The Open Science Commons is grounded on three pillars: the e-Infrastructure Commons, an ecosystem of services that constitute the foundation layer of distributed infrastructures; the Open Data Commons, where observations, results and applications are increasingly available for scientific research and for anyone to use and reuse; and the Knowledge Commons, in which communities have shared ownership of knowledge, participate in the co-development of software and are technically supported to exploit state-of-the-art digital services. To develop the Knowledge Commons, EGI-Engage is supporting the work of a set of community-specific Competence Centres, with participants from user communities (scientific institutes), National Grid Initiatives (NGIs), technology and service providers. Competence Centres collect and analyse requirements, integrate community-specific applications into state-of-the-art services, foster interoperability across e-Infrastructures, and evolve services through a user-centric development model. One of these Competence Centres is focussed on the European Plate Observing System (EPOS) [2] as representative of the solid earth science communities. EPOS is a pan-European long-term plan to integrate data, software and services from the distributed (and already existing) Research Infrastructures all over Europe, in the domain of the solid earth science. EPOS will enable innovative multidisciplinary research for a better understanding of the Earth's physical and chemical processes that control earthquakes, volcanic eruptions, ground instability and tsunami as well as the processes driving tectonics and Earth's surface dynamics. EPOS will improve our ability to better manage the use of the subsurface of the Earth. EPOS started its Implementation Phase in October 2015 and is now actively working in order to integrate multidisciplinary data into a single e-infrastructure. Multidisciplinary data are organized and governed by the Thematic Core Services (TCS) - European wide organizations and e-Infrastructure providing community specific data and data products - and are driven by various scientific communities encompassing a wide spectrum of Earth science disciplines. TCS data, data products and services will be integrated into the Integrated Core Services (ICS) system, that will ensure their interoperability and access to these services by the scientific community as well as other users within the society. The EPOS competence center (EPOS CC) goal is to tackle two of the main challenges that the ICS are going to face in the near future, by taking advantage of the technical solutions provided by EGI. In order to do this, we will present the two pilot use cases the EGI-EPOS CC is developing: 1) The AAI pilot, dealing with the provision of transparent and homogeneous access to the ICS infrastructure to users owning different kind of credentials (e.g. eduGain, OpenID Connect, X509 certificates etc.). Here the focus is on the mechanisms which allow the credential delegation. 2) The computational pilot, Improve the back-end services of an existing application in the field of Computational Seismology, developed in the context of the EC funded project VERCE. The application allows the processing and the comparison of data resulting from the simulation of seismic wave propagation following a real earthquake and real measurements recorded by seismographs. While the simulation data is produced directly by the users and stored in a Data Management System, the observations need to be pre-staged from institutional data-services, which are maintained by the community itself. This use case aims at exploiting the EGI FedCloud e-infrastructure for Data Intensive analysis and also explores possible interaction with other Common Data Infrastructure initiatives as EUDAT. In the presentation, the state of the art of the two use cases, together with the open challenges and the future application will be discussed. Also, possible integration of EGI solutions with EPOS and other e-infrastructure providers will be considered. [1] EGI-ENGAGE https://www.egi.eu/about/egi-engage/ [2] EPOS http://www.epos-eu.org/

  3. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less

  4. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

    DOE PAGES

    Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

    2018-03-19

    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less

  5. A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth

    2005-03-15

    The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less

  6. A network-based distributed, media-rich computing and information environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Phillips, R.L.

    1995-12-31

    Sunrise is a Los Alamos National Laboratory (LANL) project started in October 1993. It is intended to be a prototype National Information Infrastructure development project. A main focus of Sunrise is to tie together enabling technologies (networking, object-oriented distributed computing, graphical interfaces, security, multi-media technologies, and data-mining technologies) with several specific applications. A diverse set of application areas was chosen to ensure that the solutions developed in the project are as generic as possible. Some of the application areas are materials modeling, medical records and image analysis, transportation simulations, and K-12 education. This paper provides a description of Sunrise andmore » a view of the architecture and objectives of this evolving project. The primary objectives of Sunrise are three-fold: (1) To develop common information-enabling tools for advanced scientific research and its applications to industry; (2) To enhance the capabilities of important research programs at the Laboratory; (3) To define a new way of collaboration between computer science and industrially-relevant research.« less

  7. Modeling the Cloud to Enhance Capabilities for Crises and Catastrophe Management

    DTIC Science & Technology

    2016-11-16

    order for cloud computing infrastructures to be successfully deployed in real world scenarios as tools for crisis and catastrophe management, where...Statement of the Problem Studied As cloud computing becomes the dominant computational infrastructure[1] and cloud technologies make a transition to hosting...1. Formulate rigorous mathematical models representing technological capabilities and resources in cloud computing for performance modeling and

  8. Infrastructure Systems for Advanced Computing in E-science applications

    NASA Astrophysics Data System (ADS)

    Terzo, Olivier

    2013-04-01

    In the e-science field are growing needs for having computing infrastructure more dynamic and customizable with a model of use "on demand" that follow the exact request in term of resources and storage capacities. The integration of grid and cloud infrastructure solutions allows us to offer services that can adapt the availability in terms of up scaling and downscaling resources. The main challenges for e-sciences domains will on implement infrastructure solutions for scientific computing that allow to adapt dynamically the demands of computing resources with a strong emphasis on optimizing the use of computing resources for reducing costs of investments. Instrumentation, data volumes, algorithms, analysis contribute to increase the complexity for applications who require high processing power and storage for a limited time and often exceeds the computational resources that equip the majority of laboratories, research Unit in an organization. Very often it is necessary to adapt or even tweak rethink tools, algorithms, and consolidate existing applications through a phase of reverse engineering in order to adapt them to a deployment on Cloud infrastructure. For example, in areas such as rainfall monitoring, meteorological analysis, Hydrometeorology, Climatology Bioinformatics Next Generation Sequencing, Computational Electromagnetic, Radio occultation, the complexity of the analysis raises several issues such as the processing time, the scheduling of tasks of processing, storage of results, a multi users environment. For these reasons, it is necessary to rethink the writing model of E-Science applications in order to be already adapted to exploit the potentiality of cloud computing services through the uses of IaaS, PaaS and SaaS layer. An other important focus is on create/use hybrid infrastructure typically a federation between Private and public cloud, in fact in this way when all resources owned by the organization are all used it will be easy with a federate cloud infrastructure to add some additional resources form the Public cloud for following the needs in term of computational and storage resources and release them where process are finished. Following the hybrid model, the scheduling approach is important for managing both cloud models. Thanks to this model infrastructure every time resources are available for additional request in term of IT capacities that can used "on demand" for a limited time without having to proceed to purchase additional servers.

  9. NIF ICCS network design and loading analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tietbohl, G; Bryant, R

    The National Ignition Facility (NIF) is housed within a large facility about the size of two football fields. The Integrated Computer Control System (ICCS) is distributed throughout this facility and requires the integration of about 40,000 control points and over 500 video sources. This integration is provided by approximately 700 control computers distributed throughout the NIF facility and a network that provides the communication infrastructure. A main control room houses a set of seven computer consoles providing operator access and control of the various distributed front-end processors (FEPs). There are also remote workstations distributed within the facility that allow providemore » operator console functions while personnel are testing and troubleshooting throughout the facility. The operator workstations communicate with the FEPs which implement the localized control and monitoring functions. There are different types of FEPs for the various subsystems being controlled. This report describes the design of the NIF ICCS network and how it meets the traffic loads that will are expected and the requirements of the Sub-System Design Requirements (SSDR's). This document supersedes the earlier reports entitled Analysis of the National Ignition Facility Network, dated November 6, 1996 and The National Ignition Facility Digital Video and Control Network, dated July 9, 1996. For an overview of the ICCS, refer to the document NIF Integrated Computer Controls System Description (NIF-3738).« less

  10. Cloud Infrastructure & Applications - CloudIA

    NASA Astrophysics Data System (ADS)

    Sulistio, Anthony; Reich, Christoph; Doelitzscher, Frank

    The idea behind Cloud Computing is to deliver Infrastructure-as-a-Services and Software-as-a-Service over the Internet on an easy pay-per-use business model. To harness the potentials of Cloud Computing for e-Learning and research purposes, and to small- and medium-sized enterprises, the Hochschule Furtwangen University establishes a new project, called Cloud Infrastructure & Applications (CloudIA). The CloudIA project is a market-oriented cloud infrastructure that leverages different virtualization technologies, by supporting Service-Level Agreements for various service offerings. This paper describes the CloudIA project in details and mentions our early experiences in building a private cloud using an existing infrastructure.

  11. CloudMan as a platform for tool, data, and analysis distribution.

    PubMed

    Afgan, Enis; Chapman, Brad; Taylor, James

    2012-11-27

    Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.

  12. Overview of ATLAS PanDA Workload Management

    NASA Astrophysics Data System (ADS)

    Maeno, T.; De, K.; Wenaus, T.; Nilsson, P.; Stewart, G. A.; Walker, R.; Stradling, A.; Caballero, J.; Potekhin, M.; Smith, D.; ATLAS Collaboration

    2011-12-01

    The Production and Distributed Analysis System (PanDA) plays a key role in the ATLAS distributed computing infrastructure. All ATLAS Monte-Carlo simulation and data reprocessing jobs pass through the PanDA system. We will describe how PanDA manages job execution on the grid using dynamic resource estimation and data replication together with intelligent brokerage in order to meet the scaling and automation requirements of ATLAS distributed computing. PanDA is also the primary ATLAS system for processing user and group analysis jobs, bringing further requirements for quick, flexible adaptation to the rapidly evolving analysis use cases of the early datataking phase, in addition to the high reliability, robustness and usability needed to provide efficient and transparent utilization of the grid for analysis users. We will describe how PanDA meets ATLAS requirements, the evolution of the system in light of operational experience, how the system has performed during the first LHC data-taking phase and plans for the future.

  13. Overview of ATLAS PanDA Workload Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maeno T.; De K.; Wenaus T.

    2011-01-01

    The Production and Distributed Analysis System (PanDA) plays a key role in the ATLAS distributed computing infrastructure. All ATLAS Monte-Carlo simulation and data reprocessing jobs pass through the PanDA system. We will describe how PanDA manages job execution on the grid using dynamic resource estimation and data replication together with intelligent brokerage in order to meet the scaling and automation requirements of ATLAS distributed computing. PanDA is also the primary ATLAS system for processing user and group analysis jobs, bringing further requirements for quick, flexible adaptation to the rapidly evolving analysis use cases of the early datataking phase, in additionmore » to the high reliability, robustness and usability needed to provide efficient and transparent utilization of the grid for analysis users. We will describe how PanDA meets ATLAS requirements, the evolution of the system in light of operational experience, how the system has performed during the first LHC data-taking phase and plans for the future.« less

  14. Cloud access to interoperable IVOA-compliant VOSpace storage

    NASA Astrophysics Data System (ADS)

    Bertocco, S.; Dowler, P.; Gaudet, S.; Major, B.; Pasian, F.; Taffoni, G.

    2018-07-01

    Handling, processing and archiving the huge amount of data produced by the new generation of experiments and instruments in Astronomy and Astrophysics are among the more exciting challenges to address in designing the future data management infrastructures and computing services. We investigated the feasibility of a data management and computation infrastructure, available world-wide, with the aim of merging the FAIR data management provided by IVOA standards with the efficiency and reliability of a cloud approach. Our work involved the Canadian Advanced Network for Astronomy Research (CANFAR) infrastructure and the European EGI federated cloud (EFC). We designed and deployed a pilot data management and computation infrastructure that provides IVOA-compliant VOSpace storage resources and wide access to interoperable federated clouds. In this paper, we detail the main user requirements covered, the technical choices and the implemented solutions and we describe the resulting Hybrid cloud Worldwide infrastructure, its benefits and limitations.

  15. [Development of a secure and cost-effective infrastructure for the access of arbitrary web-based image distribution systems].

    PubMed

    Hackländer, T; Kleber, K; Schneider, H; Demabre, N; Cramer, B M

    2004-08-01

    To build an infrastructure that enables radiologists on-call and external users a teleradiological access to the HTML-based image distribution system inside the hospital via internet. In addition, no investment costs should arise on the user side and the image data should be sent renamed using cryptographic techniques. A pure HTML-based system manages the image distribution inside the hospital, with an open source project extending this system through a secure gateway outside the firewall of the hospital. The gateway handles the communication between the external users and the HTML server within the network of the hospital. A second firewall is installed between the gateway and the external users and builds up a virtual private network (VPN). A connection between the gateway and the external user is only acknowledged if the computers involved authenticate each other via certificates and the external users authenticate via a multi-stage password system. All data are transferred encrypted. External users get only access to images that have been renamed to a pseudonym by means of automated processing before. With an ADSL internet access, external users achieve an image load frequency of 0.4 CT images per second. More than 90 % of the delay during image transfer results from security checks within the firewalls. Data passing the gateway induce no measurable delay. Project goals were realized by means of an infrastructure that works vendor independently with any HTML-based image distribution systems. The requirements of data security were realized using state-of-the-art web techniques. Adequate access and transfer speed lead to a widespread acceptance of the system on the part of external users.

  16. A cyber infrastructure for the SKA Telescope Manager

    NASA Astrophysics Data System (ADS)

    Barbosa, Domingos; Barraca, João. P.; Carvalho, Bruno; Maia, Dalmiro; Gupta, Yashwant; Natarajan, Swaminathan; Le Roux, Gerhard; Swart, Paul

    2016-07-01

    The Square Kilometre Array Telescope Manager (SKA TM) will be responsible for assisting the SKA Operations and Observation Management, carrying out System diagnosis and collecting Monitoring and Control data from the SKA subsystems and components. To provide adequate compute resources, scalability, operation continuity and high availability, as well as strict Quality of Service, the TM cyber-infrastructure (embodied in the Local Infrastructure - LINFRA) consists of COTS hardware and infrastructural software (for example: server monitoring software, host operating system, virtualization software, device firmware), providing a specially tailored Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solution. The TM infrastructure provides services in the form of computational power, software defined networking, power, storage abstractions, and high level, state of the art IaaS and PaaS management interfaces. This cyber platform will be tailored to each of the two SKA Phase 1 telescopes (SKA_MID in South Africa and SKA_LOW in Australia) instances, each presenting different computational and storage infrastructures and conditioned by location. This cyber platform will provide a compute model enabling TM to manage the deployment and execution of its multiple components (observation scheduler, proposal submission tools, MandC components, Forensic tools and several Databases, etc). In this sense, the TM LINFRA is primarily focused towards the provision of isolated instances, mostly resorting to virtualization technologies, while defaulting to bare hardware if specifically required due to performance, security, availability, or other requirement.

  17. The High-Performance Computing and Communications program, the national information infrastructure and health care.

    PubMed Central

    Lindberg, D A; Humphreys, B L

    1995-01-01

    The High-Performance Computing and Communications (HPCC) program is a multiagency federal effort to advance the state of computing and communications and to provide the technologic platform on which the National Information Infrastructure (NII) can be built. The HPCC program supports the development of high-speed computers, high-speed telecommunications, related software and algorithms, education and training, and information infrastructure technology and applications. The vision of the NII is to extend access to high-performance computing and communications to virtually every U.S. citizen so that the technology can be used to improve the civil infrastructure, lifelong learning, energy management, health care, etc. Development of the NII will require resolution of complex economic and social issues, including information privacy. Health-related applications supported under the HPCC program and NII initiatives include connection of health care institutions to the Internet; enhanced access to gene sequence data; the "Visible Human" Project; and test-bed projects in telemedicine, electronic patient records, shared informatics tool development, and image systems. PMID:7614116

  18. 76 FR 20045 - The Ubs Group, a Division Of Ubs Ag, Also Known as Ubs Financial Services, Inc. and/or Ubs-Glb...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-11

    ... Known as Brinson Partners, Inc., Corporate Center Division; Group Technology Infrastructure Services... Division, Group Technology Infrastructure Services, Distributed Systems and Storage Group, Chicago... Infrastructure Services, Distributed Systems and Storage Group have their wages reported under a separate...

  19. Investigation into Cloud Computing for More Robust Automated Bulk Image Geoprocessing

    NASA Technical Reports Server (NTRS)

    Brown, Richard B.; Smoot, James C.; Underwood, Lauren; Armstrong, C. Duane

    2012-01-01

    Geospatial resource assessments frequently require timely geospatial data processing that involves large multivariate remote sensing data sets. In particular, for disasters, response requires rapid access to large data volumes, substantial storage space and high performance processing capability. The processing and distribution of this data into usable information products requires a processing pipeline that can efficiently manage the required storage, computing utilities, and data handling requirements. In recent years, with the availability of cloud computing technology, cloud processing platforms have made available a powerful new computing infrastructure resource that can meet this need. To assess the utility of this resource, this project investigates cloud computing platforms for bulk, automated geoprocessing capabilities with respect to data handling and application development requirements. This presentation is of work being conducted by Applied Sciences Program Office at NASA-Stennis Space Center. A prototypical set of image manipulation and transformation processes that incorporate sample Unmanned Airborne System data were developed to create value-added products and tested for implementation on the "cloud". This project outlines the steps involved in creating and testing of open source software developed process code on a local prototype platform, and then transitioning this code with associated environment requirements into an analogous, but memory and processor enhanced cloud platform. A data processing cloud was used to store both standard digital camera panchromatic and multi-band image data, which were subsequently subjected to standard image processing functions such as NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), band stacking, reprojection, and other similar type data processes. Cloud infrastructure service providers were evaluated by taking these locally tested processing functions, and then applying them to a given cloud-enabled infrastructure to assesses and compare environment setup options and enabled technologies. This project reviews findings that were observed when cloud platforms were evaluated for bulk geoprocessing capabilities based on data handling and application development requirements.

  20. Evaluation of a grid based molecular dynamics approach for polypeptide simulations.

    PubMed

    Merelli, Ivan; Morra, Giulia; Milanesi, Luciano

    2007-09-01

    Molecular dynamics is very important for biomedical research because it makes possible simulation of the behavior of a biological macromolecule in silico. However, molecular dynamics is computationally rather expensive: the simulation of some nanoseconds of dynamics for a large macromolecule such as a protein takes very long time, due to the high number of operations that are needed for solving the Newton's equations in the case of a system of thousands of atoms. In order to obtain biologically significant data, it is desirable to use high-performance computation resources to perform these simulations. Recently, a distributed computing approach based on replacing a single long simulation with many independent short trajectories has been introduced, which in many cases provides valuable results. This study concerns the development of an infrastructure to run molecular dynamics simulations on a grid platform in a distributed way. The implemented software allows the parallel submission of different simulations that are singularly short but together bring important biological information. Moreover, each simulation is divided into a chain of jobs to avoid data loss in case of system failure and to contain the dimension of each data transfer from the grid. The results confirm that the distributed approach on grid computing is particularly suitable for molecular dynamics simulations thanks to the elevated scalability.

  1. Future Naval Use of COTS Networking Infrastructure

    DTIC Science & Technology

    2009-07-01

    user to benefit from Google’s vast databases and computational resources. Obviously, the ability to harness the full power of the Cloud could be... Computing Impact Findings Action Items Take-Aways Appendices: Pages 54-68 A. Terms of Reference Document B. Sample Definitions of Cloud ...and definition of Cloud Computing . While Cloud Computing is developing in many variations – including Infrastructure as a Service (IaaS), Platform as

  2. The Climate-G Portal: a Grid Enabled Scientifc Gateway for Climate Change

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; Negro, Alessandro; Aloisio, Giovanni

    2010-05-01

    Grid portals are web gateways aiming at concealing the underlying infrastructure through a pervasive, transparent, user-friendly, ubiquitous and seamless access to heterogeneous and geographical spread resources (i.e. storage, computational facilities, services, sensors, network, databases). Definitively they provide an enhanced problem-solving environment able to deal with modern, large scale scientific and engineering problems. Scientific gateways are able to introduce a revolution in the way scientists and researchers organize and carry out their activities. Access to distributed resources, complex workflow capabilities, and community-oriented functionalities are just some of the features that can be provided by such a web-based environment. In the context of the EGEE NA4 Earth Science Cluster, Climate-G is a distributed testbed focusing on climate change research topics. The Euro-Mediterranean Center for Climate Change (CMCC) is actively participating in the testbed providing the scientific gateway (Climate-G Portal) to access to the entire infrastructure. The Climate-G Portal has to face important and critical challenges as well as has to satisfy and address key requirements. In the following, the most relevant ones are presented and discussed. Transparency: the portal has to provide a transparent access to the underlying infrastructure preventing users from dealing with low level details and the complexity of a distributed grid environment. Security: users must be authenticated and authorized on the portal to access and exploit portal functionalities. A wide set of roles is needed to clearly assign the proper one to each user. The access to the computational grid must be completely secured, since the target infrastructure to run jobs is a production grid environment. A security infrastructure (based on X509v3 digital certificates) is strongly needed. Pervasivity and ubiquity: the access to the system must be pervasive and ubiquitous. This is easily true due to the nature of the needed web approach. Usability and simplicity: the portal has to provide simple, high level and user friendly interfaces to ease the access and exploitation of the entire system. Coexistence of general purpose and domain oriented services: along with general purpose services (file transfer, job submission, etc.), the portal has to provide domain based services and functionalities. Subsetting of data, visualization of 2D maps around a virtual globe, delivery of maps through OGC compliant interfaces (i.e. Web Map Service - WMS) are just some examples. Since april 2009, about 70 users (85% coming from the climate change community) got access to the portal. A key challenge of this work is the idea to provide users with an integrated working environment, that is a place where scientists can find huge amount of data, complete metadata support, a wide set of data access services, data visualization and analysis tools, easy access to the underlying grid infrastructure and advanced monitoring interfaces.

  3. Evolution of user analysis on the grid in ATLAS

    NASA Astrophysics Data System (ADS)

    Dewhurst, A.; Legger, F.; ATLAS Collaboration

    2017-10-01

    More than one thousand physicists analyse data collected by the ATLAS experiment at the Large Hadron Collider (LHC) at CERN through 150 computing facilities around the world. Efficient distributed analysis requires optimal resource usage and the interplay of several factors: robust grid and software infrastructures, and system capability to adapt to different workloads. The continuous automatic validation of grid sites and the user support provided by a dedicated team of expert shifters have been proven to provide a solid distributed analysis system for ATLAS users. Typical user workflows on the grid, and their associated metrics, are discussed. Measurements of user job performance and typical requirements are also shown.

  4. PD2P: PanDA Dynamic Data Placement for ATLAS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maeno, T.; De, K.; Panitkin, S.

    2012-12-13

    The PanDA (Production and Distributed Analysis) system plays a key role in the ATLAS distributed computing infrastructure. PanDA is the ATLAS workload management system for processing all Monte-Carlo (MC) simulation and data reprocessing jobs in addition to user and group analysis jobs. The PanDA Dynamic Data Placement (PD2P) system has been developed to cope with difficulties of data placement for ATLAS. We will describe the design of the new system, its performance during the past year of data taking, dramatic improvements it has brought about in the efficient use of storage and processing resources, and plans for the future.

  5. Aspects, Wrappers and Events

    NASA Technical Reports Server (NTRS)

    Filman, Robert E.

    2003-01-01

    This viewgraph presentation provides information on Object Infrastructure Framework (OIF), an Aspect-Oriented Programming (AOP) system. The presentation begins with an introduction to the difficulties and requirements of distributed computing, including functional and non-functional requirements (ilities). The architecture of Distributed Object Technology includes stubs, proxies for implementation objects, and skeletons, proxies for client applications. The key OIF ideas (injecting behavior, annotated communications, thread contexts, and pragma) are discussed. OIF is an AOP mechanism; AOP is centered on: 1) Separate expression of crosscutting concerns; 2) Mechanisms to weave the separate expressions into a unified system. AOP is software engineering technology for separately expressing systematic properties while nevertheless producing running systems that embody these properties.

  6. Modeling and Managing Risk in Billing Infrastructures

    NASA Astrophysics Data System (ADS)

    Baiardi, Fabrizio; Telmon, Claudio; Sgandurra, Daniele

    This paper discusses risk modeling and risk management in information and communications technology (ICT) systems for which the attack impact distribution is heavy tailed (e.g., power law distribution) and the average risk is unbounded. Systems with these properties include billing infrastructures used to charge customers for services they access. Attacks against billing infrastructures can be classified as peripheral attacks and backbone attacks. The goal of a peripheral attack is to tamper with user bills; a backbone attack seeks to seize control of the billing infrastructure. The probability distribution of the overall impact of an attack on a billing infrastructure also has a heavy-tailed curve. This implies that the probability of a massive impact cannot be ignored and that the average impact may be unbounded - thus, even the most expensive countermeasures would be cost effective. Consequently, the only strategy for managing risk is to increase the resilience of the infrastructure by employing redundant components.

  7. Message Efficient Checkpointing and Rollback Recovery in Heterogeneous Mobile Networks

    NASA Astrophysics Data System (ADS)

    Jaggi, Parmeet Kaur; Singh, Awadhesh Kumar

    2016-06-01

    Heterogeneous networks provide an appealing way of expanding the computing capability of mobile networks by combining infrastructure-less mobile ad-hoc networks with the infrastructure-based cellular mobile networks. The nodes in such a network range from low-power nodes to macro base stations and thus, vary greatly in their capabilities such as computation power and battery power. The nodes are susceptible to different types of transient and permanent failures and therefore, the algorithms designed for such networks need to be fault-tolerant. The article presents a checkpointing algorithm for the rollback recovery of mobile hosts in a heterogeneous mobile network. Checkpointing is a well established approach to provide fault tolerance in static and cellular mobile distributed systems. However, the use of checkpointing for fault tolerance in a heterogeneous environment remains to be explored. The proposed protocol is based on the results of zigzag paths and zigzag cycles by Netzer-Xu. Considering the heterogeneity prevalent in the network, an uncoordinated checkpointing technique is employed. Yet, useless checkpoints are avoided without causing a high message overhead.

  8. Optimizing Mexico’s Water Distribution Services

    DTIC Science & Technology

    2011-10-28

    government pursued a decentralization policy in the water distribution infrastructure sector.5 This is evident in Article 115 of the Mexican Constitution ...infrastructure, monitoring water 5 Ibid, 47. 6 Mexican Constitution . http://www.oas.org/juridico...54 Apogee Research International, Ltd., Innovative Financing of Water and Wastewater Infrastructure in the NAFTA Partners: A Focus on

  9. Application of parallel distributed Lagrange multiplier technique to simulate coupled Fluid-Granular flows in pipes with varying Cross-Sectional area

    DOE PAGES

    Kanarska, Yuliya; Walton, Otis

    2015-11-30

    Fluid-granular flows are common phenomena in nature and industry. Here, an efficient computational technique based on the distributed Lagrange multiplier method is utilized to simulate complex fluid-granular flows. Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. The particle–particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEMmore » method with some modifications using the volume of an overlapping region as an input to the contact forces. Here, a parallel implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library.« less

  10. Diamond Eye: a distributed architecture for image data mining

    NASA Astrophysics Data System (ADS)

    Burl, Michael C.; Fowlkes, Charless; Roden, Joe; Stechert, Andre; Mukhtar, Saleem

    1999-02-01

    Diamond Eye is a distributed software architecture, which enables users (scientists) to analyze large image collections by interacting with one or more custom data mining servers via a Java applet interface. Each server is coupled with an object-oriented database and a computational engine, such as a network of high-performance workstations. The database provides persistent storage and supports querying of the 'mined' information. The computational engine provides parallel execution of expensive image processing, object recognition, and query-by-content operations. Key benefits of the Diamond Eye architecture are: (1) the design promotes trial evaluation of advanced data mining and machine learning techniques by potential new users (all that is required is to point a web browser to the appropriate URL), (2) software infrastructure that is common across a range of science mining applications is factored out and reused, and (3) the system facilitates closer collaborations between algorithm developers and domain experts.

  11. Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem

    NASA Astrophysics Data System (ADS)

    Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.

    2015-12-01

    Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.

  12. A heuristic for efficient data distribution management in distributed simulation

    NASA Astrophysics Data System (ADS)

    Gupta, Pankaj; Guha, Ratan K.

    2005-05-01

    In this paper, we propose an algorithm for reducing the complexity of region matching and efficient multicasting in data distribution management component of High Level Architecture (HLA) Run Time Infrastructure (RTI). The current data distribution management (DDM) techniques rely on computing the intersection between the subscription and update regions. When a subscription region and an update region of different federates overlap, RTI establishes communication between the publisher and the subscriber. It subsequently routes the updates from the publisher to the subscriber. The proposed algorithm computes the update/subscription regions matching for dynamic allocation of multicast group. It provides new multicast routines that exploit the connectivity of federation by communicating updates regarding interactions and routes information only to those federates that require them. The region-matching problem in DDM reduces to clique-covering problem using the connections graph abstraction where the federations represent the vertices and the update/subscribe relations represent the edges. We develop an abstract model based on connection graph for data distribution management. Using this abstract model, we propose a heuristic for solving the region-matching problem of DDM. We also provide complexity analysis of the proposed heuristics.

  13. The Virtual Geophysics Laboratory (VGL): Scientific Workflows Operating Across Organizations and Across Infrastructures

    NASA Astrophysics Data System (ADS)

    Cox, S. J.; Wyborn, L. A.; Fraser, R.; Rankine, T.; Woodcock, R.; Vote, J.; Evans, B.

    2012-12-01

    The Virtual Geophysics Laboratory (VGL) is web portal that provides geoscientists with an integrated online environment that: seamlessly accesses geophysical and geoscience data services from the AuScope national geoscience information infrastructure; loosely couples these data to a variety of gesocience software tools; and provides large scale processing facilities via cloud computing. VGL is a collaboration between CSIRO, Geoscience Australia, National Computational Infrastructure, Monash University, Australian National University and the University of Queensland. The VGL provides a distributed system whereby a user can enter an online virtual laboratory to seamlessly connect to OGC web services for geoscience data. The data is supplied in open standards formats using international standards like GeoSciML. A VGL user uses a web mapping interface to discover and filter the data sources using spatial and attribute filters to define a subset. Once the data is selected the user is not required to download the data. VGL collates the service query information for later in the processing workflow where it will be staged directly to the computing facilities. The combination of deferring data download and access to Cloud computing enables VGL users to access their data at higher resolutions and to undertake larger scale inversions, more complex models and simulations than their own local computing facilities might allow. Inside the Virtual Geophysics Laboratory, the user has access to a library of existing models, complete with exemplar workflows for specific scientific problems based on those models. For example, the user can load a geological model published by Geoscience Australia, apply a basic deformation workflow provided by a CSIRO scientist, and have it run in a scientific code from Monash. Finally the user can publish these results to share with a colleague or cite in a paper. This opens new opportunities for access and collaboration as all the resources (models, code, data, processing) are shared in the one virtual laboratory. VGL provides end users with access to an intuitive, user-centered interface that leverages cloud storage and cloud and cluster processing from both the research communities and commercial suppliers (e.g. Amazon). As the underlying data and information services are agnostic of the scientific domain, they can support many other data types. This fundamental characteristic results in a highly reusable virtual laboratory infrastructure that could also be used for example natural hazards, satellite processing, soil geochemistry, climate modeling, agriculture crop modeling.

  14. Information Power Grid: Distributed High-Performance Computing and Large-Scale Data Management for Science and Engineering

    NASA Technical Reports Server (NTRS)

    Johnston, William E.; Gannon, Dennis; Nitzberg, Bill; Feiereisen, William (Technical Monitor)

    2000-01-01

    The term "Grid" refers to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. The vision for NASN's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks that will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: The scientist / design engineer whose primary interest is problem solving (e.g., determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user if the tool designer: The computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. This paper describes the current state of IPG (the operational testbed), the set of capabilities being put into place for the operational prototype IPG, as well as some of the longer term R&D tasks.

  15. Toward a web-based real-time radiation treatment planning system in a cloud computing environment.

    PubMed

    Na, Yong Hum; Suh, Tae-Suk; Kapp, Daniel S; Xing, Lei

    2013-09-21

    To exploit the potential dosimetric advantages of intensity modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT), an in-depth approach is required to provide efficient computing methods. This needs to incorporate clinically related organ specific constraints, Monte Carlo (MC) dose calculations, and large-scale plan optimization. This paper describes our first steps toward a web-based real-time radiation treatment planning system in a cloud computing environment (CCE). The Amazon Elastic Compute Cloud (EC2) with a master node (named m2.xlarge containing 17.1 GB of memory, two virtual cores with 3.25 EC2 Compute Units each, 420 GB of instance storage, 64-bit platform) is used as the backbone of cloud computing for dose calculation and plan optimization. The master node is able to scale the workers on an 'on-demand' basis. MC dose calculation is employed to generate accurate beamlet dose kernels by parallel tasks. The intensity modulation optimization uses total-variation regularization (TVR) and generates piecewise constant fluence maps for each initial beam direction in a distributed manner over the CCE. The optimized fluence maps are segmented into deliverable apertures. The shape of each aperture is iteratively rectified to be a sequence of arcs using the manufacture's constraints. The output plan file from the EC2 is sent to the simple storage service. Three de-identified clinical cancer treatment plans have been studied for evaluating the performance of the new planning platform with 6 MV flattening filter free beams (40 × 40 cm(2)) from the Varian TrueBeam(TM) STx linear accelerator. A CCE leads to speed-ups of up to 14-fold for both dose kernel calculations and plan optimizations in the head and neck, lung, and prostate cancer cases considered in this study. The proposed system relies on a CCE that is able to provide an infrastructure for parallel and distributed computing. The resultant plans from the cloud computing are identical to PC-based IMRT and VMAT plans, confirming the reliability of the cloud computing platform. This cloud computing infrastructure has been established for a radiation treatment planning. It substantially improves the speed of inverse planning and makes future on-treatment adaptive re-planning possible.

  16. Toward a web-based real-time radiation treatment planning system in a cloud computing environment

    NASA Astrophysics Data System (ADS)

    Hum Na, Yong; Suh, Tae-Suk; Kapp, Daniel S.; Xing, Lei

    2013-09-01

    To exploit the potential dosimetric advantages of intensity modulated radiation therapy (IMRT) and volumetric modulated arc therapy (VMAT), an in-depth approach is required to provide efficient computing methods. This needs to incorporate clinically related organ specific constraints, Monte Carlo (MC) dose calculations, and large-scale plan optimization. This paper describes our first steps toward a web-based real-time radiation treatment planning system in a cloud computing environment (CCE). The Amazon Elastic Compute Cloud (EC2) with a master node (named m2.xlarge containing 17.1 GB of memory, two virtual cores with 3.25 EC2 Compute Units each, 420 GB of instance storage, 64-bit platform) is used as the backbone of cloud computing for dose calculation and plan optimization. The master node is able to scale the workers on an ‘on-demand’ basis. MC dose calculation is employed to generate accurate beamlet dose kernels by parallel tasks. The intensity modulation optimization uses total-variation regularization (TVR) and generates piecewise constant fluence maps for each initial beam direction in a distributed manner over the CCE. The optimized fluence maps are segmented into deliverable apertures. The shape of each aperture is iteratively rectified to be a sequence of arcs using the manufacture’s constraints. The output plan file from the EC2 is sent to the simple storage service. Three de-identified clinical cancer treatment plans have been studied for evaluating the performance of the new planning platform with 6 MV flattening filter free beams (40 × 40 cm2) from the Varian TrueBeamTM STx linear accelerator. A CCE leads to speed-ups of up to 14-fold for both dose kernel calculations and plan optimizations in the head and neck, lung, and prostate cancer cases considered in this study. The proposed system relies on a CCE that is able to provide an infrastructure for parallel and distributed computing. The resultant plans from the cloud computing are identical to PC-based IMRT and VMAT plans, confirming the reliability of the cloud computing platform. This cloud computing infrastructure has been established for a radiation treatment planning. It substantially improves the speed of inverse planning and makes future on-treatment adaptive re-planning possible.

  17. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

    PubMed Central

    Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

    2016-01-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418

  18. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

    PubMed

    Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

    2016-09-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.

  19. Application-Level Interoperability Across Grids and Clouds

    NASA Astrophysics Data System (ADS)

    Jha, Shantenu; Luckow, Andre; Merzky, Andre; Erdely, Miklos; Sehgal, Saurabh

    Application-level interoperability is defined as the ability of an application to utilize multiple distributed heterogeneous resources. Such interoperability is becoming increasingly important with increasing volumes of data, multiple sources of data as well as resource types. The primary aim of this chapter is to understand different ways in which application-level interoperability can be provided across distributed infrastructure. We achieve this by (i) using the canonical wordcount application, based on an enhanced version of MapReduce that scales-out across clusters, clouds, and HPC resources, (ii) establishing how SAGA enables the execution of wordcount application using MapReduce and other programming models such as Sphere concurrently, and (iii) demonstrating the scale-out of ensemble-based biomolecular simulations across multiple resources. We show user-level control of the relative placement of compute and data and also provide simple performance measures and analysis of SAGA-MapReduce when using multiple, different, heterogeneous infrastructures concurrently for the same problem instance. Finally, we discuss Azure and some of the system-level abstractions that it provides and show how it is used to support ensemble-based biomolecular simulations.

  20. Anomaly-based intrusion detection for SCADA systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, D.; Usynin, A.; Hines, J. W.

    2006-07-01

    Most critical infrastructure such as chemical processing plants, electrical generation and distribution networks, and gas distribution is monitored and controlled by Supervisory Control and Data Acquisition Systems (SCADA. These systems have been the focus of increased security and there are concerns that they could be the target of international terrorists. With the constantly growing number of internet related computer attacks, there is evidence that our critical infrastructure may also be vulnerable. Researchers estimate that malicious online actions may cause $75 billion at 2007. One of the interesting countermeasures for enhancing information system security is called intrusion detection. This paper willmore » briefly discuss the history of research in intrusion detection techniques and introduce the two basic detection approaches: signature detection and anomaly detection. Finally, it presents the application of techniques developed for monitoring critical process systems, such as nuclear power plants, to anomaly intrusion detection. The method uses an auto-associative kernel regression (AAKR) model coupled with the statistical probability ratio test (SPRT) and applied to a simulated SCADA system. The results show that these methods can be generally used to detect a variety of common attacks. (authors)« less

  1. Minimizing Overhead for Secure Computation and Fully Homomorphic Encryption: Overhead

    DTIC Science & Technology

    2015-11-01

    many inputs. We also improved our compiler infrastructure to handle very large circuits in a more scalable way. In Jan’13, we employed the AESNI and...Amazon’s elastic compute infrastructure , and is running under a Xen hypervisor. Since we do not have direct access to the bare metal, we cannot...creating novel opportunities for compressing au- thentication overhead. It is especially compelling that existing public key infrastructures can be used

  2. Architecture of a spatial data service system for statistical analysis and visualization of regional climate changes

    NASA Astrophysics Data System (ADS)

    Titov, A. G.; Okladnikov, I. G.; Gordov, E. P.

    2017-11-01

    The use of large geospatial datasets in climate change studies requires the development of a set of Spatial Data Infrastructure (SDI) elements, including geoprocessing and cartographical visualization web services. This paper presents the architecture of a geospatial OGC web service system as an integral part of a virtual research environment (VRE) general architecture for statistical processing and visualization of meteorological and climatic data. The architecture is a set of interconnected standalone SDI nodes with corresponding data storage systems. Each node runs a specialized software, such as a geoportal, cartographical web services (WMS/WFS), a metadata catalog, and a MySQL database of technical metadata describing geospatial datasets available for the node. It also contains geospatial data processing services (WPS) based on a modular computing backend realizing statistical processing functionality and, thus, providing analysis of large datasets with the results of visualization and export into files of standard formats (XML, binary, etc.). Some cartographical web services have been developed in a system’s prototype to provide capabilities to work with raster and vector geospatial data based on OGC web services. The distributed architecture presented allows easy addition of new nodes, computing and data storage systems, and provides a solid computational infrastructure for regional climate change studies based on modern Web and GIS technologies.

  3. An Overview of the Distributed Space Exploration Simulation (DSES) Project

    NASA Technical Reports Server (NTRS)

    Crues, Edwin Z.; Chung, Victoria I.; Blum, Michael G.; Bowman, James D.

    2007-01-01

    This paper describes the Distributed Space Exploration Simulation (DSES) Project, a research and development collaboration between NASA centers which investigates technologies, and processes related to integrated, distributed simulation of complex space systems in support of NASA's Exploration Initiative. In particular, it describes the three major components of DSES: network infrastructure, software infrastructure and simulation development. With regard to network infrastructure, DSES is developing a Distributed Simulation Network for use by all NASA centers. With regard to software, DSES is developing software models, tools and procedures that streamline distributed simulation development and provide an interoperable infrastructure for agency-wide integrated simulation. Finally, with regard to simulation development, DSES is developing an integrated end-to-end simulation capability to support NASA development of new exploration spacecraft and missions. This paper presents the current status and plans for these three areas, including examples of specific simulations.

  4. The EPOS Vision for the Open Science Cloud

    NASA Astrophysics Data System (ADS)

    Jeffery, Keith; Harrison, Matt; Cocco, Massimo

    2016-04-01

    Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.

  5. [Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure].

    PubMed

    Yokohama, Noriya

    2013-07-01

    This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.

  6. 75 FR 70899 - Submission for OMB Review; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-19

    ... submit to the Office of Management and Budget (OMB) for clearance the following proposal for collection... Annual Burden Hours: 2,952. Public Computer Center Reports (Quarterly and Annually) Number of Respondents... specific to Infrastructure and Comprehensive Community Infrastructure, Public Computer Center, and...

  7. Architecture for distributed design and fabrication

    NASA Astrophysics Data System (ADS)

    McIlrath, Michael B.; Boning, Duane S.; Troxel, Donald E.

    1997-01-01

    We describe a flexible, distributed system architecture capable of supporting collaborative design and fabrication of semi-conductor devices and integrated circuits. Such capabilities are of particular importance in the development of new technologies, where both equipment and expertise are limited. Distributed fabrication enables direct, remote, physical experimentation in the development of leading edge technology, where the necessary manufacturing resources are new, expensive, and scarce. Computational resources, software, processing equipment, and people may all be widely distributed; their effective integration is essential in order to achieve the realization of new technologies for specific product requirements. Our architecture leverages is essential in order to achieve the realization of new technologies for specific product requirements. Our architecture leverages current vendor and consortia developments to define software interfaces and infrastructure based on existing and merging networking, CIM, and CAD standards. Process engineers and product designers access processing and simulation results through a common interface and collaborate across the distributed manufacturing environment.

  8. Distribution system model calibration with big data from AMI and PV inverters

    DOE PAGES

    Peppanen, Jouni; Reno, Matthew J.; Broderick, Robert J.; ...

    2016-03-03

    Efficient management and coordination of distributed energy resources with advanced automation schemes requires accurate distribution system modeling and monitoring. Big data from smart meters and photovoltaic (PV) micro-inverters can be leveraged to calibrate existing utility models. This paper presents computationally efficient distribution system parameter estimation algorithms to improve the accuracy of existing utility feeder radial secondary circuit model parameters. The method is demonstrated using a real utility feeder model with advanced metering infrastructure (AMI) and PV micro-inverters, along with alternative parameter estimation approaches that can be used to improve secondary circuit models when limited measurement data is available. Lastly, themore » parameter estimation accuracy is demonstrated for both a three-phase test circuit with typical secondary circuit topologies and single-phase secondary circuits in a real mixed-phase test system.« less

  9. Distribution system model calibration with big data from AMI and PV inverters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peppanen, Jouni; Reno, Matthew J.; Broderick, Robert J.

    Efficient management and coordination of distributed energy resources with advanced automation schemes requires accurate distribution system modeling and monitoring. Big data from smart meters and photovoltaic (PV) micro-inverters can be leveraged to calibrate existing utility models. This paper presents computationally efficient distribution system parameter estimation algorithms to improve the accuracy of existing utility feeder radial secondary circuit model parameters. The method is demonstrated using a real utility feeder model with advanced metering infrastructure (AMI) and PV micro-inverters, along with alternative parameter estimation approaches that can be used to improve secondary circuit models when limited measurement data is available. Lastly, themore » parameter estimation accuracy is demonstrated for both a three-phase test circuit with typical secondary circuit topologies and single-phase secondary circuits in a real mixed-phase test system.« less

  10. Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

    DOE PAGES

    Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.; ...

    2015-04-06

    The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less

  11. Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.

    The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less

  12. The open black box: The role of the end-user in GIS integration

    USGS Publications Warehouse

    Poore, B.S.

    2003-01-01

    Formalist theories of knowledge that underpin GIS scholarship on integration neglect the importance and creativity of end-users in knowledge construction. This has practical consequences for the success of large distributed databases that contribute to spatial-data infrastructures. Spatial-data infrastructures depend on participation at local levels, such as counties and watersheds, and they must be developed to support feedback from local users. Looking carefully at the work of scientists in a watershed in Puget Sound, Washington, USA during the salmon crisis reveals that the work of these end-users articulates different worlds of knowledge. This view of the user is consonant with recent work in science and technology studies and research into computer-supported cooperative work. GIS theory will be enhanced when it makes room for these users and supports their practical work. ?? / Canadian Association of Geographers.

  13. Approach to sustainable e-Infrastructures - The case of the Latin American Grid

    NASA Astrophysics Data System (ADS)

    Barbera, Roberto; Diacovo, Ramon; Brasileiro, Francisco; Carvalho, Diego; Dutra, Inês; Faerman, Marcio; Gavillet, Philippe; Hoeger, Herbert; Lopez Pourailly, Maria Jose; Marechal, Bernard; Garcia, Rafael Mayo; Neumann Ciuffo, Leandro; Ramos Pollan, Paul; Scardaci, Diego; Stanton, Michael

    2010-05-01

    The EELA (E-Infrastructure shared between Europe and Latin America) and EELA-2 (E-science grid facility for Europe and Latin America) projects, co-funded by the European Commission under FP6 and FP7, respectively, have been successful in building a high capacity, production-quality, scalable Grid Facility for a wide spectrum of applications (e.g. Earth & Life Sciences, High energy physics, etc.) from several European and Latin American User Communities. This paper presents the 4-year experience of EELA and EELA-2 in: • Providing each Member Institution the unique opportunity to benefit of a huge distributed computing platform for its research activities, in particular through initiatives such as OurGrid which proposes a so-called Opportunistic Grid Computing well adapted to small and medium Research Laboratories such as most of those of Latin America and Africa; • Developing a realistic strategy to ensure the long-term continuity of the e-Infrastructure in the Latin American continent, beyond the term of the EELA-2 project, in association with CLARA and collaborating with EGI. Previous interactions between EELA and African Grid members at events such as the IST Africa'07, 08 and 09, the International Conference on Open Access'08 and EuroAfriCa-ICT'08, to which EELA and EELA-2 contributed, have shown that the e-Infrastructure situation in Africa compares well with the Latin American one. This means that African Grids are likely to face the same problems that EELA and EELA-2 experienced, especially in getting the necessary User and Decision Makers support to create NGIs and, later, a possible continent-wide African Grid Initiative (AGI). The hope is that the EELA-2 endeavour towards sustainability as described in this presentation could help the progress of African Grids.

  14. A Primer on High-Throughput Computing for Genomic Selection

    PubMed Central

    Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel

    2011-01-01

    High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303

  15. CloudMan as a platform for tool, data, and analysis distribution

    PubMed Central

    2012-01-01

    Background Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. Results CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. Conclusions With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions. PMID:23181507

  16. Critical infrastructure protection : significant challenges in developing national capabilities

    DOT National Transportation Integrated Search

    2001-04-01

    To address the concerns about protecting the nation's critical computer-dependent infrastructure, this General Accounting Office (GOA) report describes the progress of the National Infrastructure Protection Center (NIPC) in (1) developing national ca...

  17. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

    PubMed Central

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

    2015-01-01

    Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831

  18. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

    PubMed

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

    2015-05-01

    The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.

  19. Body area network--a key infrastructure element for patient-centered telemedicine.

    PubMed

    Norgall, Thomas; Schmidt, Robert; von der Grün, Thomas

    2004-01-01

    The Body Area Network (BAN) extends the range of existing wireless network technologies by an ultra-low range, ultra-low power network solution optimised for long-term or continuous healthcare applications. It enables wireless radio communication between several miniaturised, intelligent Body Sensor (or actor) Units (BSU) and a single Body Central Unit (BCU) worn at the human body. A separate wireless transmission link from the BCU to a network access point--using different technology--provides for online access to BAN components via usual network infrastructure. The BAN network protocol maintains dynamic ad-hoc network configuration scenarios and co-existence of multiple networks.BAN is expected to become a basic infrastructure element for electronic health services: By integrating patient-attached sensors and mobile actor units, distributed information and data processing systems, the range of medical workflow can be extended to include applications like wireless multi-parameter patient monitoring and therapy support. Beyond clinical use and professional disease management environments, private personal health assistance scenarios (without financial reimbursement by health agencies / insurance companies) enable a wide range of applications and services in future pervasive computing and networking environments.

  20. Evolution of the Virtualized HPC Infrastructure of Novosibirsk Scientific Center

    NASA Astrophysics Data System (ADS)

    Adakin, A.; Anisenkov, A.; Belov, S.; Chubarov, D.; Kalyuzhny, V.; Kaplin, V.; Korol, A.; Kuchin, N.; Lomakin, S.; Nikultsev, V.; Skovpen, K.; Sukharev, A.; Zaytsev, A.

    2012-12-01

    Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies, and Institute of Computational Mathematics and Mathematical Geophysics (ICM&MG). Since each institute has specific requirements on the architecture of computing farms involved in its research field, currently we've got several computing facilities hosted by NSC institutes, each optimized for a particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM&MG), and a Grid Computing Facility of BINP. A dedicated optical network with the initial bandwidth of 10 Gb/s connecting these three facilities was built in order to make it possible to share the computing resources among the research communities, thus increasing the efficiency of operating the existing computing facilities and offering a common platform for building the computing infrastructure for future scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technology based on XEN and KVM platforms. This contribution gives a thorough review of the present status and future development prospects for the NSC virtualized computing infrastructure and the experience gained while using it for running production data analysis jobs related to HEP experiments being carried out at BINP, especially the KEDR detector experiment at the VEPP-4M electron-positron collider.

  1. Sector and Sphere: the design and implementation of a high-performance data cloud

    PubMed Central

    Gu, Yunhong; Grossman, Robert L.

    2009-01-01

    Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source. PMID:19451100

  2. The GÉANT network: addressing current and future needs of the HEP community

    NASA Astrophysics Data System (ADS)

    Capone, Vincenzo; Usman, Mian

    2015-12-01

    The GÉANT infrastructure is the backbone that serves the scientific communities in Europe for their data movement needs and their access to international research and education networks. Using the extensive fibre footprint and infrastructure in Europe the GÉANT network delivers a portfolio of services aimed to best fit the specific needs of the users, including Authentication and Authorization Infrastructure, end-to-end performance monitoring, advanced network services (dynamic circuits, L2-L3VPN, MD-VPN). This talk will outline the factors that help the GÉANT network to respond to the needs of the High Energy Physics community, both in Europe and worldwide. The Pan-European network provides the connectivity between 40 European national research and education networks. In addition, GÉANT also connects the European NRENs to the R&E networks in other world region and has reach to over 110 NREN worldwide, making GÉANT the best connected Research and Education network, with its multiple intercontinental links to different continents e.g. North and South America, Africa and Asia-Pacific. The High Energy Physics computational needs have always had (and will keep having) a leading role among the scientific user groups of the GÉANT network: the LHCONE overlay network has been built, in collaboration with the other big world REN, specifically to address the peculiar needs of the LHC data movement. Recently, as a result of a series of coordinated efforts, the LHCONE network has been expanded to the Asia-Pacific area, and is going to include some of the main regional R&E network in the area. The LHC community is not the only one that is actively using a distributed computing model (hence the need for a high-performance network); new communities are arising, as BELLE II. GÉANT is deeply involved also with the BELLE II Experiment, to provide full support to their distributed computing model, along with a perfSONAR-based network monitoring system. GÉANT has also coordinated the setup of the network infrastructure to perform the BELLE II Trans-Atlantic Data Challenge, and has been active on helping the BELLE II community to sort out their end-to-end performance issues. In this talk we will provide information about the current GÉANT network architecture and of the international connectivity, along with the upcoming upgrades and the planned and foreseeable improvements. We will also describe the implementation of the solutions provided to support the LHC and BELLE II experiments.

  3. SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology

    PubMed Central

    Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E.; Troein, Carl; Millar, Andrew J.; Goryanin, Igor; Gilmore, Stephen

    2013-01-01

    Summary: Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI’s use of standard data formats. Availability and implementation: All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials. Contact: stg@inf.ed.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23329415

  4. Grid and Cloud for Developing Countries

    NASA Astrophysics Data System (ADS)

    Petitdidier, Monique

    2014-05-01

    The European Grid e-infrastructure has shown the capacity to connect geographically distributed heterogeneous compute resources in a secure way taking advantages of a robust and fast REN (Research and Education Network). In many countries like in Africa the first step has been to implement a REN and regional organizations like Ubuntunet, WACREN or ASREN to coordinate the development, improvement of the network and its interconnection. The Internet connections are still exploding in those countries. The second step has been to fill up compute needs of the scientists. Even if many of them have their own multi-core or not laptops for more and more applications it is not enough because they have to face intensive computing due to the large amount of data to be processed and/or complex codes. So far one solution has been to go abroad in Europe or in America to run large applications or not to participate to international communities. The Grid is very attractive to connect geographically-distributed heterogeneous resources, aggregate new ones and create new sites on the REN with a secure access. All the users have the same servicers even if they have no resources in their institute. With faster and more robust internet they will be able to take advantage of the European Grid. There are different initiatives to provide resources and training like UNESCO/HP Brain Gain initiative, EUMEDGrid, ..Nowadays Cloud becomes very attractive and they start to be developed in some countries. In this talk challenges for those countries to implement such e-infrastructures, to develop in parallel scientific and technical research and education in the new technologies will be presented illustrated by examples.

  5. Editorial [Special issue on software defined networks and infrastructures, network function virtualisation, autonomous systems and network management

    DOE PAGES

    Biswas, Amitava; Liu, Chen; Monga, Inder; ...

    2016-01-01

    For last few years, there has been a tremendous growth in data traffic due to high adoption rate of mobile devices and cloud computing. Internet of things (IoT) will stimulate even further growth. This is increasing scale and complexity of telecom/internet service provider (SP) and enterprise data centre (DC) compute and network infrastructures. As a result, managing these large network-compute converged infrastructures is becoming complex and cumbersome. To cope up, network and DC operators are trying to automate network and system operations, administrations and management (OAM) functions. OAM includes all non-functional mechanisms which keep the network running.

  6. Resource Aware Intelligent Network Services (RAINS) Final Technical Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lehman, Tom; Yang, Xi

    The Resource Aware Intelligent Network Services (RAINS) project conducted research and developed technologies in the area of cyber infrastructure resource modeling and computation. The goal of this work was to provide a foundation to enable intelligent, software defined services which spanned the network AND the resources which connect to the network. A Multi-Resource Service Plane (MRSP) was defined, which allows resource owners/managers to locate and place themselves from a topology and service availability perspective within the dynamic networked cyberinfrastructure ecosystem. The MRSP enables the presentation of integrated topology views and computation results which can include resources across the spectrum ofmore » compute, storage, and networks. The RAINS project developed MSRP includes the following key components: i) Multi-Resource Service (MRS) Ontology/Multi-Resource Markup Language (MRML), ii) Resource Computation Engine (RCE), iii) Modular Driver Framework (to allow integration of a variety of external resources). The MRS/MRML is a general and extensible modeling framework that allows for resource owners to model, or describe, a wide variety of resource types. All resources are described using three categories of elements: Resources, Services, and Relationships between the elements. This modeling framework defines a common method for the transformation of cyber infrastructure resources into data in the form of MRML models. In order to realize this infrastructure datification, the RAINS project developed a model based computation system, i.e. “RAINS Computation Engine (RCE)”. The RCE has the ability to ingest, process, integrate, and compute based on automatically generated MRML models. The RCE interacts with the resources thru system drivers which are specific to the type of external network or resource controller. The RAINS project developed a modular and pluggable driver system which facilities a variety of resource controllers to automatically generate, maintain, and distribute MRML based resource descriptions. Once all of the resource topologies are absorbed by the RCE, a connected graph of the full distributed system topology is constructed, which forms the basis for computation and workflow processing. The RCE includes a Modular Computation Element (MCE) framework which allows for tailoring of the computation process to the specific set of resources under control, and the services desired. The input and output of an MCE are both model data based on MRS/MRML ontology and schema. Some of the RAINS project accomplishments include: Development of general and extensible multi-resource modeling framework; Design of a Resource Computation Engine (RCE) system which includes the following key capabilities; Absorb a variety of multi-resource model types and build integrated models; Novel architecture which uses model based communications across the full stack for all Flexible provision of abstract or intent based user facing interfaces; Workflow processing based on model descriptions; Release of the RCE as an open source software; Deployment of RCE in the University of Maryland/Mid-Atlantic Crossroad ScienceDMZ in prototype mode with a plan under way to transition to production; Deployment at the Argonne National Laboratory DTN Facility in prototype mode; Selection of RCE by the DOE SENSE (SDN for End-to-end Networked Science at the Exascale) project as the basis for their orchestration service.« less

  7. Geospatial Applications on Different Parallel and Distributed Systems in enviroGRIDS Project

    NASA Astrophysics Data System (ADS)

    Rodila, D.; Bacu, V.; Gorgan, D.

    2012-04-01

    The execution of Earth Science applications and services on parallel and distributed systems has become a necessity especially due to the large amounts of Geospatial data these applications require and the large geographical areas they cover. The parallelization of these applications comes to solve important performance issues and can spread from task parallelism to data parallelism as well. Parallel and distributed architectures such as Grid, Cloud, Multicore, etc. seem to offer the necessary functionalities to solve important problems in the Earth Science domain: storing, distribution, management, processing and security of Geospatial data, execution of complex processing through task and data parallelism, etc. A main goal of the FP7-funded project enviroGRIDS (Black Sea Catchment Observation and Assessment System supporting Sustainable Development) [1] is the development of a Spatial Data Infrastructure targeting this catchment region but also the development of standardized and specialized tools for storing, analyzing, processing and visualizing the Geospatial data concerning this area. For achieving these objectives, the enviroGRIDS deals with the execution of different Earth Science applications, such as hydrological models, Geospatial Web services standardized by the Open Geospatial Consortium (OGC) and others, on parallel and distributed architecture to maximize the obtained performance. This presentation analysis the integration and execution of Geospatial applications on different parallel and distributed architectures and the possibility of choosing among these architectures based on application characteristics and user requirements through a specialized component. Versions of the proposed platform have been used in enviroGRIDS project on different use cases such as: the execution of Geospatial Web services both on Web and Grid infrastructures [2] and the execution of SWAT hydrological models both on Grid and Multicore architectures [3]. The current focus is to integrate in the proposed platform the Cloud infrastructure, which is still a paradigm with critical problems to be solved despite the great efforts and investments. Cloud computing comes as a new way of delivering resources while using a large set of old as well as new technologies and tools for providing the necessary functionalities. The main challenges in the Cloud computing, most of them identified also in the Open Cloud Manifesto 2009, address resource management and monitoring, data and application interoperability and portability, security, scalability, software licensing, etc. We propose a platform able to execute different Geospatial applications on different parallel and distributed architectures such as Grid, Cloud, Multicore, etc. with the possibility of choosing among these architectures based on application characteristics and complexity, user requirements, necessary performances, cost support, etc. The execution redirection on a selected architecture is realized through a specialized component and has the purpose of offering a flexible way in achieving the best performances considering the existing restrictions.

  8. Cloud computing can simplify HIT infrastructure management.

    PubMed

    Glaser, John

    2011-08-01

    Software as a Service (SaaS), built on cloud computing technology, is emerging as the forerunner in IT infrastructure because it helps healthcare providers reduce capital investments. Cloud computing leads to predictable, monthly, fixed operating expenses for hospital IT staff. Outsourced cloud computing facilities are state-of-the-art data centers boasting some of the most sophisticated networking equipment on the market. The SaaS model helps hospitals safeguard against technology obsolescence, minimizes maintenance requirements, and simplifies management.

  9. CTserver: A Computational Thermodynamics Server for the Geoscience Community

    NASA Astrophysics Data System (ADS)

    Kress, V. C.; Ghiorso, M. S.

    2006-12-01

    The CTserver platform is an Internet-based computational resource that provides on-demand services in Computational Thermodynamics (CT) to a diverse geoscience user base. This NSF-supported resource can be accessed at ctserver.ofm-research.org. The CTserver infrastructure leverages a high-quality and rigorously tested software library of routines for computing equilibrium phase assemblages and for evaluating internally consistent thermodynamic properties of materials, e.g. mineral solid solutions and a variety of geological fluids, including magmas. Thermodynamic models are currently available for 167 phases. Recent additions include Duan, Møller and Weare's model for supercritical C-O-H-S, extended to include SO2 and S2 species, and an entirely new associated solution model for O-S-Fe-Ni sulfide liquids. This software library is accessed via the CORBA Internet protocol for client-server communication. CORBA provides a standardized, object-oriented, language and platform independent, fast, low-bandwidth interface to phase property modules running on the server cluster. Network transport, language translation and resource allocation are handled by the CORBA interface. Users access server functionality in two principal ways. Clients written as browser- based Java applets may be downloaded which provide specific functionality such as retrieval of thermodynamic properties of phases, computation of phase equilibria for systems of specified composition, or modeling the evolution of these systems along some particular reaction path. This level of user interaction requires minimal programming effort and is ideal for classroom use. A more universal and flexible mode of CTserver access involves making remote procedure calls from user programs directly to the server public interface. The CTserver infrastructure relieves the user of the burden of implementing and testing the often complex thermodynamic models of real liquids and solids. A pilot application of this distributed architecture involves CFD computation of magma convection at Volcan Villarrica with magma properties and phase proportions calculated at each spatial node and at each time step via distributed function calls to MELTS-objects executing on the CTserver. Documentation and programming examples are provided at http://ctserver.ofm- research.org.

  10. Progress in Machine Learning Studies for the CMS Computing Infrastructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo

    Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.

  11. Progress in Machine Learning Studies for the CMS Computing Infrastructure

    DOE PAGES

    Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo; ...

    2017-12-06

    Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.

  12. Characterization and classification of vegetation canopy structure and distribution within the Great Smoky Mountains National Park using LiDAR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kumar, Jitendra; HargroveJr., William Walter; Norman, Steven P

    Vegetation canopy structure is a critically important habit characteristic for many threatened and endangered birds and other animal species, and it is key information needed by forest and wildlife managers for monitoring and managing forest resources, conservation planning and fostering biodiversity. Advances in Light Detection and Ranging (LiDAR) technologies have enabled remote sensing-based studies of vegetation canopies by capturing three-dimensional structures, yielding information not available in two-dimensional images of the landscape pro- vided by traditional multi-spectral remote sensing platforms. However, the large volume data sets produced by airborne LiDAR instruments pose a significant computational challenge, requiring algorithms to identify andmore » analyze patterns of interest buried within LiDAR point clouds in a computationally efficient manner, utilizing state-of-art computing infrastructure. We developed and applied a computationally efficient approach to analyze a large volume of LiDAR data and to characterize and map the vegetation canopy structures for 139,859 hectares (540 sq. miles) in the Great Smoky Mountains National Park. This study helps improve our understanding of the distribution of vegetation and animal habitats in this extremely diverse ecosystem.« less

  13. An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Follen, Gregory J.; Lytle, John K. (Technical Monitor)

    2002-01-01

    Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT). This paper discusses the salient features of the NPSS Architecture including its interface layer, object layer, implementation for accessing legacy codes, numerical zooming infrastructure and its computing layer. The computing layer focuses on the use and deployment of these propulsion simulations on parallel and distributed computing platforms which has been the focus of NASA Ames. Additional features of the object oriented architecture that support MultiDisciplinary (MD) Coupling, computer aided design (CAD) access and MD coupling objects will be discussed. Included will be a discussion of the successes, challenges and benefits of implementing this architecture.

  14. A Serviced-based Approach to Connect Seismological Infrastructures: Current Efforts at the IRIS DMC

    NASA Astrophysics Data System (ADS)

    Ahern, Tim; Trabant, Chad

    2014-05-01

    As part of the COOPEUS initiative to build infrastructure that connects European and US research infrastructures, IRIS has advocated for the development of Federated services based upon internationally recognized standards using web services. By deploying International Federation of Digital Seismograph Networks (FDSN) endorsed web services at multiple data centers in the US and Europe, we have shown that integration within seismological domain can be realized. By deploying identical methods to invoke the web services at multiple centers this approach can significantly ease the methods through which a scientist can access seismic data (time series, metadata, and earthquake catalogs) from distributed federated centers. IRIS has developed an IRIS federator that helps a user identify where seismic data from global seismic networks can be accessed. The web services based federator can build the appropriate URLs and return them to client software running on the scientists own computer. These URLs are then used to directly pull data from the distributed center in a very peer-based fashion. IRIS is also involved in deploying web services across horizontal domains. As part of the US National Science Foundation's (NSF) EarthCube effort, an IRIS led EarthCube Building Block's project is underway. When completed this project will aid in the discovery, access, and usability of data across multiple geoscienece domains. This presentation will summarize current IRIS efforts in building vertical integration infrastructure within seismology working closely with 5 centers in Europe and 2 centers in the US, as well as how we are taking first steps toward horizontal integration of data from 14 different domains in the US, in Europe, and around the world.

  15. Biomedical informatics research network: building a national collaboratory to hasten the derivation of new understanding and treatment of disease.

    PubMed

    Grethe, Jeffrey S; Baru, Chaitan; Gupta, Amarnath; James, Mark; Ludaescher, Bertram; Martone, Maryann E; Papadopoulos, Philip M; Peltier, Steven T; Rajasekar, Arcot; Santini, Simone; Zaslavsky, Ilya N; Ellisman, Mark H

    2005-01-01

    Through support from the National Institutes of Health's National Center for Research Resources, the Biomedical Informatics Research Network (BIRN) is pioneering the use of advanced cyberinfrastructure for medical research. By synchronizing developments in advanced wide area networking, distributed computing, distributed database federation, and other emerging capabilities of e-science, the BIRN has created a collaborative environment that is paving the way for biomedical research and clinical information management. The BIRN Coordinating Center (BIRN-CC) is orchestrating the development and deployment of key infrastructure components for immediate and long-range support of biomedical and clinical research being pursued by domain scientists in three neuroimaging test beds.

  16. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    NASA Astrophysics Data System (ADS)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  17. Critical Infrastructure Protection II, The International Federation for Information Processing, Volume 290.

    NASA Astrophysics Data System (ADS)

    Papa, Mauricio; Shenoi, Sujeet

    The information infrastructure -- comprising computers, embedded devices, networks and software systems -- is vital to day-to-day operations in every sector: information and telecommunications, banking and finance, energy, chemicals and hazardous materials, agriculture, food, water, public health, emergency services, transportation, postal and shipping, government and defense. Global business and industry, governments, indeed society itself, cannot function effectively if major components of the critical information infrastructure are degraded, disabled or destroyed. Critical Infrastructure Protection II describes original research results and innovative applications in the interdisciplinary field of critical infrastructure protection. Also, it highlights the importance of weaving science, technology and policy in crafting sophisticated, yet practical, solutions that will help secure information, computer and network assets in the various critical infrastructure sectors. Areas of coverage include: - Themes and Issues - Infrastructure Security - Control Systems Security - Security Strategies - Infrastructure Interdependencies - Infrastructure Modeling and Simulation This book is the second volume in the annual series produced by the International Federation for Information Processing (IFIP) Working Group 11.10 on Critical Infrastructure Protection, an international community of scientists, engineers, practitioners and policy makers dedicated to advancing research, development and implementation efforts focused on infrastructure protection. The book contains a selection of twenty edited papers from the Second Annual IFIP WG 11.10 International Conference on Critical Infrastructure Protection held at George Mason University, Arlington, Virginia, USA in the spring of 2008.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCaskey, Alexander J.

    Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.

  19. Scalable collaborative risk management technology for complex critical systems

    NASA Technical Reports Server (NTRS)

    Campbell, Scott; Torgerson, Leigh; Burleigh, Scott; Feather, Martin S.; Kiper, James D.

    2004-01-01

    We describe here our project and plans to develop methods, software tools, and infrastructure tools to address challenges relating to geographically distributed software development. Specifically, this work is creating an infrastructure that supports applications working over distributed geographical and organizational domains and is using this infrastructure to develop a tool that supports project development using risk management and analysis techniques where the participants are not collocated.

  20. Galaxy CloudMan: delivering cloud compute clusters.

    PubMed

    Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James

    2010-12-21

    Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.

  1. Analysis of outcomes in radiation oncology: An integrated computational platform

    PubMed Central

    Liu, Dezhi; Ajlouni, Munther; Jin, Jian-Yue; Ryu, Samuel; Siddiqui, Farzan; Patel, Anushka; Movsas, Benjamin; Chetty, Indrin J.

    2009-01-01

    Radiotherapy research and outcome analyses are essential for evaluating new methods of radiation delivery and for assessing the benefits of a given technology on locoregional control and overall survival. In this article, a computational platform is presented to facilitate radiotherapy research and outcome studies in radiation oncology. This computational platform consists of (1) an infrastructural database that stores patient diagnosis, IMRT treatment details, and follow-up information, (2) an interface tool that is used to import and export IMRT plans in DICOM RT and AAPM/RTOG formats from a wide range of planning systems to facilitate reproducible research, (3) a graphical data analysis and programming tool that visualizes all aspects of an IMRT plan including dose, contour, and image data to aid the analysis of treatment plans, and (4) a software package that calculates radiobiological models to evaluate IMRT treatment plans. Given the limited number of general-purpose computational environments for radiotherapy research and outcome studies, this computational platform represents a powerful and convenient tool that is well suited for analyzing dose distributions biologically and correlating them with the delivered radiation dose distributions and other patient-related clinical factors. In addition the database is web-based and accessible by multiple users, facilitating its convenient application and use. PMID:19544785

  2. Software Reuse Methods to Improve Technological Infrastructure for e-Science

    NASA Technical Reports Server (NTRS)

    Marshall, James J.; Downs, Robert R.; Mattmann, Chris A.

    2011-01-01

    Social computing has the potential to contribute to scientific research. Ongoing developments in information and communications technology improve capabilities for enabling scientific research, including research fostered by social computing capabilities. The recent emergence of e-Science practices has demonstrated the benefits from improvements in the technological infrastructure, or cyber-infrastructure, that has been developed to support science. Cloud computing is one example of this e-Science trend. Our own work in the area of software reuse offers methods that can be used to improve new technological development, including cloud computing capabilities, to support scientific research practices. In this paper, we focus on software reuse and its potential to contribute to the development and evaluation of information systems and related services designed to support new capabilities for conducting scientific research.

  3. An Ensemble-Based Smoother with Retrospectively Updated Weights for Highly Nonlinear Systems

    NASA Technical Reports Server (NTRS)

    Chin, T. M.; Turmon, M. J.; Jewell, J. B.; Ghil, M.

    2006-01-01

    Monte Carlo computational methods have been introduced into data assimilation for nonlinear systems in order to alleviate the computational burden of updating and propagating the full probability distribution. By propagating an ensemble of representative states, algorithms like the ensemble Kalman filter (EnKF) and the resampled particle filter (RPF) rely on the existing modeling infrastructure to approximate the distribution based on the evolution of this ensemble. This work presents an ensemble-based smoother that is applicable to the Monte Carlo filtering schemes like EnKF and RPF. At the minor cost of retrospectively updating a set of weights for ensemble members, this smoother has demonstrated superior capabilities in state tracking for two highly nonlinear problems: the double-well potential and trivariate Lorenz systems. The algorithm does not require retrospective adaptation of the ensemble members themselves, and it is thus suited to a streaming operational mode. The accuracy of the proposed backward-update scheme in estimating non-Gaussian distributions is evaluated by comparison to the more accurate estimates provided by a Markov chain Monte Carlo algorithm.

  4. RAPPORT: running scientific high-performance computing applications on the cloud.

    PubMed

    Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt

    2013-01-28

    Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.

  5. AstroGrid-D: Grid technology for astronomical science

    NASA Astrophysics Data System (ADS)

    Enke, Harry; Steinmetz, Matthias; Adorf, Hans-Martin; Beck-Ratzka, Alexander; Breitling, Frank; Brüsemeister, Thomas; Carlson, Arthur; Ensslin, Torsten; Högqvist, Mikael; Nickelt, Iliya; Radke, Thomas; Reinefeld, Alexander; Reiser, Angelika; Scholl, Tobias; Spurzem, Rainer; Steinacker, Jürgen; Voges, Wolfgang; Wambsganß, Joachim; White, Steve

    2011-02-01

    We present status and results of AstroGrid-D, a joint effort of astrophysicists and computer scientists to employ grid technology for scientific applications. AstroGrid-D provides access to a network of distributed machines with a set of commands as well as software interfaces. It allows simple use of computer and storage facilities and to schedule or monitor compute tasks and data management. It is based on the Globus Toolkit middleware (GT4). Chapter 1 describes the context which led to the demand for advanced software solutions in Astrophysics, and we state the goals of the project. We then present characteristic astrophysical applications that have been implemented on AstroGrid-D in chapter 2. We describe simulations of different complexity, compute-intensive calculations running on multiple sites (Section 2.1), and advanced applications for specific scientific purposes (Section 2.2), such as a connection to robotic telescopes (Section 2.2.3). We can show from these examples how grid execution improves e.g. the scientific workflow. Chapter 3 explains the software tools and services that we adapted or newly developed. Section 3.1 is focused on the administrative aspects of the infrastructure, to manage users and monitor activity. Section 3.2 characterises the central components of our architecture: The AstroGrid-D information service to collect and store metadata, a file management system, the data management system, and a job manager for automatic submission of compute tasks. We summarise the successfully established infrastructure in chapter 4, concluding with our future plans to establish AstroGrid-D as a platform of modern e-Astronomy.

  6. Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models.

    PubMed

    Rao, Nageswara S V; Poole, Stephen W; Ma, Chris Y T; He, Fei; Zhuang, Jun; Yau, David K Y

    2016-04-01

    The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities, expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical subinfrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein their components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures, are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. The analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures. © 2015 Society for Risk Analysis.

  7. An infrastructure with a unified control plane to integrate IP into optical metro networks to provide flexible and intelligent bandwidth on demand for cloud computing

    NASA Astrophysics Data System (ADS)

    Yang, Wei; Hall, Trevor

    2012-12-01

    The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users and the nature of the Internet traffic will undertake a fundamental transformation. Consequently, the current Internet will no longer suffice for serving cloud traffic in metro areas. This work proposes an infrastructure with a unified control plane that integrates simple packet aggregation technology with optical express through the interoperation between IP routers and electrical traffic controllers in optical metro networks. The proposed infrastructure provides flexible, intelligent, and eco-friendly bandwidth on demand for cloud computing in metro areas.

  8. Optimisation of the usage of LHC and local computing resources in a multidisciplinary physics department hosting a WLCG Tier-2 centre

    NASA Astrophysics Data System (ADS)

    Barberis, Stefano; Carminati, Leonardo; Leveraro, Franco; Mazza, Simone Michele; Perini, Laura; Perlz, Francesco; Rebatto, David; Tura, Ruggero; Vaccarossa, Luca; Villaplana, Miguel

    2015-12-01

    We present the approach of the University of Milan Physics Department and the local unit of INFN to allow and encourage the sharing among different research areas of computing, storage and networking resources (the largest ones being those composing the Milan WLCG Tier-2 centre and tailored to the needs of the ATLAS experiment). Computing resources are organised as independent HTCondor pools, with a global master in charge of monitoring them and optimising their usage. The configuration has to provide satisfactory throughput for both serial and parallel (multicore, MPI) jobs. A combination of local, remote and cloud storage options are available. The experience of users from different research areas operating on this shared infrastructure is discussed. The promising direction of improving scientific computing throughput by federating access to distributed computing and storage also seems to fit very well with the objectives listed in the European Horizon 2020 framework for research and development.

  9. BelleII@home: Integrate volunteer computing resources into DIRAC in a secure way

    NASA Astrophysics Data System (ADS)

    Wu, Wenjing; Hara, Takanori; Miyake, Hideki; Ueda, Ikuo; Kan, Wenxiao; Urquijo, Phillip

    2017-10-01

    The exploitation of volunteer computing resources has become a popular practice in the HEP computing community as the huge amount of potential computing power it provides. In the recent HEP experiments, the grid middleware has been used to organize the services and the resources, however it relies heavily on the X.509 authentication, which is contradictory to the untrusted feature of volunteer computing resources, therefore one big challenge to utilize the volunteer computing resources is how to integrate them into the grid middleware in a secure way. The DIRAC interware which is commonly used as the major component of the grid computing infrastructure for several HEP experiments proposes an even bigger challenge to this paradox as its pilot is more closely coupled with operations requiring the X.509 authentication compared to the implementations of pilot in its peer grid interware. The Belle II experiment is a B-factory experiment at KEK, and it uses DIRAC for its distributed computing. In the project of BelleII@home, in order to integrate the volunteer computing resources into the Belle II distributed computing platform in a secure way, we adopted a new approach which detaches the payload running from the Belle II DIRAC pilot which is a customized pilot pulling and processing jobs from the Belle II distributed computing platform, so that the payload can run on volunteer computers without requiring any X.509 authentication. In this approach we developed a gateway service running on a trusted server which handles all the operations requiring the X.509 authentication. So far, we have developed and deployed the prototype of BelleII@home, and tested its full workflow which proves the feasibility of this approach. This approach can also be applied on HPC systems whose work nodes do not have outbound connectivity to interact with the DIRAC system in general.

  10. Secure dissemination of electronic healthcare records in distributed wireless environments.

    PubMed

    Belsis, Petros; Vassis, Dimitris; Skourlas, Christos; Pantziou, Grammati

    2008-01-01

    A new networking paradigm has emerged with the appearance of wireless computing. Among else ad-hoc networks, mobile and ubiquitous environments can boost the performance of systems in which they get applied. Among else, medical environments are a convenient example of their applicability. With the utilisation of wireless infrastructures, medical data may be accessible to healthcare practitioners, enabling continuous access to medical data. Due to the critical nature of medical information, the design and implementation of these infrastructures demands special treatment in order to meet specific requirements; among else, special care should be taken in order to manage interoperability, security, and in order to deal with bandwidth and hardware resource constraints that characterize the wireless topology. In this paper we present an architecture that attempts to deal with these issues; moreover, in order to prove the validity of our approach we have also evaluated the performance of our platform through simulation in different operating scenarios.

  11. Effective Team Support: From Modeling to Software Agents

    NASA Technical Reports Server (NTRS)

    Remington, Roger W. (Technical Monitor); John, Bonnie; Sycara, Katia

    2003-01-01

    The purpose of this research contract was to perform multidisciplinary research between CMU psychologists, computer scientists and engineers and NASA researchers to design a next generation collaborative system to support a team of human experts and intelligent agents. To achieve robust performance enhancement of such a system, we had proposed to perform task and cognitive modeling to thoroughly understand the impact technology makes on the organization and on key individual personnel. Guided by cognitively-inspired requirements, we would then develop software agents that support the human team in decision making, information filtering, information distribution and integration to enhance team situational awareness. During the period covered by this final report, we made substantial progress in modeling infrastructure and task infrastructure. Work is continuing under a different contract to complete empirical data collection, cognitive modeling, and the building of software agents to support the teams task.

  12. XRootD popularity on hadoop clusters

    NASA Astrophysics Data System (ADS)

    Meoni, Marco; Boccali, Tommaso; Magini, Nicolò; Menichetti, Luca; Giordano, Domenico; CMS Collaboration

    2017-10-01

    Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.

  13. A system to build distributed multivariate models and manage disparate data sharing policies: implementation in the scalable national network for effectiveness research.

    PubMed

    Meeker, Daniella; Jiang, Xiaoqian; Matheny, Michael E; Farcas, Claudiu; D'Arcy, Michel; Pearlman, Laura; Nookala, Lavanya; Day, Michele E; Kim, Katherine K; Kim, Hyeoneui; Boxwala, Aziz; El-Kareh, Robert; Kuo, Grace M; Resnic, Frederic S; Kesselman, Carl; Ohno-Machado, Lucila

    2015-11-01

    Centralized and federated models for sharing data in research networks currently exist. To build multivariate data analysis for centralized networks, transfer of patient-level data to a central computation resource is necessary. The authors implemented distributed multivariate models for federated networks in which patient-level data is kept at each site and data exchange policies are managed in a study-centric manner. The objective was to implement infrastructure that supports the functionality of some existing research networks (e.g., cohort discovery, workflow management, and estimation of multivariate analytic models on centralized data) while adding additional important new features, such as algorithms for distributed iterative multivariate models, a graphical interface for multivariate model specification, synchronous and asynchronous response to network queries, investigator-initiated studies, and study-based control of staff, protocols, and data sharing policies. Based on the requirements gathered from statisticians, administrators, and investigators from multiple institutions, the authors developed infrastructure and tools to support multisite comparative effectiveness studies using web services for multivariate statistical estimation in the SCANNER federated network. The authors implemented massively parallel (map-reduce) computation methods and a new policy management system to enable each study initiated by network participants to define the ways in which data may be processed, managed, queried, and shared. The authors illustrated the use of these systems among institutions with highly different policies and operating under different state laws. Federated research networks need not limit distributed query functionality to count queries, cohort discovery, or independently estimated analytic models. Multivariate analyses can be efficiently and securely conducted without patient-level data transport, allowing institutions with strict local data storage requirements to participate in sophisticated analyses based on federated research networks. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  14. An infrastructure for the integration of geoscience instruments and sensors on the Grid

    NASA Astrophysics Data System (ADS)

    Pugliese, R.; Prica, M.; Kourousias, G.; Del Linz, A.; Curri, A.

    2009-04-01

    The Grid, as a computing paradigm, has long been in the attention of both academia and industry[1]. The distributed and expandable nature of its general architecture result to scalability and more efficient utilisation of the computing infrastructures. The scientific community, including that of geosciences, often handles problems with very high requirements in data processing, transferring, and storing[2,3]. This has raised the interest on Grid technologies but these are often viewed solely as an access gateway to HPC. Suitable Grid infrastructures could provide the geoscience community with additional benefits like those of sharing, remote access and control of scientific systems. These systems can be scientific instruments, sensors, robots, cameras and any other device used in geosciences. The solution for practical, general, and feasible Grid-enabling of such devices requires non-intrusive extensions on core parts of the current Grid architecture. We propose an extended version of an architecture[4] that can serve as the solution to the problem. The solution we propose is called Grid Instrument Element (IE) [5]. It is an addition to the existing core Grid parts; the Computing Element (CE) and the Storage Element (SE) that serve the purposes that their name suggests. The IE that we will be referring to, and the related technologies have been developed in the EU project on the Deployment of Remote Instrumentation Infrastructure (DORII1). In DORII, partners of various scientific communities including those of Earthquake, Environmental science, and Experimental science, have adopted the technology of the Instrument Element in order to integrate to the Grid their devices. The Oceanographic and coastal observation and modelling Mediterranean Ocean Observing Network (OGS2), a DORII partner, is in the process of deploying the above mentioned Grid technologies on two types of observational modules: Argo profiling floats and a novel Autonomous Underwater Vehicle (AUV). In this paper i) we define the need for integration of instrumentation in the Grid, ii) we introduce the solution of the Instrument Element, iii) we demonstrate a suitable end-user web portal for accessing Grid resources, iv) we describe from the Grid-technological point of view the process of the integration to the Grid of two advanced environmental monitoring devices. References [1] M. Surridge, S. Taylor, D. De Roure, and E. Zaluska, "Experiences with GRIA—Industrial Applications on a Web Services Grid," e-Science and Grid Computing, First International Conference on e-Science and Grid Computing, 2005, pp. 98-105. [2] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, "The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets," Journal of Network and Computer Applications, vol. 23, 2000, pp. 187-200. [3] B. Allcock, J. Bester, J. Bresnahan, A.L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke, "Data management and transfer in high-performance computational grid environments," Parallel Computing, vol. 28, 2002, pp. 749-771. [4] E. Frizziero, M. Gulmini, F. Lelli, G. Maron, A. Oh, S. Orlando, A. Petrucci, S. Squizzato, and S. Traldi, "Instrument Element: A New Grid component that Enables the Control of Remote Instrumentation," Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)-Volume 00, IEEE Computer Society Washington, DC, USA, 2006. [5] R. Ranon, L. De Marco, A. Senerchia, S. Gabrielli, L. Chittaro, R. Pugliese, L. Del Cano, F. Asnicar, and M. Prica, "A Web-based Tool for Collaborative Access to Scientific Instruments in Cyberinfrastructures." 1 The DORII project is supported by the European Commission within the 7th Framework Programme (FP7/2007-2013) under grant agreement no. RI-213110. URL: http://www.dorii.eu 2 Istituto Nazionale di Oceanografia e di Geofisica Sperimentale. URL: http://www.ogs.trieste.it

  15. Galaxy CloudMan: delivering cloud compute clusters

    PubMed Central

    2010-01-01

    Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983

  16. Conditions for Ubiquitous Computing: What Can Be Learned from a Longitudinal Study

    ERIC Educational Resources Information Center

    Lei, Jing

    2010-01-01

    Based on survey data and interview data collected over four academic years, this longitudinal study examined how a ubiquitous computing project evolved along with the changes in teachers, students, the human infrastructure, and technology infrastructure in the school. This study also investigated what conditions were necessary for successful…

  17. A Security Monitoring Framework For Virtualization Based HEP Infrastructures

    NASA Astrophysics Data System (ADS)

    Gomez Ramirez, A.; Martinez Pedreira, M.; Grigoras, C.; Betev, L.; Lara, C.; Kebschull, U.; ALICE Collaboration

    2017-10-01

    High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware samples. This malware set was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.

  18. Estuarine wetland evolution including sea-level rise and infrastructure effects.

    NASA Astrophysics Data System (ADS)

    Rodriguez, Jose Fernando; Trivisonno, Franco; Rojas, Steven Sandi; Riccardi, Gerardo; Stenta, Hernan; Saco, Patricia Mabel

    2015-04-01

    Estuarine wetlands are an extremely valuable resource in terms of biotic diversity, flood attenuation, storm surge protection, groundwater recharge, filtering of surface flows and carbon sequestration. On a large scale the survival of these systems depends on the slope of the land and a balance between the rates of accretion and sea-level rise, but local man-made flow disturbances can have comparable effects. Climate change predictions for most of Australia include an accelerated sea level rise, which may challenge the survival of estuarine wetlands. Furthermore, coastal infrastructure poses an additional constraint on the adaptive capacity of these ecosystems. Numerical models are increasingly being used to assess wetland dynamics and to help manage some of these situations. We present results of a wetland evolution model that is based on computed values of hydroperiod and tidal range that drive vegetation preference. Our first application simulates the long term evolution of an Australian wetland heavily constricted by infrastructure that is undergoing the effects of predicted accelerated sea level rise. The wetland presents a vegetation zonation sequence mudflats - mangrove - saltmarsh from the seaward margin and up the topographic gradient but is also affected by compartmentalization due to internal road embankments and culverts that effectively attenuates tidal input to the upstream compartments. For this reason, the evolution model includes a 2D hydrodynamic module which is able to handle man-made flow controls and spatially varying roughness. It continually simulates tidal inputs into the wetland and computes annual values of hydroperiod and tidal range to update vegetation distribution based on preference to hydrodynamic conditions of the different vegetation types. It also computes soil accretion rates and updates roughness coefficient values according to evolving vegetation types. In order to explore in more detail the magnitude of flow attenuation due to roughness and its effects on the computation of tidal range and hydroperiod, we performed numerical experiments simulating floodplain flow on the side of a tidal creek using different roughness values. Even though the values of roughness that produce appreciable changes in hydroperiod and tidal range are relatively high, they are within the range expected for some of the wetland vegetation. Both applications of the model show that flow attenuation can play a major role in wetland hydrodynamics and that its effects must be considered when predicting wetland evolution under climate change scenarios, particularly in situations where existing infrastructure affects the flow.

  19. Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi

    As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less

  20. Federated data storage and management infrastructure

    NASA Astrophysics Data System (ADS)

    Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.

    2016-10-01

    The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.

  1. Air Pollution Monitoring and Mining Based on Sensor Grid in London

    PubMed Central

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-01-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a two-layer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm. PMID:27879895

  2. Enabling scientific teamwork

    NASA Astrophysics Data System (ADS)

    Hereld, Mark; Hudson, Randy; Norris, John; Papka, Michael E.; Uram, Thomas

    2009-07-01

    The Computer Supported Collaborative Work research community has identified that the technology used to support distributed teams of researchers, such as email, instant messaging, and conferencing environments, are not enough. Building from a list of areas where it is believed technology can help support distributed teams, we have divided our efforts into support of asynchronous and synchronous activities. This paper will describe two of our recent efforts to improve the productivity of distributed science teams. One effort focused on supporting the management and tracking of milestones and results, with the hope of helping manage information overload. The second effort focused on providing an environment that supports real-time analysis of data. Both of these efforts are seen as add-ons to the existing collaborative infrastructure, developed to enhance the experience of teams working at a distance by removing barriers to effective communication.

  3. Air Pollution Monitoring and Mining Based on Sensor Grid in London.

    PubMed

    Ma, Yajie; Richards, Mark; Ghanem, Moustafa; Guo, Yike; Hassard, John

    2008-06-01

    In this paper, we present a distributed infrastructure based on wireless sensors network and Grid computing technology for air pollution monitoring and mining, which aims to develop low-cost and ubiquitous sensor networks to collect real-time, large scale and comprehensive environmental data from road traffic emissions for air pollution monitoring in urban environment. The main informatics challenges in respect to constructing the high-throughput sensor Grid are discussed in this paper. We present a twolayer network framework, a P2P e-Science Grid architecture, and the distributed data mining algorithm as the solutions to address the challenges. We simulated the system in TinyOS to examine the operation of each sensor as well as the networking performance. We also present the distributed data mining result to examine the effectiveness of the algorithm.

  4. Advances in Grid Computing for the FabrIc for Frontier Experiments Project at Fermialb

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Herner, K.; Alba Hernandex, A. F.; Bhat, S.

    The FabrIc for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientic Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of diering size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certicate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have signicantly matured, and present an increasinglymore » complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the eorts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production work ows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular work ows, and support troubleshooting and triage in case of problems. Recently a new certicate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specic third-party Certicate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.« less

  5. Advances in Grid Computing for the Fabric for Frontier Experiments Project at Fermilab

    NASA Astrophysics Data System (ADS)

    Herner, K.; Alba Hernandez, A. F.; Bhat, S.; Box, D.; Boyd, J.; Di Benedetto, V.; Ding, P.; Dykstra, D.; Fattoruso, M.; Garzoglio, G.; Kirby, M.; Kreymer, A.; Levshina, T.; Mazzacane, A.; Mengel, M.; Mhashilkar, P.; Podstavkov, V.; Retzke, K.; Sharma, N.; Teheran, J.

    2017-10-01

    The Fabric for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientific Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of differing size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certificate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have significantly matured, and present an increasingly complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the efforts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production workflows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular workflows, and support troubleshooting and triage in case of problems. Recently a new certificate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specific third-party Certificate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.

  6. VMEbus based computer and real-time UNIX as infrastructure of DAQ

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yasu, Y.; Fujii, H.; Nomachi, M.

    1994-12-31

    This paper describes what the authors have constructed as the infrastructure of data acquisition system (DAQ). The paper reports recent developments concerned with HP VME board computer with LynxOS (HP742rt/HP-RT) and Alpha/OSF1 with VMEbus adapter. The paper also reports current status of developing a Benchmark Suite for Data Acquisition (DAQBENCH) for measuring not only the performance of VME/CAMAC access but also that of the context switching, the inter-process communications and so on, for various computers including Workstation-based systems and VME board computers.

  7. Development, deployment and operations of ATLAS databases

    NASA Astrophysics Data System (ADS)

    Vaniachine, A. V.; Schmitt, J. G. v. d.

    2008-07-01

    In preparation for ATLAS data taking, a coordinated shift from development towards operations has occurred in ATLAS database activities. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure. We describe development and commissioning of major ATLAS database applications for online and offline. We present the first scalability test results and ramp-up schedule over the initial LHC years of operations towards the nominal year of ATLAS running, when the database storage volumes are expected to reach 6.1 TB for the Tag DB and 1.0 TB for the Conditions DB. ATLAS database applications require robust operational infrastructure for data replication between online and offline at Tier-0, and for the distribution of the offline data to Tier-1 and Tier-2 computing centers. We describe ATLAS experience with Oracle Streams and other technologies for coordinated replication of databases in the framework of the WLCG 3D services.

  8. The BaBar Data Reconstruction Control System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ceseracciu, A

    2005-04-20

    The BaBar experiment is characterized by extremely high luminosity and very large volume of data produced and stored, with increasing computing requirements each year. To fulfill these requirements a Control System has been designed and developed for the offline distributed data reconstruction system. The control system described in this paper provides the performance and flexibility needed to manage a large number of small computing farms, and takes full benefit of OO design. The infrastructure is well isolated from the processing layer, it is generic and flexible, based on a light framework providing message passing and cooperative multitasking. The system ismore » distributed in a hierarchical way: the top-level system is organized in farms, farms in services, and services in subservices or code modules. It provides a powerful Finite State Machine framework to describe custom processing models in a simple regular language. This paper describes the design and evolution of this control system, currently in use at SLAC and Padova on {approx}450 CPUs organized in 9 farms.« less

  9. Approaches to a global quantum key distribution network

    NASA Astrophysics Data System (ADS)

    Islam, Tanvirul; Bedington, Robert; Ling, Alexander

    2017-10-01

    Progress in realising quantum computers threatens to weaken existing public key encryption infrastructure. A global quantum key distribution (QKD) network can play a role in computational attack-resistant encryption. Such a network could use a constellation of high altitude platforms such as airships and satellites as trusted nodes to facilitate QKD between any two points on the globe on demand. This requires both space-to-ground and inter-platform links. However, the prohibitive cost of traditional satellite based development limits the experimental work demonstrating relevant technologies. To accelerate progress towards a global network, we use an emerging class of shoe-box sized spacecraft known as CubeSats. We have designed a polarization entangled photon pair source that can operate on board CubeSats. The robustness and miniature form factor of our entanglement source makes it especially suitable for performing pathfinder missions that studies QKD between two high altitude platforms. The technological outcomes of such mission would be the essential building blocks for a global QKD network.

  10. The BaBar Data Reconstruction Control System

    NASA Astrophysics Data System (ADS)

    Ceseracciu, A.; Piemontese, M.; Tehrani, F. S.; Pulliam, T. M.; Galeazzi, F.

    2005-08-01

    The BaBar experiment is characterized by extremely high luminosity and very large volume of data produced and stored, with increasing computing requirements each year. To fulfill these requirements a control system has been designed and developed for the offline distributed data reconstruction system. The control system described in this paper provides the performance and flexibility needed to manage a large number of small computing farms, and takes full benefit of object oriented (OO) design. The infrastructure is well isolated from the processing layer, it is generic and flexible, based on a light framework providing message passing and cooperative multitasking. The system is distributed in a hierarchical way: the top-level system is organized in farms, farms in services, and services in subservices or code modules. It provides a powerful finite state machine framework to describe custom processing models in a simple regular language. This paper describes the design and evolution of this control system, currently in use at SLAC and Padova on /spl sim/450 CPUs organized in nine farms.

  11. Introducing high performance distributed logging service for ACS

    NASA Astrophysics Data System (ADS)

    Avarias, Jorge A.; López, Joao S.; Maureira, Cristián; Sommer, Heiko; Chiozzi, Gianluca

    2010-07-01

    The ALMA Common Software (ACS) is a software framework that provides the infrastructure for the Atacama Large Millimeter Array and other projects. ACS, based on CORBA, offers basic services and common design patterns for distributed software. Every properly built system needs to be able to log status and error information. Logging in a single computer scenario can be as easy as using fprintf statements. However, in a distributed system, it must provide a way to centralize all logging data in a single place without overloading the network nor complicating the applications. ACS provides a complete logging service infrastructure in which every log has an associated priority and timestamp, allowing filtering at different levels of the system (application, service and clients). Currently the ACS logging service uses an implementation of the CORBA Telecom Log Service in a customized way, using only a minimal subset of the features provided by the standard. The most relevant feature used by ACS is the ability to treat the logs as event data that gets distributed over the network in a publisher-subscriber paradigm. For this purpose the CORBA Notification Service, which is resource intensive, is used. On the other hand, the Data Distribution Service (DDS) provides an alternative standard for publisher-subscriber communication for real-time systems, offering better performance and featuring decentralized message processing. The current document describes how the new high performance logging service of ACS has been modeled and developed using DDS, replacing the Telecom Log Service. Benefits and drawbacks are analyzed. A benchmark is presented comparing the differences between the implementations.

  12. Technography and Design-Actuality Gap-Analysis of Internet Computer Technologies-Assisted Education: Western Expectations and Global Education

    ERIC Educational Resources Information Center

    Greenhalgh-Spencer, Heather; Jerbi, Moja

    2017-01-01

    In this paper, we provide a design-actuality gap-analysis of the internet infrastructure that exists in developing nations and nations in the global South with the deployed internet computer technologies (ICT)-assisted programs that are designed to use internet infrastructure to provide educational opportunities. Programs that specifically…

  13. Closing the Gap: Cybersecurity for U.S. Forces and Commands

    DTIC Science & Technology

    2017-03-30

    Dickson, Ph.D. Professor of Military Studies , JAWS Thesis Advisor Kevin Therrien, Col, USAF Committee Member Stephen Rogers, Colonel, USA Director...infrastructures, and includes the Internet, telecommunications networks, computer systems, and embedded processors and controllers in critical industries.”5...of information technology infrastructures, including the Internet, telecommunications networks, computer systems, and embedded processors and

  14. The National Information Infrastructure: Agenda for Action.

    ERIC Educational Resources Information Center

    Department of Commerce, Washington, DC. Information Infrastructure Task Force.

    The National Information Infrastructure (NII) is planned as a web of communications networks, computers, databases, and consumer electronics that will put vast amounts of information at the users' fingertips. Private sector firms are beginning to develop this infrastructure, but essential roles remain for the Federal Government. The National…

  15. The virtual machine (VM) scaler: an infrastructure manager supporting environmental modeling on IaaS clouds

    USDA-ARS?s Scientific Manuscript database

    Infrastructure-as-a-service (IaaS) clouds provide a new medium for deployment of environmental modeling applications. Harnessing advancements in virtualization, IaaS clouds can provide dynamic scalable infrastructure to better support scientific modeling computational demands. Providing scientific m...

  16. Helix Nebula: Enabling federation of existing data infrastructures and data services to an overarching cross-domain e-infrastructure

    NASA Astrophysics Data System (ADS)

    Lengert, Wolfgang; Farres, Jordi; Lanari, Riccardo; Casu, Francesco; Manunta, Michele; Lassalle-Balier, Gerard

    2014-05-01

    Helix Nebula has established a growing public private partnership of more than 30 commercial cloud providers, SMEs, and publicly funded research organisations and e-infrastructures. The Helix Nebula strategy is to establish a federated cloud service across Europe. Three high-profile flagships, sponsored by CERN (high energy physics), EMBL (life sciences) and ESA/DLR/CNES/CNR (earth science), have been deployed and extensively tested within this federated environment. The commitments behind these initial flagships have created a critical mass that attracts suppliers and users to the initiative, to work together towards an "Information as a Service" market place. Significant progress in implementing the following 4 programmatic goals (as outlined in the strategic Plan Ref.1) has been achieved: × Goal #1 Establish a Cloud Computing Infrastructure for the European Research Area (ERA) serving as a platform for innovation and evolution of the overall infrastructure. × Goal #2 Identify and adopt suitable policies for trust, security and privacy on a European-level can be provided by the European Cloud Computing framework and infrastructure. × Goal #3 Create a light-weight governance structure for the future European Cloud Computing Infrastructure that involves all the stakeholders and can evolve over time as the infrastructure, services and user-base grows. × Goal #4 Define a funding scheme involving the three stake-holder groups (service suppliers, users, EC and national funding agencies) into a Public-Private-Partnership model to implement a Cloud Computing Infrastructure that delivers a sustainable business environment adhering to European level policies. Now in 2014 a first version of this generic cross-domain e-infrastructure is ready to go into operations building on federation of European industry and contributors (data, tools, knowledge, ...). This presentation describes how Helix Nebula is being used in the domain of earth science focusing on geohazards. The so called "Supersite Exploitation Platform" (SSEP) provides scientists an overarching federated e-infrastructure with a very fast access to (i) large volume of data (EO/non-space data), (ii) computing resources (e.g. hybrid cloud/grid), (iii) processing software (e.g. toolboxes, RTMs, retrieval baselines, visualization routines), and (iv) general platform capabilities (e.g. user management and access control, accounting, information portal, collaborative tools, social networks etc.). In this federation each data provider remains in full control of the implementation of its data policy. This presentation outlines the Architecture (technical and services) supporting very heterogeneous science domains as well as the procedures for new-comers to join the Helix Nebula Market Place. Ref.1 http://cds.cern.ch/record/1374172/files/CERN-OPEN-2011-036.pdf

  17. Microelectromechanical Systems

    NASA Technical Reports Server (NTRS)

    Gabriel, Kaigham J.

    1995-01-01

    Micro-electromechanical systems (MEMS) is an enabling technology that merges computation and communication with sensing and actuation to change the way people and machines interact with the physical world. MEMS is a manufacturing technology that will impact widespread applications including: miniature inertial measurement measurement units for competent munitions and personal navigation; distributed unattended sensors; mass data storage devices; miniature analytical instruments; embedded pressure sensors; non-invasive biomedical sensors; fiber-optics components and networks; distributed aerodynamic control; and on-demand structural strength. The long term goal of ARPA's MEMS program is to merge information processing with sensing and actuation to realize new systems and strategies for both perceiving and controlling systems, processes, and the environment. The MEMS program has three major thrusts: advanced devices and processes, system design, and infrastructure.

  18. EUROPLANET-RI modelling service for the planetary science community: European Modelling and Data Analysis Facility (EMDAF)

    NASA Astrophysics Data System (ADS)

    Khodachenko, Maxim; Miller, Steven; Stoeckler, Robert; Topf, Florian

    2010-05-01

    Computational modeling and observational data analysis are two major aspects of the modern scientific research. Both appear nowadays under extensive development and application. Many of the scientific goals of planetary space missions require robust models of planetary objects and environments as well as efficient data analysis algorithms, to predict conditions for mission planning and to interpret the experimental data. Europe has great strength in these areas, but it is insufficiently coordinated; individual groups, models, techniques and algorithms need to be coupled and integrated. Existing level of scientific cooperation and the technical capabilities for operative communication, allow considerable progress in the development of a distributed international Research Infrastructure (RI) which is based on the existing in Europe computational modelling and data analysis centers, providing the scientific community with dedicated services in the fields of their computational and data analysis expertise. These services will appear as a product of the collaborative communication and joint research efforts of the numerical and data analysis experts together with planetary scientists. The major goal of the EUROPLANET-RI / EMDAF is to make computational models and data analysis algorithms associated with particular national RIs and teams, as well as their outputs, more readily available to their potential user community and more tailored to scientific user requirements, without compromising front-line specialized research on model and data analysis algorithms development and software implementation. This objective will be met through four keys subdivisions/tasks of EMAF: 1) an Interactive Catalogue of Planetary Models; 2) a Distributed Planetary Modelling Laboratory; 3) a Distributed Data Analysis Laboratory, and 4) enabling Models and Routines for High Performance Computing Grids. Using the advantages of the coordinated operation and efficient communication between the involved computational modelling, research and data analysis expert teams and their related research infrastructures, EMDAF will provide a 1) flexible, 2) scientific user oriented, 3) continuously developing and fast upgrading computational and data analysis service to support and intensify the European planetary scientific research. At the beginning EMDAF will create a set of demonstrators and operational tests of this service in key areas of European planetary science. This work will aim at the following objectives: (a) Development and implementation of tools for distant interactive communication between the planetary scientists and computing experts (including related RIs); (b) Development of standard routine packages, and user-friendly interfaces for operation of the existing numerical codes and data analysis algorithms by the specialized planetary scientists; (c) Development of a prototype of numerical modelling services "on demand" for space missions and planetary researchers; (d) Development of a prototype of data analysis services "on demand" for space missions and planetary researchers; (e) Development of a prototype of coordinated interconnected simulations of planetary phenomena and objects (global multi-model simulators); (f) Providing the demonstrators of a coordinated use of high performance computing facilities (super-computer networks), done in cooperation with European HPC Grid DEISA.

  19. Integrating Xgrid into the HENP distributed computing model

    NASA Astrophysics Data System (ADS)

    Hajdu, L.; Kocoloski, A.; Lauret, J.; Miller, M.

    2008-07-01

    Modern Macintosh computers feature Xgrid, a distributed computing architecture built directly into Apple's OS X operating system. While the approach is radically different from those generally expected by the Unix based Grid infrastructures (Open Science Grid, TeraGrid, EGEE), opportunistic computing on Xgrid is nonetheless a tempting and novel way to assemble a computing cluster with a minimum of additional configuration. In fact, it requires only the default operating system and authentication to a central controller from each node. OS X also implements arbitrarily extensible metadata, allowing an instantly updated file catalog to be stored as part of the filesystem itself. The low barrier to entry allows an Xgrid cluster to grow quickly and organically. This paper and presentation will detail the steps that can be taken to make such a cluster a viable resource for HENP research computing. We will further show how to provide to users a unified job submission framework by integrating Xgrid through the STAR Unified Meta-Scheduler (SUMS), making tasks and jobs submission effortlessly at reach for those users already using the tool for traditional Grid or local cluster job submission. We will discuss additional steps that can be taken to make an Xgrid cluster a full partner in grid computing initiatives, focusing on Open Science Grid integration. MIT's Xgrid system currently supports the work of multiple research groups in the Laboratory for Nuclear Science, and has become an important tool for generating simulations and conducting data analyses at the Massachusetts Institute of Technology.

  20. Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

    DOE PAGES

    Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; ...

    2011-01-01

    In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less

  1. Progress on the Fabric for Frontier Experiments Project at Fermilab

    NASA Astrophysics Data System (ADS)

    Box, Dennis; Boyd, Joseph; Dykstra, Dave; Garzoglio, Gabriele; Herner, Kenneth; Kirby, Michael; Kreymer, Arthur; Levshina, Tanya; Mhashilkar, Parag; Sharma, Neha

    2015-12-01

    The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.

  2. 7 CFR 1717.851 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... distribution system means any system of community infrastructure whose primary function is the distribution of... communication system means any system of community infrastructure whose primary function is the provision of... primary function is the supplying of water and/or the collection and treatment of waste water and whose...

  3. 7 CFR 1717.851 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... distribution system means any system of community infrastructure whose primary function is the distribution of... communication system means any system of community infrastructure whose primary function is the provision of... primary function is the supplying of water and/or the collection and treatment of waste water and whose...

  4. 7 CFR 1717.851 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... distribution system means any system of community infrastructure whose primary function is the distribution of... communication system means any system of community infrastructure whose primary function is the provision of... primary function is the supplying of water and/or the collection and treatment of waste water and whose...

  5. 7 CFR 1717.851 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... distribution system means any system of community infrastructure whose primary function is the distribution of... communication system means any system of community infrastructure whose primary function is the provision of... primary function is the supplying of water and/or the collection and treatment of waste water and whose...

  6. 7 CFR 1717.851 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... distribution system means any system of community infrastructure whose primary function is the distribution of... communication system means any system of community infrastructure whose primary function is the provision of... primary function is the supplying of water and/or the collection and treatment of waste water and whose...

  7. Transformation of OODT CAS to Perform Larger Tasks

    NASA Technical Reports Server (NTRS)

    Mattmann, Chris; Freeborn, Dana; Crichton, Daniel; Hughes, John; Ramirez, Paul; Hardman, Sean; Woollard, David; Kelly, Sean

    2008-01-01

    A computer program denoted OODT CAS has been transformed to enable performance of larger tasks that involve greatly increased data volumes and increasingly intensive processing of data on heterogeneous, geographically dispersed computers. Prior to the transformation, OODT CAS (also alternatively denoted, simply, 'CAS') [wherein 'OODT' signifies 'Object-Oriented Data Technology' and 'CAS' signifies 'Catalog and Archive Service'] was a proven software component used to manage scientific data from spaceflight missions. In the transformation, CAS was split into two separate components representing its canonical capabilities: file management and workflow management. In addition, CAS was augmented by addition of a resource-management component. This third component enables CAS to manage heterogeneous computing by use of diverse resources, including high-performance clusters of computers, commodity computing hardware, and grid computing infrastructures. CAS is now more easily maintainable, evolvable, and reusable. These components can be used separately or, taking advantage of synergies, can be used together. Other elements of the transformation included addition of a separate Web presentation layer that supports distribution of data products via Really Simple Syndication (RSS) feeds, and provision for full Resource Description Framework (RDF) exports of metadata.

  8. Spin-Off Successes of SETI Research at Berkeley

    NASA Astrophysics Data System (ADS)

    Douglas, K. A.; Anderson, D. P.; Bankay, R.; Chen, H.; Cobb, J.; Korpela, E. J.; Lebofsky, M.; Parsons, A.; von Korff, J.; Werthimer, D.

    2009-12-01

    Our group contributes to the Search for Extra-Terrestrial Intelligence (SETI) by developing and using world-class signal processing computers to analyze data collected on the Arecibo telescope. Although no patterned signal of extra-terrestrial origin has yet been detected, and the immediate prospects for making such a detection are highly uncertain, the SETI@home project has nonetheless proven the value of pursuing such research through its impact on the fields of distributed computing, real-time signal processing, and radio astronomy. The SETI@home project has spun off the Center for Astronomy Signal Processing and Electronics Research (CASPER) and the Berkeley Open Infrastructure for Networked Computing (BOINC), both of which are responsible for catalyzing a smorgasbord of new research in scientific disciplines in countries around the world. Futhermore, the data collected and archived for the SETI@home project is proving valuable in data-mining experiments for mapping neutral galatic hydrogen and for detecting black-hole evaporation.

  9. Security and Policy for Group Collaboration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ian Foster; Carl Kesselman

    2006-07-31

    “Security and Policy for Group Collaboration” was a Collaboratory Middleware research project aimed at providing the fundamental security and policy infrastructure required to support the creation and operation of distributed, computationally enabled collaborations. The project developed infrastructure that exploits innovative new techniques to address challenging issues of scale, dynamics, distribution, and role. To reduce greatly the cost of adding new members to a collaboration, we developed and evaluated new techniques for creating and managing credentials based on public key certificates, including support for online certificate generation, online certificate repositories, and support for multiple certificate authorities. To facilitate the integration ofmore » new resources into a collaboration, we improved significantly the integration of local security environments. To make it easy to create and change the role and associated privileges of both resources and participants of collaboration, we developed community wide authorization services that provide distributed, scalable means for specifying policy. These services make it possible for the delegation of capability from the community to a specific user, class of user or resource. Finally, we instantiated our research results into a framework that makes it useable to a wide range of collaborative tools. The resulting mechanisms and software have been widely adopted within DOE projects and in many other scientific projects. The widespread adoption of our Globus Toolkit technology has provided, and continues to provide, a natural dissemination and technology transfer vehicle for our results.« less

  10. Grid infrastructure for automatic processing of SAR data for flood applications

    NASA Astrophysics Data System (ADS)

    Kussul, Natalia; Skakun, Serhiy; Shelestov, Andrii

    2010-05-01

    More and more geosciences applications are being put on to the Grids. Due to the complexity of geosciences applications that is caused by complex workflow, the use of computationally intensive environmental models, the need of management and integration of heterogeneous data sets, Grid offers solutions to tackle these problems. Many geosciences applications, especially those related to the disaster management and mitigations require the geospatial services to be delivered in proper time. For example, information on flooded areas should be provided to corresponding organizations (local authorities, civil protection agencies, UN agencies etc.) no more than in 24 h to be able to effectively allocate resources required to mitigate the disaster. Therefore, providing infrastructure and services that will enable automatic generation of products based on the integration of heterogeneous data represents the tasks of great importance. In this paper we present Grid infrastructure for automatic processing of synthetic-aperture radar (SAR) satellite images to derive flood products. In particular, we use SAR data acquired by ESA's ENVSAT satellite, and neural networks to derive flood extent. The data are provided in operational mode from ESA rolling archive (within ESA Category-1 grant). We developed a portal that is based on OpenLayers frameworks and provides access point to the developed services. Through the portal the user can define geographical region and search for the required data. Upon selection of data sets a workflow is automatically generated and executed on the resources of Grid infrastructure. For workflow execution and management we use Karajan language. The workflow of SAR data processing consists of the following steps: image calibration, image orthorectification, image processing with neural networks, topographic effects removal, geocoding and transformation to lat/long projection, and visualisation. These steps are executed by different software, and can be executed by different resources of the Grid system. The resulting geospatial services are available in various OGC standards such as KML and WMS. Currently, the Grid infrastructure integrates the resources of several geographically distributed organizations, in particular: Space Research Institute NASU-NSAU (Ukraine) with deployed computational and storage nodes based on Globus Toolkit 4 (htpp://www.globus.org) and gLite 3 (http://glite.web.cern.ch) middleware, access to geospatial data and a Grid portal; Institute of Cybernetics of NASU (Ukraine) with deployed computational and storage nodes (SCIT-1/2/3 clusters) based on Globus Toolkit 4 middleware and access to computational resources (approximately 500 processors); Center of Earth Observation and Digital Earth Chinese Academy of Sciences (CEODE-CAS, China) with deployed computational nodes based on Globus Toolkit 4 middleware and access to geospatial data (approximately 16 processors). We are currently adding new geospatial services based on optical satellite data, namely MODIS. This work is carried out jointly with the CEODE-CAS. Using workflow patterns that were developed for SAR data processing we are building new workflows for optical data processing.

  11. Analysis of an algorithm for distributed recognition and accountability

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ko, C.; Frincke, D.A.; Goan, T. Jr.

    1993-08-01

    Computer and network systems are available to attacks. Abandoning the existing huge infrastructure of possibly-insecure computer and network systems is impossible, and replacing them by totally secure systems may not be feasible or cost effective. A common element in many attacks is that a single user will often attempt to intrude upon multiple resources throughout a network. Detecting the attack can become significantly easier by compiling and integrating evidence of such intrusion attempts across the network rather than attempting to assess the situation from the vantage point of only a single host. To solve this problem, we suggest an approachmore » for distributed recognition and accountability (DRA), which consists of algorithms which ``process,`` at a central location, distributed and asynchronous ``reports`` generated by computers (or a subset thereof) throughout the network. Our highest-priority objectives are to observe ways by which an individual moves around in a network of computers, including changing user names to possibly hide his/her true identity, and to associate all activities of multiple instance of the same individual to the same network-wide user. We present the DRA algorithm and a sketch of its proof under an initial set of simplifying albeit realistic assumptions. Later, we relax these assumptions to accommodate pragmatic aspects such as missing or delayed ``reports,`` clock slew, tampered ``reports,`` etc. We believe that such algorithms will have widespread applications in the future, particularly in intrusion-detection system.« less

  12. Geospatial-enabled Data Exploration and Computation through Data Infrastructure Building Blocks

    NASA Astrophysics Data System (ADS)

    Song, C. X.; Biehl, L. L.; Merwade, V.; Villoria, N.

    2015-12-01

    Geospatial data are present everywhere today with the proliferation of location-aware computing devices and sensors. This is especially true in the scientific community where large amounts of data are driving research and education activities in many domains. Collaboration over geospatial data, for example, in modeling, data analysis and visualization, must still overcome the barriers of specialized software and expertise among other challenges. The GABBs project aims at enabling broader access to geospatial data exploration and computation by developing spatial data infrastructure building blocks that leverage capabilities of end-to-end application service and virtualized computing framework in HUBzero. Funded by NSF Data Infrastructure Building Blocks (DIBBS) initiative, GABBs provides a geospatial data architecture that integrates spatial data management, mapping and visualization and will make it available as open source. The outcome of the project will enable users to rapidly create tools and share geospatial data and tools on the web for interactive exploration of data without requiring significant software development skills, GIS expertise or IT administrative privileges. This presentation will describe the development of geospatial data infrastructure building blocks and the scientific use cases that help drive the software development, as well as seek feedback from the user communities.

  13. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

    2014-06-01

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.

  14. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared overmore » the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.« less

  15. EUDAT: A New Cross-Disciplinary Data Infrastructure For Science

    NASA Astrophysics Data System (ADS)

    Lecarpentier, Damien; Michelini, Alberto; Wittenburg, Peter

    2013-04-01

    In recent years significant investments have been made by the European Commission and European member states to create a pan-European e-Infrastructure supporting multiple research communities. As a result, a European e-Infrastructure ecosystem is currently taking shape, with communication networks, distributed grids and HPC facilities providing European researchers from all fields with state-of-the-art instruments and services that support the deployment of new research facilities on a pan-European level. However, the accelerated proliferation of data - newly available from powerful new scientific instruments, simulations and the digitization of existing resources - has created a new impetus for increasing efforts and investments in order to tackle the specific challenges of data management, and to ensure a coherent approach to research data access and preservation. EUDAT is a pan-European initiative that started in October 2011 and which aims to help overcome these challenges by laying out the foundations of a Collaborative Data Infrastructure (CDI) in which centres offering community-specific support services to their users could rely on a set of common data services shared between different research communities. Although research communities from different disciplines have different ambitions and approaches - particularly with respect to data organization and content - they also share many basic service requirements. This commonality makes it possible for EUDAT to establish common data services, designed to support multiple research communities, as part of this CDI. During the first year, EUDAT has been reviewing the approaches and requirements of a first subset of communities from linguistics (CLARIN), solid earth sciences (EPOS), climate sciences (ENES), environmental sciences (LIFEWATCH), and biological and medical sciences (VPH), and shortlisted four generic services to be deployed as shared services on the EUDAT infrastructure. These services are data replication from site to site, data staging to compute facilities, metadata, and easy storage. A number of enabling services such as distributed authentication and authorization, persistent identifiers, hosting of services, workspaces and centre registry were also discussed. The services being designed in EUDAT will thus be of interest to a broad range of communities that lack their own robust data infrastructures, or that are simply looking for additional storage and/or computing capacities to better access, use, re-use, and preserve their data. The first pilots were completed in 2012 and a pre-production ready operational infrastructure, comprised of five sites (RZG, CINECA, SARA, CSC, FZJ), offering 480TB of online storage and 4PB of near-line (tape) storage, initially serving four user communities (ENES, EPOS, CLARIN, VPH) was established. These services shall be available to all communities in a production environment by 2014. Although EUDAT has initially focused on a subset of research communities, it aims to engage with other communities interested in adapting their solutions or contributing to the design of the infrastructure. Discussions with other research communities - belonging to the fields of environmental sciences, biomedical science, physics, social sciences and humanities - have already begun and are following a pattern similar to the one we adopted with the initial communities. The next step will consist of integrating representatives from these communities into the existing pilots and task forces so as to include them in the process of designing the services and, ultimately, shaping the future CDI.

  16. NHERI: Advancing the Research Infrastructure of the Multi-Hazard Community

    NASA Astrophysics Data System (ADS)

    Blain, C. A.; Ramirez, J. A.; Bobet, A.; Browning, J.; Edge, B.; Holmes, W.; Johnson, D.; Robertson, I.; Smith, T.; Zuo, D.

    2017-12-01

    The Natural Hazards Engineering Research Infrastructure (NHERI), supported by the National Science Foundation (NSF), is a distributed, multi-user national facility that provides the natural hazards research community with access to an advanced research infrastructure. Components of NHERI are comprised of a Network Coordination Office (NCO), a cloud-based cyberinfrastructure (DesignSafe-CI), a computational modeling and simulation center (SimCenter), and eight Experimental Facilities (EFs), including a post-disaster, rapid response research facility (RAPID). Utimately NHERI enables researchers to explore and test ground-breaking concepts to protect homes, businesses and infrastructure lifelines from earthquakes, windstorms, tsunamis, and surge enabling innovations to help prevent natural hazards from becoming societal disasters. When coupled with education and community outreach, NHERI will facilitate research and educational advances that contribute knowledge and innovation toward improving the resiliency of the nation's civil infrastructure to withstand natural hazards. The unique capabilities and coordinating activities over Year 1 between NHERI's DesignSafe-CI, the SimCenter, and individual EFs will be presented. Basic descriptions of each component are also found at https://www.designsafe-ci.org/facilities/. Additionally to be discussed are the various roles of the NCO in leading development of a 5-year multi-hazard science plan, coordinating facility scheduling and fostering the sharing of technical knowledge and best practices, leading education and outreach programs such as the recent Summer Institute and multi-facility REU program, ensuring a platform for technology transfer to practicing engineers, and developing strategic national and international partnerships to support a diverse multi-hazard research and user community.

  17. Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline

    PubMed Central

    Dinov, Ivo; Lozev, Kamen; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Pierce, Jonathan; Zamanyan, Alen; Chakrapani, Shruthi; Van Horn, John; Parker, D. Stott; Magsipoc, Rico; Leung, Kelvin; Gutman, Boris; Woods, Roger; Toga, Arthur

    2010-01-01

    Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges—management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu. PMID:20927408

  18. Challenges in Managing Trustworthy Large-scale Digital Science

    NASA Astrophysics Data System (ADS)

    Evans, B. J. K.

    2017-12-01

    The increased use of large-scale international digital science has opened a number of challenges for managing, handling, using and preserving scientific information. The large volumes of information are driven by three main categories - model outputs including coupled models and ensembles, data products that have been processing to a level of usability, and increasingly heuristically driven data analysis. These data products are increasingly the ones that are usable by the broad communities, and far in excess of the raw instruments data outputs. The data, software and workflows are then shared and replicated to allow broad use at an international scale, which places further demands of infrastructure to support how the information is managed reliably across distributed resources. Users necessarily rely on these underlying "black boxes" so that they are productive to produce new scientific outcomes. The software for these systems depend on computational infrastructure, software interconnected systems, and information capture systems. This ranges from the fundamentals of the reliability of the compute hardware, system software stacks and libraries, and the model software. Due to these complexities and capacity of the infrastructure, there is an increased emphasis of transparency of the approach and robustness of the methods over the full reproducibility. Furthermore, with large volume data management, it is increasingly difficult to store the historical versions of all model and derived data. Instead, the emphasis is on the ability to access the updated products and the reliability by which both previous outcomes are still relevant and can be updated for the new information. We will discuss these challenges and some of the approaches underway that are being used to address these issues.

  19. From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

    PubMed

    Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

    2012-01-01

    Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.

  20. FDA's Activities Supporting Regulatory Application of "Next Gen" Sequencing Technologies.

    PubMed

    Wilson, Carolyn A; Simonyan, Vahan

    2014-01-01

    Applications of next-generation sequencing (NGS) technologies require availability and access to an information technology (IT) infrastructure and bioinformatics tools for large amounts of data storage and analyses. The U.S. Food and Drug Administration (FDA) anticipates that the use of NGS data to support regulatory submissions will continue to increase as the scientific and clinical communities become more familiar with the technologies and identify more ways to apply these advanced methods to support development and evaluation of new biomedical products. FDA laboratories are conducting research on different NGS platforms and developing the IT infrastructure and bioinformatics tools needed to enable regulatory evaluation of the technologies and the data sponsors will submit. A High-performance Integrated Virtual Environment, or HIVE, has been launched, and development and refinement continues as a collaborative effort between the FDA and George Washington University to provide the tools to support these needs. The use of a highly parallelized environment facilitated by use of distributed cloud storage and computation has resulted in a platform that is both rapid and responsive to changing scientific needs. The FDA plans to further develop in-house capacity in this area, while also supporting engagement by the external community, by sponsoring an open, public workshop to discuss NGS technologies and data formats standardization, and to promote the adoption of interoperability protocols in September 2014. Next-generation sequencing (NGS) technologies are enabling breakthroughs in how the biomedical community is developing and evaluating medical products. One example is the potential application of this method to the detection and identification of microbial contaminants in biologic products. In order for the U.S. Food and Drug Administration (FDA) to be able to evaluate the utility of this technology, we need to have the information technology infrastructure and bioinformatics tools to be able to store and analyze large amounts of data. To address this need, we have developed the High-performance Integrated Virtual Environment, or HIVE. HIVE uses a combination of distributed cloud storage and distributed cloud computations to provide a platform that is both rapid and responsive to support the growing and increasingly diverse scientific and regulatory needs of FDA scientists in their evaluation of NGS in research and ultimately for evaluation of NGS data in regulatory submissions. © PDA, Inc. 2014.

  1. Developing an Index to Measure Health System Performance: Measurement for Districts of Nepal.

    PubMed

    Kandel, N; Fric, A; Lamichhane, J

    2014-01-01

    Various frameworks for measuring health system performance have been proposed and discussed. The scope of using performance indicators are broad, ranging from examining national health system to individual patients at various levels of health system. Development of innovative and easy index is essential to measure multidimensionality of health systems. We used indicators, which also serve as proxy to the set of activities, whose primary goal is to maintain and improve health. We used eleven indicators of MDGs, which represent all dimensions of health to develop index. These indicators are computed with similar methodology that of human development index. We used published data of Nepal for computation of the index for districts of Nepal as an illustration. To validate our finding, we compared the indices of these districts with other development indices of Nepal. An index for each district has been computed from eleven indicators. Then indices are compared with that of human development index, socio-economic and infrastructure development indices and findings has shown the similarity on distribution of districts. Categories of low and high performing districts on health system performance are also having low and high human development, socio-economic, and infrastructure indices respectively. This methodology of computing index from various indicators could assist policy makers and program managers to prioritize activities based on their performance. Validation of the findings with that of other development indicators show that this can be one of the tools, which can assist on assessing health system performance for policy makers, program managers and others.

  2. Computational simulation of concurrent engineering for aerospace propulsion systems

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Singhal, S. N.

    1992-01-01

    Results are summarized of an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulations methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties - fundamental in developing such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering for propulsion systems and systems in general. Benefits and facets needing early attention in the development are outlined.

  3. Computational simulation for concurrent engineering of aerospace propulsion systems

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Singhal, S. N.

    1993-01-01

    Results are summarized for an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulation methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties--fundamental to develop such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering of propulsion systems and systems in general. Benefits and issues needing early attention in the development are outlined.

  4. Computational simulation for concurrent engineering of aerospace propulsion systems

    NASA Astrophysics Data System (ADS)

    Chamis, C. C.; Singhal, S. N.

    1993-02-01

    Results are summarized for an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulation methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties--fundamental to develop such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering of propulsion systems and systems in general. Benefits and issues needing early attention in the development are outlined.

  5. The OSG open facility: A sharing ecosystem

    DOE PAGES

    Jayatilaka, B.; Levshina, T.; Rynge, M.; ...

    2015-12-23

    The Open Science Grid (OSG) ties together individual experiments’ computing power, connecting their resources to create a large, robust computing grid, this computing infrastructure started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero. In the years since, the OSG has broadened its focus to also address the needs of other US researchers and increased delivery of Distributed High Through-put Computing (DHTC) to users from a wide variety of disciplines via the OSG Open Facility. Presently, the Open Facility delivers about 100 million computing wall hours per year to researchers whomore » are not already associated with the owners of the computing sites, this is primarily accomplished by harvesting and organizing the temporarily unused capacity (i.e. opportunistic cycles) from the sites in the OSG. Using these methods, OSG resource providers and scientists share computing hours with researchers in many other fields to enable their science, striving to make sure that these computing power used with maximal efficiency. Furthermore, we believe that expanded access to DHTC is an essential tool for scientific innovation and work continues in expanding this service.« less

  6. Computational Science in Armenia (Invited Talk)

    NASA Astrophysics Data System (ADS)

    Marandjian, H.; Shoukourian, Yu.

    This survey is devoted to the development of informatics and computer science in Armenia. The results in theoretical computer science (algebraic models, solutions to systems of general form recursive equations, the methods of coding theory, pattern recognition and image processing), constitute the theoretical basis for developing problem-solving-oriented environments. As examples can be mentioned: a synthesizer of optimized distributed recursive programs, software tools for cluster-oriented implementations of two-dimensional cellular automata, a grid-aware web interface with advanced service trading for linear algebra calculations. In the direction of solving scientific problems that require high-performance computing resources, examples of completed projects include the field of physics (parallel computing of complex quantum systems), astrophysics (Armenian virtual laboratory), biology (molecular dynamics study of human red blood cell membrane), meteorology (implementing and evaluating the Weather Research and Forecast Model for the territory of Armenia). The overview also notes that the Institute for Informatics and Automation Problems of the National Academy of Sciences of Armenia has established a scientific and educational infrastructure, uniting computing clusters of scientific and educational institutions of the country and provides the scientific community with access to local and international computational resources, that is a strong support for computational science in Armenia.

  7. GSKY: A scalable distributed geospatial data server on the cloud

    NASA Astrophysics Data System (ADS)

    Rozas Larraondo, Pablo; Pringle, Sean; Antony, Joseph; Evans, Ben

    2017-04-01

    Earth systems, environmental and geophysical datasets are an extremely valuable sources of information about the state and evolution of the Earth. Being able to combine information coming from different geospatial collections is in increasing demand by the scientific community, and requires managing and manipulating data with different formats and performing operations such as map reprojections, resampling and other transformations. Due to the large data volume inherent in these collections, storing multiple copies of them is unfeasible and so such data manipulation must be performed on-the-fly using efficient, high performance techniques. Ideally this should be performed using a trusted data service and common system libraries to ensure wide use and reproducibility. Recent developments in distributed computing based on dynamic access to significant cloud infrastructure opens the door for such new ways of processing geospatial data on demand. The National Computational Infrastructure (NCI), hosted at the Australian National University (ANU), has over 10 Petabytes of nationally significant research data collections. Some of these collections, which comprise a variety of observed and modelled geospatial data, are now made available via a highly distributed geospatial data server, called GSKY (pronounced [jee-skee]). GSKY supports on demand processing of large geospatial data products such as satellite earth observation data as well as numerical weather products, allowing interactive exploration and analysis of the data. It dynamically and efficiently distributes the required computations among cloud nodes providing a scalable analysis framework that can adapt to serve large number of concurrent users. Typical geospatial workflows handling different file formats and data types, or blending data in different coordinate projections and spatio-temporal resolutions, is handled transparently by GSKY. This is achieved by decoupling the data ingestion and indexing process as an independent service. An indexing service crawls data collections either locally or remotely by extracting, storing and indexing all spatio-temporal metadata associated with each individual record. GSKY provides the user with the ability of specifying how ingested data should be aggregated, transformed and presented. It presents an OGC standards-compliant interface, allowing ready accessibility for users of the data via Web Map Services (WMS), Web Processing Services (WPS) or raw data arrays using Web Coverage Services (WCS). The presentation will show some cases where we have used this new capability to provide a significant improvement over previous approaches.

  8. iTools: a framework for classification, categorization and integration of computational biology resources.

    PubMed

    Dinov, Ivo D; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H V; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D Stott; Toga, Arthur W

    2008-05-28

    The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.

  9. iTools: A Framework for Classification, Categorization and Integration of Computational Biology Resources

    PubMed Central

    Dinov, Ivo D.; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H. V.; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D. Stott; Toga, Arthur W.

    2008-01-01

    The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu. PMID:18509477

  10. Reframing the Dissemination Challenge: A Marketing and Distribution Perspective

    PubMed Central

    Bernhardt, Jay M.

    2009-01-01

    A fundamental obstacle to successful dissemination and implementation of evidence-based public health programs is the near-total absence of systems and infrastructure for marketing and distribution. We describe the functions of a marketing and distribution system, and we explain how it would help move effective public health programs from research to practice. Then we critically evaluate the 4 dominant strategies now used to promote dissemination and implementation, and we explain how each would be enhanced by marketing and distribution systems. Finally, we make 6 recommendations for building the needed system infrastructure and discuss the responsibility within the public health community for implementation of these recommendations. Without serious investment in such infrastructure, application of proven solutions in public health practice will continue to occur slowly and rarely. PMID:19833993

  11. Reframing the dissemination challenge: a marketing and distribution perspective.

    PubMed

    Kreuter, Matthew W; Bernhardt, Jay M

    2009-12-01

    A fundamental obstacle to successful dissemination and implementation of evidence-based public health programs is the near-total absence of systems and infrastructure for marketing and distribution. We describe the functions of a marketing and distribution system, and we explain how it would help move effective public health programs from research to practice. Then we critically evaluate the 4 dominant strategies now used to promote dissemination and implementation, and we explain how each would be enhanced by marketing and distribution systems. Finally, we make 6 recommendations for building the needed system infrastructure and discuss the responsibility within the public health community for implementation of these recommendations. Without serious investment in such infrastructure, application of proven solutions in public health practice will continue to occur slowly and rarely.

  12. Characterization of attacks on public telephone networks

    NASA Astrophysics Data System (ADS)

    Lorenz, Gary V.; Manes, Gavin W.; Hale, John C.; Marks, Donald; Davis, Kenneth; Shenoi, Sujeet

    2001-02-01

    The U.S. Public Telephone Network (PTN) is a massively connected distributed information systems, much like the Internet. PTN signaling, transmission and operations functions must be protected from physical and cyber attacks to ensure the reliable delivery of telecommunications services. The increasing convergence of PTNs with wireless communications systems, computer networks and the Internet itself poses serious threats to our nation's telecommunications infrastructure. Legacy technologies and advanced services encumber well-known and as of yet undiscovered vulnerabilities that render them susceptible to cyber attacks. This paper presents a taxonomy of cyber attacks on PTNs in converged environments that synthesizes exploits in computer and communications network domains. The taxonomy provides an opportunity for the systematic exploration of mitigative and preventive strategies, as well as for the identification and classification of emerging threats.

  13. Enabling Data Intensive Science through Service Oriented Science: Virtual Laboratories and Science Gateways

    NASA Astrophysics Data System (ADS)

    Lescinsky, D. T.; Wyborn, L. A.; Evans, B. J. K.; Allen, C.; Fraser, R.; Rankine, T.

    2014-12-01

    We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia's National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI's 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud. The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes. We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.

  14. PanDA: Exascale Federation of Resources for the ATLAS Experiment at the LHC

    NASA Astrophysics Data System (ADS)

    Barreiro Megino, Fernando; Caballero Bejar, Jose; De, Kaushik; Hover, John; Klimentov, Alexei; Maeno, Tadashi; Nilsson, Paul; Oleynik, Danila; Padolski, Siarhei; Panitkin, Sergey; Petrosyan, Artem; Wenaus, Torre

    2016-02-01

    After a scheduled maintenance and upgrade period, the world's largest and most powerful machine - the Large Hadron Collider(LHC) - is about to enter its second run at unprecedented energies. In order to exploit the scientific potential of the machine, the experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousand of physics users and compared to simulated data. Given diverse funding constraints, the computational resources for the LHC have been deployed in a worldwide mesh of data centres, connected to each other through Grid technologies. The PanDA (Production and Distributed Analysis) system was developed in 2005 for the ATLAS experiment on top of this heterogeneous infrastructure to seamlessly integrate the computational resources and give the users the feeling of a unique system. Since its origins, PanDA has evolved together with upcoming computing paradigms in and outside HEP, such as changes in the networking model, Cloud Computing and HPC. It is currently running steadily up to 200 thousand simultaneous cores (limited by the available resources for ATLAS), up to two million aggregated jobs per day and processes over an exabyte of data per year. The success of PanDA in ATLAS is triggering the widespread adoption and testing by other experiments. In this contribution we will give an overview of the PanDA components and focus on the new features and upcoming challenges that are relevant to the next decade of distributed computing workload management using PanDA.

  15. Data distribution service-based interoperability framework for smart grid testbed infrastructure

    DOE PAGES

    Youssef, Tarek A.; Elsayed, Ahmed T.; Mohammed, Osama A.

    2016-03-02

    This study presents the design and implementation of a communication and control infrastructure for smart grid operation. The proposed infrastructure enhances the reliability of the measurements and control network. The advantages of utilizing the data-centric over message-centric communication approach are discussed in the context of smart grid applications. The data distribution service (DDS) is used to implement a data-centric common data bus for the smart grid. This common data bus improves the communication reliability, enabling distributed control and smart load management. These enhancements are achieved by avoiding a single point of failure while enabling peer-to-peer communication and an automatic discoverymore » feature for dynamic participating nodes. The infrastructure and ideas presented in this paper were implemented and tested on the smart grid testbed. A toolbox and application programing interface for the testbed infrastructure are developed in order to facilitate interoperability and remote access to the testbed. This interface allows control, monitoring, and performing of experiments remotely. Furthermore, it could be used to integrate multidisciplinary testbeds to study complex cyber-physical systems (CPS).« less

  16. Cloud computing applications for biomedical science: A perspective.

    PubMed

    Navale, Vivek; Bourne, Philip E

    2018-06-01

    Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.

  17. Cloud computing applications for biomedical science: A perspective

    PubMed Central

    2018-01-01

    Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176

  18. A system for distributed intrusion detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Snapp, S.R.; Brentano, J.; Dias, G.V.

    1991-01-01

    The study of providing security in computer networks is a rapidly growing area of interest because the network is the medium over which most attacks or intrusions on computer systems are launched. One approach to solving this problem is the intrusion-detection concept, whose basic premise is that not only abandoning the existing and huge infrastructure of possibly-insecure computer and network systems is impossible, but also replacing them by totally-secure systems may not be feasible or cost effective. Previous work on intrusion-detection systems were performed on stand-alone hosts and on a broadcast local area network (LAN) environment. The focus of ourmore » present research is to extend our network intrusion-detection concept from the LAN environment to arbitarily wider areas with the network topology being arbitrary as well. The generalized distributed environment is heterogeneous, i.e., the network nodes can be hosts or servers from different vendors, or some of them could be LAN managers, like our previous work, a network security monitor (NSM), as well. The proposed architecture for this distributed intrusion-detection system consists of the following components: a host manager in each host; a LAN manager for monitoring each LAN in the system; and a central manager which is placed at a single secure location and which receives reports from various host and LAN managers to process these reports, correlate them, and detect intrusions. 11 refs., 2 figs.« less

  19. Corridor One:An Integrated Distance Visualization Enuronments for SSI+ASCI Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Christopher R. Johnson, Charles D. Hansen

    2001-10-29

    The goal of Corridor One: An Integrated Distance Visualization Environment for ASCI and SSI Application was to combine the forces of six leading edge laboratories working in the areas of visualization and distributed computing and high performance networking (Argonne National Laboratory, Lawrence Berkeley National Laboratory, Los Alamos National Laboratory, University of Illinois, University of Utah and Princeton University) to develop and deploy the most advanced integrated distance visualization environment for large-scale scientific visualization and demonstrate it on applications relevant to the DOE SSI and ASCI programs. The Corridor One team brought world class expertise in parallel rendering, deep image basedmore » rendering, immersive environment technology, large-format multi-projector wall based displays, volume and surface visualization algorithms, collaboration tools and streaming media technology, network protocols for image transmission, high-performance networking, quality of service technology and distributed computing middleware. Our strategy was to build on the very successful teams that produced the I-WAY, ''Computational Grids'' and CAVE technology and to add these to the teams that have developed the fastest parallel visualizations systems and the most widely used networking infrastructure for multicast and distributed media. Unfortunately, just as we were getting going on the Corridor One project, DOE cut the program after the first year. As such, our final report consists of our progress during year one of the grant.« less

  20. Local storage federation through XRootD architecture for interactive distributed analysis

    NASA Astrophysics Data System (ADS)

    Colamaria, F.; Colella, D.; Donvito, G.; Elia, D.; Franco, A.; Luparello, G.; Maggi, G.; Miniello, G.; Vallero, S.; Vino, G.

    2015-12-01

    A cloud-based Virtual Analysis Facility (VAF) for the ALICE experiment at the LHC has been deployed in Bari. Similar facilities are currently running in other Italian sites with the aim to create a federation of interoperating farms able to provide their computing resources for interactive distributed analysis. The use of cloud technology, along with elastic provisioning of computing resources as an alternative to the grid for running data intensive analyses, is the main challenge of these facilities. One of the crucial aspects of the user-driven analysis execution is the data access. A local storage facility has the disadvantage that the stored data can be accessed only locally, i.e. from within the single VAF. To overcome such a limitation a federated infrastructure, which provides full access to all the data belonging to the federation independently from the site where they are stored, has been set up. The federation architecture exploits both cloud computing and XRootD technologies, in order to provide a dynamic, easy-to-use and well performing solution for data handling. It should allow the users to store the files and efficiently retrieve the data, since it implements a dynamic distributed cache among many datacenters in Italy connected to one another through the high-bandwidth national network. Details on the preliminary architecture implementation and performance studies are discussed.

  1. Network-based reading system for lung cancer screening CT

    NASA Astrophysics Data System (ADS)

    Fujino, Yuichi; Fujimura, Kaori; Nomura, Shin-ichiro; Kawashima, Harumi; Tsuchikawa, Megumu; Matsumoto, Toru; Nagao, Kei-ichi; Uruma, Takahiro; Yamamoto, Shinji; Takizawa, Hotaka; Kuroda, Chikazumi; Nakayama, Tomio

    2006-03-01

    This research aims to support chest computed tomography (CT) medical checkups to decrease the death rate by lung cancer. We have developed a remote cooperative reading system for lung cancer screening over the Internet, a secure transmission function, and a cooperative reading environment. It is called the Network-based Reading System. A telemedicine system involves many issues, such as network costs and data security if we use it over the Internet, which is an open network. In Japan, broadband access is widespread and its cost is the lowest in the world. We developed our system considering human machine interface and security. It consists of data entry terminals, a database server, a computer aided diagnosis (CAD) system, and some reading terminals. It uses a secure Digital Imaging and Communication in Medicine (DICOM) encrypting method and Public Key Infrastructure (PKI) based secure DICOM image data distribution. We carried out an experimental trial over the Japan Gigabit Network (JGN), which is the testbed for the Japanese next-generation network, and conducted verification experiments of secure screening image distribution, some kinds of data addition, and remote cooperative reading. We found that network bandwidth of about 1.5 Mbps enabled distribution of screening images and cooperative reading and that the encryption and image distribution methods we proposed were applicable to the encryption and distribution of general DICOM images via the Internet.

  2. National Computational Infrastructure for Lattice Gauge Theory SciDAC-2 Closeout Report Indiana University Component

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gottlieb, Steven Arthur; DeTar, Carleton; Tousaint, Doug

    This is the closeout report for the Indiana University portion of the National Computational Infrastructure for Lattice Gauge Theory project supported by the United States Department of Energy under the SciDAC program. It includes information about activities at Indian University, the University of Arizona, and the University of Utah, as those three universities coordinated their activities.

  3. Deploying Crowd-Sourced Formal Verification Systems in a DoD Network

    DTIC Science & Technology

    2013-09-01

    INTENTIONALLY LEFT BLANK 1 I. INTRODUCTION A. INTRODUCTION In 2014 cyber attacks on critical infrastructure are expected to increase...CSFV systems on the Internet‒‒possibly using cloud infrastructure (Dean, 2013). By using Amazon Compute Cloud (EC2) systems, DARPA will use ordinary...through standard access methods. Those clients could be mobile phones, laptops, netbooks, tablet computers or personal digital assistants (PDAs) (Smoot

  4. Distributed data networks: a blueprint for Big Data sharing and healthcare analytics.

    PubMed

    Popovic, Jennifer R

    2017-01-01

    This paper defines the attributes of distributed data networks and outlines the data and analytic infrastructure needed to build and maintain a successful network. We use examples from one successful implementation of a large-scale, multisite, healthcare-related distributed data network, the U.S. Food and Drug Administration-sponsored Sentinel Initiative. Analytic infrastructure-development concepts are discussed from the perspective of promoting six pillars of analytic infrastructure: consistency, reusability, flexibility, scalability, transparency, and reproducibility. This paper also introduces one use case for machine learning algorithm development to fully utilize and advance the portfolio of population health analytics, particularly those using multisite administrative data sources. © 2016 New York Academy of Sciences.

  5. Infrastructure and distributed learning methodology for privacy-preserving multi-centric rapid learning health care: euroCAT.

    PubMed

    Deist, Timo M; Jochems, A; van Soest, Johan; Nalbantov, Georgi; Oberije, Cary; Walsh, Seán; Eble, Michael; Bulens, Paul; Coucke, Philippe; Dries, Wim; Dekker, Andre; Lambin, Philippe

    2017-06-01

    Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible. We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future 'big data' infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade [Formula: see text]. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The performance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine.

  6. MicROS-drt: supporting real-time and scalable data distribution in distributed robotic systems.

    PubMed

    Ding, Bo; Wang, Huaimin; Fan, Zedong; Zhang, Pengfei; Liu, Hui

    A primary requirement in distributed robotic software systems is the dissemination of data to all interested collaborative entities in a timely and scalable manner. However, providing such a service in a highly dynamic and resource-limited robotic environment is a challenging task, and existing robot software infrastructure has limitations in this aspect. This paper presents a novel robot software infrastructure, micROS-drt, which supports real-time and scalable data distribution. The solution is based on a loosely coupled data publish-subscribe model with the ability to support various time-related constraints. And to realize this model, a mature data distribution standard, the data distribution service for real-time systems (DDS), is adopted as the foundation of the transport layer of this software infrastructure. By elaborately adapting and encapsulating the capability of the underlying DDS middleware, micROS-drt can meet the requirement of real-time and scalable data distribution in distributed robotic systems. Evaluation results in terms of scalability, latency jitter and transport priority as well as the experiment on real robots validate the effectiveness of this work.

  7. Acquisition system for the "EMSO Generic Instrument Module" (EGIM) and analysis of the data obtained during its first deployment at OBSEA site (Spain)

    NASA Astrophysics Data System (ADS)

    Garcia, Oscar; Mihai Toma, Daniel; Dañobeitia, Juanjo; del Rio, Joaquin; Bartolome, Rafael; Martínez, Enoc; Nogueras, Marc; Bghiel, Ikram; Lanteri, Nadine; Rolin, Jean Francois; Beranzoli, Laura; Favali, Paolo

    2017-04-01

    The EMSODEV project (EMSO implementation and operation: DEVelopment of instrument module) is an Horizon-2020 UE project whose overall objective is the operation of eleven seafloor observatories and four test sites. These infrastructures are distributed throughout European seas, from the Arctic across the Atlantic and the Mediterranean to the Black Sea, and are managed by the European consortium EMSO-ERIC (European Research Infrastructure Consortium) with the participation of 8 European countries and other associated partners. Recently, we have implemented a Generic Sensor Module (EGIM) within the EMSO-ERIC distributed marine research infrastructure. EGIM is able to operate on any EMSO observatory node, mooring line, seabed station, cabled or non-cabled and surface buoy. The main role of EGIM is to measure homogeneously a set of core variables using the same hardware, sensor references, qualification methods, calibration methods, data format and access, maintenance procedures in several European ocean locations. The EGIM module acquires a wide range of ocean parameters in a long-term consistent, accurate and comparable manner from disciplines such as biology, geology, chemistry, physics, engineering, and computer science, from polar to subtropical environments, through the water column down to the deep sea. Our work includes developing standard-compliant generic software for Sensor Web Enablement (SWE) on EGIM and to perform the first onshore and offshore test bench, to support the sensors data acquisition on a new interoperable EGIM system. EGIM in its turn is linked to an acquisition drives processes, a centralized Sensor Observation Service (SOS) server and a laboratory monitor system (LabMonitor) that records events and alarms during acquisition. The measurements recorded along EMSO NODES are essential to accurately respond to the social and scientific challenges such as climate change, changes in marine ecosystems, and marine hazards. This presentation shows the first EGIM deployment and the SWE infrastructure, developed to manage the data acquisition from the underwater sensors and their insertion to the SOS interface.

  8. The Distributed Space Exploration Simulation (DSES)

    NASA Technical Reports Server (NTRS)

    Crues, Edwin Z.; Chung, Victoria I.; Blum, Mike G.; Bowman, James D.

    2007-01-01

    The paper describes the Distributed Space Exploration Simulation (DSES) Project, a research and development collaboration between NASA centers which focuses on the investigation and development of technologies, processes and integrated simulations related to the collaborative distributed simulation of complex space systems in support of NASA's Exploration Initiative. This paper describes the three major components of DSES: network infrastructure, software infrastructure and simulation development. In the network work area, DSES is developing a Distributed Simulation Network that will provide agency wide support for distributed simulation between all NASA centers. In the software work area, DSES is developing a collection of software models, tool and procedures that ease the burden of developing distributed simulations and provides a consistent interoperability infrastructure for agency wide participation in integrated simulation. Finally, for simulation development, DSES is developing an integrated end-to-end simulation capability to support NASA development of new exploration spacecraft and missions. This paper will present current status and plans for each of these work areas with specific examples of simulations that support NASA's exploration initiatives.

  9. Scaling the CERN OpenStack cloud

    NASA Astrophysics Data System (ADS)

    Bell, T.; Bompastor, B.; Bukowiec, S.; Castro Leon, J.; Denis, M. K.; van Eldik, J.; Fermin Lobo, M.; Fernandez Alvarez, L.; Fernandez Rodriguez, D.; Marino, A.; Moreira, B.; Noel, B.; Oulevey, T.; Takase, W.; Wiebalck, A.; Zilli, S.

    2015-12-01

    CERN has been running a production OpenStack cloud since July 2013 to support physics computing and infrastructure services for the site. In the past year, CERN Cloud Infrastructure has seen a constant increase in nodes, virtual machines, users and projects. This paper will present what has been done in order to make the CERN cloud infrastructure scale out.

  10. Science of Security Lablet - Scalability and Usability

    DTIC Science & Technology

    2014-12-16

    mobile computing [19]. However, the high-level infrastructure design and our own implementation (both described throughout this paper) can easily...critical and infrastructural systems demands high levels of sophistication in the technical aspects of cybersecurity, software and hardware design...Forget, S. Komanduri, Alessandro Acquisti, Nicolas Christin, Lorrie Cranor, Rahul Telang. "Security Behavior Observatory: Infrastructure for Long-term

  11. Performance measurement and modeling of component applications in a high performance computing environment : a case study.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Armstrong, Robert C.; Ray, Jaideep; Malony, A.

    2003-11-01

    We present a case study of performance measurement and modeling of a CCA (Common Component Architecture) component-based application in a high performance computing environment. We explore issues peculiar to component-based HPC applications and propose a performance measurement infrastructure for HPC based loosely on recent work done for Grid environments. A prototypical implementation of the infrastructure is used to collect data for a three components in a scientific application and construct performance models for two of them. Both computational and message-passing performance are addressed.

  12. Role of Computational Fluid Dynamics and Wind Tunnels in Aeronautics R and D

    NASA Technical Reports Server (NTRS)

    Malik, Murjeeb R.; Bushnell, Dennis M.

    2012-01-01

    The purpose of this report is to investigate the status and future projections for the question of supplantation of wind tunnels by computation in design and to intuit the potential impact of computation approaches on wind-tunnel utilization all with an eye toward reducing the infrastructure cost at aeronautics R&D centers. Wind tunnels have been closing for myriad reasons, and such closings have reduced infrastructure costs. Further cost reductions are desired, and the work herein attempts to project which wind-tunnel capabilities can be replaced in the future and, if possible, the timing of such. If the possibility exists to project when a facility could be closed, then maintenance and other associated costs could be rescheduled accordingly (i.e., before the fact) to obtain an even greater infrastructure cost reduction.

  13. Evolution of a Materials Data Infrastructure

    NASA Astrophysics Data System (ADS)

    Warren, James A.; Ward, Charles H.

    2018-06-01

    The field of materials science and engineering is writing a new chapter in its evolution, one of digitally empowered materials discovery, development, and deployment. The 2008 Integrated Computational Materials Engineering (ICME) study report helped usher in this paradigm shift, making a compelling case and strong recommendations for an infrastructure supporting ICME that would enable access to precompetitive materials data for both scientific and engineering applications. With the launch of the Materials Genome Initiative in 2011, which drew substantial inspiration from the ICME study, digital data was highlighted as a core component of a Materials Innovation Infrastructure, along with experimental and computational tools. Over the past 10 years, our understanding of what it takes to provide accessible materials data has matured and rapid progress has been made in establishing a Materials Data Infrastructure (MDI). We are learning that the MDI is essential to eliminating the seams between experiment and computation by providing a means for them to connect effortlessly. Additionally, the MDI is becoming an enabler, allowing materials engineering to tie into a much broader model-based engineering enterprise for product design.

  14. Building Cyberinfrastructures for Earth and Space Sciences so that they will come: lessons learnt from Australia

    NASA Astrophysics Data System (ADS)

    Wyborn, L. A.; Woodcock, R.

    2013-12-01

    One of the greatest drivers for change in the way scientific research is undertaken in Australia was the development of the Australian eResearch Infrastructure which was coordinated by the then Australian Government Department of Innovation, Industry, Science and Research. There were two main tranches of funding: the 2007-2013 National Collaborative Research Infrastructure Strategy (NCRIS) and the 2009 Education and Investment Framework (EIF) Super Science Initiative. Investments were in two areas: the Australian e-Research Infrastructure and domain specific capabilities: combined investment in both is 1,452M with at least 456M being invested in eResearch infrastructure. NCRIS was specifically designed as a community-guided process to provide researchers, both academic and government, with major research facilities, supporting infrastructures and networks necessary for world-class research. Extensive community engagement was sought to inform decisions on where Australia could best make strategic infrastructure investments to further develop its research capacity and improve research outcomes over the next 5 to 10years. The current (2007-2014) Australian e-Research Infrastructure has 2 components: 1. The National eResearch physical infrastructure which includes two petascale HPC facilities (one in Canberra and one in Perth), a 10 Gbps national network (National Research Network), a national data storage infrastructure comprising 8 multi petabyte data stores and shared access methods (Australian Access Federation). 2. A second component is focused on research integration infrastructures and includes the Australian National Data Service, which is concerned with better management, description and access to distributed research data in Australia and the National eResearch Collaboration Tools and Resources (NeCTAR) project. NeCTAR is centred on developing problem oriented digital laboratories which provide better and coordinated access to research tools, data environments and workflows. The eResearch Infrastructure Stack is designed to support 12 individual domain-specific capabilities. Four are relevant to the Earth and Space Sciences: (1) AuScope (a national Earth Science Infrastructure Program), (2) the Integrated Marine Observing System (IMOS), (3) the Terrestrial Ecosystems Research Network (TERN) and (4) the Australian Urban Research Infrastructure Network (AURIN). The two main research integration infrastructures, ANDS and NeCTAR, are seen as pivotal to the success of the Australian eResearch Infrastructure. Without them, there was a risk that that the investments in new computers and data storage would provide physical infrastructure, but few would come to use it as the skills barriers to entry were too high. ANDS focused on transforming Australia's research data environment. Its flagship is Research Data Australia, an Internet-based discovery service designed to provide rich connections between data, projects, researchers and institutions, and promote visibility of Australian research data collections in search engines. NeCTAR focused on building eResearch infrastructure in four areas: virtual laboratories, tools, a federated research cloud and a hosting service. Combined, ANDS and NeCTAR are ensuring that people ARE coming and ARE using the physical infrastructures that were built.

  15. Information technology developments within the national biological information infrastructure

    USGS Publications Warehouse

    Cotter, G.; Frame, M.T.

    2000-01-01

    Looking out an office window or exploring a community park, one can easily see the tremendous challenges that biological information presents the computer science community. Biological information varies in format and content depending whether or not it is information pertaining to a particular species (i.e. Brown Tree Snake), or a specific ecosystem, which often includes multiple species, land use characteristics, and geospatially referenced information. The complexity and uniqueness of each individual species or ecosystem do not easily lend themselves to today's computer science tools and applications. To address the challenges that the biological enterprise presents the National Biological Information Infrastructure (NBII) (http://www.nbii.gov) was established in 1993. The NBII is designed to address these issues on a National scale within the United States, and through international partnerships abroad. This paper discusses current computer science efforts within the National Biological Information Infrastructure Program and future computer science research endeavors that are needed to address the ever-growing issues related to our Nation's biological concerns.

  16. Use of agents to implement an integrated computing environment

    NASA Technical Reports Server (NTRS)

    Hale, Mark A.; Craig, James I.

    1995-01-01

    Integrated Product and Process Development (IPPD) embodies the simultaneous application to both system and quality engineering methods throughout an iterative design process. The use of IPPD results in the time-conscious, cost-saving development of engineering systems. To implement IPPD, a Decision-Based Design perspective is encapsulated in an approach that focuses on the role of the human designer in product development. The approach has two parts and is outlined in this paper. First, an architecture, called DREAMS, is being developed that facilitates design from a decision-based perspective. Second, a supporting computing infrastructure, called IMAGE, is being designed. Agents are used to implement the overall infrastructure on the computer. Successful agent utilization requires that they be made of three components: the resource, the model, and the wrap. Current work is focused on the development of generalized agent schemes and associated demonstration projects. When in place, the technology independent computing infrastructure will aid the designer in systematically generating knowledge used to facilitate decision-making.

  17. Life science research and drug discovery at the turn of the 21st century: the experience of SwissBioGrid.

    PubMed

    den Besten, Matthijs; Thomas, Arthur J; Schroeder, Ralph

    2009-04-22

    It is often said that the life sciences are transforming into an information science. As laboratory experiments are starting to yield ever increasing amounts of data and the capacity to deal with those data is catching up, an increasing share of scientific activity is seen to be taking place outside the laboratories, sifting through the data and modelling "in silico" the processes observed "in vitro." The transformation of the life sciences and similar developments in other disciplines have inspired a variety of initiatives around the world to create technical infrastructure to support the new scientific practices that are emerging. The e-Science programme in the United Kingdom and the NSF Office for Cyberinfrastructure are examples of these. In Switzerland there have been no such national initiatives. Yet, this has not prevented scientists from exploring the development of similar types of computing infrastructures. In 2004, a group of researchers in Switzerland established a project, SwissBioGrid, to explore whether Grid computing technologies could be successfully deployed within the life sciences. This paper presents their experiences as a case study of how the life sciences are currently operating as an information science and presents the lessons learned about how existing institutional and technical arrangements facilitate or impede this operation. SwissBioGrid gave rise to two pilot projects: one for proteomics data analysis and the other for high-throughput molecular docking ("virtual screening") to find new drugs for neglected diseases (specifically, for dengue fever). The proteomics project was an example of a data management problem, applying many different analysis algorithms to Terabyte-sized datasets from mass spectrometry, involving comparisons with many different reference databases; the virtual screening project was more a purely computational problem, modelling the interactions of millions of small molecules with a limited number of protein targets on the coat of the dengue virus. Both present interesting lessons about how scientific practices are changing when they tackle the problems of large-scale data analysis and data management by means of creating a novel technical infrastructure. In the experience of SwissBioGrid, data intensive discovery has a lot to gain from close collaboration with industry and harnessing distributed computing power. Yet the diversity in life science research implies only a limited role for generic infrastructure; and the transience of support means that researchers need to integrate their efforts with others if they want to sustain the benefits of their success, which are otherwise lost.

  18. Leveraging the Cloud for Robust and Efficient Lunar Image Processing

    NASA Technical Reports Server (NTRS)

    Chang, George; Malhotra, Shan; Wolgast, Paul

    2011-01-01

    The Lunar Mapping and Modeling Project (LMMP) is tasked to aggregate lunar data, from the Apollo era to the latest instruments on the LRO spacecraft, into a central repository accessible by scientists and the general public. A critical function of this task is to provide users with the best solution for browsing the vast amounts of imagery available. The image files LMMP manages range from a few gigabytes to hundreds of gigabytes in size with new data arriving every day. Despite this ever-increasing amount of data, LMMP must make the data readily available in a timely manner for users to view and analyze. This is accomplished by tiling large images into smaller images using Hadoop, a distributed computing software platform implementation of the MapReduce framework, running on a small cluster of machines locally. Additionally, the software is implemented to use Amazon's Elastic Compute Cloud (EC2) facility. We also developed a hybrid solution to serve images to users by leveraging cloud storage using Amazon's Simple Storage Service (S3) for public data while keeping private information on our own data servers. By using Cloud Computing, we improve upon our local solution by reducing the need to manage our own hardware and computing infrastructure, thereby reducing costs. Further, by using a hybrid of local and cloud storage, we are able to provide data to our users more efficiently and securely. 12 This paper examines the use of a distributed approach with Hadoop to tile images, an approach that provides significant improvements in image processing time, from hours to minutes. This paper describes the constraints imposed on the solution and the resulting techniques developed for the hybrid solution of a customized Hadoop infrastructure over local and cloud resources in managing this ever-growing data set. It examines the performance trade-offs of using the more plentiful resources of the cloud, such as those provided by S3, against the bandwidth limitations such use encounters with remote resources. As part of this discussion this paper will outline some of the technologies employed, the reasons for their selection, the resulting performance metrics and the direction the project is headed based upon the demonstrated capabilities thus far.

  19. Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Chase Qishi; Zhu, Michelle Mengxia

    The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less

  20. Transforming Our Cities: High-Performance Green Infrastructure (WERF Report INFR1R11)

    EPA Science Inventory

    The objective of this project is to demonstrate that the highly distributed real-time control (DRTC) technologies for green infrastructure being developed by the research team can play a critical role in transforming our nation’s urban infrastructure. These technologies include a...

  1. Grid accounting service: state and future development

    NASA Astrophysics Data System (ADS)

    Levshina, T.; Sehgal, C.; Bockelman, B.; Weitzel, D.; Guru, A.

    2014-06-01

    During the last decade, large-scale federated distributed infrastructures have been continually developed and expanded. One of the crucial components of a cyber-infrastructure is an accounting service that collects data related to resource utilization and identity of users using resources. The accounting service is important for verifying pledged resource allocation per particular groups and users, providing reports for funding agencies and resource providers, and understanding hardware provisioning requirements. It can also be used for end-to-end troubleshooting as well as billing purposes. In this work we describe Gratia, a federated accounting service jointly developed at Fermilab and Holland Computing Center at University of Nebraska-Lincoln. The Open Science Grid, Fermilab, HCC, and several other institutions have used Gratia in production for several years. The current development activities include expanding Virtual Machines provisioning information, XSEDE allocation usage accounting, and Campus Grids resource utilization. We also identify the direction of future work: improvement and expansion of Cloud accounting, persistent and elastic storage space allocation, and the incorporation of WAN and LAN network metrics.

  2. Technical support for Life Sciences communities on a production grid infrastructure.

    PubMed

    Michel, Franck; Montagnat, Johan; Glatard, Tristan

    2012-01-01

    Production operation of large distributed computing infrastructures (DCI) still requires a lot of human intervention to reach acceptable quality of service. This may be achievable for scientific communities with solid IT support, but it remains a show-stopper for others. Some application execution environments are used to hide runtime technical issues from end users. But they mostly aim at fault-tolerance rather than incident resolution, and their operation still requires substantial manpower. A longer-term support activity is thus needed to ensure sustained quality of service for Virtual Organisations (VO). This paper describes how the biomed VO has addressed this challenge by setting up a technical support team. Its organisation, tooling, daily tasks, and procedures are described. Results are shown in terms of resource usage by end users, amount of reported incidents, and developed software tools. Based on our experience, we suggest ways to measure the impact of the technical support, perspectives to decrease its human cost and make it more community-specific.

  3. Operating a production pilot factory serving several scientific domains

    NASA Astrophysics Data System (ADS)

    Sfiligoi, I.; Würthwein, F.; Andrews, W.; Dost, J. M.; MacNeill, I.; McCrea, A.; Sheripon, E.; Murphy, C. W.

    2011-12-01

    Pilot infrastructures are becoming prominent players in the Grid environment. One of the major advantages is represented by the reduced effort required by the user communities (also known as Virtual Organizations or VOs) due to the outsourcing of the Grid interfacing services, i.e. the pilot factory, to Grid experts. One such pilot factory, based on the glideinWMS pilot infrastructure, is being operated by the Open Science Grid at University of California San Diego (UCSD). This pilot factory is serving multiple VOs from several scientific domains. Currently the three major clients are the analysis operations of the HEP experiment CMS, the community VO HCC, which serves mostly math, biology and computer science users, and the structural biology VO NEBioGrid. The UCSD glidein factory allows the served VOs to use Grid resources distributed over 150 sites in North and South America, in Europe, and in Asia. This paper presents the steps taken to create a production quality pilot factory, together with the challenges encountered along the road.

  4. Benchmarking infrastructure for mutation text mining

    PubMed Central

    2014-01-01

    Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600

  5. Benchmarking infrastructure for mutation text mining.

    PubMed

    Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

    2014-02-25

    Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

  6. Asteroids@Home

    NASA Astrophysics Data System (ADS)

    Durech, Josef; Hanus, J.; Vanco, R.

    2012-10-01

    We present a new project called Asteroids@home (http://asteroidsathome.net/boinc). It is a volunteer-computing project that uses an open-source BOINC (Berkeley Open Infrastructure for Network Computing) software to distribute tasks to volunteers, who provide their computing resources. The project was created at the Astronomical Institute, Charles University in Prague, in cooperation with the Czech National Team. The scientific aim of the project is to solve a time-consuming inverse problem of shape reconstruction of asteroids from sparse-in-time photometry. The time-demanding nature of the problem comes from the fact that with sparse-in-time photometry the rotation period of an asteroid is not apriori known and a huge parameter space must be densely scanned for the best solution. The nature of the problem makes it an ideal task to be solved by distributed computing - the period parameter space can be divided into small bins that can be scanned separately and then joined together to give the globally best solution. In the framework of the the project, we process asteroid photometric data from surveys together with asteroid lightcurves and we derive asteroid shapes and spin states. The algorithm is based on the lightcurve inversion method developed by Kaasalainen et al. (Icarus 153, 37, 2001). The enormous potential of distributed computing will enable us to effectively process also the data from future surveys (Large Synoptic Survey Telescope, Gaia mission, etc.). We also plan to process data of a synthetic asteroid population to reveal biases of the method. In our presentation, we will describe the project, show the first results (new models of asteroids), and discuss the possibilities of its further development. This work has been supported by the grant GACR P209/10/0537 of the Czech Science Foundation and by the Research Program MSM0021620860 of the Ministry of Education of the Czech Republic.

  7. Modelling noise propagation using Grid Resources. Progress within GDI-Grid

    NASA Astrophysics Data System (ADS)

    Kiehle, Christian; Mayer, Christian; Padberg, Alexander; Stapelfeld, Hartmut

    2010-05-01

    Modelling noise propagation using Grid Resources. Progress within GDI-Grid. GDI-Grid (english: SDI-Grid) is a research project funded by the German Ministry for Science and Education (BMBF). It aims at bridging the gaps between OGC Web Services (OWS) and Grid infrastructures and identifying the potential of utilizing the superior storage capacities and computational power of grid infrastructures for geospatial applications while keeping the well-known service interfaces specified by the OGC. The project considers all major OGC webservice interfaces for Web Mapping (WMS), Feature access (Web Feature Service), Coverage access (Web Coverage Service) and processing (Web Processing Service). The major challenge within GDI-Grid is the harmonization of diverging standards as defined by standardization bodies for Grid computing and spatial information exchange. The project started in 2007 and will continue until June 2010. The concept for the gridification of OWS developed by lat/lon GmbH and the Department of Geography of the University of Bonn is applied to three real-world scenarios in order to check its practicability: a flood simulation, a scenario for emergency routing and a noise propagation simulation. The latter scenario is addressed by the Stapelfeldt Ingenieurgesellschaft mbH located in Dortmund adapting their LimA software to utilize grid resources. Noise mapping of e.g. traffic noise in urban agglomerates and along major trunk roads is a reoccurring demand of the EU Noise Directive. Input data requires road net and traffic, terrain, buildings and noise protection screens as well as population distribution. Noise impact levels are generally calculated in 10 m grid and along relevant building facades. For each receiver position sources within a typical range of 2000 m are split down into small segments, depending on local geometry. For each of the segments propagation analysis includes diffraction effects caused by all obstacles on the path of sound propagation. This immense intensive calculation needs to be performed for a major part of European landscape. A LINUX version of the commercial LimA software for noise mapping analysis has been implemented on a test cluster within the German D-GRID computer network. Results and performance indicators will be presented. The presentation is an extension to last-years presentation "Spatial Data Infrastructures and Grid Computing: the GDI-Grid project" that described the gridification concept developed in the GDI-Grid project and provided an overview of the conceptual gaps between Grid Computing and Spatial Data Infrastructures. Results from the GDI-Grid project are incorporated in the OGC-OGF (Open Grid Forum) collaboration efforts as well as the OGC WPS 2.0 standards working group developing the next major version of the WPS specification.

  8. Solving optimization problems on computational grids.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wright, S. J.; Mathematics and Computer Science

    2001-05-01

    Multiprocessor computing platforms, which have become more and more widely available since the mid-1980s, are now heavily used by organizations that need to solve very demanding computational problems. Parallel computing is now central to the culture of many research communities. Novel parallel approaches were developed for global optimization, network optimization, and direct-search methods for nonlinear optimization. Activity was particularly widespread in parallel branch-and-bound approaches for various problems in combinatorial and network optimization. As the cost of personal computers and low-end workstations has continued to fall, while the speed and capacity of processors and networks have increased dramatically, 'cluster' platforms havemore » become popular in many settings. A somewhat different type of parallel computing platform know as a computational grid (alternatively, metacomputer) has arisen in comparatively recent times. Broadly speaking, this term refers not to a multiprocessor with identical processing nodes but rather to a heterogeneous collection of devices that are widely distributed, possibly around the globe. The advantage of such platforms is obvious: they have the potential to deliver enormous computing power. Just as obviously, however, the complexity of grids makes them very difficult to use. The Condor team, headed by Miron Livny at the University of Wisconsin, were among the pioneers in providing infrastructure for grid computations. More recently, the Globus project has developed technologies to support computations on geographically distributed platforms consisting of high-end computers, storage and visualization devices, and other scientific instruments. In 1997, we started the metaneos project as a collaborative effort between optimization specialists and the Condor and Globus groups. Our aim was to address complex, difficult optimization problems in several areas, designing and implementing the algorithms and the software infrastructure need to solve these problems on computational grids. This article describes some of the results we have obtained during the first three years of the metaneos project. Our efforts have led to development of the runtime support library MW for implementing algorithms with master-worker control structure on Condor platforms. This work is discussed here, along with work on algorithms and codes for integer linear programming, the quadratic assignment problem, and stochastic linear programmming. Our experiences in the metaneos project have shown that cheap, powerful computational grids can be used to tackle large optimization problems of various types. In an industrial or commercial setting, the results demonstrate that one may not have to buy powerful computational servers to solve many of the large problems arising in areas such as scheduling, portfolio optimization, or logistics; the idle time on employee workstations (or, at worst, an investment in a modest cluster of PCs) may do the job. For the optimization research community, our results motivate further work on parallel, grid-enabled algorithms for solving very large problems of other types. The fact that very large problems can be solved cheaply allows researchers to better understand issues of 'practical' complexity and of the role of heuristics.« less

  9. Moving code - Sharing geoprocessing logic on the Web

    NASA Astrophysics Data System (ADS)

    Müller, Matthias; Bernard, Lars; Kadner, Daniel

    2013-09-01

    Efficient data processing is a long-standing challenge in remote sensing. Effective and efficient algorithms are required for product generation in ground processing systems, event-based or on-demand analysis, environmental monitoring, and data mining. Furthermore, the increasing number of survey missions and the exponentially growing data volume in recent years have created demand for better software reuse as well as an efficient use of scalable processing infrastructures. Solutions that address both demands simultaneously have begun to slowly appear, but they seldom consider the possibility to coordinate development and maintenance efforts across different institutions, community projects, and software vendors. This paper presents a new approach to share, reuse, and possibly standardise geoprocessing logic in the field of remote sensing. Drawing from the principles of service-oriented design and distributed processing, this paper introduces moving-code packages as self-describing software components that contain algorithmic code and machine-readable descriptions of the provided functionality, platform, and infrastructure, as well as basic information about exploitation rights. Furthermore, the paper presents a lean publishing mechanism by which to distribute these packages on the Web and to integrate them in different processing environments ranging from monolithic workstations to elastic computational environments or "clouds". The paper concludes with an outlook toward community repositories for reusable geoprocessing logic and their possible impact on data-driven science in general.

  10. Adaptive Resource Utilization Prediction System for Infrastructure as a Service Cloud.

    PubMed

    Zia Ullah, Qazi; Hassan, Shahzad; Khan, Gul Muhammad

    2017-01-01

    Infrastructure as a Service (IaaS) cloud provides resources as a service from a pool of compute, network, and storage resources. Cloud providers can manage their resource usage by knowing future usage demand from the current and past usage patterns of resources. Resource usage prediction is of great importance for dynamic scaling of cloud resources to achieve efficiency in terms of cost and energy consumption while keeping quality of service. The purpose of this paper is to present a real-time resource usage prediction system. The system takes real-time utilization of resources and feeds utilization values into several buffers based on the type of resources and time span size. Buffers are read by R language based statistical system. These buffers' data are checked to determine whether their data follows Gaussian distribution or not. In case of following Gaussian distribution, Autoregressive Integrated Moving Average (ARIMA) is applied; otherwise Autoregressive Neural Network (AR-NN) is applied. In ARIMA process, a model is selected based on minimum Akaike Information Criterion (AIC) values. Similarly, in AR-NN process, a network with the lowest Network Information Criterion (NIC) value is selected. We have evaluated our system with real traces of CPU utilization of an IaaS cloud of one hundred and twenty servers.

  11. CERN data services for LHC computing

    NASA Astrophysics Data System (ADS)

    Espinal, X.; Bocchi, E.; Chan, B.; Fiorot, A.; Iven, J.; Lo Presti, G.; Lopez, J.; Gonzalez, H.; Lamanna, M.; Mascetti, L.; Moscicki, J.; Pace, A.; Peters, A.; Ponce, S.; Rousseau, H.; van der Ster, D.

    2017-10-01

    Dependability, resilience, adaptability and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the broad number of experiments at CERN need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the success of HEP experiments. Nowadays we operate at high incoming throughput (14GB/s during 2015 LHC Pb-Pb run and 11PB in July 2016) and with concurrent complex production work-loads. In parallel our systems provide the platform for the continuous user and experiment driven work-loads for large-scale data analysis, including end-user access and sharing. The storage services at CERN cover the needs of our community: EOS and CASTOR as a large-scale storage; CERNBox for end-user access and sharing; Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy distributed-file-system services. In this paper we will summarise the experience in supporting LHC experiments and the transition of our infrastructure from static monolithic systems to flexible components providing a more coherent environment with pluggable protocols, tuneable QoS, sharing capabilities and fine grained ACLs management while continuing to guarantee dependable and robust services.

  12. Adaptive Resource Utilization Prediction System for Infrastructure as a Service Cloud

    PubMed Central

    Hassan, Shahzad; Khan, Gul Muhammad

    2017-01-01

    Infrastructure as a Service (IaaS) cloud provides resources as a service from a pool of compute, network, and storage resources. Cloud providers can manage their resource usage by knowing future usage demand from the current and past usage patterns of resources. Resource usage prediction is of great importance for dynamic scaling of cloud resources to achieve efficiency in terms of cost and energy consumption while keeping quality of service. The purpose of this paper is to present a real-time resource usage prediction system. The system takes real-time utilization of resources and feeds utilization values into several buffers based on the type of resources and time span size. Buffers are read by R language based statistical system. These buffers' data are checked to determine whether their data follows Gaussian distribution or not. In case of following Gaussian distribution, Autoregressive Integrated Moving Average (ARIMA) is applied; otherwise Autoregressive Neural Network (AR-NN) is applied. In ARIMA process, a model is selected based on minimum Akaike Information Criterion (AIC) values. Similarly, in AR-NN process, a network with the lowest Network Information Criterion (NIC) value is selected. We have evaluated our system with real traces of CPU utilization of an IaaS cloud of one hundred and twenty servers. PMID:28811819

  13. Co-location and Self-Similar Topologies of Urban Infrastructure Networks

    NASA Astrophysics Data System (ADS)

    Klinkhamer, Christopher; Zhan, Xianyuan; Ukkusuri, Satish; Elisabeth, Krueger; Paik, Kyungrock; Rao, Suresh

    2016-04-01

    The co-location of urban infrastructure is too obvious to be easily ignored. For reasons of practicality, reliability, and eminent domain, the spatial locations of many urban infrastructure networks, including drainage, sanitary sewers, and road networks, are well correlated. However, important questions dealing with correlations in the network topologies of differing infrastructure types remain unanswered. Here, we have extracted randomly distributed, nested subnets from the urban drainage, sanitary sewer, and road networks in two distinctly different cities: Amman, Jordan; and Indianapolis, USA. Network analyses were performed for each randomly chosen subnet (location and size), using a dual-mapping approach (Hierarchical Intersection Continuity Negotiation). Topological metrics for each infrastructure type were calculated and compared for all subnets in a given city. Despite large differences in the climate, governance, and populace of the two cities, and functional properties of the different infrastructure types, these infrastructure networks are shown to be highly spatially homogenous. Furthermore, strong correlations are found between topological metrics of differing types of surface and subsurface infrastructure networks. Also, the network topologies of each infrastructure type for both cities are shown to exhibit self-similar characteristics (i.e., power law node-degree distributions, [p(k) = ak-γ]. These findings can be used to assist city planners and engineers either expanding or retrofitting existing infrastructure, or in the case of developing countries, building new cities from the ground up. In addition, the self-similar nature of these infrastructure networks holds significant implications for the vulnerability of these critical infrastructure networks to external hazards and ways in which network resilience can be improved.

  14. 32 CFR 240.4 - Policy.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...

  15. 32 CFR 240.4 - Policy.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...

  16. 32 CFR 240.4 - Policy.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...

  17. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

    PubMed

    Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

    2011-08-30

    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.

  18. Progress on the FabrIc for Frontier Experiments project at Fermilab

    DOE PAGES

    Box, Dennis; Boyd, Joseph; Dykstra, Dave; ...

    2015-12-23

    The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercialmore » cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. Hence, the progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide« less

  19. The Magellan Final Report on Cloud Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    ,; Coghlan, Susan; Yelick, Katherine

    The goal of Magellan, a project funded through the U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research (ASCR), was to investigate the potential role of cloud computing in addressing the computing needs for the DOE Office of Science (SC), particularly related to serving the needs of mid- range computing and future data-intensive computing workloads. A set of research questions was formed to probe various aspects of cloud computing from performance, usability, and cost. To address these questions, a distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility (ALCF) and the National Energy Research Scientific Computingmore » Center (NERSC). The testbed was designed to be flexible and capable enough to explore a variety of computing models and hardware design points in order to understand the impact for various scientific applications. During the project, the testbed also served as a valuable resource to application scientists. Applications from a diverse set of projects such as MG-RAST (a metagenomics analysis server), the Joint Genome Institute, the STAR experiment at the Relativistic Heavy Ion Collider, and the Laser Interferometer Gravitational Wave Observatory (LIGO), were used by the Magellan project for benchmarking within the cloud, but the project teams were also able to accomplish important production science utilizing the Magellan cloud resources.« less

  20. The FIFE Project at Fermilab

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Box, D.; Boyd, J.; Di Benedetto, V.

    2016-01-01

    The FabrIc for Frontier Experiments (FIFE) project is an initiative within the Fermilab Scientific Computing Division designed to steer the computing model for non-LHC Fermilab experiments across multiple physics areas. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying size, needs, and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of solutions for high throughput computing, data management, database access and collaboration management within an experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid compute sites alongmore » with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including a common job submission service, software and reference data distribution through CVMFS repositories, flexible and robust data transfer clients, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken the leading role in defining the computing model for Fermilab experiments, aided in the design of experiments beyond those hosted at Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.« less

  1. The AAL project: automated monitoring and intelligent analysis for the ATLAS data taking infrastructure

    NASA Astrophysics Data System (ADS)

    Kazarov, A.; Lehmann Miotto, G.; Magnoni, L.

    2012-06-01

    The Trigger and Data Acquisition (TDAQ) system of the ATLAS experiment at CERN is the infrastructure responsible for collecting and transferring ATLAS experimental data from detectors to the mass storage system. It relies on a large, distributed computing environment, including thousands of computing nodes with thousands of application running concurrently. In such a complex environment, information analysis is fundamental for controlling applications behavior, error reporting and operational monitoring. During data taking runs, streams of messages sent by applications via the message reporting system together with data published from applications via information services are the main sources of knowledge about correctness of running operations. The flow of data produced (with an average rate of O(1-10KHz)) is constantly monitored by experts to detect problem or misbehavior. This requires strong competence and experience in understanding and discovering problems and root causes, and often the meaningful information is not in the single message or update, but in the aggregated behavior in a certain time-line. The AAL project is meant at reducing the man power needs and at assuring a constant high quality of problem detection by automating most of the monitoring tasks and providing real-time correlation of data-taking and system metrics. This project combines technologies coming from different disciplines, in particular it leverages on an Event Driven Architecture to unify the flow of data from the ATLAS infrastructure, on a Complex Event Processing (CEP) engine for correlation of events and on a message oriented architecture for components integration. The project is composed of 2 main components: a core processing engine, responsible for correlation of events through expert-defined queries and a web based front-end to present real-time information and interact with the system. All components works in a loose-coupled event based architecture, with a message broker to centralize all communication between modules. The result is an intelligent system able to extract and compute relevant information from the flow of operational data to provide real-time feedback to human experts who can promptly react when needed. The paper presents the design and implementation of the AAL project, together with the results of its usage as automated monitoring assistant for the ATLAS data taking infrastructure.

  2. NaradaBrokering as Middleware Fabric for Grid-based Remote Visualization Services

    NASA Astrophysics Data System (ADS)

    Pallickara, S.; Erlebacher, G.; Yuen, D.; Fox, G.; Pierce, M.

    2003-12-01

    Remote Visualization Services (RVS) have tended to rely on approaches based on the client server paradigm. The simplicity in these approaches is offset by problems such as single-point-of-failures, scaling and availability. Furthermore, as the complexity, scale and scope of the services hosted on this paradigm increase, this approach becomes increasingly unsuitable. We propose a scheme based on top of a distributed brokering infrastructure, NaradaBrokering, which comprises a distributed network of broker nodes. These broker nodes are organized in a cluster-based architecture that can scale to very large sizes. The broker network is resilient to broker failures and efficiently routes interactions to entities that expressed an interest in them. In our approach to RVS, services advertise their capabilities to the broker network, which manages these service advertisements. Among the services considered within our system are those that perform graphic transformations, mediate access to specialized datasets and finally those that manage the execution of specified tasks. There could be multiple instances of each of these services and the system ensures that load for a given service is distributed efficiently over these service instances. Among the features provided in our approach are efficient discovery of services and asynchronous interactions between services and service requestors (which could themselves be other services). Entities need not be online during the execution of the service request. The system also ensures that entities can be notified about task executions, partial results and failures that might have taken place during service execution. The system also facilitates specification of task overrides, distribution of execution results to alternate devices (which were not used to originally request service execution) and to multiple users. These RVS services could of course be either OGSA (Open Grid Services Architecture) based Grid services or traditional Web services. The brokering infrastructure will manage the service advertisements and the invocation of these services. This scheme ensures that the fundamental Grid computing concept is met - provide computing capabilities of those that are willing to provide it to those that seek the same. {[1]} The NaradaBrokering Project: http://www.naradabrokering.org

  3. Essays on the Impacts of Geography and Institutions on Access to Energy and Public Infrastructure Services

    NASA Astrophysics Data System (ADS)

    Archibong, Belinda

    While previous literature has emphasized the importance of energy and public infrastructure services for economic development, questions surrounding the implications of unequal spatial distribution in access to these resources remain, particularly in the developing country context. This dissertation provides evidence on the nature, origins and implications of this distribution uniting three strands of research from the development and political economy, regional science and energy economics fields. The dissertation unites three papers on the nature of spatial inequality of access to energy and infrastructure with further implications for conflict risk , the historical institutional and biogeographical determinants of current distribution of access to energy and public infrastructure services and the response of households to fuel price changes over time. Chapter 2 uses a novel survey dataset to provide evidence for spatial clustering of public infrastructure non-functionality at schools by geopolitical zone in Nigeria with further implications for armed conflict risk in the region. Chapter 3 investigates the drivers of the results in chapter 2, exploiting variation in the spatial distribution of precolonial institutions and geography in the region, to provide evidence for the long-term impacts of these factors on current heterogeneity of access to public services. Chapter 4 addresses the policy implications of energy access, providing the first multi-year evidence on firewood demand elasticities in India, using the spatial variation in prices for estimation.

  4. Improving ATLAS grid site reliability with functional tests using HammerCloud

    NASA Astrophysics Data System (ADS)

    Elmsheuser, Johannes; Legger, Federica; Medrano Llamas, Ramon; Sciacca, Gianfranco; van der Ster, Dan

    2012-12-01

    With the exponential growth of LHC (Large Hadron Collider) data in 2011, and more coming in 2012, distributed computing has become the established way to analyse collider data. The ATLAS grid infrastructure includes almost 100 sites worldwide, ranging from large national computing centers to smaller university clusters. These facilities are used for data reconstruction and simulation, which are centrally managed by the ATLAS production system, and for distributed user analysis. To ensure the smooth operation of such a complex system, regular tests of all sites are necessary to validate the site capability of successfully executing user and production jobs. We report on the development, optimization and results of an automated functional testing suite using the HammerCloud framework. Functional tests are short lightweight applications covering typical user analysis and production schemes, which are periodically submitted to all ATLAS grid sites. Results from those tests are collected and used to evaluate site performances. Sites that fail or are unable to run the tests are automatically excluded from the PanDA brokerage system, therefore avoiding user or production jobs to be sent to problematic sites.

  5. Evolving the Technical Infrastructure of the Planetary Data System for the 21st Century

    NASA Technical Reports Server (NTRS)

    Beebe, Reta F.; Crichton, D.; Hughes, S.; Grayzeck, E.

    2010-01-01

    The Planetary Data System (PDS) was established in 1989 as a distributed system to assure scientific oversight. Initially the PDS followed guidelines recommended by the National Academies Committee on Data Management and Computation (CODMAC, 1982) and placed emphasis on archiving validated datasets. But overtime user demands, supported by increased computing capabilities and communication methods, have placed increasing demands on the PDS. The PDS must add additional services to better enable scientific analysis within distributed environments and to ensure that those services integrate with existing systems and data. To face these challenges the Planetary Data System (PDS) must modernize its architecture and technical implementation. The PDS 2010 project addresses these challenges. As part of this project, the PDS has three fundamental project goals that include: (1) Providing more efficient client delivery of data by data providers to the PDS (2) Enabling a stable, long-term usable planetary science data archive (3) Enabling services for the data consumer to find, access and use the data they require in contemporary data formats. In order to achieve these goals, the PDS 2010 project is upgrading both the technical infrastructure and the data standards to support increased efficiency in data delivery as well as usability of the PDS. Efforts are underway to interface with missions as early as possible and to streamline the preparation and delivery of data to the PDS. Likewise, the PDS is working to define and plan for data services that will help researchers to perform analysis in cost-constrained environments. This presentation will cover the PDS 2010 project including the goals, data standards and technical implementation plans that are underway within the Planetary Data System. It will discuss the plans for moving from the current system, version PDS 3, to version PDS 4.

  6. Advanced Artificial Science. The development of an artificial science and engineering research infrastructure to facilitate innovative computational modeling, analysis, and application to interdisciplinary areas of scientific investigation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saffer, Shelley

    2014-12-01

    This is a final report of the DOE award DE-SC0001132, Advanced Artificial Science. The development of an artificial science and engineering research infrastructure to facilitate innovative computational modeling, analysis, and application to interdisciplinary areas of scientific investigation. This document describes the achievements of the goals, and resulting research made possible by this award.

  7. Ubiquitous Green Computing Techniques for High Demand Applications in Smart Environments

    PubMed Central

    Zapater, Marina; Sanchez, Cesar; Ayala, Jose L.; Moya, Jose M.; Risco-Martín, José L.

    2012-01-01

    Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time. PMID:23112621

  8. GSDC: A Unique Data Center in Korea for HEP research

    NASA Astrophysics Data System (ADS)

    Ahn, Sang-Un

    2017-04-01

    Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) is a unique data center in South Korea established for promoting the fundamental research fields by supporting them with the expertise on Information and Communication Technology (ICT) and the infrastructure for High Performance Computing (HPC), High Throughput Computing (HTC) and Networking. GSDC has supported various research fields in South Korea dealing with the large scale of data, e.g. RENO experiment for neutrino research, LIGO experiment for gravitational wave detection, Genome sequencing project for bio-medical, and HEP experiments such as CDF at FNAL, Belle at KEK, and STAR at BNL. In particular, GSDC has run a Tier-1 center for ALICE experiment using the LHC at CERN since 2013. In this talk, we present the overview on computing infrastructure that GSDC runs for the research fields and we discuss on the data center infrastructure management system deployed at GSDC.

  9. First results from a combined analysis of CERN computing infrastructure metrics

    NASA Astrophysics Data System (ADS)

    Duellmann, Dirk; Nieke, Christian

    2017-10-01

    The IT Analysis Working Group (AWG) has been formed at CERN across individual computing units and the experiments to attempt a cross cutting analysis of computing infrastructure and application metrics. In this presentation we will describe the first results obtained using medium/long term data (1 months — 1 year) correlating box level metrics, job level metrics from LSF and HTCondor, IO metrics from the physics analysis disk pools (EOS) and networking and application level metrics from the experiment dashboards. We will cover in particular the measurement of hardware performance and prediction of job duration, the latency sensitivity of different job types and a search for bottlenecks with the production job mix in the current infrastructure. The presentation will conclude with the proposal of a small set of metrics to simplify drawing conclusions also in the more constrained environment of public cloud deployments.

  10. Ubiquitous green computing techniques for high demand applications in Smart environments.

    PubMed

    Zapater, Marina; Sanchez, Cesar; Ayala, Jose L; Moya, Jose M; Risco-Martín, José L

    2012-01-01

    Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.

  11. NCI's High Performance Computing (HPC) and High Performance Data (HPD) Computing Platform for Environmental and Earth System Data Science

    NASA Astrophysics Data System (ADS)

    Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley

    2015-04-01

    The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially increasing data volumes at NCI. Traditional HPC and data environments are still made available in a way that flexibly provides the tools, services and supporting software systems on these new petascale infrastructures. But to enable the research to take place at this scale, the data, metadata and software now need to evolve together - creating a new integrated high performance infrastructure. The new infrastructure at NCI currently supports a catalogue of integrated, reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. One of the challenges for NCI has been to support existing techniques and methods, while carefully preparing the underlying infrastructure for the transition needed for the next class of Data-intensive Science. In doing so, a flexible range of techniques and software can be made available for application across the corpus of data collections available, and to provide a new infrastructure for future interdisciplinary research.

  12. A mathematical model for generating bipartite graphs and its application to protein networks

    NASA Astrophysics Data System (ADS)

    Nacher, J. C.; Ochiai, T.; Hayashida, M.; Akutsu, T.

    2009-12-01

    Complex systems arise in many different contexts from large communication systems and transportation infrastructures to molecular biology. Most of these systems can be organized into networks composed of nodes and interacting edges. Here, we present a theoretical model that constructs bipartite networks with the particular feature that the degree distribution can be tuned depending on the probability rate of fundamental processes. We then use this model to investigate protein-domain networks. A protein can be composed of up to hundreds of domains. Each domain represents a conserved sequence segment with specific functional tasks. We analyze the distribution of domains in Homo sapiens and Arabidopsis thaliana organisms and the statistical analysis shows that while (a) the number of domain types shared by k proteins exhibits a power-law distribution, (b) the number of proteins composed of k types of domains decays as an exponential distribution. The proposed mathematical model generates bipartite graphs and predicts the emergence of this mixing of (a) power-law and (b) exponential distributions. Our theoretical and computational results show that this model requires (1) growth process and (2) copy mechanism.

  13. NAS infrastructure management system build 1.5 computer-human interface

    DOT National Transportation Integrated Search

    2001-01-01

    Human factors engineers from the National Airspace System (NAS) Human Factors Branch (ACT-530) of the Federal Aviation Administration William J. Hughes Technical Center conducted an evaluation of the NAS Infrastructure Management System (NIMS) Build ...

  14. EC FP6 Enviro-RISKS project outcomes in area of Earth and Space Science Informatics applications

    NASA Astrophysics Data System (ADS)

    Gordov, E. P.; Zakarin, E. A.

    2009-04-01

    Nowadays the community acknowledged that to understand dynamics of regional environment properly and perform its assessment on the base of monitoring and modeling more strong involvement of information-computational technologies (ICT) is required, which should lead to development of information-computational infrastructure as an inherent part of such investigations. This paper is based on the Report&Recommendations (www.dmi.dk/dmi/sr08-05-4.pdf) of the Enviro-RISKS (Man-induced Environmental Risks: Monitoring, Management and Remediation of Man-made Changes in Siberia) Project Thematic expert group for Information Systems, Integration and Synthesis Focus and presents results of activities of Project Partners in area of Information Technologies for Environmental Sciences development and usage. Approaches used the web-based Information Technologies and the GIS-based Information Technologies are described and a way to their integration is outlined. In particular, developed in course of the Project carrying out Enviro-RISKS web portal and its Climate site (http://climate.risks.scert.ru/), providing an access to interactive web-system for regional climate assessment on the base of standard meteorological data archives, which is a key element of the information-computational infrastructure of the Siberia Integrated Regional Study (SIRS), is described in details as well as developed on the base of GIS technology system for monitoring and modeling air and water pollutions transport and transformations. The later is quite useful for practical applications realization of geoinformation modeling, in which relevant mathematical models are plunged into GIS and all the modeling and analysis phases are accomplished in the informational sphere, based on the real data including those coming from satellites. Major efforts currently are undertaken in attempt to integrate GIS based environmental applications with web accessibility, computing power and data interoperability thus to exploit completely huge potential of web bases technologies. In particular, development of a region devoted web portal using approached suggested by the Open Geospatial Consortium has been started recently. The state of the art of the information-computational infrastructure in the targeted region is quite a step in the process of development of a distributed collaborative information-computational environment to support multidisciplinary investigations of Earth regional environment, especially those required meteorology, atmospheric pollution transport and climate modeling. Established in process of the Project carrying out cooperative links, new Partners initiatives, and gained expertise allow us to hope that this infrastructure rather soon will make significant input into understanding regional environmental processes in their relationships with Global Change. In particular, this infrastructure will play a role of the 'underlying mechanics' of the research work, leaving the earth scientists to concentrate on their investigations as well as providing the environment to make research results available and understandable to everyone. Additionally to the core FP6 Enviro-RISKS project (INCO-CT-2004-013427) support this activity was partially supported by SB RAS Integration Project 34, SB RAS Basic Program Project 4.5.2.2 and APN Project CBA2007-08NSY. Valuable input into the expert group work and elaborated outcomes of Profs. V. Lykosov and A. Starchenko, Drs. D. Belikov, , M. Korets, S. Kostrykin, B. Mirkarimova, I. Okladnikov, , A. Titov and A. Tridvornov is acknowledged.

  15. A Performance Comparison of Tree and Ring Topologies in Distributed System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Min

    A distributed system is a collection of computers that are connected via a communication network. Distributed systems have become commonplace due to the wide availability of low-cost, high performance computers and network devices. However, the management infrastructure often does not scale well when distributed systems get very large. Some of the considerations in building a distributed system are the choice of the network topology and the method used to construct the distributed system so as to optimize the scalability and reliability of the system, lower the cost of linking nodes together and minimize the message delay in transmission, and simplifymore » system resource management. We have developed a new distributed management system that is able to handle the dynamic increase of system size, detect and recover the unexpected failure of system services, and manage system resources. The topologies used in the system are the tree-structured network and the ring-structured network. This thesis presents the research background, system components, design, implementation, experiment results and the conclusions of our work. The thesis is organized as follows: the research background is presented in chapter 1. Chapter 2 describes the system components, including the different node types and different connection types used in the system. In chapter 3, we describe the message types and message formats in the system. We discuss the system design and implementation in chapter 4. In chapter 5, we present the test environment and results, Finally, we conclude with a summary and describe our future work in chapter 6.« less

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amerio, S.; Behari, S.; Boyd, J.

    The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have approximately 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 and beyond. To achieve this goal, we have implemented a system that utilizes virtualization, automated validation, and migration to new standards inmore » both software and data storage technology and leverages resources available from currently-running experiments at Fermilab. Lastly, these efforts have also provided useful lessons in ensuring long-term data access for numerous experiments, and enable high-quality scientific output for years to come.« less

  17. Data preservation at the Fermilab Tevatron

    NASA Astrophysics Data System (ADS)

    Amerio, S.; Behari, S.; Boyd, J.; Brochmann, M.; Culbertson, R.; Diesburg, M.; Freeman, J.; Garren, L.; Greenlee, H.; Herner, K.; Illingworth, R.; Jayatilaka, B.; Jonckheere, A.; Li, Q.; Naymola, S.; Oleynik, G.; Sakumoto, W.; Varnes, E.; Vellidis, C.; Watts, G.; White, S.

    2017-04-01

    The Fermilab Tevatron collider's data-taking run ended in September 2011, yielding a dataset with rich scientific potential. The CDF and D0 experiments each have approximately 9 PB of collider and simulated data stored on tape. A large computing infrastructure consisting of tape storage, disk cache, and distributed grid computing for physics analysis with the Tevatron data is present at Fermilab. The Fermilab Run II data preservation project intends to keep this analysis capability sustained through the year 2020 and beyond. To achieve this goal, we have implemented a system that utilizes virtualization, automated validation, and migration to new standards in both software and data storage technology and leverages resources available from currently-running experiments at Fermilab. These efforts have also provided useful lessons in ensuring long-term data access for numerous experiments, and enable high-quality scientific output for years to come.

  18. Distributed Processing of Sentinel-2 Products using the BIGEARTH Platform

    NASA Astrophysics Data System (ADS)

    Bacu, Victor; Stefanut, Teodor; Nandra, Constantin; Mihon, Danut; Gorgan, Dorian

    2017-04-01

    The constellation of observational satellites orbiting around Earth is constantly increasing, providing more data that need to be processed in order to extract meaningful information and knowledge from it. Sentinel-2 satellites, part of the Copernicus Earth Observation program, aim to be used in agriculture, forestry and many other land management applications. ESA's SNAP toolbox can be used to process data gathered by Sentinel-2 satellites but is limited to the resources provided by a stand-alone computer. In this paper we present a cloud based software platform that makes use of this toolbox together with other remote sensing software applications to process Sentinel-2 products. The BIGEARTH software platform [1] offers an integrated solution for processing Earth Observation data coming from different sources (such as satellites or on-site sensors). The flow of processing is defined as a chain of tasks based on the WorDeL description language [2]. Each task could rely on a different software technology (such as Grass GIS and ESA's SNAP) in order to process the input data. One important feature of the BIGEARTH platform comes from this possibility of interconnection and integration, throughout the same flow of processing, of the various well known software technologies. All this integration is transparent from the user perspective. The proposed platform extends the SNAP capabilities by enabling specialists to easily scale the processing over distributed architectures, according to their specific needs and resources. The software platform [3] can be used in multiple configurations. In the basic one the software platform runs as a standalone application inside a virtual machine. Obviously in this case the computational resources are limited but it will give an overview of the functionalities of the software platform, and also the possibility to define the flow of processing and later on to execute it on a more complex infrastructure. The most complex and robust configuration is based on cloud computing and allows the installation on a private or public cloud infrastructure. In this configuration, the processing resources can be dynamically allocated and the execution time can be considerably improved by the available virtual resources and the number of parallelizable sequences in the processing flow. The presentation highlights the benefits and issues of the proposed solution by analyzing some significant experimental use cases. Main references for further information: [1] BigEarth project, http://cgis.utcluj.ro/projects/bigearth [2] Constantin Nandra, Dorian Gorgan: "Defining Earth data batch processing tasks by means of a flexible workflow description language", ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., III-4, 59-66, (2016). [3] Victor Bacu, Teodor Stefanut, Dorian Gorgan, "Adaptive Processing of Earth Observation Data on Cloud Infrastructures Based on Workflow Description", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp.444-454, (2015).

  19. Augmenting Trust Establishment in Dynamic Systems with Social Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lagesse, Brent J; Kumar, Mohan; Venkatesh, Svetha

    2010-01-01

    Social networking has recently flourished in popularity through the use of social websites. Pervasive computing resources have allowed people stay well-connected to each other through access to social networking resources. We take the position that utilizing information produced by relationships within social networks can assist in the establishment of trust for other pervasive computing applications. Furthermore, we describe how such a system can augment a sensor infrastructure used for event observation with information from mobile sensors (ie, mobile phones with cameras) controlled by potentially untrusted third parties. Pervasive computing systems are invisible systems, oriented around the user. As a result,more » many future pervasive systems are likely to include a social aspect to the system. The social communities that are developed in these systems can augment existing trust mechanisms with information about pre-trusted entities or entities to initially consider when beginning to establish trust. An example of such a system is the Collaborative Virtual Observation (CoVO) system fuses sensor information from disaparate sources in soft real-time to recreate a scene that provides observation of an event that has recently transpired. To accomplish this, CoVO must efficently access services whilst protecting the data from corruption from unknown remote nodes. CoVO combines dynamic service composition with virtual observation to utilize existing infrastructure with third party services available in the environment. Since these services are not under the control of the system, they may be unreliable or malicious. When an event of interest occurs, the given infrastructure (bus cameras, etc.) may not sufficiently cover the necessary information (be it in space, time, or sensor type). To enhance observation of the event, infrastructure is augmented with information from sensors in the environment that the infrastructure does not control. These sensors may be unreliable, uncooperative, or even malicious. Additionally, to execute queries in soft real-time, processing must be distributed to available systems in the environment. We propose to use information from social networks to satisfy these requirements. In this paper, we present our position that knowledge gained from social activities can be used to augment trust mechanisms in pervasive computing. The system uses social behavior of nodes to predict a subset that it wants to query for information. In this context, social behavior such as transit patterns and schedules (which can be used to determine if a queried node is likely to be reliable) or known relationships, such as a phone's address book, that can be used to determine networks of nodes that may also be able to assist in retrieving information. Neither implicit nor explicit relationships necessarily imply that the user trusts an entity, but rather will provide a starting place for establishing trust. The proposed framework utilizes social network information to assist in trust establishment when third-party sensors are used for sensing events.« less

  20. International Convergence on Geoscience Cyberinfrastructure

    NASA Astrophysics Data System (ADS)

    Allison, M. L.; Atkinson, R.; Arctur, D. K.; Cox, S.; Jackson, I.; Nativi, S.; Wyborn, L. A.

    2012-04-01

    There is growing international consensus on addressing the challenges to cyber(e)-infrastructure for the geosciences. These challenges include: Creating common standards and protocols; Engaging the vast number of distributed data resources; Establishing practices for recognition of and respect for intellectual property; Developing simple data and resource discovery and access systems; Building mechanisms to encourage development of web service tools and workflows for data analysis; Brokering the diverse disciplinary service buses; Creating sustainable business models for maintenance and evolution of information resources; Integrating the data management life-cycle into the practice of science. Efforts around the world are converging towards de facto creation of an integrated global digital data network for the geosciences based on common standards and protocols for data discovery and access, and a shared vision of distributed, web-based, open source interoperable data access and integration. Commonalities include use of Open Geospatial Consortium (OGC) and ISO specifications and standardized data interchange mechanisms. For multidisciplinarity, mediation, adaptation, and profiling services have been successfully introduced to leverage the geosciences standards which are commonly used by the different geoscience communities -introducing a brokering approach which extends the basic SOA archetype. Principal challenges are less technical than cultural, social, and organizational. Before we can make data interoperable, we must make people interoperable. These challenges are being met by increased coordination of development activities (technical, organizational, social) among leaders and practitioners in national and international efforts across the geosciences to foster commonalities across disparate networks. In doing so, we will 1) leverage and share resources, and developments, 2) facilitate and enhance emerging technical and structural advances, 3) promote interoperability across scientific domains, 4) support the promulgation and institutionalization of agreed-upon standards, protocols, and practice, and 5) enhance knowledge transfer not only across the community, but into the domain sciences, 6) lower existing entry barriers for users and data producers, 7) build on the existing disciplinary infrastructures leveraging their service buses. . All of these objectives are required for establishing a permanent and sustainable cyber(e)-infrastructure for the geosciences. The rationale for this approach is well articulated in the AuScope mission statement: "Many of these problems can only be solved on a national, if not global scale. No single researcher, research institution, discipline or jurisdiction can provide the solutions. We increasingly need to embrace e-Research techniques and use the internet not only to access nationally distributed datasets, instruments and compute infrastructure, but also to build online, 'virtual' communities of globally dispersed researchers." Multidisciplinary interoperability can be successfully pursued by adopting a "system of systems" or a "Network of Networks" philosophy. This approach aims to: (a) supplement but not supplant systems mandates and governance arrangements; (b) keep the existing capacities as autonomous as possible; (c) lower entry barriers; (d) Build incrementally on existing infrastructures (information systems); (e) incorporate heterogeneous resources by introducing distribution and mediation functionalities. This approach has been adopted by the European INSPIRE (Infrastructure for Spatial Information in the European Community) initiative and by the international GEOSS (Global Earth Observation System of Systems) programme.

Top